Motivation: Loop portions in proteins are involved in many molecular interaction processes. They
often exhibit a high degree of flexibility, which can be essential for their function. However, molecular
modeling approaches usually represent loops using a single conformation. Although this conformation
may correspond to a (meta-)stable state, it does not always provide a realistic representation.
Results: In this paper, we propose a method to exhaustively sample the conformational space of protein
loops. It exploits structural information encoded in a large library of three-residue fragments, and enforces
loop-closure using a closed-form inverse kinematics solver. A novel reinforcement learning approach is
applied to accelerate sampling while preserving diversity. The performance of our method is showcased
on benchmark datasets involving 9-, 12- and 15-residue loops. In addition, more detailed results presented
for streptavidin illustrate the ability of the method to exhaustively sample the conformational space of loops
presenting several meta-stable conformations.
Availability: We are developing a software package called MoMA (for Molecular Motion Algorithms), which
includes modeling tools and algorithms to sample conformations and transition paths of biomolecules,
including the application described in this work. The binaries can be provided upon request and a web
application will also be implemented in the short future.
@article{Barozet2020MoMALoop,
abstract = {{Loop portions in proteins are involved in many molecular interaction processes. They often exhibit a high degree of flexibility, which can be essential for their function. However, molecular modeling approaches usually represent loops using a single conformation. Although this conformation may correspond to a (meta-)stable state, it does not always provide a realistic representation.In this paper, we propose a method to exhaustively sample the conformational space of protein loops. It exploits structural information encoded in a large library of three-residue fragments, and enforces loop-closure using a closed-form inverse kinematics solver. A novel reinforcement-learning-based approach is applied to accelerate sampling while preserving diversity. The performance of our method is showcased on benchmark datasets involving 9-, 12- and 15-residue loops. In addition, more detailed results presented for streptavidin illustrate the ability of the method to exhaustively sample the conformational space of loops presenting several meta-stable conformations.We are developing a software package called MoMA (for Molecular Motion Algorithms), which includes modeling tools and algorithms to sample conformations and transition paths of biomolecules, including the application described in this work. The binaries can be provided upon request and a web application will also be implemented in the short future.Supplementary data are available at Bioinformatics online.}},
author = {Barozet, Am{\'e}lie and Molloy, Kevin and Vaisset, Marc and Sim{\'e}on, Thierry and Cort{\'e}s, Juan},
doi = {10.1093/bioinformatics/btz684},
eprint = {https://academic.oup.com/bioinformatics/article-pdf/36/4/1099/32527509/btz684.pdf},
issn = {1367-4803},
journal = {Bioinformatics},
month = {08},
number = {4},
pages = {1099-1106},
title = {{A reinforcement-learning-based approach to enhance exhaustive protein loop sampling}},
url = {https://doi.org/10.1093/bioinformatics/btz684},
volume = {36},
year = {2019},
Bdsk-Url-1 = {https://doi.org/10.1093/bioinformatics/btz684}}
A. Estaña, K. Molloy, M. Vaisset, B. Sibille, T. Simeon, P. Bernado, J. Cortés.
The study of the conformational energy landscape of a molecule is essential for the understanding of its physicochemical properties. This requires the exploration of a continuous, high-dimensional space to identify the most probable conformations and the transition paths between them. The problem is computationally difficult, in particular for highly-flexible biomolecules such as Intrinsically Disordered Proteins (IDPs). In recent years, a robotics-inspired algorithm called Transition-based Rapidly-exploring Random Tree (TRRT) has been proposed to solve this problem, and has been shown to provide good results with small and middle-sized biomolecules. Aiming to treat larger systems, we propose a hybrid strategy for the efficient parallelization of a multi-tree variant of TRRT, called Multi-TRRT, enabling an efficient execution in (possibly large) computer clusters. The parallel algorithm uses OpenMP multi-threading for computation inside each multi-core processor and MPI to perform the communication between processors. Results show a near-linear speedup for a wide range of cluster configurations. Although the paper mainly deals with the application of the proposed parallel algorithm to the investigation of biomolecules, the explanations concerning the methods are general, aiming to inspire future work on the parallelization of related algorithms.
@article{estana_MultiTreeIDPs_2018,
title = {Hybrid parallelization of a multi-tree path search algorithm: Application to highly-flexible biomolecules},
author = {Esta{\~n}a, Alejandro N and Molloy, Kevin and Vaisset, Marc and Sibille, Nathalie and Sim{\'e}on, Thierry and Bernad{\'o}, Pau and Cort{\'e}s, Juan},
journal = "Parallel Computing",
volume = "77", pages = "84 - 100", year = "2018", issn = "0167-8191",
doi = "https://doi.org/10.1016/j.parco.2018.06.005",
url = "http://www.sciencedirect.com/science/article/pii/S0167819118301893",
keywords = "High Performance Computing (HPC), Hybrid parallelization, Path planning algorithms, Molecular energy landscape exploration, Intrinsically Disordered Proteins (IDPs)",
abstract = "The study of the conformational energy landscape of a molecule is essential for the understanding of its physicochemical properties. This requires the exploration of a continuous, high-dimensional space to identify the most probable conformations and the transition paths between them. The problem is computationally difficult, in particular for highly-flexible biomolecules such as Intrinsically Disordered Proteins (IDPs). In recent years, a robotics-inspired algorithm called Transition-based Rapidly-exploring Random Tree (TRRT) has been proposed to solve this problem, and has been shown to provide good results with small and middle-sized biomolecules. Aiming to treat larger systems, we propose a hybrid strategy for the efficient parallelization of a multi-tree variant of TRRT, called Multi-TRRT, enabling an efficient execution in (possibly large) computer clusters. The parallel algorithm uses OpenMP multi-threading for computation inside each multi-core processor and MPI to perform the communication between processors. Results show a near-linear speedup for a wide range of cluster configurations. Although the paper mainly deals with the application of the proposed parallel algorithm to the investigation of biomolecules, the explanations concerning the methods are general, aiming to inspire future work on the parallelization of related algorithms."
}
Kevin Molloy, Laurent Denarie, Marc Vaisset,
Thierry Siméon, Juan Cortés.
This paper addresses the simultaneous design and path-planning problem, in which features associated to the bodies of a mobile system must be selected to find the best design that optimizes its motion between two given configurations. Solving individual path-planning problems for all possible designs and selecting the best result would be straightforward only for very simple cases. We propose a more efficient approach that combines discrete (design) and continuous (path) optimization in a single stage. It builds on an extension of a sampling-based algorithm, which simultaneously explores the configuration-space costmap of all possible designs, aiming to find the best path-design pair. The algorithm filters out unsuitable designs during the path search, which breaks down the combinatorial explosion. Illustrative results are presented for relatively simple (academic) robotic examples, showing that even in these simple cases, the computational cost can be reduced by two orders of magnitude with respect to the naïve approach. A preliminary application to challenging problems in computational biology related to protein design is also discussed.
@article{Molloy_SysDesignPlanning_2018,
author = {Kevin Molloy and Laurent Denarie and Marc Vaisset and Thierry Siméon and Juan Cortés},
title ={Simultaneous system design and path planning: A sampling-based algorithm},
journal = {The International Journal of Robotics Research},
volume = {0},
number = {0},
pages = {0278364918783054},
year = {2018}, month = {Jul},
doi = {10.1177/0278364918783054},
URL = { https://doi.org/10.1177/0278364918783054 },
eprint = { https://doi.org/10.1177/0278364918783054 },
abstract = { This paper addresses the simultaneous design and path-planning problem, in which features associated to the bodies of a mobile system must be selected to find the best design that optimizes its motion between two given configurations. Solving individual path-planning problems for all possible designs and selecting the best result would be straightforward only for very simple cases. We propose a more efficient approach that combines discrete (design) and continuous (path) optimization in a single stage. It builds on an extension of a sampling-based algorithm, which simultaneously explores the configuration-space costmap of all possible designs, aiming to find the best path-design pair. The algorithm filters out unsuitable designs during the path search, which breaks down the combinatorial explosion. Illustrative results are presented for relatively simple (academic) robotic examples, showing that even in these simple cases, the computational cost can be reduced by two orders of magnitude with respect to the naïve approach. A preliminary application to challenging problems in computational biology related to protein design is also discussed. }
}
Precious information on protein function can be extracted from a detailed characterization of protein equilibrium dynamics. This remains elusive in wet and dry laboratories, as function-modulating transitions of a protein between functionally-relevant, thermodynamically-stable and meta-stable structural states often span disparate time scales. In this paper we propose a novel, robotics-inspired algorithm that circumvents time-scale challenges by drawing analogies between protein motion and robot motion. The algorithm adapts the popular roadmap-based framework in robot motion computation to handle the more complex protein conformation space and its underlying rugged energy surface. Given known structures representing stable and meta-stable states of a protein, the algorithm yields a time- and energy-prioritized list of transition paths between the structures, with each path represented as a series of conformations. The algorithm balances computational resources between a global search aimed at obtaining a global view of the network of protein conformations and their connectivity and a detailed local search focused on realizing such connections with physically-realistic models. Promising results are presented on a variety of proteins that demonstrate the general utility of the algorithm and its capability to improve the state of the art without employing system-specific insight.
Obtaining accurate representations of energy landscapes of biomolecules such as proteins and peptides is central to the study of their physicochemical properties and biological functions. Peptides are particularly interesting, as they exploit structural flexibility to modulate their biological function. Despite their small size, peptide modeling remains challenging due to the complexity of the energy landscape of such highly-flexible dynamic systems. Currently, only stochastic sampling-based methods can efficiently explore the conformational space of a peptide. In this paper, we suggest to combine two such methods to obtain a full characterization of energy landscapes of small yet flexible peptides. First, we propose a simplified version of the classical Basin Hopping algorithm to reveal low-energy regions in the landscape, and thus to identify the corresponding metastable structural states of a peptide. Then, we present several variants of a robotics-inspired algorithm, the Transition-based Rapidly-exploring Random Tree, to quickly determine transition path ensembles, as well as transition probabilities between metastable states. We demonstrate this combined approach on met-enkephalin.
@article{Devaurs_CharEnergyLandscapes_2015,
author={D. Devaurs and K. Molloy and M. Vaisset and A. Shehu and T. Siméon and J. Cortés*},
journal={IEEE Transactions on NanoBioscience},
title={Characterizing Energy Landscapes of Peptides Using a Combination of Stochastic Algorithms},
year={2015},
month={July},
volume={14},
number={5},
pages={545-552},
keywords={biology computing;molecular biophysics;molecular configurations;proteins;stochastic processes;energy landscapes;biomolecules;proteins;peptides;stochastic sampling-based methods;classical Basin Hopping algorithm;metastable structural states;robotics-inspired algorithm;transition-based rapidly-exploring random tree;met-enkephalin;Peptides;Minimization;Clustering algorithms;Space exploration;Nanobioscience;Proteins;Energy landscape;peptides;stochastic algorithms;Algorithms;Computational Biology;Models, Theoretical;Peptides;Stochastic Processes;Thermodynamics},
doi={10.1109/TNB.2015.2424597},
ISSN={1536-1241}
}
Evidence is emerging that the role of protein structure in disease needs to be rethought. Sequence mutations in proteins are often found to affect the rate at which a protein switches between structures. Modeling structural transitions in wildtype and variant proteins is central to understanding the molecular basis of disease. This paper investigates an efficient algorithmic realization of the stochastic roadmap simulation framework to model structural transitions in wildtype and variants of proteins implicated in human disorders. Our results indicate that the algorithm is able to extract useful information on the impact of mutations on protein structure and function.
@article{molloy_clausen_shehu_2016,
author={Molloy, Kevin and Clausen, Rudy and Shehu, Amarda},
title={A stochastic roadmap method to model protein structural transitions},
volume={34}, DOI={10.1017/S0263574715001058},
number={8}, journal={Robotica}, publisher={Cambridge University Press},
year={2016}, pages={1705–1733}}
Kevin Molloy, M. Jennifer Van, Daniel Barbará and Amarda Shehu.
Due to rapid sequencing of genomes, there are now millions of deposited protein sequences with no known function. Fast sequence-based comparisons allow detecting close homologs for a protein of interest to transfer functional information from the homologs to the given protein. Sequence-based comparison cannot detect remote homologs, in which evolution has adjusted the sequence while largely preserving structure. Structure-based comparisons can detect remote homologs but most methods for doing so are too expensive to apply at a large scale over structural databases of proteins. Recently, fragment-based structural representations have been proposed that allow fast detection of remote homologs with reasonable accuracy. These representations have also been used to obtain linearly-reducible maps of protein structure space. It has been shown, as additionally supported from analysis in this paper that such maps preserve functional co-localization of the protein structure space. Methods Inspired by a recent application of the Latent Dirichlet Allocation (LDA) model for conducting structural comparisons of proteins, we propose higher-order LDA-obtained topic-based representations of protein structures to provide an alternative route for remote homology detection and organization of the protein structure space in few dimensions. Various techniques based on natural language processing are proposed and employed to aid the analysis of topics in the protein structure domain. Results We show that a topic-based representation is just as effective as a fragment-based one at automated detection of remote homologs and organization of protein structure space. We conduct a detailed analysis of the information content in the topic-based representation, showing that topics have semantic meaning. The fragment-based and topic-based representations are also shown to allow prediction of superfamily membership. Conclusions This work opens exciting venues in designing novel representations to extract information about protein structures, as well as organizing and mining protein structure space with mature text mining tools.
@article{MolloyBarbaraShehuBMCBioinf14,
abstract = {BACKGROUND: Due to rapid sequencing of genomes, there are now millions of deposited protein sequences with no known function. Fast sequence-based comparisons allow detecting close homologs for a protein of interest to transfer functional information from the homologs to the given protein. Sequence-based comparison cannot detect remote homologs, in which evolution has adjusted the sequence while largely preserving structure. Structure-based comparisons can detect remote homologs but most methods for doing so are too expensive to apply at a large scale over structural databases of proteins. Recently, fragment-based structural representations have been proposed that allow fast detection of remote homologs with reasonable accuracy. These representations have also been used to obtain linearly-reducible maps of protein structure space. It has been shown, as additionally supported from analysis in this paper that such maps preserve functional co-localization of the protein structure space. METHODS: Inspired by a recent application of the Latent Dirichlet Allocation (LDA) model for conducting structural comparisons of proteins, we propose higher-order LDA-obtained topic-based representations of protein structures to provide an alternative route for remote homology detection and organization of the protein structure space in few dimensions. Various techniques based on natural language processing are proposed and employed to aid the analysis of topics in the protein structure domain. RESULTS: We show that a topic-based representation is just as effective as a fragment-based one at automated detection of remote homologs and organization of protein structure space. We conduct a detailed analysis of the information content in the topic-based representation, showing that topics have semantic meaning. The fragment-based and topic-based representations are also shown to allow prediction of superfamily membership. CONCLUSIONS: This work opens exciting venues in designing novel representations to extract information about protein structures, as well as organizing and mining protein structure space with mature text mining tools.}, an = {25080993}, author = {Molloy, Kevin and Van, M Jennifer and Barbara, Daniel and Shehu, Amarda}, date-added = {2021-03-21 18:10:00 -0400}, date-modified = {2021-03-21 18:10:00 -0400}, db = {PubMed}, doi = {10.1186/1471-2105-15-S8-S4}, et = {2014/07/14}, isbn = {1471-2105}, j2 = {BMC Bioinformatics}, journal = {BMC bioinformatics}, keywords = {Algorithms; Amino Acid Sequence; Automation; Computational Biology/instrumentation/*methods; Natural Language Processing; Proteins/*chemistry}, l2 = {https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4120149/}, la = {eng}, number = {Suppl 8}, pages = {S4--S4}, publisher = {BioMed Central}, title = {Exploring representations of protein structure for automated remote homology detection and mapping of protein structure space}, ty = {JOUR}, u1 = {25080993{$[$}pmid{$]$}}, u2 = {PMC4120149{$[$}pmcid{$]$}}, u4 = {1471-2105-15-S8-S4{$[$}PII{$]$}}, url = {https://pubmed.ncbi.nlm.nih.gov/25080993}, volume = {15 Suppl 8}, year = {2014}, Bdsk-Url-1 = {https://pubmed.ncbi.nlm.nih.gov/25080993}, Bdsk-Url-2 = {https://doi.org/10.1186/1471-2105-15-S8-S4}}
Adequate sampling of the conformational space is a central challenge in ab-initio protein structure prediction. In the absence of a template structure, a conformational search procedure guided by an energy function explores the conformational space, gathering an ensemble of low-energy decoy conformations. If the sampling is inadequate, the native structure may be missed altogether. Even if reproduced, a subsequent stage that selects a subset of decoys for further structural detail and energetic refinement may discard near-native decoys if they are high-energy or insufficiently represented in the ensemble. Sampling should produce a decoy ensemble that facilitates the subsequent selection of near-native decoys. In this paper, we investigate a robotics-inspired framework that allows directly measuring the role of energy in guiding sampling. Testing demonstrates that a soft energy bias steers sampling towards a diverse decoy ensemble less prone to exploiting energetic artifacts and thus more likely to facilitate retainment of near-native conformations by selection techniques. We employ two different energy functions, the Associative Memory Hamiltonian with Water (AMW) and Rosetta. Results show that enhanced sampling provides a rigorous testing of energy functions and exposes different deficiencies in them, thus promising to guide development of more accurate representations and energy functions.
Many proteins tune their biological function by transitioning between different functional states, effectively acting as dynamic molecular machines. Detailed structural characterization of transition trajectories is central to understanding the relationship between protein dynamics and function. Computational approaches that build on the Molecular Dynamics framework are in principle able to model transition trajectories at great detail but also at considerable computational cost. Methods that delay consideration of dynamics and focus instead on elucidating energetically-credible conformational paths connecting two functionally-relevant structures provide a complementary approach. Effective sampling-based path planning methods originating in robotics have been recently proposed to produce conformational paths. These methods largely model short peptides or address large proteins by simplifying conformational space. We propose a robotics-inspired method that connects two given structures of a protein by sampling conformational paths. The method focuses on small- to medium-size proteins, efficiently modeling structural deformations through the use of the molecular fragment replacement technique. In particular, the method grows a tree in conformational space rooted at the start structure, steering the tree to a goal region defined around the goal structure. We investigate various bias schemes over a progress coordinate for balance between coverage of conformational space and progress towards the goal. A geometric projection layer promotes path diversity. A reactive temperature scheme allows sampling of rare paths that cross energy barriers. Experiments are conducted on small- to medium-size proteins of length up to 214 amino acids and with multiple known functionally-relevant states, some of which are more than 13Å apart of each-other. Analysis reveals that the method effectively obtains conformational paths connecting structural states that are significantly different. A detailed analysis on the depth and breadth of the tree suggests that a soft global bias over the progress coordinate enhances sampling and results in higher path diversity. The explicit geometric projection layer that biases the exploration away from over-sampled regions further increases coverage, often improving proximity to the goal by forcing the exploration to find new paths. The reactive temperature scheme is shown effective in increasing path diversity, particularly in difficult structural transitions with known high-energy barriers.
@article{MolloyShehuBMCStructBiol13,
author = {Molloy, K. AND Shehu, A.},
journal = {BMC Structural Biology},
volume = {13},
title = {Elucidating the Ensemble of Functionally-relevant Transitions in Protein Systems with a Robotics-inspired Method},
number = {Suppl 1}, pages = {S8}, year = 2013,
ISSN={1472-6807}, doi={10.1186/1472-6807-13-S1-S8},
url={https://doi.org/10.1186/1472-6807-13-S1-S8}
}
Brian Olson, Irinia Hashmi, Kevin Molloy, and Amarda Shehu
Since its introduction, the basin hopping (BH) framework has proven useful for hard nonlinear optimization problems with multiple variables and modalities. Applications span a wide range, from packing problems in geometry to characterization of molecular states in statistical physics. BH is seeing a reemergence in computational structural biology due to its ability to obtain a coarse-grained representation of the protein energy surface in terms of local minima. In this paper, we show that the BH framework is general and versatile, allowing to address problems related to the characterization of protein structure, assembly, and motion due to its fundamental ability to sample minima in a high-dimensional variable space. We show how specific implementations of the main components in BH yield algorithmic realizations that attain state-of-the-art results in the context of ab initio protein structure prediction and rigid protein-protein docking. We also show that BH can map intermediate minima related with motions connecting diverse stable functionally relevant states in a protein molecule, thus serving as a first step towards the characterization of transition trajectories connecting these states.
@article{OlsonShehuAdvAI12,
author = {Olson, B. AND Hashmi, I. AND Molloy, K. AND Shehu, A.},
journal = {Advances in Artificial Intelligence},
number = {674832},
title = {Basin Hopping as a General and Versatile Optimization Framework for the Characterization of Biological Macromolecules},
doi={https://doi.org/10.1155/2012/674832},
year = 2012
}
Brian Olson, Kevin Molloy, S-Farid Hendi, and Amarda Shehu
The roughness of the protein energy surface poses a significant challenge to search algorithms that seek to obtain a structural characterization of the native state. Recent research seeks to bias search toward near-native conformations through one-dimensional structural profiles of the protein native state. Here we investigate the effectiveness of such profiles in a structure prediction setting for proteins of various sizes and folds. We pursue two directions. We first investigate the contribution of structural profiles in comparison to or in conjunction with physics-based energy functions in providing an effective energy bias. We conduct this investigation in the context of Metropolis Monte Carlo with fragment-based assembly. Second, we explore the effectiveness of structural profiles in providing projection coordinates through which to organize the conformational space. We do so in the context of a robotics-inspired search framework proposed in our lab that employs projections of the conformational space to guide search. Our findings indicate that structural profiles are most effective in obtaining physically realistic near-native conformations when employed in conjunction with physics-based energy functions. Our findings also show that these profiles are very effective when employed instead as projection coordinates to guide probabilistic search toward undersampled regions of the conformational space.
@article{OlsonMolloyShehuJBCB12,
author = {Olson, Brian and Molloy, Kevin and Hendi, S. Farid and Shehu, Amarda},
title = {Guiding Probabilistic Search of the Protein Conformational Space with Structural Profiles},
journal = {Journal of Bioinformatics and Computational Biology},
volume = {10}, number = {03}, pages = {1242005},
year = {2012},
doi = {10.1142/S021972001242005X},
note ={PMID: 22809381},
URL = { https://doi.org/10.1142/S021972001242005X },
eprint = { https://doi.org/10.1142/S021972001242005X } ,
abstract = { The roughness of the protein energy surface poses a
significant challenge to search algorithms that seek to obtain a
structural characterization of the native state. Recent research
seeks to bias search toward near-native conformations through
one-dimensional structural profiles of the protein native state.
Here we investigate the effectiveness of such profiles in a
structure prediction setting for proteins of various sizes and folds.
We pursue two directions. We first investigate the contribution of
structural profiles in comparison to or in conjunction with
physics-based energy functions in providing an effective energy bias.
We conduct this investigation in the context of Metropolis Monte Carlo
with fragment-based assembly. Second, we explore the effectiveness of
structural profiles in providing projection coordinates through which
to organize the conformational space. We do so in the context of a
robotics-inspired search framework proposed in our lab that employs
projections of the conformational space to guide search. Our findings
indicate that structural profiles are most effective in obtaining
physically realistic near-native conformations when employed in
conjunction with physics-based energy functions. Our findings
also show that these profiles are very effective when employed
instead as projection coordinates to guide probabilistic search
toward undersampled regions of the conformational space. }
}
}
The three-dimensional structure of a protein is a key determinant of its biological function. Given the cost and time required to acquire this structure through experimental means, computational models are necessary to complement wet-lab efforts. Many computational techniques exist for navigating the high-dimensional protein conformational search space, which is explored for low-energy conformations that comprise a protein's native states. This work proposes two strategies to enhance the sampling of conformations near the native state. An enhanced fragment library with greater structural diversity is used to expand the search space in the context of fragment-based assembly. To manage the increased complexity of the search space, only a representative subset of the sampled conformations is retained to further guide the search towards the native state. Our results make the case that these two strategies greatly enhance the sampling of the conformational space near the native state. A detailed comparative analysis shows that our approach performs as well as state-of-the-art ab initio structure prediction protocols.
@article{OlsonMolloyShehuJBCB11,
author = {Olson, Brian and Molloy, Kevin and Shehu, Amarda},
title = {In Search of the Protein Native State with a Probabilistic Sampling Approach},
journal = {Journal of Bioinformatics and Computational Biology},
volume = {09}, number = {03}, pages = {383-398}, year = {2011},
doi = {10.1142/S0219720011005574},
URL = { https://doi.org/10.1142/S0219720011005574},
eprint = {https://doi.org/10.1142/S0219720011005574},
abstract = { The three-dimensional structure of a protein is a
key determinant of its biological function. Given the cost and time required to acquire this structure through experimental means, computational models are necessary to complement wet-lab efforts. Many computational techniques exist for navigating the high-dimensional protein conformational search space, which is explored for low-energy conformations that comprise a protein's native states. This work proposes two strategies to enhance the sampling of conformations near the native state. An enhanced fragment library with greater structural diversity is used to expand the search space in the context of fragment-based assembly. To manage the increased complexity of the search space, only a representative subset of the sampled conformations is retained to further guide the search towards the native state. Our results make the case that these two strategies greatly enhance the sampling of the conformational space near the native state. A detailed comparative analysis shows that our approach performs as well as state-of-the-art ab initio structure prediction protocols. }
}
Conference Publications
Laurent Denarie, Kevin Molloy, Marc Vaisset, Thierry Siméon, and
Juan Cortés.
This paper addresses the simultaneous design and path planning problem, in which features associated to the bodies of a mobile system have to be selected to find the best design that optimizes its motion between two given configurations. Solving individual path planning problems for all possible designs and selecting the best result would be a straightforward approach for very simple cases. We propose a more efficient approach that combines discrete (design) and continuous (path) optimization in a single stage. It builds on an extension of a sampling-based algorithm, which simultaneously explores the configuration-space costmap of all possible designs aiming to find the best path-design pair. The algorithm filters out unsuitable designs during the path search, which breaks down the combinatorial explosion. Illustrative results are presented for relatively simple (academic) examples. While our work is currently motivated by problems in computational biology, several applications in robotics can also be envisioned.
@inproceedings{DenarieWAFR2016,
author = {Laurent Denarie and Kevin Molloy and Marc Vaisset and Thierry Siméon and Juan Cortés},
title = {Combining System Design and Path Planning},
year={2016},
booktitle ={Proc. Workshop on the Algorithmic Foundations of Robotics (WAFR)}
}
We propose a novel robotics-inspired algorithm to compute physically-realistic motions connecting thermodynamically-stable and semi-stable structural states in protein molecules. Protein motion computation is a challenging problem due to the high-dimensionality of the search space involved and ruggedness of the potential energy surface underlying the space. To handle the multiple local minima issue, we propose a novel algorithm that is not based on the traditional Molecular Dynamics or Monte Carlo frameworks but instead adapts ideas from robot motion planning. In particular, the algorithm balances computational resources between a global search aimed at obtaining a global view of the network of protein conformations and their connectivity and a detailed local search focused on realizing such connections with physically-realistic models. We present here promising results on a variety of proteins and demonstrate the general utility of the algorithm and its capability to improve the state of the art without employing system-specific insight.
@inproceedings{MolloyShehuISBRA15,
AUTHOR = {K. Molloy AND A. Shehu},
TITLE = {Interleaving Global and Local Search for Protein Motion Computation},
BOOKTITLE = {LNCS: Bioinformatics Research and Applications},
EDITOR = { R. Harrison AND Y. Li AND I. Mandoiu},
YEAR = {2015},
VOLUME = {9096},
PAGES = {175-186},
PUBLISHER = {Springer International Publishing},
ADDRESS = {Norfolk, VA}
}
Obtaining a detailed microscopic view of protein transitions among key structural states is central to obtaining a deeper understanding of the relationship between protein dynamics and function. Doing so in the wet laboratory is currently not possible. It is also infeasible to model conformational switching through computational treatments based on Molecular Dynamics, particularly when the objective is expanded to model switching of medium-sized proteins among an arbitrary number of given states. In this paper, we consider this expanded objective and propose a novel probabilistic method to sample conformational paths connecting functionally-relevant structures of a protein. The method achieves this without launching expensive simulations but instead by mapping the connectivity of the conformational space around given thermodynamically-stable and semi-stable structural states. This is achieved through an adaptation of the probabilistic roadmap framework that has been shown successful at planning motions of articulated mechanisms in robotics. Preliminary analysis shows the method is promising and efficient in modeling motions among various states for medium-size proteins.
@inproceedings{MolloyShehuBICOB14,
author = {K. Molloy AND A. Shehu},
title = {A Probabilistic Roadmap-based Method to Model Conformational Switching of a Protein Among Many Functionally-relevant Structures},
boottitle = {Intl Conf on Bioinf and Comp Biol (BICoB)},
year = {2014}, address = {Las Vegas, NV}
}
Kevin Molloy, Jennifer M. Van, Daniel Barbará, and Amarda Shehu.
Fragment-based representations of protein structure have recently been proposed to identify remote homologs with reasonable accuracy. The representations have also been shown through PCA to elucidate low-dimensional maps of protein structure space. In this work we conduct further analysis of these representations, showing that the low-dimensional maps preserve functional co-localization. Moreover, we employ Latent Dirichlet Allocation to investigate a new, topic-based representation. We show through various techniques adapted from text mining that the topics have unique signatures over structural classes and allow a coplementary yet informative organization of protein structure space.
@article{MolloyBarbaraShehuBMCBioinf14,
abstract = {BACKGROUND: Due to rapid sequencing of genomes, there are now millions of deposited protein sequences with no known function. Fast sequence-based comparisons allow detecting close homologs for a protein of interest to transfer functional information from the homologs to the given protein. Sequence-based comparison cannot detect remote homologs, in which evolution has adjusted the sequence while largely preserving structure. Structure-based comparisons can detect remote homologs but most methods for doing so are too expensive to apply at a large scale over structural databases of proteins. Recently, fragment-based structural representations have been proposed that allow fast detection of remote homologs with reasonable accuracy. These representations have also been used to obtain linearly-reducible maps of protein structure space. It has been shown, as additionally supported from analysis in this paper that such maps preserve functional co-localization of the protein structure space. METHODS: Inspired by a recent application of the Latent Dirichlet Allocation (LDA) model for conducting structural comparisons of proteins, we propose higher-order LDA-obtained topic-based representations of protein structures to provide an alternative route for remote homology detection and organization of the protein structure space in few dimensions. Various techniques based on natural language processing are proposed and employed to aid the analysis of topics in the protein structure domain. RESULTS: We show that a topic-based representation is just as effective as a fragment-based one at automated detection of remote homologs and organization of protein structure space. We conduct a detailed analysis of the information content in the topic-based representation, showing that topics have semantic meaning. The fragment-based and topic-based representations are also shown to allow prediction of superfamily membership. CONCLUSIONS: This work opens exciting venues in designing novel representations to extract information about protein structures, as well as organizing and mining protein structure space with mature text mining tools.},
an = {25080993},
author = {Molloy, Kevin and Van, M Jennifer and Barbara, Daniel and Shehu, Amarda},
date-added = {2021-03-21 18:10:00 -0400},
date-modified = {2021-03-21 18:10:00 -0400},
db = {PubMed},
doi = {10.1186/1471-2105-15-S8-S4},
et = {2014/07/14},
isbn = {1471-2105},
j2 = {BMC Bioinformatics},
journal = {BMC bioinformatics},
keywords = {Algorithms; Amino Acid Sequence; Automation; Computational Biology/instrumentation/*methods; Natural Language Processing; Proteins/*chemistry},
l2 = {https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4120149/},
la = {eng},
number = {Suppl 8},
pages = {S4--S4},
publisher = {BioMed Central},
title = {Exploring representations of protein structure for automated remote homology detection and mapping of protein structure space},
ty = {JOUR},
u1 = {25080993{$[$}pmid{$]$}},
u2 = {PMC4120149{$[$}pmcid{$]$}},
u4 = {1471-2105-15-S8-S4{$[$}PII{$]$}},
url = {https://pubmed.ncbi.nlm.nih.gov/25080993},
volume = {15 Suppl 8},
year = {2014},
Bdsk-Url-1 = {https://pubmed.ncbi.nlm.nih.gov/25080993},
Bdsk-Url-2 = {https://doi.org/10.1186/1471-2105-15-S8-S4}}
Characterization of transition trajectories that take a protein between different functional states is an important yet challenging problem in computational biology. Approaches based on Molecular Dynamics can obtain the most detailed and accurate information but at considerable computational cost. To address the cost, sampling-based path planning methods adapted from robotics forego protein dynamics and seek instead conformational paths, operating under the assumption that dynamics can be incorporated later to transform paths to transition trajectories. Existing methods focus either on short peptides or large proteins; on the latter, coarse representations simplify the search space. Here we present a robotics-inspired tree-based method to sample conformational paths that connect known structural states of small- to medium-size proteins. We address the dimensionality of the search space using molecular fragment replacement to efficiently obtain physically-realistic conformations. The method grows a tree in conformational space rooted at a given conformation and biases the growth of the tree to steer it to a given goal conformation. Different bias schemes are investigated for their efficacy. Experiments on proteins up to 214 amino acids long with known functionally-relevant states more than 13A apart show that the method effectively obtains conformational paths connecting significantly different structural states.
@inproceedings{MolloyShehuCSBW12,
author = {Molloy, K. AND Shehu, A.},
booktitle = {IEEE Intl Conf on Bioinf and Biomed Workshops (BIBMW)},
title = {A Robotics-inspired Method to Sample Conformational Paths Connecting Known Functionally-relevant Structures in Protein Systems},
year = {2012}, month = {October}, pages = {56-63},
address = {Philadelphia, PA},
}
@inproceedings{OlsonMolloyShehuBionetics10,
address = {Boston, MA},
author = {Olson, B. AND Molloy, K. AND Shehu, A.},
booktitle = {Intl ICST Conf on Bio-Inspired Models of Network, Information, and Computing Systems},
pages = {103-117}, publisher = {Springer},
title = {Enhancing Sampling of the Conformational Space Near the Protein Native State},
year = 2010
}