Friday, 8 March 2013

Respiratory Complex 1, ND1 component


Structural evolutionary module detected by analysis of evolutionary couplings

A 3D structure of a full respiratory complex 1 from bacteria is now solved by X-ray crystallography (Nature 494; 443, 2013) and includes the structure of the previously enigmatic membrane protein component (NQO8). Based on this tour-de-force of structural biology, the authors can now propose a plausible model for how the long range conformational changes in the membrane components are coupled to the redox reactions in the non-membrane portion of the complex. Despite the lack of significant sequence similarity, the newly solved subunit NQO8 contains a half-anti-porter like fold, with similarity in 3D to a repeated module in three of the other membrane proteins in the complex. Interestingly, this 3D similarity was suggested by Hopf et al, (Cell, 149; 1607,  2012), who predicted that the ND1 subunit of human component, which is orthologous to NQO8, has a fold highly similar to that of the other complex 1 membrane components, by using a novel statistical analysis of evolutionary co-variation of sequences. 
The confirmation of this prediction by crystallographic experiment indicates that analysis of evolutionary relationships in terms of conservation of residue-residue interactions is potentially much more powerful than that the canonical analysis in terms of conservation of just single residue properties.




Sunday, 3 March 2013

Theory Lunch reading list


An incomplete, personal collection of the Maximum Entropy approach to a few biological problems.

The fact that correlation does not imply causation is obvious. Correlation between variables in data does not mean that they are causally linked e.g. frequencies of residue co-occurrence in protein families or correlation of gene expression levels over many experiments. This is because correlation between non causal variables may be induced by chaining of correlation between a set of intervening, directly causal variables. Such "non-causal correlation" is well known in statistical physics, and much has been written about it.
The problem of weeding out non-causal correlation in observed biological data is especially important when we are dealing with highly coupled networks of variables. For instance a protein residue might contact  many other residues in a protein in 3D, and genes similarly may have effect on very many other genes . Mutual information where each pair of variables is looked at in isolation of all other pairs of variables will lead to many false high correlations as one is treating the pairs as independent et of each other., In some problems e.g. predicting RNA secondary structure, this is not important.  In contrast, in residues contacts in and across proteins and in gene expression networks it will be crucial. As yet,  the maximum entropy approach is not that widespread in computational biology; but I think it will become so,  especially as amount of data increases. There are formal mathematical links of MaxEnt  to "graphical models" and  to  Bayesian network approach but in my hands the latter is not as powerful. This may be to do with the fact that in the maximum entropy approach has an additional feature as well as the global approach;  it  not need to  assume that the lack of observation of a correlation means it could never happen. In that sense it is 'maximum entropy' since one is assuming the flattest possible distribution of all possible combinations except for the data. or rather the data can imply many probability distributions which are consistent, maximizing the entropy ensures that the global probability model is the  one which is the flattest, least biased, for the data in hand.
I can offer much more literature if you would like ; below is a very limited set.  Books at the end

1. Papers: Applications
Using Sequence Alignments to Predict Protein Structure and Stability With High Accuracy. Alan LapedesBertrand GiraudChristopher Jarzynski arXiv:1207.2484 [q-bio.QM]
This is a succinct exposition of the use of maximum entropy formalism to predict 3D contacts in proteins and stability. This paper was written in 2000/2001 but then  buried in LANL. After we found out about the work ( via Gary Stormo), we called Alan Lapedes and encouraged him to put his 199/200 work into arXiv, which he then did, in July 2012. The original and buried non published paper was 2001 which makes it the first successful use of MAxEnt for prediction of residue contacts in proteins. its also worth listening to a recorded talk he gave on the subject in the archives - can be downloaded from http://online.itp.ucsb.edu/online/infobio01/lapedes/

Protein 3D Structure Computed from Evolutionary Sequence Variation Debora S., Marks Lucy J. Colwell Robert Sheridan,Thomas A. Hopf,Andrea Pagnani,Riccardo Zecchina,Chris Sander, PLoS One 2011
Read supplement too.

Direct-coupling analysis of residue coevolution captures native contacts across many protein families Proc Natl Acad Sci U S A. 2011 Dec 6;108(49):E1293-301. doi: 10.1073/pnas.1111471108. 21.Morcos F, Pagnani A, Lunt B, Bertolino A, Marks DS, Sander C, Zecchina R, Onuchic JN, Hwa T, Weigt M.
Read supplement too.

Three-dimensional structures of membrane proteins from genomic sequencing.
Hopf TA, Colwell LJ, Sheridan R, Rost B, Sander C, Marks DS.
Cell. 2012 Jun 22;149(7):1607-21. doi: 10.1016/j.cell.2012.04.012. Epub 2012 May 10.

Weak pairwise correlations imply strongly correlated network states in a neural population.
Schneidman E, Berry MJ 2nd, Segev R, Bialek W.
Nature. 2006

Using the principle of entropy maximization to infer genetic interaction networks from geneexpression patterns. Lezon TR, Banavar JR, Cieplak M, Maritan A, Fedoroff NV. Proc Natl Acad Sci U S A. 2006 Dec 12;103(50)


Correlated Mutations in models of protein sequences; phylogenetic and structural effects Alan S. Lapedes1, Bertrand G. Giraud, LonChang Liu and Gary D. Stormo Statistics in Molecular Biology
IMS Lecture Notes Monograph Series (1999) Volume 33
nice toy model to show effect and solution

2. Papers: Theory

Information Theory and Statistical Mechanics. E. T Jaynes 1957  part one
Information Theory and Statistical Mechanics. E. T Jaynes 1957  part two

These two papers will give you all you need to understand the use of maximum entropy formalism , but may have to be read more than once
Superadditive correlation  Giraud BGHeumann JMLapedes AS. Phys Rev E Stat Phys Plasmas Fluids Relat Interdiscip Topics. 1999 May;59(5 Pt A):4983-91.
Very nice figure on page 2 says it all intuitively
E. T. Jaynes Information Theory and Statistical Physics.  Brandeis Summer institute lectures in Theoretical Physics , 1962. Vol 3 page 181
Nice background

3. Recent
Improved contact prediction in proteins: Using pseudolikelihoods to infer Potts models. Magnus Ekeberg, Cecilia Lövkvist, Yueheng Lan, Martin Weigt, Erik Aurell arXiv:1211.1281 [q-bio.QM]

This is still MaxEnt, but  an alternative approach to solving the equations for the LaGrange multipliers once the objective function is set up. It also uses the norm of the matrix rather than a corrected MI formulation to plug the computed probabilities into to find correlated columns, if that's what one once to do We have an implementation of our code with this method of solving Erik shared with us and we also testing it. one thin is its much slower. Not sure yet whether that matters. .

4. Papers: Connection to Bayes

Bayesian methods: General Background: An Introductory Tutorial
E. T. Jaynes 1996 ( he died in 98)
Lovely philosophical back1996 ground, from Herodutus, Bernouilli, bayes, laplace , Jeffreys, Cox and Shannon

Nice list
http://bayes.wustl.edu/etj/articles/


5. Books
E T  Jaynes.  Probability Theory: The Logic of Science. 
Buhlman. 
Murphy