Compression of Protein Conformational Space
Y. Shao, M. Magdon-Ismail, D. Freedman, S. Akella, and C. Bystroff
International Conference on Research in Computational Molecular Biology (RECOMB), 2002

Compressing conformational space is the process of defining a subspace of minimal dimensionality where any point may represent a protein-like structure. This is similar to the problem of image compression, where it is desired to reconstruct an image from a small amount of information. In this case, the similarity (in atomic detail) between a true protein and the protein like structure obtained by projecting the compressed protein back into real space is the measure of the success of the compression algorithm. If proteins may be accurately compressed to a space that is efficiently searchable, and then decompressed back to real space, existing energy functions that use atomic detail may finally be rigorously tested in an exhaustive conformational search. (Note: Unlike in the threading approach, such an exhaustive simulation search may consider the folding pathway, and therefore the folding kinetics.) In this paper, we apply compression techniques to various representations of the proteins of known structure. We apply principle component analysis (PCA) to the coordinate, backbone angle, and distance matrix representations. Additionally, we applied Fourier transform techniques to distance matrix space. The success of the compression was measured by the structural difference between the original and the reconstructed coordinates for proteins that were not used in the development of the compression algorithm. It is found that some representations of the model are more easily compressed that others. We find, unexpectedly, that the models that retain the most atomic detail may be compressed to the smallest subspace.