Identificación de sitios en proteínas usando máquinas con vectores de soporte

Jaime Leonardo Bobadilla; Tobías Mojica; Luis Fernando Niño

doi:10.22490/24629448.1058

Sites Identification in Proteins, using machines with support vectors

Identificación de sitios en proteínas usando máquinas con vectores de soporte

Open | Download

PDF (Spanish) FLIP HTML (Spanish)

Issue

Vol. 1 No. 1 (2003)

Section

Artículo Original

How to Cite

Bobadilla, J. L., Mojica, T., & Niño, L. F. (2014). Sites Identification in Proteins, using machines with support vectors. NOVA, 1(1). https://doi.org/10.22490/24629448.1058

DOI

https://doi.org/10.22490/24629448.1058

Dimensions

PlumX

license

NOVA by http://www.unicolmayor.edu.co/publicaciones/index.php/nova is distributed under a license creative commons non comertial-atribution-withoutderive 4.0 international.

Furthermore, the authors keep their property intellectual rights over the articles.

Jaime Leonardo Bobadilla

Departamento de Ingeniería de Sistemas, Universidad Nacional de Colombia.

Tobías Mojica

Instituto de Genética, Universidad Nacional de Colombia, Bogotá

Luis Fernando Niño

Departamento de Ingeniería de Sistemas, Universidad Nacional de Colombia

Show authors biography

Machine learning

Annoted

Protein structure

Sites

Algorithm

Non-sites

The increasing amount of protein three-dimensional (3D) structures determined by x-ray and NMR technologies as well as structures predicted by computational methods results in the need for automated methods to provide initial annotations. We have developed a new method for recognizing sites in three-dimensional protein structures. Our method is based on a previously reported algorithm for creating descriptions of protein microenvironments using physical and chemical properties at multiple levels of detail. The recognition method takes three inputs: 1. a set of sites that share some structural or functional role, 2.a set of control non-sites that lack this role, and 3. a single query site. A support vector machine classifier is built using feature vectors where each component represents a property in a given volume. Validation against an independent test shows that this recognition approach has high sensitivity and specificity. We also describe the results of scanning four calcium binding proteins (with the calcium removed) using a three dimensional grid of probe points at 1.25≈ spacing. Our results show that property based descriptions along with support vector machines can be used for recognizing protein sites in un-annotated structures.

Article visits 157 | PDF visits 100

Downloads

Download data is not yet available.

Mojica T, Estrada L. Acerca del genoma humano. Agronomía Colombiana;27:7-12
Workshop Report National Research Council Steering Committee: George L. Kenyon, (Chair). Defining the Mandate of Proteomics in the Post-Genomics Era.
Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, et al. The Protein Data Bank. Nucleic Acids Research 2000;28:235-42.
Simons KT, Kooperberg C, Huang E, Baker D. Assembly of protein tertiary structures from fragments with similar local sequences using simulated anealing and Bayesian scoring functions. J Mol Biol 1997;268:209-25.
Altman RB, Klein TE. Challenges for Biomedical Informatics and Pharmacogenomics. Annu Rev Pharmacol Toxicol 2002;42:113-33.
Koza JR. Evolution of a Computer Program for Classifying Protein Segments as Transmembrane Domains Using Genetic Programming. Proc of ISMB-94 1994:244-52.
Bryant SH, Altschul SF. Statistics of Sequence-structure Threading. Current Opinion in Structural Biology 1995;5:236- 44.
Simons KT, Kooperberg C, Huang E, Baker D. Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. J Mol Biol 1997;268:209-25.
Bork P, Dandekar T, Diaz-Lazcoz Y, Eisenhaber F, Huynen M, Yuan Y. Predicting Function: From Genes to Genomes and Back. J Mol Biol 1998;283:707-25.
Brown MPS, Grundy WN, Lin D, Cristianini N, Sugent CW, Furey TS, et al. Knowledge-based Analysis of Microarray Gene Expression Data by Using Support Vector Machines. PNAS 2000;97(1):262-7.
Koza JR. Evolution of a Computer Program for Classifying Protein Segments as Transmembrane Domains Using Genetic Programming. Proc of ISMB-94 1994:244-52.
Lathrop RH. The Protein Threading Problem with Sequence Amino Acid Interaction Preferences is NP-Complete. Protein Engineering 1994;7:9:1059-68.
Alberts B, Bray D, Lewis J, Raff M, Roberts K, Watson JD. Molecular Biology of the Cell. 3rd ed. New York and London: Garland Publishing; c1994.
Richards FM. Calculation of Molecular Volumes and Areas for Structures of Known Geometry. Methods in Enzymology 1996;115:440-64.
Bateman A, Birney E, Cerruti L, Durbin R, Etwiller L, Eddy SR, et al. The Pfam Protein Families Database. Nucleic Acids Research 2002;30(1):276-80.
Bagley SC, Altman RB. Characterizing the Microenvironment Surrounding Protein Sites. Protein Science 1995;4:622-35.
Baldi P, Brunak S. Bioinformatics: The Machine Learning Approach. Cambridge, MA: MIT Press; 1998.
Burges CC.ATutorial on Support Vector Machines for Pattern Recognition. In «Data Mining and Knowledge Discovery», 1998.
Berman HM, Bhat TN, Bourne PE, Feng Z, Gilliland G, Weissig H, et al. The Protein Data Bank and the challenge of structural genomics. Nature Structural Biology 2000;7(11):957-9.
-------------------------------------------------------------------------------
DOI: http://dx.doi.org/10.22490/24629448.1058