Автоматический анализ научных текстов для создания семантических сетей белков
Диссертация
Возможности постгеномных технологий позволяют в высокоэффективном режиме идентифицировать группы генов и белков, координировано реагирующих на изменение экспериментальных условий. В то же время, оказалось не очень просто объяснить с точки зрения функциональных особенностей биомакромолекул биологические эффекты, проявляющиеся в результате работы ансамбля генов. Интерпретация экспериментальных… Читать ещё >
Список литературы
- Aerts, S. et al. (2008). Text-mining assisted regulatory annotation //Genome Biol. 9: R31.
- Al-Shahrour, F. et al. (2006). BABELOMICS: a systems biology perspective in the functional annotation of genome-scale experiments //Nucleic Acids Res. 34: W472−476.
- Al-Shahrour, F., R. Diaz-Uriarte, J. Dopazo (2004). FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes //Bioinformatics. 20: 578−580.
- Archakov, A. I. et al. (2003). Protein-protein interactions as a target for drugs in proteomics //Proteomics. 3: 380−391.
- Ashburner, M. et al. (2000). Gene ontology: tool for the unification of biology. The Gene Ontology Consortium //Nat Genet. 25: 25−29.
- Azuaje F., Dopazo J. (2005) Data Analysis and Visualization in Genomics and Proteomics. England: John Wiley & Sons Ltd.
- Bader, G. D., C. W. Hogue (2002). Analyzing yeast protein-protein interaction data obtained from different sources //Nat Biotechnol. 20: 991−997.
- Bader, G. D., D. Betel, C. W. Hogue (2003). BIND: the Biomolecular Interaction
- Network Database //Nucleic Acids Res. 31: 248−250.
- Barabasi, A. L., Z. N. Oltvai (2004). Network biology: understanding the cell’s functional organization //Nat Rev Genet. 5: 101−113.
- Becker, K.G. et al. (2003) PubMatrix: a tool for multiplex literature mining // BMC Bioinformatics.4:61.
- Beissbarth, T. (2006). Interpreting experimental results using gene ontologies //Methods Enzymol. 411: 340−352.
- Blaschke, C., A. Valencia (2002). Automatic ontology construction from the literature//Genome Inform. 13: 201−213.
- Blaschke, С., M. A. Andrade, C. Ouzounis, A. Valencia (1999). Automatic extraction of biological information from scientific text: protein-protein interactions //Proc Int Conf Intell Syst Mol Biol: 60−67.
- Bodenreider, O. (2004). The Unified Medical Language System (UMLS): integrating biomedical terminology //Nucleic Acids Res. 32: D267−270.
- Boeckmann, B. et al. (2003) The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003 //Nucleic Acids Res. 31(l):365−70.
- Boutet, E. et al (2007). UniProtKB/Swiss-Prot //Methods Mol Biol. 406: 89−112.
- Brill, E. (1995). Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging //Comput. Linguistics. 21.
- Bundschus, M. et al (2008). Extraction of semantic biomedical relations from text using conditional random fields //BMC Bioinformatics. 9: 207.
- Chabalier, J., J. Mosser, A. Burgun (2007). A transversal approach to predict gene product networks from ontology-based similarity//BMC Bioinformatics. 8: 235.
- Chang, J. Т., H. Schutze, R. B. Altman (2004). GAPSCORE: finding gene and protein names one word at a time //Bioinformatics. 20: 216−225.
- Chen, H., В. M. Sharp (2004). Content-rich biological network constructed by mining PubMed abstracts //BMC Bioinformatics. 5: 147.
- Chen, J. et al (2003). Biosynthesis of 3-O-sulfated heparan sulfate: unique substrate specificity of heparan sulfate 3-O-sulfotransferase isoform 5 //Glycobiology. 13: 785−794.
- Clegg, А. В., A. J. Shepherd (2008). Text mining //Methods Mol Biol. 453: 471 491.
- Couto, F.M., Silva, M.J., Coutinho, P.M. (2005) Semantic Similarity over the Gene Ontology: Family Correlation and Selecting Disjunctive Ancestors //Proc of the ACM Conference in Information and Knowledge Management as a short paper.
- Deerwester, S.C., Dumais, S.T., Landauer, Т.К., Furnas, G.W., Harshman, R.A. (1990). Indexing by latent semantic analysis //J. Inform. Sci. 41: 391−407.
- Donaldson, I. et al. (2003). PreBIND and Textomy—mining the biomedical literature for protein-protein interactions using a support vector machine //BMC Bioinformatics. 4: 11.
- Duan, Z. H. et al. (2006). The relationship between protein sequences and their gene ontology functions //BMC Bioinformatics. 7 Suppl 4: SI 1.
- Eisen, M. В., P. T. Spellman, P. O. Brown, D. Botstein (1998). Cluster analysis and display of genome-wide expression patterns //Proc Natl Acad Sci U S A. 95: 14 863−14 868.i
- Erhardt, R. A., R. Schneider, C. Blaschke (2006). Status of text-mining techniques applied to biomedical text //Drug Discov Today. 11: 315−325.
- Flybase Consortium. (2003). The FlyBase database of the Drosophila genome projects and community literature //Nucleic Acids Res. 31: 172−175.
- Formstecher, E. et al. (2005). Protein interaction mapping: a Drosophila case study //Genome Res. 15: 376−384.
- Fukuda, K., A. Tamura, T. Tsunoda, T. Takagi (1998). Toward information extraction: identifying protein names from biological papers //Рас Symp Biocomput: 707−718.
- Glenisson, P. et al. (2003). Evaluation of the vector space representation in text-based gene clustering //Рас Symp Biocomput: 391−402.
- Govorun, V. M., A. I. Archakov (2002). Proteomic technologies in modem biomedical science //Biochemistry (Mosc). 67: 1109−1123.
- Guo, X. et al. (2006). Assessing semantic similarity measures for the characterization of human regulatory pathways //Bioinformatics. 22: 967−973.
- Guo, X., C. D. Shriver, H. Ни, M. N. Liebman (2005). Analysis of metabolic and regulatory pathways through Gene Ontology-derived semantic similarity measures НАША Annu Symp Proc: 972.
- Harris, M. A. et al. (2004). The Gene Ontology (GO) database and informatics resource //Nucleic Acids Res. 32: D258−261.
- Harris, T. W. et al. (2003). WormBase: a cross-species database for comparative genomics //Nucleic Acids Res. 31: 133−137.
- He, M., Y. Wang, W. Li (2009). PPI finder: a mining tool for human protein-protein interactions //PLoS ONE. 4: 4554.
- Hoffmann, R., A. Valencia (2004). A gene network for navigating the literature //Nat Genet. 36: 664.
- Homayouni, R., K. Heinrich, L. Wei, M. W. Berry (2005). Gene clustering by latent semantic indexing of MEDLINE abstracts //Bioinformatics. 21: 104−115.
- Hsing, M., J. L. Bellenson, C. Shankey, A. Cherkasov (2004). Modeling of cell signaling pathways in macrophages by semantic networks //BMC Bioinformatics. 5: 156.
- Hunter, L., К. B. Cohen (2006). Biomedical language processing: what’s beyond PubMed? //Mol Cell. 21: 589−594.
- Jensen, L., J. Saric, P. Bork (2003). Utilizing literature for biological discovery //Proceedings of E-BioSci/ORIEL, Villa Monastero, Varenna, Italy
- Jensen, L.J. et al. (2009) STRING 8~a global view on proteins and their functional interactions in 630 organisms //Nucleic Acids Res. 37: D412−6.
- Jenssen, Т. K., A. Laegreid, J. Komorowski, E. Hovig (2001). A literature network of human genes for high-throughput analysis of gene expression //Nat Genet. 28: 21−28.
- Kanehisa, M., S. Goto (2000). KEGG: kyoto encyclopedia of genes and genomes //Nucleic Acids Res. 28: 27−30.
- Khatri, P. et al. (2005). A semantic analysis of the annotations of the human genome//Bioinformatics. 21: 3416−3421.
- Kim, W., A. R. Aronson, W. J. Wilbur (2001). Automatic MeSH term assignment and quality assessment //Proc AMIA Symp: 319−323.
- Klie, S. et al. (2008). Analyzing large-scale proteomics projects with latent semantic indexing //J Proteome Res. 7: 182−191.
- Krallinger, M., A. Valencia (2005). Text-mining and information-retrieval services for molecular biology //Genome Biol. 6: 224.
- Krallinger, M., A. Valencia, L. Hirschman (2008). Linking genes to literature: text mining, information extraction, and retrieval applications for biology //Genome Biol. 9 Suppl 2: S8.
- Landauer, Т.К., Laham, D., Derr, M. (2004) From paragraph to graph: latent semantic analysis for information visualization //Proc. Natl. Acad. Sci. 101:52 145 219.
- Lee, P. H., D. Lee (2005). Modularized learning of genetic interaction networks from biological annotations and mRNA expression data //Bioinformatics. 21: 2739−2747.
- Lei, Z., Y. Dai (2006). Assessing protein similarity with Gene Ontology and its use in subnuclear localization prediction //BMC Bioinformatics. 7: 491.
- Li, H., Y. Sun, M. Zhan (2007). Analysis of Gene Coexpression by B-Spline Based CoD Estimation //EURASIP J Bioinform Syst Biol. 49: 478.
- Lin, J., W. J. Wilbur (2007). PubMed related articles: a probabilistic topic-based model for content similarity//BMC Bioinformatics. 8: 423.
- Lord, P. W., R. D. Stevens, A. Brass, C. A. Goble (2003). Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation//Bioinformatics. 19: 1275−1283.
- Maglott, D., J. Ostell, K. D. Pruitt, T. Tatusova (2007). Entrez Gene: gene-centered information at NCBI //Nucleic Acids Res. 35: D26−31.
- Manning, С., H. Schutze (1999). Foundations of Statistical Natural Language Processing //.
- Мао, X., T. Cai, J. G. Olyarchuk, L. Wei (2005). Automated genome annotation and pathway identification using the KEGG Orthology (КО) as a controlled vocabulary//Bioinformatics. 21: 3787−3793.
- Marcotte, E. M., I. Xenarios, D. Eisenberg (2001). Mining literature for protein-protein interactions//Bioinformatics. 17: 359−363.
- Mika, S., B. Rost (2004). NLProt: extracting protein names and sequences from papers //Nucleic Acids Res. 32: W634−637.
- Mungall, C. J. (2004). Obol: integrating language and meaning in bio-ontologies //Comp Funct Genomics. 5: 509−520.
- Nadanaka, S., H. Kitagawa (2008). Heparan sulphate biosynthesis and disease //J1. Biochem. 144: 7−14.
- Nelson, D. R. (2006). Cytochrome P450 nomenclature, 2004 //Methods Mol Biol. 320:1−10.
- Newman, M. (2003). The structure and function of complex networks //SIAM Rev. 45: 167−256.
- Onogi, Y. (2007). Assigning categorical information to Japanese medical terms using MeSH and MEDLINE //Stud Health Technol Inform. 129: 694−698.
- Paul, M., A. Poyan Mehr, R. Kreutz (2006). Physiology of local renin-angiotensin systems //Physiol Rev. 86: 747−803.
- Petrak, J. et al. (2008). Deja vu in proteomics. A hit parade of repeatedly identified differentially expressed proteins //Proteomics. 8: 1744−1749.
- Pruitt, K. D., D. R. Maglott (2001). RefSeq and LocusLink: NCBI gene-centered resources //Nucleic Acids Res. 29: 137−140.
- Quentin, Y., J. Chabalier, G. Fichant (2002). Strategies for the identification, the assembly and the classification of integrated biological systems in completely sequenced genomes //Comput Chem. 26: 447−457.
- Raychaudhuri, S. (2006) Computational Text Analysis for Functional Genomics and Bioinformatics. Oxford University Press.
- Raychaudhuri, S., R. B. Altman (2003). A literature-based method for assessing the functional coherence of a gene group //Bioinformatics. 19: 396−401.
- Regev, Y., M. Finkelstein-Landau, R. Feldman (2003). Rule-based extraction of experimental evidence in the biomedical domain: The KDD Cup 2002 (task 1) //ACM SIGKDD Explorations Newsletter. 4: 90−92.
- Rogers D.J., Tanimoto T.T. (1960). A Computer Program for Classifying Plants. Science. 132, 1115−1118.
- Safran, M. et al. (2002). GeneCards 2002: towards a complete, object-oriented, human gene compendium//Bioinformatics. 18: 1542−1543.
- Settles, B. (2005). ABNER: an open source tool for automatically tagging genes, proteins and other entity names in text //Bioinformatics. 21: 3191−3192.
- Shi, M., D. Edwin, R. Menon (2002). A machine learning approach for the curation of biomedical literature-KDD Cup 2002 (task 1) //ACM SIGKDD Explorations Newsletter. 4: 93−94.
- Song, Y., E. Kim, G. G. Lee, В. K. Yi (2005). POSBIOTM-NER: a trainable biomedical named-entity recognition system //Bioinformatics. 21: 2794−2796.
- Spirin, V., M. S. Gelfand, A. A. Mironov, L. A. Mirny (2006). A metabolic network in the evolutionary context: multiscale structure and modularity //Proc Natl Acad Sci U S A. 103: 8774−8779.
- Stapley, B. J., G. Benoit (2000). Biobibliometrics: information retrieval and visualization from co-occurrences of gene names in Medline abstracts //Рас Symp Biocomput: 529−540.
- Sun, H. et al (2006). GOFFA: Gene Ontology For Functional Analysis A FDA Gene Ontology Tool for Analysis of Genomic and Proteomic Data //BMC Bioinformatics. 7 Suppl 2: S23.
- Ulitsky, I., R. Shamir (2007). Identification of functional modules using network topology and high-throughput data //BMC Syst Biol. 1: 8.
- UniProt Consortium. (2009). The Universal Protein Resource (UniProt) //Nucleic Acids Res. 37: D169−174.
- Wang, Y., Marsden, P.A. (1995) Nitric oxide synthases: gene structure and regulation //Adv. Pharmacol. 34:71−90.
- Wang, Y., P. A. Marsden (1995). Nitric oxide synthases: gene structure and regulation //Adv Pharmacol. 34: 71−90.
- Wang, Z., J. Zhang (2007). In search of the biological significance of modular structures in protein networks //PLoS Comput Biol. 3: 107.
- Wilbur, W., L. Coffee (1994). The Effectiveness of Document Neighboring in Search Enhancement //Inf. Process. Manage. 30: 253−266.
- Wu, С. H. et al (2003). The Protein Information Resource //Nucleic Acids Res. 31:345−347.
- Wu, X. et al (2006). Prediction of yeast protein-protein interaction network: insights from the Gene Ontology and annotations //Nucleic Acids Res. 34: 21 372 150.
- Xenarios, I. et al (2002). DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions //Nucleic Acids Res. 30: 303 305.
- Xu, D., D. Song, L. C. Pedersen, J. Liu (2007). Mutational study of heparansulfate 2-O-sulfotransferase and chondroitin sulfate 2-O-sulfotransferase //J Biol Chem. 282: 8356−8367.
- Zhao, J. et al. (2007). Modular co-evolution of metabolic networks //BMC Bioinformatics. 8: 311.
- Zheng, В., X. Lu (2007). Novel metrics for evaluating the functional coherence of protein groups via protein semantic network //Genome Biol. 8: R153.1. БЛАГОДАРНОСТИ