Методы выравнивания биологических последовательностей, не использующие штрафы за делеции
Диссертация
Показано, что наилучшее по точности выравнивание Смита-Ватермана в случае слабогомологичных последовательностей и использования матриц семейства РАМ получается при значении параметра СЕР = 1.0. Построена база данных эталонных выравниваний Р11ЕЕАВ-Р, которая может быть использована для оценки точности различных алгоритмов парного глобального выравнивания последовательностей. Предложена новая… Читать ещё >
Список литературы
- NeedlemanS. В., WunschC. D. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 1970, vol. 48, pp. 443−453
- Smith T. F., Waterman M.S. Identification of common molecular subsequences. J. Mol. Biol. 1981, vol. 147, pp. 195−1973. http://server2.1pm.org.ru/static/download/parca/4. http://server2.1pm.org.ru/bio/pareto/5. http://server2.lpm.org.ru/static/prefab-p/
- Ройтберг M. A. IJapemo-оптимальные выравнивания символьных последовательностей. Пущино: ОНТИ НЦБИ, 1994. Препринт, 10 с.
- Ройтберг М. А., Семионенков М. Н., Таболина О. Ю. Парето-опти-мальные выравнивания биологических последовательностей. Биофизика. 1998, Т. 44, № 4, стр. 581−594
- Pareto V. Manual of political economy. New York: A. M. Kelley, 1972
- AsthanaS., RoytbergM., StamatoyannopoulosJ., SunyaevS. Analysis of sequence conservation at nucleotide resolution. PLoS Comput Biol. 2007, vol.3, N. 12, p. 254
- Корзинов О. M., Астахова Т. В., Власов П. К., Ройтберг М. А. Статистический анализ участков ДНК в окрестности сайтов сплайсинга. Молекулярная биология. 2008, т. 42, № 1, стр. 150−162
- LeskA.M. Introduction to protein architecture. Oxford, N.Y.: Oxford Univ. press. 2001. P.360
- PirovanoW., FeenstraK. A., HeringaJ. The meaning of alignment: lessons from structural diversity. BMC Bioinformatics. 2008, vol. 9, p. 556
- KisterA.E., RoytbergM.A., ChothiaC., Gelfandl.M. The sequence determinants of cadherin molecules. Protein Science. 2001, vol. 10, pp. 1801−1810
- NotredameC., HigginsD.G., HeringaJ. T-Coffee: a novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 2000, vol. 302, pp. 205−217
- Thompson J. D., HigginsD.G., Gibson T.J. CLUSTALW: improving the sensitivity of progressive multiple sequence alignment through sequence eighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994, vol. 22, pp. 473−480
- KatohK., KumaK., TohH., MiyataT. MAFFT version5: Improvement in Accuracy of Multiple Sequence Alignment. Nucleic Acids Research. 2005, vol.3, pp. 511−518
- Edgar R. C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research. 2004, vol. 32, N. 5, pp. 1792−1797
- Edgar R. C, BatzoglouS. Multiple Sequence Alignment. Current Opinion in Structural Biology. 2006, vol. 16, pp. 368−373
- Russell D., OutH., SayoodK. Grammar-based distance in progressive multiple sequence alignment. BMC Bioinformatics. 2008, vol. 9, p. 306
- Rice P., LongdenL, BleasbyA. EMBOSS: The European Molecular Biology Open Software Suite Trends in Genetics 2000, vol. 16, N. 6, pp. 276−277
- NazipovaN. N., ShabalinaS. A., Ogurtsov A. Yu., Kondrashov A. S., RoytbergM.A., BuryakovG.V., VernoslovS.E. SAMSON: a software package for the biopolymer primary structure analyses. Comput. Appl. Biosci. 1995, vol. 11, N. 4, pp. 423−426
- ЛевенштейнВ. И. Двоичные коды с исправлением выпадений, вставок и замещений символов. Доклады Академий Наук СССР. 1965. Т. 163, № 4. С. 845−848
- UlamS.M. Some Ideas and Prospects in Biomathematics. Annu Rev Biophys Bioeng. 1972, vol. 1, p. 277
- KruskalJ.B. An Overview of Sequence Comparison: Time Warps. String Edit and Macromoleculas. SIAM Rev. 1983, vol. 25, N. 26, pp. 201−238
- RochkindM. J. The Source Code Control System. IEEE Transactions on Software Engineering. 1975, vol. 1, N. 4, pp. 364−370
- Waterman M.S. Introduction to Computational Biology. London: Chapman and Hall Press. 1995
- Cartwright. R. A. Logarithmic gap costs decrease alignment accuracy. BMC Bioinformatics. 2006, vol. 7, p. 527
- GonnetG. H, Cohen M. A, BennerS. A. Exhaustive matching of the entire protein sequence database. Science. 1992, vol. 256, N. 5062, pp. 1443−1445
- ZvelebilM., BaumJ.O. Understanding bioinformatics. London: Garland Science. 2007. 800 p.
- Waterman M.S. Sequence alignments in the neighborhood of the optimum with general application to dynamic programming. PNAS. 1983, vol. 80, N. 10, pp. 3123−3124
- ByersT. M., Waterman M.S. Determining all optimal and near-optimal solutions when solving shortest path problems by dynamic programming. Oper Res. 1984, vol. 32, pp. 1381−1384
- VingronM., ArgosP. Determination of reliable regions in protein sequence alignments. Protein Engineering. 1990, vol. 3, N. 7, pp. 565−569
- ZukerM. Suboptimal sequence alignment in molecular biology. Alignment with error analysis. J. Mol. Biol. 1991, vol.221, N. 2, p. 403
- Fitch W. M., Smith T. F. Optimal sequence alignments. Proc. Natl. Acad. Sci. USA. 1983, vol.80, pp. 1382−1386
- Waterman M. S., EggertM., Lander E. Parametric sequence comparisons. Proc.Nat. Acad. Sci. USA. 1992, vol.89, pp. 6090−6093
- Fernandez-Baca. D., SrinivasamS. Constructing tile minimization diagram of a two-parameter problem. Operat. Res. Letters. 1991, vol.10, pp. 87−93
- GusfieldD., BalasubramianK., NaorK. Proc. 3rd Ann. ACM-SIAM Discrete Algorithms. 1992, pp. 432−439
- Bellman R. On the theory of dynamic programming. Proc. Nat. Acad. Sci. U.S.A. 1952, vol.38, pp. 716−719
- KleeneS.C., Representation of events in nerve nets and finite automata. Shannon C.E., McCarthy J. (Eds.), Automata Studies, Princeton University Press. 1956. P. 3−41
- McNaughtonR., YamadaH. Regular expressions and state graphs for automata. IRE Transactions on Electronic Computer. 1960, vol.9, N. 1. pp.39−47
- Kramers H. A., Wannier G. H. Statistics of the one-dimensional ferromagnet. Phys. Rev. 1941, vol.60, pp. 252−276
- AhoA., HopcroftJ., UlmanJ. The design and analysis of computer algorithms. Addison-Wesley, Reading, MA., USA. 1974. P.470
- GelfandM.S., Podolsky L. I., AstakhovaT. V., RoytbergM.A. Prediction of the exon-intron structure and multicriterial optimization. Bioinformatics and Genome Research (H.A.Lim, C.R.Cantor, eds.). World Scientific Publ. Co., Singapore. 1995. PP. 173−183
- RoytbergM.A., AstahovaT.V., GelfandM.S. Combinatorial approaches to gene recognition. Computers and Chemistry. 1998, vol. 1, N. 21, pp. 229−236
- RamenskyV.E., MakeevV.Ju., RoytbergM.A., Tumanyan.V.G. DNA segmentation through the Bayesian approach. J. Comput. Biol. 2000, vol. 7, N. 1−2, p. 215−31
- Ramensky V.E., MakeevV. Y., RoytbergM. A., TumanyanV. G. Segmentation of long genomic sequences into domains with homogeneous composition with BASIO software. Bioinformatics. 2001, vol. 17, N. 11, pp. 1065−1066
- Vogt G., EtzoldT., ArgosP. An assessment of amino acid exchange matrices in aligning protein sequences: the twilight zone revisited. J. Mol. Biol. 1995, vol. 249, pp. 816−831
- GribskovM, VeretnikS. Identification of sequence pattern wit hprofile analysis. Methods Enzymol. 1996, vol. 266, pp. 198 -212
- Sunyaev S. R., Bogopolsky G. A., OleynikovaN. V., VlasovP.K., FinkelsteinA.V., RoytbergM.A. From analysis of protein structural alignments toward a novel approach to align protein sequences. Proteins. 2004, vol. 54, pp. 569−582
- Huang X., ChaoK.-M. A generalized global alignment algorithm. Bioinformatics. 2003, vol. 19, no. 2, pp. 228−233
- DayhoffM. O., Schwartz R. M., Orcutt B. C. A model of evolutionary change in proteins. Atlas of Protein Sequence and Structure. 1978, vol. 5, suppl. 3, pp.345−35 252. http://www.biorecipes.com/Dayhoff/code.html
- Polyanovsky V., Roytberg M., Tumanyan V. Reconstruction of genuine pair-wise sequence alignment. Computational Biology. 2008, vol. 15, N. 4. pp. 379−391
- HenikoffS., HenikoffJ. G. Amino Acid Substitution Matrices from Protein Blocks. PNAS. 1992, 89(22), pp. 10 915−10 919
- Smith R. F. and Smith T.F. Pattern-induced multi-sequence alignment (PIMA) algorithm employing secondary structure-dependent gap penalties for use in comparative protein modelling. Protein Eng. 1992, vol. 5, pp. 35−41
- DeperieuxE., BaudouxG., BriffeuilP., ReginsterL, DeBolleX., VinalsC. and FeytmansE. MATCH-BOX server: a multiple sequence alignmenttool placing emphasis on reliability. Comput. Appl. BioSci. 1997, vol. 13, pp.249−256
- Thompson J. D., GibsonT. J, PlewniakF., JeanmouginF. and HigginsD.G. The CLUSTALX windows interface: flexible strategies for multiple sequence aligment aided by quality analysis tools. Nucleic Acids Res. 1997, 24, pp. 4876−4882
- Edgar R. C. Quality measures for protein alignment benchmarks. Nucleic Acids Res. 2010, vol. 38, pp. 2145−2153
- BermanH. M. et al. The Protein Data Bank. Nucleic Acids Res. 2010, vol. 28(1), pp. 235−242
- Cochrane G. el.al. Petabyte-scale innovations at the European Nucleotide Archive. Nucleic Acids Research. 2009, vol. 37, pp. 19−2561. http: //www. uniprot. org/
- MurzinA. G., Brenner S.E., Hubbard T. and ChothiaC. SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 1995, 247, pp. 536−540
- OrengoC., MichieA., JonesS., Jones D., Swindells M. and Thornton J. CATH a hierarchic classification of protein domain structures. Structure.1997, vol. 5, pp. 1093−110 864. http://www.drive5.com/muscle/prefab.htm
- Thompson J. D., PlewniakF., PochO. BAliBASE: a benchmark alignment database for the evaluation of multiple alignment programs. Bioinformatics.1998, vol. 15, pp. 87−88
- VanWalleL, LastersL, WynsL. SABmark a benchmark for sequence alignment that covers the entire known fold space. Bioinformatics. 2005, vol.21, pp. 1267−1268
- MizuguchiK., DeaneC.M., BlundellT. L., Overington J. P. HOMSTRAD: a database of protein structure alignments for homologous families. Protein Sci. 1998, vol. 7, pp. 2469−2471
- RaghavaG. P., SearleS.M., AudleyP.C., Barber J. D., Barton G.J. OXBench: a benchmark for evaluation of protein multiple sequence alignment accuracy. BMC Bioinformatics. 2003, vol. 4, p. 4769. http://www.ncbi.nlm.nih.gov/BLAST/tutorial/Altschul-1.html
- SadreyevR. and GrishinN. COMPASS: a tool for comparison of multiple protein alignments with assessment of statistical significance. J. Mol. Biol. 2003, N. 326, pp. 317−336
- Edgar R. С. and SjolanderK. A comparison of scoring functions for protein sequence profile alignment. Bioinformatics. 2004, DOI: 10.1093/bioinformatics/bth090
- HolmL. and Sander C. Touring protein fold space with Dali/FSSP. Nucleic Acids Res. 1998, 26, pp. 316−319
- Shindyalovl. N. and Bourne P. E. CE: A Resource to Compute and Review 3-D Protein Structure Alignments. Nucleic Acid Research. 2001, 29(1), pp.228−229
- Астахова Т. В., Поверенная И. В., Ройтберг М. А, Яковлев В.В. Верификация базы эталонных выравниваний PREFAB. Биофизика. 2012, Т. 57, № 2, стр. 205−211
- PoverennayaL, LobanovM., YacovlevV., RoytbergM. Using of PREFAB for analysis of amino-acid sequence alignment algorhitms. Proc, MCCMB’ll. p. 327
- К. Дж. Дейт Введение в системы баз данных 7-е изд. -М.: Вильяме, 2001. — С. 1072
- Waterman М. S. Mathematical methods for DNA sequences. Boca Raton, FL: CRC Press, 198 978. http://www.python.org/79. http://www.boost.org/
- Tanenbaum A. S. Operating Systems: Design and Implementation. -Prentice Hall, 1987. 719 P.
- Робачевский A. M. Операционная система UNIX. -СПб.: БХВ, 2002. -С. 52 682. http://fasta.bioch.Virginia.edu/fastawww2/fastalist2.shtml
- Pearson WR, Lipman DJ. Improved tools for biological sequence comparison. Proc Natl Acad Sci USA. 1988, vol 85. pp. 2444−2448
- Яковлев В. В., Ройтберг М. А. Увеличение точности глобального выравнивания аминокислотных последовательностей с помощью построения набора выравниваний-кандидатов. Биофизика. 2010, Т. 55, № 6, стр. 965−975
- Поверенная И. В., Ройтберг М. А., Яковлев В. В. Эффективность использования программы PARCA для глобального выравнивания аминокислотных последовательностей. Информационные процессы. 2011, Т. 11, № 4, стр. 510−519