![]() PSA aligns each pair of sequences once at a time. Of which, PSA and MSA are most widely used. Various kinds of methods have been proposed for creating an alignment, including pair-wise sequence alignment (PSA), multiple sequence alignments (MSA), profile-based methods, prediction-based methods, and structure-based methods, etc. Such information got from sequence alignment analyses could be used to map onto protein 3D structure and help deducing potential function of the protein. Sequence alignment could detect motifs and important functional or structural residues such as binding sites, etc. In the field of database query, protein sequence alignment algorithms such as BLAST, FASTA, dynamic programming methods and other methods enable researchers to compare a query protein sequence with databases or library to get similar sequences of the input sequence. Many protein databases covered protein family information had been built based on sequence alignments such as PROSITE, Pfam, and ProDom, etc. As a result, protein sequence alignments analyses become a crucial step for many bioinformatics analysis studies during the past decades. Protein sequence alignments could identify regions of similarity that may reflect biological relationships among the input sequences. Protein sequence alignments, as an effective and intuitive way of identifying homologous regions among sequences, play a fundamental role in various biomedical researches such as database construction and query, prediction of protein structure and function, etc. These results validated that the drawbacks of MSA methods revealed in nucleotide level also existed in protein sequence alignment analyses and affect the accuracy of results. Analyses on the 80 re-sampled benchmark datasets constructed by randomly choosing 90% of each dataset 10 times showed similar results. Results showed that PSA methods performed better than MSA methods on most of the BAliBASE benchmark datasets. This strategy could avoid the influence brought by different clustering methods thus make results more dependable. Compared with former studies, we calculate the cluster validity score based on sequence distances instead of clustering results. This new framework directly reflects the biological ground truth of the application scenarios that adopt sequence alignments, and evaluates the alignment quality according to the achievement of the biological goal, rather than the comparison on sequence level only, which averts the biases introduced by alignment scores or manual alignment templates. To test whether similar drawbacks also influence protein sequence alignment analyses, we propose a new benchmark framework for protein clustering based on cluster validity. Former benchmark studies revealed drawbacks of MSA methods on nucleotide sequence alignments. Multiple sequence alignment (MSA) and pair-wise sequence alignment (PSA) are two major approaches in sequence alignment. Protein sequence alignment analyses have become a crucial step for many bioinformatics studies during the past decades. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |