Chapter 3 Pairwise Alignment

格式：ppt
大小：3.30 MB
文档页数：78

下载文档原格式

03-BLAST(生物信息学国外教程2010版)

page 109
BLAST search output: top portion
database query
program
taxonomy
page 112
BLAST search output: taxonomy report summarizes species with matches
BLAST search output: graphical output
page 112
BLAST search output: tabular output
High scores low E values
Cut-off: .05? 10-10?
page 113
BLAST search output: alignment output
Outline of today’s lecture
Step 4: optional parameters
You can... • choose the organism to search • turn filtering on/off • change the substitution matrix • change the expect (e) value • change the word size • change the output format
page 109
(c) Query: human insulin NP_000198 Program: blastp Database: C. elegans RefSeq Option: conditional compositional score matrix adjustment
Note that the bit score, Expect value, and percent identity all change with the compositional score matrix adjustment

多重基因组序列的快速排比方法(33)

F a e M D s p eG ih]C t k(1/3 - 3/3)p e s G NSC 91-2213-E-002-129G898192731G x W j T u tD H G x W j T u t[Z(Kun-Mao Chao)(email: kmchao@.tw)K nw Aw wA b AA…¡C o[ R T]C A¬O§Ú-Ì²{¶¥¬q«E»Ý¤ÀªR»PÂk¯Çªº¸êC z L h C A i HC O u B M w]W h t L C M A o¦]²Õ§Ç¦C«Ü¤jªº¯S¦â´N¬O¥¦-Ì«D±`A u O@q A]O H U p P A pG H n u A¦b-pºâ®É¶¡¤ÎªÅ¶¡¤W¡A³£¬O¦æ¤£³qC p eD n N O]p@M i h]C n u Az L h]C AU a]cCc O H]C A N L]C(©C q)«t a P CA b o A i H o Bw C M A N o C PC o w A p@A N o F@Wh C C w N os@n A i C]]p F@h C Ru A i T a p h CC A]R F lb@q A H K oCG C R B p]BpAbstractDue to the advancement of genome sequencing technology, more and more genomic sequences have been determined. In the near future, the draft of human genomic sequence will be finished. World-widesequencing capacity is ramping up to the level of one vertebrate genome per year, and after the human and mouse genomes are completed it will turn to chicken,fish, rat, etc. These data, which essentially encode all the genetic information in life, will soon need to be analyzed and classified. By multiple sequence comparison, we are able to locate the conserved regions in the biological sequences. It can also be used to study gene regulation or even infer evolutionary trees. However, these genomic sequences are usuallyvery long. As the sequences are getting longer and longer, there is no doubt that time-efficient and space-saving strategies for multiple sequence alignments will become more and more important in the near future. The purpose of this project is to design a software tool for aligning multiple genomic sequences. It will be used to explore the structure and function of a whole genome sequence.Our idea is based on a given genomic sequence. We first use a very fast method to compare other sequences with the base sequence. Then we roughly determine their relative location. By pasting these sequences according totheir relativity, a simple multiple sequence alignment can be derived. We have implemented a simple multiplealignment program. We have also implemented an efficient algorithm that can accurately compute the score of a multiple sequence alignment. We haveadjusted the bias of the base sequence by extending the segments which were aligned together in the crude alignment. KeywordsSequence analysis,computational genomics, computational biology.We have surveyed the literatures relevant to the multiple sequence alignment problem. In particular, weare interested in the alignment methods dealing with long sequences. In large-scale sequencing projects, the task of converting experimental data into biologically relevant information requires a higher level of abstraction in sequence analysis. Therefore, we have also developed a prototype for genomicsequence visualization tools. A graphic interface allows the user to zoom into any specific area of the resulting alignment.We first compare the selected genomic sequence with all other given sequences. Then we develop a simple pasting program for converting these pairwise alignments into a tentativemultiple sequence alignment. Thepairwise alignments provide theinformation about the possible coherent multiple alignment columns in sequences. What we do here is more or less a pile-up procedure for aligning all sequences together. We first use a very fast method to compare other sequences with the base sequence. Then we roughly determine their relative location. By pasting these sequences according to their relativity, a crude multiple sequence alignment can be derived.To improve the quality of the multiple sequence alignment, a round-robin iterative improvement of a multiple alignment will be initiated in the next year. The improved alignment tool will be used to test some real-world data.We comprise software dedicated to the visualization of resulting alignments so that more biological meaningful information can be extracted. It will provide users a reliable data management system which allows the user to manipulate both the sequences as well as the resulting alignment. It will be a framework that allows several toolsto work together in a cooperative way under the user’s control. Automatic annotation of the alignment will give the users more valuable information.To improve the quality of the multiple sequence alignment, a round-robin iterative improvement of a multiple alignment is initiated. We start by pasting the alignments together, then repeatedly (1) delete an aligned fragment and (2) align that fragment with the remainder of the multiple alignment (using a variant of our yama2 procedure where we need to optimize based on the fact that one of the two alignments must be a single sequence). The improved alignment tool will be used to test some real-world data.We continue improving the alignment tool by other approaches. Specifically, we adjust the bias of the base sequence by extending the segments which were aligned together in the crude alignment. That way, we are able to compensate the situations where the segments are more similar to each other (longer local alignments) than they are to the base genomic sequence. The local alignments we find by iteratively improving the crude alignment created from the pairwise alignments with the base genomic sequence encompass these longer alignments in some way.m[1] Altschul, S., Gish, W., Miller, W., Myers,E. and Lipman, D. (1990) A basiclocal alignment search tool. J. Mol.Biol. 215, 403-410.[2] Altschul, S. and Lipman, D. (1989)Trees, stars, and multiple biologicalsequence alignment. SIAM J. Appl. Math. 49, 197-209.[3] Altschul, S., Madden, T. L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W. and Lipman, D. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research 25, 3389-3402. [4] Bassett, Jr. D.E., Eisen, M.B. andBoguski, M. S. (1999) Gene expression informatics – it’s all in your mine. Nature Genetics Supplement 21, 51-55. [5] Chao, K. -M. (1999) Calign: aligningsequences with restricted affine gap penalties. Bioinformatics, 15, 298-304. [6] Ephremides, A. and Hajek, B. (1998)Information theory and communication networks: an unconsummated union. IEEE Transactions on Information Theory 44, 2416-2434.[7] Eppstein, D., Gaili, Z., Giancarlo, R. andItaliano, G . (1992) Sparse dynamic programming I: linear cost functions. Journal of the ACM 39, 519-545.[8] Feng, D. and Doolittle, R. (1987)Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J. Mol. Evol. 25, 351-360.[9] Gusfield, D. (1997) Algorithms onstrings, trees, and sequences: computer science and computational biology. Cambridge University Press .[10] Lenhof, H. Morgenstern, B. andReinert, K. (1999) An exact solution for the segment-to-segment multiplesequence alignment problem. Bioinformatics 15, 203-210.[11] Medigue, C., Rechenmann, F.,Danchin, A. and Viari, A. (1999) Imagene: an integrated computer environment for sequence annotation and analysis. Bioinformatics 15, 2-15. [12] Morgenstern, B., Dress, A., andWerner, T. (1996) Multiple DNA and protein sequence alignment based on segment-to-segment comparison. Proc. Natl. Acad. Sci. 93, 12098-12103. [13] Morgenstern, B., Frech, K., Dress, A.and Werner, T. (1998) DIALIGN: finding local similarities by multiple sequence alignment. Bioinformatics 14, 290-294.[14] Mott, R. (1999) Local sequencealignments with monotonic gap penalties. Bioinformatics 15, 455-462. [15] Setubal, J. and Meidanis, J. (1997)Introduction to computational molecular biology. PWS Publishing Company . [16] Thompson, J. D., Higgins, D. G . andGibson, T. J. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Research 22, 4673-4680. [17] Z. Zhang, P. Berman and W. Miller(1998) Alignments without low-scoring regions. J. Computational Biology 5, 197-210.。

r语言序列比对语句

r语言序列比对语句在R语言中，进行序列比对可以使用Bioconductor包中的一些库，比如Biostrings和BSgenome。

这些包提供了一些函数和方法来进行序列比对操作。

首先，你需要安装这些包，可以使用以下命令安装Biostrings 和BSgenome包：R.if (!requireNamespace("BiocManager", quietly = TRUE))。

install.packages("BiocManager")。

BiocManager::install("Biostrings")。

BiocManager::install("BSgenome")。

一旦安装完成，你可以加载这些包并开始进行序列比对。

以下是一个简单的例子，假设你有两个DNA序列，想要比对它们：R.library(Biostrings)。

seq1 <DNAString("ATCGATCGATCG")。

seq2 <DNAString("ATCGATAGCTAG")。

# 使用pairwiseAlignment函数进行全局比对。

alignment <pairwiseAlignment(seq1, seq2)。

# 打印比对结果。

alignment.上面的代码中，我们首先加载了Biostrings包，然后创建了两个DNA序列seq1和seq2。

接下来，我们使用pairwiseAlignment函数对这两个序列进行全局比对，将结果存储在alignment变量中。

最后，我们打印了比对的结果。

除了全局比对外，还可以进行局部比对，使用不同的比对算法，设置不同的参数等。

Biostrings包提供了丰富的函数和方法来满足不同的比对需求。

总的来说，在R语言中进行序列比对可以通过Biostrings和BSgenome包来实现，这些包提供了丰富的功能和灵活的参数设置，可以满足不同的序列比对需求。

Clustal多重序列比对图解教程图解使用

C l u s t a l x多重序列比对图解教程(B y R a i n d y) 本帖首发于Raindy'blog软件简介:CLUSTALX－是CLUSTAL多重序列比对程序的Windows版本。

ClustalX为进行多重序列和轮廓比对和分析结果提供一个整体的环境。

序列将显示屏幕的窗口中。

采用多色彩的模式可以在比对中加亮保守区的特征。

窗口上面的下拉菜单可让你选择传统多重比对和轮廓比对需要的所有选项。

主要功能：你可以剪切、粘贴序列以更改比对的顺序；你可以选择序列子集进行比对；你可以选择比对的子排列(Sub-range)进行重新比对并可插入到原始比对中；可执行比对质量分析，低分值片段或异常残基将以高亮显示。

当前版本:1.83PS:如果你是新手或喜欢中文界面,推荐使用本人汉化的Clustalx1.81版链接地址:ist&ID=7435(请完整复制)应用:Clustalx比对结果是构建系统发育树的前提实例：植物呼肠孤病毒属外层衣壳蛋白P8(AA序列)为例流程：载入序列―>编辑序列―>设置参数―>完全比对―>比对结果1.载入序列：运行ClustalX，主界面窗口如下所图（图1），依次在程序上方的菜单栏选择“File”－“LoadSequence”载入待比对的序列，如图2所示，如果当前已载入序列，此时会提示是否替换现有序列(Replaceexistingsequences)，根据具体情形选择操作。

图1图22.编辑序列：对标尺(Ruler)上方的序列进行编辑操作，主要有Cutsequences(剪切序列)、Pastesequences(粘贴)、SelectAllsequences(选定所有序列)，ClearsequenceSelection(清除序列选定)、Searchforstring(搜索字串)、RemoveAllgaps(移除序列空位)、RemoveGap-OnlyColumns(仅移除选定序列的空位)图33.参数设置：可以根据分析要求设置相对的比对参数。

Chapter_3__Sequence_Analysis_of_Nucleic_Acid

（二） pairwise alignment
Alignment between two sequences.
1 Simple alignment
1）unconsidering gap，determine match score and mismatch score in advance，and then judge the similarity according to the scores.
该模型不考虑核苷酸之间的关联。双核苷酸的全部 16种组合中，两个碱基相邻的频率等于序列中两碱基的频率的乘积。 2. Markov Model（马尔可夫模型）
该模型认为，由4种碱基组成的一条DNA序列中，如果完全是随机的，那么任何一个字母后出现其它字母的频率都相同，如AA、AC、AG、AT出现的频率都相同（1/4）。
直系同源物ortholog：在进化上起源于同一祖先并垂直遗传
（vertical descent）的同源基因，在结构和功能上高度保守。
旁系同源物paralog是指同一基因组中由于祖先基因的加倍而
横向传递（horizontal transfer）产生的几个同源基因，即一个基因组中既有一定同源关系而又不十分相同的某些基因。
There are versions of BLAST for
searching nucleic acid and protein databases, which can be used to translate DNA sequences prior to comparing them to protein sequence databases.
3） Doolitter经验显著性检验 Doolitter针对蛋白质序列提出经验法则（1）若两序列的长度都大于100，在适当加入空位后，其配对的相同率达25%以上，则认为这两序列相关；若小于15%，不可能相关；若15%～25%，可能是相关的。

Clustalx 多重序列比对图解教程(图解使用)

Clustalx 多重序列比对图解教程(By Raindy)本帖首发于Raindy'blog,转载请保留作者信息，谢谢！欢迎有写生物学软件专长的战友，加入生信教程写作群：，接头暗号：你所擅长的生物学软件名称软件简介:CLUSTALX－是CLUSTAL多重序列比对程序的Windows版本。

Clustal X为进行多重序列和轮廓比对和分析结果提供一个整体的环境。

序列将显示屏幕的窗口中。

采用多色彩的模式可以在比对中加亮保守区的特征。

窗口上面的下拉菜单可让你选择传统多重比对和轮廓比对需要的所有选项。

当前版本:1.83PS:如果你是新手或喜欢中文界面,推荐使用本人汉化的Clustalx 1.81版链接地址::ist&ID=7435(请完整复制)应用:Clustalx比对结果是构建系统发育树的前提实例：植物呼肠孤病毒属外层衣壳蛋白P8(AA序列)为例流程：载入序列―>编辑序列―>设置参数―>完全比对―>比对结果1.载入序列：运行ClustalX，主界面窗口如下所图（图1），依次在程序上方的菜单栏选择“File”－“Load Sequence”载入待比对的序列，如图2所示，如果当前已载入序列，此时会提示是否替换现有序列(Replace existing sequences)，根据具体情形选择操作。

图1图22.编辑序列：对标尺(Ruler)上方的序列进行编辑操作，主要有Cut sequences(剪切序列)、Paste sequences(粘贴)、Select All sequences(选定所有序列)，Clear sequence Selection(清除序列选定)、Search for string(搜索字串)、Remove All gaps(移除序列空位)、Remove Gap-Only Columns(仅移除选定序列的空位)图33.参数设置：可以根据分析要求设置相对的比对参数。

第3章序列比对[1]

contents
3.1概述 3.2两条序列比对方法 3.3多条序列比对方法
3.1概述
3.1.1序列比对的概念 3.1.2生物序列之间的关系
3.1.1序列比对的概念
⑴序列比对（Sequence
alignment）
序列比对是序列相似性分析的常用方法，又称序列联配。通过将两个或多个核酸序列或蛋白序列进行比对，显示其中相似的结构域，这是进一步相似性分析的基础。通过比较未知序列与已知序列的一致性或相似性，可以预测未知序列功能。
Query: 181 catcaactacaactccaaagacacccttacacccactaggatatcaacaaacctacccac 240 |||||||| |||| |||||| ||||| | ||||||||||||||||||||||||||||||| Sbjct: 189 catcaactgcaaccccaaagccacccct-cacccactaggatatcaacaaacctacccac 247
一致性（identity）
Identity: The extent to which two (nucleotide or amino acid) sequences are invariant. 当两条序列同源时，它们的氨基酸序列或核苷酸序列通常有显著的一致性（identity）。一致性反映的是两个氨基酸序列（或核苷酸序列）之间相同的程度。因此，同源性是序列同源或不同源的一种论断，而一致性和相似性是一种描述序列相关性的量。
⑵同源性、相似性、一致性
同源性(homology)
Homology: Similarity attributed to descent from a common ancestor.

生物信息学基础——第三章

50% | PAM80
60% | PAM 60
PAM250
→ 14% - 27%
（v） BLOSUM矩阵
（Blocks Amino Acid Substitution Matrices）通过统计相似蛋白质序列的替换率得到的。PAM矩阵是从蛋白质序列的全局比对结果推导出来的，而BLOSUM矩阵是从蛋白质序列块比对而推导出来的。

序列比对的目的是寻找一个得分最大（或代价最小）的比对。
5、打分矩阵（Weight Matrices）(P87)
（1）核酸打分矩阵设DNA序列所用的字母表为 = { A，C，G，T }
a. 等价矩阵（相同核苷酸得分为1，不同核苷酸替换得分为0） b. BLAST矩阵（相同核苷酸得分为+5，不同核苷酸得分为-4） c. 转移矩阵（transition，transversion）（嘌呤：腺嘌呤A，鸟嘌呤G；嘧啶：胞嘧啶C，胸腺嘧啶T）
BLOSUM 62
第二节两两比对算法
1、序列两两比对基本算法直接方法 — 生成两个序列所有可能的比对，分别计算代价函数，然后挑选一个代价最小的比对作为最终结果。本质问题：优化动态规划寻优策略动态规划算法（Dynamic Programming）（P93）
（iii）疏水矩阵根据氨基酸残基替换前后疏水性的变化得到的矩阵。如果氨
基酸A被氨基酸B替换后，疏水性变化不大则替换得分高，否则替换得分低。
（iv）PAM矩阵（Point Accepted Mutation）统计自然界中各种氨基酸残基的相互替换率。如果两种特定
的氨基酸之间替换发生得比较频繁，则这一对氨基酸在得分矩阵中的互换得分就高。 PAM矩阵基于进化原理，建立在进化的点接受突变模型基础上，通过统计相似序列中的各种氨基酸替换发生率而得到的矩阵。

Pairwise Sequence Alignment

6
Alignment
The process of lining up two or more sequences to achieve maximal levels of identity (and conservation, in the case of amino acid sequences) for the purpose of assessing the degree of similarity and the possibility of homology.
2
提纲
序列比对的基本概念序列比对的基本方法动态规划算法的基本算法原理序列比对中的评分矩阵序列比对的统计分析利用NCBI的Blast2Seq进行成对序列比较
3
成对序列比对是最基本的生物信息学的计算
用于确定两个蛋白质(或基因)结构或功能上是否相关用于识别蛋白质间共有的保守的domain 是利用BLAST (下节内容)进行生物序列数据库搜索的基础用于基因组的分析用于蛋白质三维结构的预测 ……
Similarity (相似性)
The extent to which nucleotide or protein sequences are related. It is based upon identity plus conservation.
9
Pairwise alignment of retinol-binding protein and b-lactoglobulin
13
14
几个重要概念(三)
two types of homologous sequence
1
extension gap Open gap MKWVWALLLLAAWAAAERDCRVSSFRVKENFDKARFSGTWYAMAKKDPEG 50 RBP

生物信息学及常用工具简介

中心研究方向
基因组注释芯片数据分析
与实验室密切相关的研究和支持
为蛋白质组学研究提供生物信息学支持
应用医学生物信息学
基于本体论的数据仓库系统基因组转录组蛋白质组代谢组
主要内容
多序列联配（Alignment)和进化树分析 PCR引物及芯片探针的设计使用软件在数据库中检索、收集、整理文献 BLAST应用简介序列片段的拼接基因注释：编码蛋白区域的预测 NCBI的数据库代谢途径分析数据库（KEGG）蛋白质分析数据库（uniprot）比较基因组的方法目标基因的分析流程
/outorder=order /tree /newtree=tree
♦ 蛋白质结构与功能预测
序列数据选取
1. 生物实验中获取或收集的相关基因或蛋白序列 2. 利用NCBI Entrez，SRS（Sequence Retrieve System）获取序列 3. 利用同源搜索工具BLAST，从公共数据库中搜索与自身相关序列
▼ Jackknife
不将剩下的一半序列补齐，只生成一个缩短了一半的新序列。
▼ Permute
其目的与Bootstrap和Jackknife法不同，不常用。
为什么树不一致？
1、数据选取不充分 2、基因或蛋白质序列选择 3、测序中序列错误 4、分析方法的选择
PHYLIP
PHYLIP （ Phylogeny Inference Package ）(Joseph Felsenstein等，1986-1995）由华盛顿大学遗传系开发，1980 年首次公布，免费共享，包括35个独立程序，目前的版本是3.6。下载地址： ftp:///pub/phylip/ 标准C语言开发，有Windows、 Macintosh，Linux/UNIX等版本。 Windows： phylipw3.6source.exe、 phylipwx3.6executables.exe，

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

Pairwise alignment: protein sequences can be more informative thaninformative (20 vs 4 characters); many amino acids share related biophysical properties
RBP: 26 RVKENFDKARFSGTWYAMAKKDPEGLFLQDNIVAEFSVDETGQMSATAKGRVRLLNNWD+ K ++ + + + GTW++ MA + L + A V T + +L+ W+ QTKQDLELPKLAGTWHSMAMA-TNNISLMATLKAPLRVHITSLLPTPEDNLEI V LHRWEN Glycodelin: 23

codons are degenerate: changes in the third position often do not alter the amino acid that is specified
protein sequences offer a longer “look-back” time DNA sequences can be translated into protein, and then used in pairwise alignments
for the purpose of assessing the degree of similarity and the possibility of homology.
Query: catcaactacaactccaaagacacccttacacccactaggatatcaacaa |||||||| |||| |||||| ||||| | ||||||||||||||||||||| Sbjct: catcaactgcaaccccaaagccacccct-cacccactaggatatcaacaa
6. Conservation
Changes at a specific position of an amino acid or (less commonly, DNA) sequence that preserve the physico-chemical properties of the original residue.
Chapter 3
Pairwise Alignment
Pairwise alignment : outline
Ⅰ. Overview and examples
Ⅱ. Definitions: homologs, paralogs, orthologs
Ⅲ. How to Score the Similarity –Scoring Matrix Ⅳ. Assigning scores to aligned amino acids: Dayhoff’s PAM matrices Ⅴ. Assigning scores to aligned amino acids: BLOSUM Matrices Ⅵ. Alignment algorithms: Needleman-Wunsch, SmithWaterman
Ⅱ. Definitions: homologs, paralogs, orthologs
1. Homology
Similarity attributed to descent from a common ancestor.
2. Orthologs
Homologous sequences in different species that arose from a common ancestral gene during speciation; may or may not be responsible for a similar function.
4. Significance of sequence alignment
Sequence alignment is a central problem and the most fundamental operation of Bioinformatics:

It is used to decide if two proteins (or genes) are related structurally or functionally Structure, function, evolution… It is used to identify domains or motifs that are shared between proteins It is used in the analysis of genomes Coding region, motif, SNP, genome assembly… It is the basis of database searching tools (e.g. BLAST)
3. Paralogs
Homologous sequences within a single species that arose by gene duplication.
4. Identity
The extent to which two (nucleotide or amino acid) sequences are invariant.
Early alignments revealed --differences in amino acid sequences between species --differences in amino acids responsible for distinct functions
3. Definition of sequence alignment

Multiple Sequence Alignment
~~~~~EIQDVSGTWYAMTVDREFPEMNLESVTPMTLTTL.GGNLEAKVTM LSFTLEEEDITGTWYAMVVDKDFPEDRRRKVSPVKVTALGGGNLEATFTF TKQDLELPKLAGTWHSMAMATNNISLMATLKAPLRVHITSEDNLEIVLHR VQENFDVNKYLGRWYEIEKIPTTFENGRCIQANYSLMENGNQELRADGTV VKENFDKARFSGTWYAMAKDPEGLFLQDNIVAEFSVDETGNWDVCADGTF LQQNFQDNQFQGKWYVVGLAGNAI.LREDKDPQKMYATIDKSYNVTSVLF VQPNFQQDKFLGRWFSAGLASNSSWLREKKAALSMCKSVDGGLNLTSTFL VQENFNISRIYGKWYNLAIGSTCPWMDRMTVSTLVLGEGEAEISMTSTRW PKANFDAQQFAGTWLLVAVGSACRFLQRAEATTLHVAPQGSTFRKLD...
1. Learning objectives

Define homologs, paralogs, orthologs Perform pairwise alignments (NCBI BLAST)
Understand how scores are assigned to aligned amino acids using Dayhoff’s PAM matrices Explain how the Needleman-Wunsch algorithm performs global pairwise alignments
Query: 181 catcaactacaactccaaagacacccttacacccactaggatatcaacaaacctacccac 240 |||||||| |||| |||||| ||||| | ||||||||||||||||||||||||||||||| Sbjct: 189 catcaactgcaaccccaaagccacccct-cacccactaggatatcaacaaacctacccac 247
Ⅰ. Overview and examples
1. Learning objectives 2. Early pairwise alignments 3. Definition of sequence alignment 4. Significance of sequence alignment

Pairwise alignment: protein sequences can be more informative than DNA • Many times, DNA alignments are appropriate --to confirm the identity of a cDNA --to study noncoding regions of DNA --to study DNA polymorphisms --example: Neanderthal（穴居人）vs modern human DNA

2. Early pairwise alignments
促肾上腺皮质素 b-corticotropin (sheep)
Corticotropin A (pig) (催产素 )Oxytocin (加压素 )Vasopressin CYIQNCPLG CYFQNCPRG ala gly glu asp asp glu asp gly ala glu asp glu

Pair-wise Sequence Alignment
Query: catcaactacaactccaaagacacccttacacccactaggatatcaacaa |||||||| |||| |||||| ||||| | ||||||||||||||||||||| Sbjct: catcaactgcaaccccaaagccacccct-cacccactaggatatcaacaa