BWA-概述
- 格式:pdf
- 大小:1.88 MB
- 文档页数:35
bwa参数BWA是一种广泛使用的基因组比对工具,可以用于比对短读序列到参考基因组上。
在进行BWA比对时,需要设置一些参数来调整比对的灵敏度和特异性。
本文将详细介绍BWA的参数及其作用。
一、BWA的三种算法在介绍BWA参数之前,先简单介绍一下BWA的三种算法:BWA-MEM、BWA-SW和BWA-ALN。
1. BWA-MEMBWA-MEM是目前最常用的算法,适用于比对Illumina测序数据。
它采用了一种后缀数组(suffix array)索引结构来加速比对过程,并且支持gap open和gap extension操作,可以处理较大的indels。
2. BWA-SWBWA-SW适用于长reads(如PacBio或Nanopore),它采用了Smith-Waterman算法来进行局部比对,可以处理较大的indels。
3. BWA-ALNBWA-ALN是最早开发出来的算法,适用于短reads(如Illumina),它采用了一种叫做Burrows-Wheeler Transform(BWT)的压缩技术来实现快速比对。
二、常见参数解释1. -t/--threads <int>指定线程数,默认为1。
如果有多个核心可用,可以设置为更高的数字以加速比对过程。
2. -k/--kmer-length <int>指定k-mer的长度,默认为19。
k-mer是指序列中的连续k个碱基,BWA-MEM使用了较长的k-mer来提高比对速度和特异性。
3. -M/--mark-secondary标记次级比对(secondary alignment)。
当一个read有多个可能的比对位置时,BWA会选择最好的那个作为主要比对,其他位置则被标记为次级比对。
这个参数可以让BWA输出所有可能的比对位置。
4. -R/--RG <string>指定read group信息。
read group是指一组具有相同测序条件和样本来源的reads,可以用来区分不同样本或测序批次。
文章标题:深度解析bowtie和bwa的工作原理在生物信息学领域,bowtie和bwa是常用的序列比对工具,它们在基因组学研究和生物信息分析中起着至关重要的作用。
本文将深入探讨bowtie和bwa的工作原理,以帮助读者更加全面、深入地理解这两种工具的内在机制。
1. 简介在介绍bowtie和bwa的工作原理之前,首先需要了解它们的基本概念和作用。
bowtie是一种用于短序列比对的工具,它能够高效地对DNA序列进行比对和定位;而bwa则是一种广泛应用于DNA和RNA序列比对的工具,具有较高的比对准确性和速度。
这两种工具对于基因组序列的分析和解读具有重要意义。
2. bowtie的工作原理2.1 索引建立bowtie的工作原理首先涉及到索引的建立。
在进行比对之前,需要对参考基因组进行索引构建,以便于快速准确地进行比对。
索引建立的过程包括选择合适的索引类型和参数,以及对参考基因组进行分割和编码。
2.2 比对算法bowtie采用的是贪婪算法和回溯算法相结合的方法进行比对。
在比对过程中,bowtie通过匹配查询序列和参考基因组序列,找到最佳的比对结果。
其核心思想是在保证比对准确性的前提下,尽可能地提高比对的速度。
3. bwa的工作原理3.1 空间索引与bowtie不同,bwa采用空间索引的方式进行比对。
空间索引是一种高效的数据结构,能够大大加速比对过程。
bwa利用空间索引快速定位查询序列在参考基因组中的位置,从而实现快速而准确的比对。
3.2 Smith-Waterman 算法在比对过程中,bwa采用Smith-Waterman算法进行序列比对。
这种算法能够在保证比对准确性的充分考虑序列的相似性和差异性,从而有效地进行序列比对和定位。
4. 个人观点bowtie和bwa作为序列比对工具,在基因组学和生物信息学研究中具有重要的地位。
它们的工作原理涉及到索引建立、比对算法和数据结构的应用,是生物信息学领域中的重要技术之一。
主要内容IEEE 802.11标准WiFi技术IEEE 802.11基本结构和服务 IEEE 802.11物理层技术 IEEE 802.11MAC层技术 IEEE 802.11的增强技术 WiFi组网技术张天魁 zhangtiankui@ 北京邮电大学 信息与通信工程学院12IEEE 802 系列标准802.11协议栈34802.11协议族PHY802.11(1/2 Mbps)802.11物理层协议分类802.11物理层协议 MAC802.11/11a/11b/11g MAC 802.11e — QoS802.11 2.4GHz频段 (1/2Mbps)802.11b 2.4GHz频段 (1/2/5.5/11Mbps)802.11a 5GHz频段 (~54Mbps)802.11g 2.4GHz频段 (~54Mbps)802.11n 2.4G/5GHz频段 (~300+Mbps)802.11b(5.5/11 Mbps) 802.11h —动态调整 802.11g(54 Mbps) 802.11a(54 Mbps) 802.11n(300 Mbps) 802.11i —安全增强 802.11f —漫游和切换 FHSS:跳频扩频(Frequency Hopping Spread Spectrum) 802.11s — mesh DSSS:直接序列扩频( Direct Sequence Spread Spectrum ) IR:红外( Infrared ) CCK:补码键控(Complementary Code Keying) OFDM:正交频分复用( Orthogonal Frequency Division Multiplexing )FHSS DSSSIRDSSS+CCKOFDM DSSS+CCK OFDMOFDM56802.11的技术演变270+ Mbps (802.11n) 54+ MbpsIEEE 802.11标准IEEE 802.11, 1999 Edition (ISO/IEC 8802-11: 1999) Information technology – Telecommunications and information exchange between systems – Local and metropolitan area networks – Specific requirements – Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications 下载: /getieee802/802.11.html20 Mbps(802.11a/g)传 输 速 率230 Kbps 19.2 Kbps11 Mbps 1 -2 Mbps (802.11)(802.11g)(802.11b)Narrow DSSS BandFHSSDSSSOFDMOFDM+MIMO78IEEE 802.11标准(续)MAC层技术基于CSMA/CA和四次握手机制 物理层使用2.4GHz频段 – 跳频扩频技术(Frequency Hopping spread spectrum, FHSS)• 速率:1 , 2 Mbps • 调制:2-GFSK,4-GFSKIEEE 802.11b标准IEEE 802.11b-1999 Supplement to 802.111999,Wireless LAN MAC and PHY specifications: Higher speed Physical Layer extension in the 2.4 GHz band IEEE 802.11b 工作在2.4GHz频段– 调制:直接序列扩频(DSSS),补码键控 (Complementary Code Keying,CCK) – 速率:动态传输速率,允许数据速率根据噪音状况在1 Mbps、2 Mbps、5.5 Mbps、11 Mbps等范围内自行调整– 直接序列扩频技术(Direct sequence spread spectrum, DSSS)• 速率:1, 2 Mbps • 调制:DBPSK, DQPSK– 红外技术(Infrared)• 速率:1,2Mbps • 调制:16- PPM,4- PPM早期广泛应用的WiFi技术就是指802.11b技术 9 10IEEE802.11a标准IEEE 802.11a-1999 Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) specifications : High-speed Physical Layer in the 5 GHz band 扩充了标准的物理层– 5GHz 的频段 – 调制:正交频分调制(Orthogonal Frequency Division Multiplexing, OFDM) – 速率:6、9、12、18、24、36、48和54Mb/s共8种不同的 速率 – 覆盖范围约50米IEEE802.11g标准IEEE 802.11g-2003 Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) specifications Amendment 4: Further Higher Data Rate Extension in the 2.4 GHz Band 工作在2.4GHz频段 不同速率采用不同调制方式:– DSSS/CCK – OFDM – PBCC (Packet Binary Convolutional Coding,分组二进制 卷积码)(可选) – DSSS-OFDM(可选)11速率:支持IEEE 802.11b和a的所有速率12IEEE802.11g标准(续)与IEEE802.11b兼容– IEEE802.11g与IEEE802.11b的 设备可以在同一个接入点内互通使用频段的影响覆盖范围– 5GHz的覆盖范围小 – 覆盖同样区域需更多接入点传输速率高– 可达IEEE802.11a的54Mbps的 速率802.11g干扰– 工作于2.4GHz频段的系统多,如微波炉、无绳电话、蓝牙 等 – 5GHz频段的干扰较少20–50 Mbps 11 Mbps覆盖范围比IEEE802.11a大802.11g802.11b1314IEEE 802.11n标准11n = Next Generation of 802.11 工作频段:2.4GHz,5GHz 目标:– 显著增加吞吐量,可达300+Mbps – 扩大覆盖范围 – 后向兼容性 频带IEEE 802.11的标准比较802.11b 标准颁布时间Sept. 1999802.11aSept. 1999802.11gJun. 2003802.11nSept. 20092.4 GHz5 GHz2.4 GHz2.4/5 GHz关键技术:OFDM,MIMO 现状– 2009年9月批准通过 – WiFi联盟版的802.11n• 2007年6月开始对802.11n设备进行互操作性认证不重叠的子信道数 数据速率 调制类型3 1 – 11 Mbps12 6 – 54 Mbps33/121 – 54 Mbps 1 – 300 Mbps DSSS, CCK, OFDM 与802.11b 兼容 DSSS, CCK, OFDM, MIMO 与802.11b/g 兼容DSSS, CCKOFDM– WiFi联盟认证的802.11n的产品早有供货 兼容性无 无1516其他IEEE802.11x标准802.11c: Bridge Operation 802.11d: Specification for Operation in Additional Regulatory Domains 802.11e: Medium Access Control (MAC) Quality of Service Enhancements 802.11F: Inter-Access Point Protocol(撤销) 802.11h: Spectrum and Transmit Power Management Extensions in the 5GHz band in Europe 802.11i: Medium Access Control (MAC) Security Enhancements 802.11j: 4.9 GHz–5 GHz Operation in Japan其他IEEE802.11x标准(续)已完成标准化:– IEEE 802.11k-2008: Radio Resource Measurement of Wireless LANs – IEEE 802.11n-2009:Enhancements for Higher Throughput – IEEE 802.11p-2010:Wireless Access in Vehicular Environments – IEEE 802.11r-2008: Fast Basic Service Set(BSS) – IEEE 802.11w-2009:Protected Management Frames – IEEE 802.11y-2008: 3650-3700 MHz Operation in USA已发布的802.11-2007将802.11a, b, d, e, g, h, i, j 整合为一个统一的标准1718其他IEEE802.11x标准(续)正在进行标准化:– 802.11s: Mesh Networking – 802.11T: Wireless Performance – 802.11u: Inter-working with External Networks – 802.11v: Wireless Network Management – 802.11z: Direct Datalink Setup (DLS) – 802.11aa : Robust Audio Video Stream Transport – 802.11ac: Very High Throughput <6GHz – 802.11ad: Very High Throughput in 60 GHz – 802.11ae, QoS Management – 802.11af, TV Whitespace主要内容IEEE 802.11标准 IEEE 802.11基本结构和服务 IEEE 802.11物理层技术 IEEE 802.11MAC层技术 IEEE 802.11的增强技术 WiFi组网技术Reference: /groups/802/11/QuickGuide_IEEE_802_ WG_and_Activities.htm1920802.11的基本结构BSS的拓扑结构有基础架构(Infrastructure)的集中式拓扑– 由无线接入点AP(Access Point)提供• 到有线网络的连接 • 各站通信的中继 • 相当于蜂窝网中的基站– 各站不能直接通信,需经AP转发STA2 STA4能互相进行无线通信的STA(Station)组成一个基本服务组BSS (Basic Service Set) BSS是IEEE802.11网络的基本单元APSTA1 STA3BSS 2221BSS的拓扑结构(续)分布对等式拓扑– 在没有预先存在的基础通信设施的环境下,各个无线节点 彼此直接进行通信 – 网络中没有接入点AP设备 – 构成一种独立(Independent)BSS,IBSS – 也称为自组织(Ad Hoc)模式STA2802.11的扩展服务组扩展服务组ESS (Extended Service Set)是由多个BSS通过 分布式系统DS (Distributed System) 相互联接起来的 DS可以是有线的也可以 是无线的STA4 STA1 STA3每个BSS都分配了一个 标识Identifier,BSSIDIBSS 23 24802.11组网模式802.11的逻辑服务站服务SS(Station Services)由站点提供的服务Ad Hoc– Authentication – Deauthentication – Data confidentiality – MSDU(MAC Service Data Unit,MAC服务数据单元 )delivery – DFS(Dynamic Frequency Selection) – TPC(Transmit Power Control) – Higher layer timer synchronization(QoS facility only) – QoS traffic scheduling(QoS facility only)单个BSS多个BSS2526802.11的逻辑服务(续)分布式系统服务DSS(Distribution System Services)由分布式系统提供的服务 – Association – Disassociation – Reassociation – Distribution – Integration – QoS traffic scheduling(QoS facility only)DS的数据分发服务Distribution– 用在节点间交换MAC帧 – 从一个BSS的节点发送到另一个BSS的节点Integration– 当数据交换双方一个位于802.11局域网中而另一个位于非 802.11局域网时 – 涉及地址转换、传输介质变换逻辑和帧格式转换2728关联服务Association– 节点必须与所在BSS的AP建立关联,AP才能将此信息通 报给ESS内的其他AP,以便进行路由和帧的传递安全性服务Authentication– 802.11支持多种认证模式,并允许对此进行扩充 – 标准没有强制要求任何特定认证模式Reassociation– 关联可从一个AP转换到另一个AP,允许节点从一个BSS 移动到另一个BSSDe-authentication– 一个原先已通过认证的站点离开网络时需要解除认证Privacy– 用来防止消息内容被非指定接收者阅读 – 标准支持可选的加密手段Diassociation– 站点在离开ESS或者关机之前发送该通告 – MAC管理机制防止没通告的节点消失2930主要内容IEEE 802.11标准 IEEE 802.11基本结构和服务 IEEE 802.11物理层技术 IEEE 802.11 MAC层技术 IEEE 802.11的增强技术 WiFi组网技术IEEE 802.11物理层技术IEEE 802.11物理层概述 DSSS物理层 HR/DSSS物理层 FHSS物理层 IR物理层 OFDM物理层3132802.11物理层分类802.11基本物理层(2.4 GHz频段)– DSSS:1, 2 Mbps – FHSS:1, 2 Mbps – IR:1, 2 Mbps802.11协议分层模型802.11b物理层(2.4 GHz频段)– HR/DSSS(High Rate DSSS)• 1, 2 Mbps • 采用CCK时,5.5, 11 Mbps物理层802.11a物理层(5GHz频段)– OFDM: 6,9,12,18,24,36,48,54 Mbps802.11g物理层(2.4GHz频段)-- ERP(Extended Rate PHY)– ERP- DSSS/CCK: 1, 2, 5.5, 11 Mbps – ERP- OFDM: 6,9,12,18,24,36,48,54 Mbps – ERP- PBCC(可选):22, 33 Mbps – DSSS- OFDM(可选): 6,9,12,18,24,36,48,54 Mbps3334802.11物理层分层结构物理层管理实体PLME(Physical Layer Management Entity)– 与MAC层管理相连,执行本地物理层的管理功能DSSS物理层直接序列扩频DSSS(Direct Sequence Spread Spectrum)– 发送端:把要传送的信息直接由高码率 的扩频码序列编码后,对载波进行调制 以扩展信号的频谱 – 接收端:用相同的扩频码序列进行解 扩,利用扩频码良好的自相关性,把展 宽的扩频信号还原成原始信号物理层汇聚过程PLCP(Physical Layer Convergence Procedure)子层– 规定如何将MAC层协议数据单元(MPDU)映射为合适的 物理层帧格式,用于收发用户数据和管理信息,以及相反 操作DSSS物理层特点– 有较强的抗干扰能力,802.11中要求扩 频增益不小于10dB – 扩频信号的频谱具有近似于噪声频谱的 特性,具有很强的抗截获和防侦查和窃 听的能力 – 抗多径干扰的能力强物理媒体相关PMD(Physical Medium Dependent)子 层– 定义了两点和多点间通过无线媒体收发数据的特性和方法3536DSSS物理层规定扩频码:伪噪声PN(Pseudo Noise)码– 11位Barker序列:+1, –1, +1, +1, –1, +1, +1, +1, –1, –1, –1 – 11Mchip/sDSSS物理层的信道划分工作在2.4GHz ISM频段– 频率范围:2.400GHz~2.4835GHz – 划分为14个信道• 每个信道带宽为22MHz • 相邻信道中心频率间隔为5MHzCh Ch Ch 12 13 14调制:– 1Mbps – DBPSK(差分二相移相键控)调制 – 2Mbps - DQPSK (差分正交移相键控)调制空闲信道估计CCA(Clear Channel Assessment)– 模式1:能量超过门限时报告信道忙 – 模式2:检测到DSSS信号时报告信道忙(载波检测CS) – 模式3:检测到DSSS信号,且其能量超过门限372.484GHz38各个国家授权使用的ISM频段DSSS物理信道只有3个互不重叠的信道 在多小区网络中,为避免临道干扰,应使用无频率交叉的信道,如: 1,6,11信道(美国),或1,7,13(欧洲) 适当调整发射功率,避免跨区域同频干扰3940PLCP前导码和帧头使用1 Mbps DBPSK调制41短PLCP PDU(可选)FHSS Using MFSK–频率范围:49FHSS物理层(续)–跳频信道数OFDM物理层(续)•我国已开放高U-NII频段的149,153,157,161PLCP帧格式PLCP前导码58MAC层–来自接收端的立即确认RTS/CTS–通知发送端和接收端附近的节点Two nodes transmithas been a collisionNodes wait a randomperiod before retransmittinga common bus at the same timeC是的隐藏节点降低了信道的利用率B是C的暴露节点CSMA/CA的实施载波侦听多路访问CSMA(Carrier Sense Multiple–利用ACK确认帧的正确传输(由于不能检测冲突)9596109DCF 数据传送过程举例例:Slot Time = 1, CW min = 5, DIFS = 3, PIFS = 2, SIFS = 1管理帧——PCF过程(续)CFP期间,采用SIFS帧间隔以防止DCF的站点的接入Beacon中有CFP时长,其它站记录CFP占用的时长,写入NAV,以保证CFP期间非被轮询的站点不会占有媒体PC用CF_end指示CFP结束114信标帧(Beacon)认证Open-system Authentication等同于不需要认证,没有任何安全防护能力。
bwa使⽤说明Manual Reference Pages- bwa (1)NAMEbwa - Burrows-Wheeler Alignment ToolCONTENTSSynopsisDescriptionCommands And OptionsSam Alignment FormatNotes On Short-read AlignmentAlignment AccuracyEstimating Insert Size DistributionMemory RequirementSpeedChanges In Bwa-0.6See AlsoAuthorLicense And CitationHistorySYNOPSISbwa index ref.fa 构建索引bwa mem ref.fa reads.fq > aln-se.sam 单端测序bwa mem ref.fa read1.fq read2.fq > aln-pe.sam 双端测序bwa aln ref.fa short_read.fq > aln_sa.saibwa samse ref.fa aln_sa.sai short_read.fq > aln-se.sambwa sampe ref.fa aln_sa1.sai aln_sa2.sai read1.fq read2.fq > aln-pe.sambwa bwasw ref.fa long_read.fq > aln.samDESCRIPTIONBWA is a software package for mapping low-divergent sequences against a large reference genome, such as the human genome. It consists of three algorithms: BWA-backtrack, BWA-SW and BWA-MEM. The first algorithm is designed for Illumina sequence reads up to 100bp, while the rest two for longer sequences ranged from 70bp to 1Mbp. BWA-MEM and BWA-SW share similar features such as long-read supportand split alignment, but BWA-MEM, which is the latest, is generally recommended for high-quality queries as it is faster and more accurate. BWA-MEM also has better performance than BWA-backtrack for 70-100bp Illumina reads.For all the algorithms, BWA first needs to construct the FM-index for the reference genome (the index command). Alignment algorithms are invoked with different sub-commands: aln/samse/sampe for BWA-backtrack, bwasw for BWA-SW and mem for the BWA-MEM algorithm.COMMANDS AND OPTIONSSAM ALIGNMENT FORMATThe output of the ‘aln’ command is binary and designed for BWA use only. BWA outputs the final alignment in the SAM(Sequence Alignment/Map) format. Each line consists of:Each bit in the FLAG field is defined as:The Please check for the format specification and the tools for post-processing the alignment. BWA generates the following optional fields. Tags starting with ‘X’ are specific to BWA.Note that XO and XG are generated by BWT search while the CIGAR string by Smith-Waterman alignment. These two tags may be inconsistent with the CIGAR string. This is not a bug.NOTES ON SHORT-READ ALIGNMENTAlignment AccuracyWhen seeding is disabled, BWA guarantees to find an alignment containing maximum maxDiff differences including maxGapO gap opens which do not occur within nIndelEnd bp towards either end of the query. Longer gaps may be found if maxGapE is positive, but it is not guaranteed to find all hits. When seeding is enabled, BWA further requires that the first seedLen subsequence contains no more than maxSeedDiff differences.When gapped alignment is disabled, BWA is expected to generate the same alignment as Eland version 1, the Illumina alignment program. However, as BWA change ‘N’ in the database sequence to random nucleotides, hits to these random sequences will also be counted. As a consequence, BWA may mark a unique hit as a repeat, if the random sequences happen to be identical to the sequences which should be unqiue in the database.By default, if the best hit is not highly repetitive (controlled by -R), BWA also finds all hits contains one more mismatch; otherwise, BWA finds all equally best hits only. Base quality is NOT considered in evaluating hits. In the paired-end mode, BWA pairs all hits it found. It further performs Smith-Waterman alignment for unmapped reads to rescue reads with a high erro rate, and for high-quality anomalous pairs to fix potential alignment errors.Estimating Insert Size DistributionBWA estimates the insert size distribution per 256*1024 read pairs. It first collects pairs of reads with both ends mapped with a single-end quality 20 or higher and then calculates median (Q2), lower and higher quartile (Q1 and Q3). It estimates the mean and the variance of the insert size distribution from pairs whose insert sizes are within interval [Q1-2(Q3-Q1), Q3+2(Q3-Q1)]. The maximum distance x for a pair considered to be properly paired (SAM flag 0x2) is calculated by solving equation Phi((x-mu)/sigma)=x/L*p0, where mu is the mean, sigma is the standard error of the insert size distribution, L is the length of the genome, p0 is prior of anomalous pair and Phi() is the standard cumulative distribution function. For mapping Illumina short-insert reads to the human genome, x is about 6-7 sigma away from the mean. Quartiles, mean, variance and x will beprinted to the standard error output.Memory RequirementWith bwtsw algorithm, 5GB memory is required for indexing the complete human genome sequences. For short reads, the aln command uses ~3.2GB memory and the sampe command uses ~5.4GB.SpeedIndexing the human genome sequences takes 3 hours with bwtsw algorithm. Indexing smaller genomes with IS algorithms is faster, but requires more memory.The speed of alignment is largely determined by the error rate of the query sequences (r). Firstly, BWA runs much faster for near perfect hits than for hits with many differences, and it stops searching for a hit with l+2 differences if a l-difference hit is found. This means BWA will be very slow if r is high because in this case BWA has to visit hits with many differences and looking for these hits is expensive. Secondly, the alignment algorithm behind makes the speed sensitive to [k log(N)/m], where k is the maximum allowed differences, N the size of database and m the length of a query. In practice, we choose k w.r.t. r and therefore r is the leading factor. I would not recommend to use BWA on data with r>0.02.Pairing is slower for shorter reads. This is mainly because shorter reads have more spurious hits and converting SA coordinates to chromosomal coordinates are very costly. CHANGES IN BWA-0.6Since version 0.6, BWA has been able to work with a reference genome longer than 4GB. This feature makes it possible to integrate the forward and reverse complemented genome in one FM-index, which speeds up both BWA-short and BWA-SW. As a tradeoff, BWA uses more memory because it has to keep all positions and ranks in 64-bit integers, twice larger than 32-bit integers used in the previous versions.The latest BWA-SW also works for paired-end reads longer than 100bp. In comparison to BWA-short, BWA-SW tends to be more accurate for highly unique reads and more robust to relative long INDELs and structural variants. Nonetheless, BWA-short usually has higher power to distinguish the optimal hit from many suboptimal hits. The choice of the mapping algorithm may depend on the application.SEE ALSOBWA website , Samtools website</doc/18084badaf45b307e87197e9.html >AUTHORHeng Li at the Sanger Institute wrote the key source codes and integrated the following codes for BWT construction: bwtsw< http://i.cs.hku.hk/~ckwong3/bwtsw/>, implementedby Chi-Kwong Wong at the University of Hong Kong and IS</doc/18084badaf45b307e87197e9.html /sais> originally proposed by Nong Ge</doc/18084badaf45b307e87197e9.html /nong/> at the Sun Yat-Sen University and implemented by Yuta Mori.LICENSE AND CITATIONThe full BWA package is distributed under GPLv3 as it uses source codes from BWT-SW which is covered by GPL. Sorting, hash table, BWT and IS libraries are distributed under the MIT license.If you use the BWA-backtrack algorithm, please cite the following paper:Li H. and Durbin R. (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics, 25, 1754-1760. [PMID: 19451168]If you use the BWA-SW algorithm, please cite:Li H. and Durbin R. (2010) Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics, 26, 589-595. [PMID: 20080505]If you use the fastmap component of BWA, please cite:Li H. (2012) Exploring single-sample SNP and INDEL calling with whole-genome de novo assembly. Bioinformatics, 28,1838-1844. [PMID: 22569178]The BWA-MEM algorithm has not been published yet.HISTORYBWA is largely influenced by BWT-SW. It uses source codes from BWT-SW and mimics its binary file formats; BWA-SW resembles BWT-SW in several ways. The initial idea about BWT-based alignment also came from the group who developed BWT-SW. At the same time, BWA is different enough from BWT-SW. The short-read alignment algorithm bears no similarity to Smith-Waterman algorithm any more. While BWA-SW learns fromBWT-SW, it introduces heuristics that can hardly be applied to the original algorithm. In all, BWA does not guarantee to find all local hits as what BWT-SW is designed to do, but it is much faster than BWT-SW on both short and long query sequences.I started to write the first piece of codes on 24 May 2008 and got the initial stable version on 02 June 2008. During this period, I was acquainted that Professor Tak-Wah Lam, the first author of BWT-SW paper, was collaborating with Beijing Genomics Institute on SOAP2, the successor to SOAP (Short Oligonucleotide Analysis Package). SOAP2 has come out in November 2008. According to the SourceForge download page, the third BWT-based short read aligner, bowtie, was first released in August 2008. At the time ofwriting this manual, at least three more BWT-based short-read aligners are being implemented.The BWA-SW algorithm is a new component of BWA. It was conceived in November 2008 and implemented ten months later.The BWA-MEM algorithm is based on an algorithm finding super-maximal exact matches (SMEMs), which was first published with the fermi assembler paper in 2012. I first implemented the basic SMEM algorithm in the fastmap command for an experiment and then extended the basic algorithm and added the extension part in Feburary 2013 to make BWA-MEM a fully featured mapper.samtools – Utilities for the Sequence Alignment/Map (SAM) formatSYNOPSISsamtools view -bt ref_list.txt -o aln.bam aln.sam.gzsamtools sort -T /tmp/aln.sorted -o aln.sorted.bam aln.bamsamtools index aln.sorted.bamsamtools idxstats aln.sorted.bamsamtools view aln.sorted.bam chr2:20,100,000-20,200,000samtools merge out.bam in1.bam in2.bam in3.bamsamtools faidx ref.fastasamtools fixmate /doc/18084badaf45b307e87197e9.html sorted.sam out.bamsamtools mpileup -C50 -gf ref.fasta -r chr3:1,000-2,000 in1.bam in2.bamsamtools tview aln.sorted.bam ref.fastasamtools flags PAIRED,UNMAP,MUNMAPsamtools bam2fq input.bam > output.fastqDESCRIPTIONSamtools is a set of utilities that manipulate alignments in the BAM format. It imports from and exports to the SAM (Sequence Alignment/Map) format, does sorting, merging and indexing, and allows to retrieve reads in any regions swiftly.Samtools is designed to work on a stream. It regards an input file `-' as the standard input (stdin) and an output file `-' as the standard output (stdout). Several commands can thus be combined with Unix pipes. Samtools always output warning and error messages to the standard error output (stderr).Samtools is also able to open a BAM (not SAM) file on a remote FTP or HTTP server if the BAM file name starts with `ftp://' or `http://'. Samtools checks the current working directory for the index file and will download the index upon absence. Samtools does not retrieve the entire alignment file unless it is asked to do so.COMMANDS AND OPTIONSviewsamtools view [options] in.bam|in.sam|in.cram [region...]With no options or regions specified, prints all alignments in the specified input alignment file (in SAM, BAM, or CRAM format) to standard output in SAM format (with no header).You may specify one or more space-separated region specifications after the input filename to restrict output to only those alignments which overlap the specified region(s). Use of region specifications requires a coordinate-sorted and indexed input file (in BAM or CRAM format).The -b, -C, -1, -u, -h, -H, and -c options change the output format from the default of headerless SAM, and the -o and -U options set the output file name(s).The -t and -T options provide additional reference data. One of these two options is required when SAM input does not contain @SQ headers, and the -T option is required whenever writing CRAM output.The -L, -r, -R, -q, -l, -m, -f, and -F options filter the alignments that will be included in the output to only those alignments that match certain criteria.The -x, -B, and -s options modify the data which is contained in each alignment.Finally, the -@ option can be used to allocate additional threads to be used for compression, and the -? option requests a long help message.REGIONS:Regions can be specified as: RNAME[:STARTPOS[-ENDPOS]] and all position coordinates are 1-based.Important note: when multiple regions are given, some alignments may be output multiple times if they overlap more than one of the specified regions.Examples of region specifications:`chr1'Output all alignments mapped to the reference sequence named `chr1' (i.e. @SQ SN:chr1) .`chr2:1000000'The region on chr2 beginning at base position 1,000,000 and ending at the end of the chromosome.`chr3:1000-2000'The 1001bp region on chr3 beginning at base position 1,000 and ending at base position 2,000 (including both end positions).OPTIONS:-bOutput in the BAM format.-COutput in the CRAM format (requires -T).-1Enable fast BAM compression (implies -b).-uOutput uncompressed BAM. This option saves time spent on compression/decompression and is thus preferred when the output is piped to another samtools command.-hInclude the header in the output.-HOutput the header only.-cInstead of printing the alignments, only count them and print the total number. All filter options, such as -f, -F, and -q, are taken into account.-?Output long help and exit immediately.-o FILEOutput to FILE [stdout].-U FILEWrite alignments that are not selected by the various filter options to FILE. When this option is used, all alignments (or all alignments intersecting the regions specified) are written to either the output file or this file, but never both.-t FILEA tab-delimited FILE. Each line must contain the reference name in the first column and the length of the reference in the second column, with one line for each distinct reference. Any additional fields beyond the second column are ignored. This file also defines the order of the reference sequences in sorting. If you run: `samtools faidx ', the resulting index file .fai can be used as this FILE.-T FILEA FASTA format reference FILE, optionally compressed by bgzip and ideally indexed by samtools faidx. If an index is not present, one will be generated for you.-L FILEOnly output alignments overlapping the input BED FILE [null].-r STROnly output alignments in read group STR [null].-R FILEOutput alignments in read groups listed in FILE [null].-q INTSkip alignments with MAPQ smaller than INT [0].-l STROnly output alignments in library STR [null].-m INTOnly output alignments with number of CIGAR bases consuming query sequence ≥ INT [0]-f INTOnly output alignments with all bits set in INT present in the FLAG field. INT can be specified in hex by beginning with `0x' (i.e. /^0x[0-9A-F]+/) or in octal by beginning with `0' (i.e. /^0[0-7]+/) [0].-F INTDo not output alignments with any bits set in INT present in the FLAG field. INT can be specified in hex by beginning with `0x' (i.e. /^0x[0-9A-F]+/) or in octal by beginning with `0' (i.e. /^0[0-7]+/) [0].-x STRRead tag to exclude from output (repeatable) [null]-BCollapse the backward CIGAR operation.-s FLOATInteger part is used to seed the random number generator [0]. Part after the decimal point sets the fraction of templates/pairs to subsample [no subsampling]. -@ INTNumber of BAM compression threads to use in addition to main thread [0].-SIgnored for compatibility with previous samtools versions. Previously this option was required if input was in SAM format, but now the correct format is automatically detected by examining the first few characters of input.tviewsamtools tview [-p chr:pos] [-s STR] [-d display]< in.sorted.bam> [ref.fasta]Text alignment viewer (based on the ncurses library). In the viewer, press `?' for help and press `g' to check the alignment start from a region in the format like `chr10:10,000,000' or `=10,000,000' when viewing the same reference sequence. Options:-d displayOutput as (H)tml or (C)urses or (T)ext-p chr:posGo directly to this position-s STRDisplay only alignments from this sample or read groupmpileupsamtools mpileup [-EBugp] [-C capQcoef] [-r reg] [-f in.fa] [-l list] [-Q minBaseQ] [-q minMapQ] in.bam [in2.bam [...]] Generate VCF, BCF or pileup for one or multiple BAM files. Alignment records are grouped by sample (SM) identifiers in@RG header lines. If sample identifiers are absent, each input file is regarded as one sample.In the pileup format (without -u or -g), each line represents a genomic position, consisting of chromosome name, 1-based coordinate, reference base, the number of reads covering the site, read bases, base qualities and alignment mapping qualities. Information on match, mismatch, indel, strand, mapping quality and start and end of a read are all encoded at the read base column. At this column, a dot stands for a match to the reference base on the forward strand, a comma for a match on the reverse strand, a '>' or '<' for a reference skip, `ACGTN' for a mismatch on the forward strand and `acgtn' for a mismatch on the reverse strand.A pattern `\\+[0-9]+[ACGTNacgtn]+' indicates there is an insertion between thisreference position and the next reference position. The length of the insertion is given by the integer in the pattern, followed by the inserted sequence. Similarly, a pattern `-[0-9]+[ACGTNacgtn]+' represents a deletion from the reference. The deleted bases will be presented as `*' in the following lines. Also at the read base column, a symbol `^' marks the start of a read. The ASCII of the characterfollowing `^' minus 33 gives the mapping quality. A symbol `$' marks the end of a read segment.Input Options:-6, --illumina1.3+Assume the quality is in the Illumina 1.3+ encoding.-A, --count-orphansDo not skip anomalous read pairs in variant calling.-b, --bam-list FILEList of input BAM files, one file per line [null]-B, --no-BAQDisable probabilistic realignment for the computation of base alignment quality (BAQ). BAQ is the Phred-scaled probability of a read base being misaligned. Applying this option greatly helps to reduce false SNPs caused by misalignments. -C, --adjust-MQ INTCoefficient for downgrading mapping quality for reads containing excessive mismatches. Given a read with a phred-scaled probability q of being generated from the mapped position, the new mapping quality is about sqrt((INT-q)/INT)*INT. A zero value disables this functionality; if enabled, the recommended value for BWA is 50. [0]-d, --max-depth INTAt a position, read maximally INT reads per input BAM. [250]-E, --redo-BAQRecalculate BAQ on the fly, ignore existing BQ tags-f, --fasta-ref FILEThe faidx-indexed reference file in the FASTA format. The file can be optionally compressed by bgzip. [null]-G, --exclude-RG FILEExclude reads from readgroups listed in FILE (one @RG-ID per line)-l, --positions FILEBED or position list file containing a list of regions or sites where pileup or BCF should be generated. If BED, positions are 0-based half-open [null]-q, -min-MQ INTMinimum mapping quality for an alignment to be used [0]。
bwa 索引案例BWA 是一种常用于下一代测序(NGS)数据的序列比对工具。
为了使用BWA 进行比对,首先需要对参考基因组构建索引。
以下是一个使用 BWA 构建索引的案例:假设我们有一个参考基因组文件名为 ``,首先我们需要在终端中运行以下命令来构建索引:```bashecho "bwa index starts"`date`" &&\cd ref &&\bwa index -a bwtsw &&\echo "bwa index ends"`date````这个命令会生成以下几种类型的索引文件:``````````这些索引文件用于后续的序列比对过程。
此外,还可以使用 BWA 的 `mem` 算法进行序列比对。
以下是一个使用`mem` 算法进行比对的例子:```bashecho "bwa mem starts"`date`" &&\cd ref &&\bwa mem -t 4 -M -R"RG\tID:{library}\tLB:{library}\tPL:Illumina\tPU:{sample}\tSM:{sample}" /home/data/ {read1}.fastq {read2}.fastq > &&\echo "bwa mem ends"`date````在这个例子中,我们使用了 4 个线程进行比对,并指定了 read group 信息,以便于后续的SAM/BAM 文件处理。
请注意,你需要将 `{library}`,`{library}`, `{sample}` 和 `{read1}.fastq`, `{read2}.fastq` 替换为实际的值。
中兴通迅技术:中兴BWA统一网管系统
杨采坚
【期刊名称】《电信科学》
【年(卷),期】2004(20)4
【摘要】@@ 1引言rn宽带无线接入(BWA)系统以点对多点的信号传送方式为电信运营商提供高速率、大容量、高可靠性、全双工的宽带接人手段,实现用户终端到骨干网的宽带无线接入.实践证明,在众多的解决方案中,BWA系统在初期投资量、部署速度、抢占市场的速度、资源重复利用等方面与其它技术系统相比有着巨大的优势和更广阔的市场空间,越来越受到运营商特别是新兴运营商的重视.
【总页数】3页(P66-68)
【作者】杨采坚
【作者单位】中兴通讯股份有限公司,深圳,518004
【正文语种】中文
【相关文献】
1.蒙东电力中兴传输网管系统研究 [J], 孙添资;张海全;杨燚;王华
2.中兴通讯BWA系统在"村村通工程"中的应用 [J], 中兴通讯股份有限公司
3.中兴通讯领跑BWA3.5G宽带无线市场 [J],
4.中兴中标BWA3.5G宽带无线接入项目 [J],
5.中兴通讯力拓BWA3.5G宽带无线市场 [J],
因版权原因,仅展示原文概要,查看原文内容请购买。
bwa用法
BWA是一个基于Burrows-Wheeler Transform算法的常用测序数据比对软件。
以下是BWA的用法:
1.安装BWA:下载BWA源代码,解压后进入其目录执行make命令即可安装。
2.构建参考基因组索引:利用bwa index命令生成参考基因组的BWT索引文件,命令格式为:bwa index <reference.fa>
3.比对测序数据:将测序数据与参考基因组进行比对,命令格式为:bwa mem <reference.fa> <reads.fq> > output.sam
其中,<reference.fa>为参考基因组文件路径,<reads.fq>为测序数据文件路径,>output.sam表示将比对结果输出到output.sam文件中。
4.处理比对结果:可使用samtools等软件对比对结果进行过滤、排序、去重等处理。
例如,使用samtools view命令将SAM格式文件转换为BAM格式文件:samtools view -bS output.sam > output.bam 以上是BWA的基本用法,可根据实际需求选取不同的命令和选项以实现更细粒度的操作。