语音信号处理中英文翻译

格式：docx
大小：927.95 KB
文档页数：17

下载文档原格式

语音信号处理中英文翻译

附录：中英文翻译15SpeechSignalProcessing15.3AnalysisandSynthesisJ esseW. FussellA fte r an acousti c spee ch s i gnal i s conve rte d to an ele ctri cal si gnal by a mi crophone, i t m ay be desi rable toanalyzetheelectricalsignaltoestimatesometime-varyingparameterswhichprovideinformationaboutamodel of the speech producti on me chanism. S peech a na ly sis i s the process of e stim ati ng such paramete rs. Simil arl y , g ive n some parametri c model of spee ch production and a se que nce of param eters for that m odel,speechsynthesis istheprocessofcreatinganelectricalsignalwhichapproximatesspeech.Whileanalysisandsynthesistechniques maybedoneeitheronthecontinuoussignaloronasampledversionofthesignal,mostmode rn anal y sis and sy nthesis methods are base d on di gital si gnal processing.Atypicalspeechproductionmodelisshownin Fig.15.6.Inthismodeltheoutputoftheexcitationfunctionisscaledbythegainparam eterandthenfilteredtoproducespeech.Allofthesefunctionsaretime-varying.F IGUR E 15 .6 A ge ne ra l spee ch productionmodel.F IGUR E 1 5 .7 W ave form of a spoken phone me /i/ as i nbeet.Formanymodels,theparametersarevariedataperiodicrate,typically50to100timespersecond.Mostspee ch inform ati on is containe d i n the porti on of the si gnal bel ow about 4 kHz.Theexcitationisusually modeledaseitheramixtureorachoiceofrandomnoiseandperiodicwaveform.For hum an spee ch, v oi ced e x citati on occurs w hen the vocal fol ds in the lary nx vibrate; unvoi ce d e x citati onoccurs at constri cti ons i n the vocal tract w hi ch cre ate turbulent a i r fl ow [Fl anagan, 1965] . The rel ati ve mi x ofthesetw o type s ofexcitationisterme d ‚v oicing.‛In addition,theperiodi c e xcitation i s characterizedby afundamentalfrequency,termed pitch orF0.Theexcitationisscaledbyafactordesignedtoproducetheproperampli tude or level of the spee ch si gnal . The scaled ex citati on function i s then fi ltere d to produce the properspe ctral characte risti cs. W hile the filter m ay be nonli near, i t i s usuall y m odele d as a li nearfunction.AnalysisofExcitationInasimplifiedform,theexcitationfunctionmaybeconsideredtobepurelyperiodic,forvoicedspeech,orpurel y random, for unvoi ce d. T hese tw o states correspond to voi ce d phoneti c cl asse s such as vow elsand nasalsandunvoicedsoundssuchasunvoicedfricatives.Thisbinaryvoicingmodelisanoversimplificationforsounds such as v oi ced fri cati ves, whi ch consist of a mi xture of peri odi c and random compone nts. Fi gure 15.7is an ex ample of a time w ave form of a spoke n /i/ phoneme , w hi ch is w ell m odeled by onl y pe riodi c e x citation.B oth ti me dom ai n and frequency dom ai n anal y s is te chni ques have bee n used to esti m ate the de greeofvoi ci ng for a short se gme nt or frame of spee ch. One ti me dom ain fe ature, te rme d the ze ro crossing rate,i sthenumberoftimesthesignalchangessigninashortinterval.AsshowninFig.15.7,thezerocrossingrateforvoicedsoundsisrelativ elylow.Sinceunvoicedspeechtypicallyhasalargerproportionofhigh-frequencyenergy than voi ce d spee ch, the ratio of high-fre que ncy to low -frequency e nergy is a fre que ncy dom aintechni que that provi des i nform ation on voi cing.A nothe r measure use d to estim ate the de gree of voi ci ng is the autocorrel ation functi on, w hi ch is de fine d fora sam pled speech se gment, S ,aswheres(n)isthevalueofthenthsamplewithinthesegmentoflengthN.Sincetheautocorrelationfunctionofa periodi c functi on is i tsel f pe ri odi c, voi ci ng can be e sti mated from the de gree of pe ri odi city oftheautocorrel ati on function. Fi gure 15. 8 i s a graph of the nonne gati ve te rms of the autocorrel ation functi on for a64 -ms frame of the w aveform of Fi g . 15. 7. Ex cept for the de cre ase i n amplitude w ith i ncre asi ng lag, whi chresultsfromtherectangularwindowfunctionwhichdelimitsthesegment,theautocorrelationfunctionisseento be quite pe riodi c for thi s voi ce dutterance.F IGUR E 1 5 .8 A utocorrel ati on functi on of one frame of /i/. Ifananalysisofthevoicingofthespeechsignalindicatesavoicedorperiodiccomponentispresent,another ste p i n the anal y si s process m ay be to estim ate the freque ncy ( or pe ri od) of the voi ce d component.Thereareanumberofwaysinwhichthismaybedone.Oneistomeasurethetimelapsebetweenpeaksinthetime dom ai n si gnal. For ex am ple i n Fi g . 15.7 the m aj or peaks are separate d by about 0. 00 71 s, for afundamentalfrequencyofabout141Hz.Note,itwouldbequitepossibletoerrintheestimateoffundamentalfre quency by mistaki ng the sm aller pe aks that occur betwee n the m a jor pe aks for the m aj or pe aks. Thesesmallerpeaksareproducedbyresonanceinthevocaltractwhich,inthisexample,happentobeatabouttwicethe ex citation fre quency . T his ty pe of e rror w ould re sult in an e sti m ate of pitch approxi m atel y tw i ce the corre ct fre quency.The di stance betw ee n m ajor pe ak s of the autocorrel ation functi on is a closel y rel ate d fe ature thatisfre quentl y use d to esti m ate the pitch pe ri od. In Fi g . 15. 8, the di stance between the m aj or peaks in the autocorrelationfunctionisabout0.0071s.Estimatesofpitchfromtheautocorrelationfunctionarealsosusce pti ble to mistaking the fi rst vocal track resonance for the g l ottal e x citati on frequency.The absol ute m agnitude di ffere nce functi on ( AM DF), de fi nedas,is another functi on w hi ch is often use d i n estim ating the pitch of voi ce d spee ch. A n ex ample of the AM DF isshownin Fig.15.9forthesame64-msframeofthe/i/phoneme.However,theminimaoftheAMDFisusedasanindicatorofthepitchperiod.TheAMDFhasbeenshownt obeagoodpitchperiodindicator[Rossetal.,19 74 ] and does not requi re multi pli cations.FourierAnalysisOne of the m ore comm on processe s for e stim ating the spe ctrum of a se gme nt of spee ch is the Fourie rtransform [ Oppenheim and S chafer, 1 97 5 ]. T he Fourie r transform of a seque nce is m athem ati call y de fine daswheres(n)representsthetermsofthesequence.Theshort-timeFouriertransformofasequenceisatimedependentfunction,definedasF IGUR E 1 5 .9 A bsolute m agnitude diffe rence functi on of one frame of /i/.wherethewindowfunctionw(n)isusuallyzeroexceptforsomefiniterange,andthevariablemisusedtoselectthesectionofthesequ enceforanalysis.ThediscreteFouriertransform(DFT)isobtainedbyuniformlysam pling the short-ti me Fourie r transform i n the fre quency dime nsi on. Thus an N-point DFT is computedusingEq.(15.14),wherethe setofNsamples,s(n),may have firstbeenmultiplied by a window function.Anexampleofthemagnitudeofa512-pointDFTofthewaveformofthe/i/from Fig.15.10isshowninFig.15.10.Noteforthisfi gure, the 512 poi nts in the se que nce have been m ulti plied by a Ham ming w i ndow de fi nedbyF IGUR E 1 5 .1 0 M agnitude of 51 2-point FFT of Ham mi ng window e d/i/.S ince the spe ctral characteristi cs of spee ch m ay change dram a ti call y in a fe w milli se conds, the le ngth, type,and l ocation of the wi ndow function are im portant consi derati ons. If the w indow is too long, changi ng spe ctralcharacteristicsmaycauseablurredresult;ifthewindowistooshort,spectralinaccuraciesresult.AHammingwi ndow of 16 to 32 m s durati on is com m onl y use d for spee ch analysis.S everal characte risti cs of a speech utte rance m ay be dete rmine d by ex amination of the DFT m agnitude. InFig.15.10,theDFTofavoicedutterancecontainsaseriesofsharppeaksinthefrequencydomain.Thesepeaks, caused by the peri odi c sampl ing acti on of the g lottal ex ci tation, are separated by the fundame ntalfrequencywhichisabout141Hz,inthisexample.Inaddition,broaderpeakscanbeseen,forexampleatabout300 Hz and at about 2300 Hz. T hese broad peaks, calle d formants, result from resonances in the vocaltract. LinearPredictiveAnalysisGivenasampled(discrete-time)signals(n),apowerfulandgeneralparametric modelfortimeseriesanalysisiswheres(n)istheoutputandu(n)istheinput(perhapsunknown).Themodelparametersare a(k)fork=1,p,b( l ) for l = 1, q, and G. b( 0) is assume d to be unity. Thi s m odel , describe d as an autore g ressi ve m ov ing average(ARM A)orpole-zeromodel,formsthefoundationfortheanalysismethodtermedlinearprediction.Anautoregressive(AR) orall-polemodel,forwhichallofthe‚b‛coe fficientsexceptb(0)arezero,isfrequentlyused for spee ch anal y si s [M arkel and Gray, 1976].In the standard A R formul ati on of li ne ar predi ction, the model paramete rs are sele cte d to mi ni mizethemean-squarederrorbetweenthemodelandthespeechdata.Inoneofthevariantsoflinearprediction,theautocorrelationmethod,themini mizationiscarriedoutforawindowedsegmentofdata.Intheautocorrelationmethod,minimizingthemean-squareerror of the time domain samples is equivalentto minimizing theintegratedratioofthesignalspectrumtothespectrumoftheall-polemodel.Thus,linearpredictiveanalysisisagoodmethod forspectralanalysiswheneverthesignalisproducedby an all-pole system.M ost speechsounds fi t thi s model w ell.One ke y consi deration for li near pre dicti ve anal y si s is the order of the model, p. For spee ch, if the orde ristoosmall,theformantstructureisnot well represented. If the orderis too large, pitch pulses as well asformantsbegintoberepresented.Tenth- or twelfth-order analysis is typical forspeech.Figures15.11 and15.12 provideexamplesof the spectrum produced by eighth-order and sixteenth-order linear predictiveanalysisofthe/i/waveformofFig.15.7.Figure15.11showstheretobethreeformantsatfrequenciesofabout30 0, 23 00, and 3200 Hz , whi ch are ty pi cal for an/i/.Homomorphic(Cepstral)AnalysisFor the speech m odel of Fi g. 15. 6, the e x citati on and filter i mpulse response are convol ved to produce thespeech.Oneoftheproblemsofspeechanalysisistoseparateordeconvolvethespeechintothesetw ocom ponents. Onesuch te chni que is called hom omorphi c filte ri ng [ Oppe nheim and S chafer, 1968 ]. Thecharacte risti c sy ste mfor a sy ste m for hom om orphi c deconvol ution conve rts a convolution operation to anadditi on ope ration. The output of such a characteristi c sy stem is calle d the com ple x cep str u m . The complexcepstrumisdefinedastheinverseFouriertransformofthecomplexlogarithmoftheFouriertransformoftheinput.Iftheinputseque nceisminimumphase(i.e.,thez-transformoftheinputsequencehasnopolesorzerosoutside the unit ci rcle), the se quence can be represe nted by the real portion of the transforms. Thus, the re alcepstrum can be com pute d by cal cul ati ng the inve rse Fourie r transform of the log- spe ctrum of theinput.FIGURE15.11Eighth-orderlinearpredictiveanalysisofan‚i‛.FIGURE15.12Sixteenth-orderlinearpredictiveanalysisofan‚i‛.Fi gure 1 5.1 3 show s an e x ample of the cepstrum for the voi ced /i/ utterance from Fi g. 15.7 . The cepstrum ofsuch a voi ce d utterance i s characte rized by rel ati vel y la rge v alues in the fi rst one or tw o milli se conds as w ellas。

15_Speech Signal Processing(语音信号处理)

General Approaches
Time Domain Coders and Linear Prediction Linear Predictive Coding (LPC) is a modeling technique that has seen widespread application among timedomain speech coders, largely because it is computationally simple and applicable to the mechanisms involved in speech production. In LPC, general spectral characteristics are described by a parametric model based on estimates of autocorrelations or autocovariances. The model of choice for speech is the all-pole or autoregressive (AR) model. This model is particularly suited for voiced speech because the vocal tract can be well modeled by an all-pole transfer function. In this case, the estimated LPC model parameters correspond to an AR process which can produce waveforms very similar to the original speech segment. Differential Pulse Code Modulation (DPCM) coders (i.e., ITU-T G.721 ADPCM [CCITT, 1984]) and LPC vocoders (i.e., U.S. Federal Standard 1015 [National Communications System, 1984]) are examples of this class of time-domain predictive architecture. Code Excited Coders (i.e., ITU-T G728 [Chen, 1990] and U.S. Federal Standard 1016 [National Communications System, 1991]) also utilize LPC spectral modeling techniques.1 Based on the general spectral model, a predictive coder formulates an estimate of a future sample of speech based on a weighted combination of the immediately preceding samples. The error in this estimate (the prediction residual) typically comprises a signiﬁcant portion of the data stream of the encoded speech. The residual contains information that is important in speech perception and cannot be modeled in a straightforward fashion. The most familiar form of predictive coder is the classical Differential Pulse Code Modulation (DPCM) system shown in Fig. 15.1. In DPCM, the predicted value at time instant k, ˆ s(k Έ k – 1), is subtracted from the input signal at time k, s(k), to produce the prediction error signal e(k). The prediction error is then approximated (quantized) and the quantized prediction error, eq(k), is coded (represented as a binary number) s(k Έ k – 1) to yield a for transmission to the receiver. Simultaneously with the coding, eq(k) is summed with ˆ s(k). Assuming no channel errors, an identical reconstruction, reconstructed version of the input sample, ˆ distorted only by the effects of quantization, is accomplished at the receiver. At both the transmitter and receiver, the predicted value at time instant k +1 is derived using reconstructed values up through time k, and the procedure is repeated. N ˆ (z) = 0 and Â(z) = The ﬁrst DPCM systems had B a z -i , where {ai ,i = 1…N} are the LPC coefﬁcients i =1 i –1 and z represents unit delay, so that the predicted value was a weighted linear combination of previous reconstructed valuesJ. Watson Research Center

移动通信常用英汉小词典

移动通信常用英汉小词典AA模拟AB地址总线AC交流电ACCESS接入ACCESSORIER配件ACCTIVE FITER有源滤波ACTIVA TE激活ADC模数转换ADDRESS地址线ADDRESS ENFEMA 地址信息ADI邻近AFC自动频率控制AFMS来自音频信号AFPCB音频电路板AGC自动增益控制AGND模拟地AID区域识别标志AIR TIME通话时间AIR TIME COUNTER通话计时器AIS ALC告警指示信息ALARM告警ALERT振铃AM调幅AM ADJ调幅调整AMP放大器ANACLK模拟13MHZ时钟ANODE阳极ANT天线ANTSW天线开关APC自动功率控制APCM自适应型脉冲编码调整ASIC专用应用集成电路ATMS到移动台音频信号AUC鉴权中心AUDIO音频AUDIO BIT RATE音频比特率AUDIO MUTE音频静音AUDIO PCM SIGAL音频脉冲编码信号AUTO TEST自动测试AUX辅助A VCC音频供电A/D INTERFACE模数接口A/L音频/逻辑板B BACKLIGHT背光BACKUP后备电源BAND频段BARRING限制BASE基极BASEBAND基带BIAS偏压BIT比特BLKCK块时钟BOOT屏蔽罩BS基站BSC基站控制器BSI电池尺寸BSIC基站识别码BSS基站子系统BTS基站收发信台BUFFER缓冲放大器BURST突发脉冲串BUS总线BW带宽C CE片使能、激活芯片CELL小区CELLULAR蜂窝CHANNEL信道CHSW充电开关CINVERTER整机CLONE复制COL列地址线COMP补充数据D DB分贝DET检测DEV偏移DFMS（来自手机）数据DIMS（来自基站）数据DISTORTION失真DIVERT转换DSP数字信号处理器DTMF双音多频DUPLEX双工器DUPLX双工间隔E EL发光ERROR AMP误差放大器ESD静电放大ESN电子串号F FACCH信道FBUS外接信号线FDMA频分多址FEED BACK反馈FH跳频FILTER滤波器FOCC全双工FREQUENCY DRIFT频率漂移FUSE熔丝G GAIN增益GREEN绿色GRID栅极GV AP电源模块H HARMONIC谐波滤波器HOOK挂机检测HPF高通滤波器I IF中频信号IFLO中频本振IFIUAD中频输出IFVCO中频VCOINDUCTANCE电感INFRARED RAY红外线INITIAL初始INT中断INTERFACE接口L LATOR温补晶体振荡器LBQ滤波器LCD EN显示屏使能LCDRSTX LCD复位信号LDE发光二极管LO本振LOCATION UPDA TE位置登记LOOP GAIN环路增益LOGIC逻辑LOOP FILTER环路滤波LOST失步LPC线性预测编码器LPF低通滤波LSB最低有效位M MAIN DIVIDER主分频器MASH多级噪声整形MCLK主时钟MCU微处理器MOD调制MODEM调制解调器MONITOR监视器MPU中央处理器MS移动台MSC移动电话交换中心MULTIPLEX多路复用MUTE静音N NPC网络参数控制O OFST偏置OMC操作维护中心ONE FRAME一帧OP AMP运算放大器OSC振荡器P PA功率放大器PAD衬底PARAMETER参数PARITY奇偶校验PAUSE暂停PCH寻呼信道PCM脉冲编码调制PD光敏二极管PE相位编码PEL像素PK峰值PKL相位跟踪环路PLL锁相环路PM调相POINT点PPM百万分之一PRE AMP前置放大器PURX复位PWM脉冲宽度调制PWRON开机信号线Q QUADRATURE正交调制QUALIFY认证QUALITY质量R R/W读写控制RACH随机接入信道RADIO射频、无线电RAM随机存储器RANDOM随机RD读RECALL重呼RECC反向控制信道RED红色REDA TA射频频率合成器数据REDUCE减少REED干簧管REF参考REF ADJ基准频率调整REFERENCE OCILLATOR参考振荡器RESET复位RESISTANCE电阻RF射频RFAENB射频频率合成器启动RFC逻辑时钟信号RFCLK射频时钟信号RFLO射频本振RMS均方根ROAM漫游ROM只读存储器ROW行地址RSSI接收信号强度指示RST复位信号RTC时钟控制RVC反向话音信道RX接收RX ON接收启动RX OUT接收输出RX/IQ接收解调信号RXEN接收使能RXIFN接收中频信号负RXIFP接收中频信号正RXPWR接收电源控制RXQN接收Q信号负RXQP接收Q信号正RXVCO接收VCOS SAD-DET饱和度检测SAMPLE取样SAT音频监测音SAW声表面滤波器SCLK频率合成时钟信号SDATA频率合成数据SECCH标准专用控制信道SECURTITY CODE保密码SENA频率合成启动控制SENSITIVITY灵敏度SENSOR传感器SHORTCUT短路SIC信令接口芯片SID系统识别SIGNAL信号SPI外接串行接口SRAM静态随机存储器STANDBY待机SW开关SYB DAT频率合成数据SYNC同步T TANK回路TCXO温度补偿晶体振荡器TP测试点TS时隙TX发射TXEN发射使能、启动TXI/Q发射数据TXIN发射I信号负TXIP发射I信号正TXON发射启动TXQN发射Q信号负TXQP发射Q信号正TXRF发射射频TXVCO发射压控振荡器U UHF超高频UPDATA升级UPLINK上行链接V VCH语音信道VCO压控振荡器VCTCXO温补压控振荡器VCXOCONT基准振荡器频率控制VHF甚高频VLCD液晶显示器电压VLR访位置登记VSYN频率合成电源W WAN广域网络WARNING警告WA VEFORM波形WDG看门狗WIRELESS无线X XVCC射频供电。

8 语音信号处理(1)

8.2 语音基本操作
一、读文件
[y, Fs, nbits] = wavread(filename) ① filename：为指定载入的WAV格式的文件名称； ② y：为所读取的音频数据样本； ③ Fs：为采样频率；
22/81
④ nbits：为文件中每个样本的字节数。
例如：
[y,fs,bits]=wavread(‘爸爸去哪儿.wav');
出的是“擦擦”声。在语音中，用阻碍气流的方法发出的是辅音，
相同或相近的音波产生共振，把它加强，其它频率的音波就会被
抑制或消耗。不同形状的共鸣器频率不一样，即使是对同一束复合音波，产生的共振结果也不一样。对于语音来说，口鼻腔就是
共鸣器，一个人不断改变口形就会发出不同的音。
8.1 基本概念——什么是语音
个装在半透明的镜子后面的摄像机
组成，训练时系统把人的声音和和嘴唇动作存入一个处理器。
引言
虚拟主持人
–
– –
5/81
英国报纸联合新闻社设计出世界首位虚拟播音员—安娜诺娃（安娜诺娃网络公司）() 中国“混血儿”虚拟主持人比尔中国首个虚拟主持人言东方上班
8.1 基本概念——什么是语音
语音四要素：
14/81
（1）音高指声音的高低，取决于发音体振动的快慢。发音体振动越快，发出的声音越高，反之声音则低。 ① 物体每秒钟振动的次数叫频率，声学把频率作为测定物体振动快慢与声音高低的标准。 ② 说声音的频率高就是说发音体在单位时间里振动次数多，它振动得快，发出的声音高，而频率低也就表示声音低。 ③ 频率的高低是由物体自身的质量、松紧度、长短等项因素决定的，大而沉、粗而厚、长而松的物体振动慢，音低；小而轻、细而薄、短而紧的物体振动快，音高。 ④ 语音的高低则与声带的长短、厚薄、松紧有关。通常，儿童和一般妇女的声带比成年男子的声带短而薄，所以声音高；而声音低的人声带相对说长而厚，如成年男子，女中、低音声乐演员，老人等。同一个人发出的声音有高低变化，则是靠控制声带的松紧来调节的。

语音信号处理毕业论文中英文资料外文翻译文献

语音信号处理毕业论文中英文资料外文翻译文献语音识别在计算机技术中，语音识别是指为了达到说话者发音而由计算机生成的功能，利用计算机识别人类语音的技术。

（例如，抄录讲话的文本，数据项;经营电子和机械设备;电话的自动化处理），是通过所谓的自然语言处理的计算机语音技术的一个重要元素。

通过计算机语音处理技术，来自语音发音系统的由人类创造的声音，包括肺，声带和舌头，通过接触，语音模式的变化在婴儿期、儿童学习认识有不同的模式，尽管由不同人的发音，例如，在音调，语气，强调，语调模式不同的发音相同的词或短语，大脑的认知能力，可以使人类实现这一非凡的能力。

在撰写本文时（2008年），我们可以重现，语音识别技术不只表现在有限程度的电脑能力上，在其他许多方面也是有用的。

语音识别技术的挑战古老的书写系统,要回溯到苏美尔人的六千年前。

他们可以将模拟录音通过留声机进行语音播放，直到1877年。

然而，由于与语音识别各种各样的问题，语音识别不得不等待着计算机的发展。

首先,演讲不是简单的口语文本——同样的道理,戴维斯很难捕捉到一个note-for-note曲作为乐谱。

人类所理解的词、短语或句子离散与清晰的边界实际上是将信号连续的流,而不是听起来: I went to the store yesterday昨天我去商店。

单词也可以混合,用Whadd ayawa吗?这代表着你想要做什么。

第二,没有一对一的声音和字母之间的相关性。

在英语,有略多于5个元音字母——a,e,i,o,u,有时y和w。

有超过二十多个不同的元音, 虽然,精确统计可以取决于演讲者的口音而定。

但相反的问题也会发生,在那里一个以上的信号能再现某一特定的声音。

字母C可以有相同的字母K的声音，如蛋糕，或作为字母S，如柑橘。

此外,说同一语言的人使用不相同的声音,即语言不同,他们的声音语音或模式的组织，有不同的口音。

例如“水”这个词,wadder可以显著watter，woader wattah等等。

数字信号处理英语

数字信号处理英语Digital Signal Processing (DSP) is an essential technology used in various fields such as communication, media, control systems and audio signal processing. This technology uses algorithms to transform digital signals (numbers) into specific applications. In this article, wewill explore some common terminologies used in DSP in English.1. SamplingSampling is the process of converting a continuoussignal into a discrete signal. The sampled signal represents the original signal at specific intervals, known as the sampling frequency. The number of samples taken per unit time is called the sample rate. For example, in audio signal processing, the standard sample rate is 44.1 kilohertz (kHz), which means that the signal is sampled 44,100 times per second.2. QuantizationQuantization is the process of assigning a discretevalue to each sample. Each sample is rounded to the nearest value in a given set of discrete values. The interval between each value is known as the quantization step size. For example, in audio signal processing, the quantization stepsize is measured in bits. The most common quantization bitsize is 16 bits, which means that each sample can be represented by a 16-bit binary number.3. FilteringFiltering is the process of removing or attenuating specific frequencies in a signal. The filter can be designedto pass only the desired frequency range or to eliminate unwanted frequencies. There are two types of filters –analog filters and digital filters. Analog filters usepassive components such as capacitors and resistors, while digital filters use mathematical algorithms to process the signal.4. Fast Fourier Transform (FFT)The Fourier Transform is a mathematical technique usedto analyze signals in the frequency domain. FFT is aparticular algorithm that efficiently calculates the Fourier Transform of a discrete signal. It is widely used in digital signal processing to analyze and process signals in the frequency domain.5. Digital Signal Processors (DSPs)Digital Signal Processors (DSPs) are specialized microprocessors used to perform DSP operations. DSPs are used in devices such as cellphones, wireless modems, televisions, and audio devices. They are optimized for performing complex mathematical operations required in digital signal processing.In conclusion, digital signal processing has become an essential technology in many fields, from communications to audio signal processing. Understanding the terminologies used in DSP is vital in learning and applying this technology. The above-mentioned terminologies are some of the most common terms used in DSP, and having a good understanding of thesewill help you to get started in this exciting field.。

我收集到的最齐全的音频专业术语中英文对照表翻译交流

我收集到的最齐全的音频专业术语中英文对照表翻译交流AAAC automatic ampltiude control 自动幅度控制AB AB制立体声录音法Abeyancd 暂停，潜态A—B repeat A-B重复ABS absolute 绝对的，完全的，绝对时间ABS american bureau of standard 美国标准局ABSS auto blank secrion scanning 自动磁带空白部分扫描Abstime 绝对运行时间A.DEF audio defeat 音频降噪，噪声抑制，伴音静噪ADJ adjective 附属的，附件ADJ Adjust 调节ADJ acoustic delay line 声延迟线Admission 允许进入，供给ADP acoustic data processor 音响数据处理机ADP（T）adapter 延配器，转接器ADRES automatic dynamic range expansion system 动态范围扩展系统ADRM analog to digital remaster 模拟录音、数字处理数码唱盘ADS audio distribution system 音频分配系统A.DUB audio dubbing 配音，音频复制，后期录音ADV advance 送入，提升，前置量ADV adversum 对抗ADV advancer 相位超前补偿器Adventure 惊险效果AE audio erasing 音频(声音）擦除AE auxiliary equipment 辅助设备Aerial 天线AES audio engineering society 美国声频工程协会AF audio fidelity 音频保真度AF audio frequency 音频频率AFC active field control 自动频率控制AFC automatic frequency control 声场控制Affricate 塞擦音AFL aside fade listen 衰减后（推子后)监听A-fader 音频衰减AFM advance frequency modulation 高级调频AFS acoustic feedback speaker 声反馈扬声器AFT automatic fine tuning 自动微调AFTAAS advanced fast time acoustic analysis system 高级快速音响分析系统After 转移部分文件Afterglow 余辉，夕照时分音响效果Against 以……为背景AGC automatic gain control 自动增益控制AHD audio high density 音频高密度唱片系统AI advanced integrated 预汇流AI amplifier input 放大器输入AI artificial intelligence 人工智能AI azimuth indicator 方位指示器A—IN 音频输入A-INSEL audio input selection 音频输入选择Alarm 警报器ALC automatic level control 自动电平控制ALC automatic load control自动负载控制Alford loop 爱福特环形天线Algorithm 演示Aliasing 量化噪声，频谱混叠Aliasing distortion 折叠失真Align alignment 校正，补偿，微调,匹配Al-Si—Fe alloy head 铁硅铝合金磁头Allegretto 小快板，稍快地Allegro 快板，迅速地Allocation 配置,定位All rating 全（音）域ALM audio level meter 音频电平表ALT alternating 震荡，交替的ALT alternator 交流发电机ALT altertue 转路ALT—CH alternate channel 转换通道，交替声道Alter 转换，交流电，变换器AM amperemeter 安培计，电流表AM amplitude modulation 调幅（广播）AM auxiliary memory 辅助存储器Ambience 临场感，环绕感ABTD automatic bulk tape degausser 磁带自动整体去磁电路Ambient 环境的Ambiophonic system 环绕声系统Ambiophony 现场混响,环境立体声AMLS automatic music locate system 自动音乐定位系统AMP ampere 安培AMP amplifier 放大器AMPL amplification 放大AMP amplitude 幅度，距离Amorphous head 非晶态磁头Abort 终止，停止（录制或播放)A-B TEST AB比较试听Absorber 减震器Absorption 声音被物体吸收ABX acoustic bass extension 低音扩展AC accumulator 充电电池AC adjustment caliration 调节—校准AC alternating current 交流电，交流AC audio coding 数码声，音频编码AC audio center 音频中心AC azimuth comprator 方位比较器AC—3 杜比数码环绕声系统AC-3 RF 杜比数码环绕声数据流（接口）ACC Acceleration 加速Accel 渐快，加速Accent 重音，声调Accentuator 预加重电路Access 存取，进入，增加，通路Accessory 附件（接口），配件Acryl 丙基酰基Accompaniment 伴奏,合奏,伴随Accord 和谐，调和Accordion 手风琴ACD automatic call distributor 自动呼叫分配器ACE audio control erasing 音频控制消磁A—Channel A（左)声道Acoumeter 测听计Acoustical 声的,声音的Acoustic coloring 声染色Acoustic image 声像Across 交叉，并行，跨接Across frequency 交叉频率，分频频率ACST access time 存取时间Active 主动的，有源的，有效的，运行的Active crossover 主动分频，电子分频，有源分频Active loudsperker 有源音箱Armstrong MOD 阿姆斯特朗调制ARP azimuth reference pulse 方位基准脉冲Arpeggio 琶音Articulation 声音清晰度，发音Artificial 仿……的，人工的，手动（控制) AAD active acoustic devide 有源声学软件ABC auto base and chord 自动低音合弦Architectural acoustics 建筑声学Arm motor 唱臂唱机Arpeggio single 琶音和弦，分解和弦ARL aerial 天线ASC automatic sensitivity control 自动灵敏度控制ASGN Assign 分配，指定，设定ASP audio signal processing 音频信号处理ASS assembly 组件，装配，总成ASSEM assemble 汇编，剪辑ASSEM Assembly 组件，装配，总成Assign 指定，转发，分配Assist 辅助（装置）ASSY accessory 组件，附件AST active servo techonology 有源伺服技术A Tempo 回到原速Astigmatism methord 象散法BB band 频带B Bit 比特,存储单元B Button 按钮Babble 多路感应的复杂失真Back 返回Back clamping 反向钳位Back drop 交流哼声，干扰声Background noise 背景噪声，本底噪声Backing copy 副版Backoff 倒扣,补偿Back tracking 补录Back up 磁带备份，支持,预备Backward 快倒搜索Baffle box 音箱BAL balance 平衡，立体声左右声道音量比例，平衡连接Balanced 已平衡的Balancing 调零装置，补偿，中和Balun 平衡=不平衡转换器Banana jack 香蕉插头Banana bin 香蕉插座Banana pin 香蕉插头Banana plug 香蕉插头Band 频段，Band pass 带通滤波器Bandwidth 频带宽，误差，范围Band 存储单元Bar 小节，拉杆BAR barye 微巴Bargraph 线条Barrier 绝缘(套）Base 低音Bass 低音，倍司(低音提琴）Bass tube 低音号，大号Bassy 低音加重BATT battery 电池Baud 波特（信息传输速率的单位）Bazooka 导线平衡转接器BB base band 基带BBD Bucket brigade device 戽链器件（效果器）B BAT Battery 电池BBE 特指BBE公司设计的改善较高次谐波校正程度的系统BC balanced current 平衡电流BC Broadcast control 广播控制BCH band chorus 分频段合唱BCST broadcast （无线电）广播BD board 仪表板Beat 拍，脉动信号Beat cancel switch 差拍干扰消除开关Bel 贝尔Below 下列，向下Bench 工作台Bend 弯曲，滑音Bender 滑音器BER bit error rate 信息差错率BF back feed 反馈BF Backfeed flanger 反馈镶边BF Band filter 带通滤波器BGM background music 背景音乐Bias 偏置,偏磁，偏压，既定程序Bidirectional 双向性的,8字型指向的Bifess Bi—feedback sound system 双反馈系统Big bottom 低音扩展,加重低音Bin 接收器，仓室BNG BNC连接器（插头、插座），卡口同轴电缆连接器Binaural effect 双耳效应，立体声Binaural synthesis 双耳合成法Bin go 意外现象Bit binary digit 字节,二进制数字，位Bitstream 数码流，比特流Bit yield 存储单元Bi—AMP 双(通道）功放系统Bi-wire 双线（传输、分音）Bi—Wring 双线BK break 停顿，间断BKR breaker 断电器Blamp 两路电子分音Blanking 关闭，消隐，断路Blaster 爆裂效果器Blend 融合（度）、调和、混合Block 分程序，联动,中断Block Repeat 分段重复Block up 阻塞Bloop （磁带的)接头噪声，消音贴片BNC bayonet connector 卡口电缆连接器Body mike 小型话筒Bond 接头，连接器Bongo 双鼓Boom 混响,轰鸣声Boomy 嗡嗡声（指低音过强）Boost 提升（一般指低音）,放大，增强Booth 控制室，录音棚Bootstrap 辅助程序，自举电路Both sides play disc stereo system双面演奏式唱片立体声系统Bottoming 底部切除,末端切除Bounce 合并Bourclon 单调低音Bowl 碗状体育场效果BP bridge bypass 电桥旁路BY bypass 旁通BPC basic pulse generator 基准脉冲发生器。

Digital-Signal-Processing数字信号处理大学毕业论文英文文献翻译及原文

毕业设计（论文）外文文献翻译文献、资料中文题目：数字信号处理文献、资料英文题目：Digital Signal Processing 文献、资料来源：文献、资料发表（出版）日期：院（部）：专业：班级：姓名：学号：指导教师：翻译日期： 2017.02.14数字信号处理一、导论数字信号处理（DSP）是由一系列的数字或符号来表示这些信号的处理的过程的。

数字信号处理与模拟信号处理属于信号处理领域。

DSP包括子域的音频和语音信号处理，雷达和声纳信号处理，传感器阵列处理，谱估计，统计信号处理，数字图像处理，通信信号处理，生物医学信号处理，地震数据处理等。

由于DSP的目标通常是对连续的真实世界的模拟信号进行测量或滤波，第一步通常是通过使用一个模拟到数字的转换器将信号从模拟信号转化到数字信号。

通常，所需的输出信号却是一个模拟输出信号，因此这就需要一个数字到模拟的转换器。

即使这个过程比模拟处理更复杂的和而且具有离散值，由于数字信号处理的错误检测和校正不易受噪声影响，它的稳定性使得它优于许多模拟信号处理的应用（虽然不是全部）。

DSP算法一直是运行在标准的计算机，被称为数字信号处理器（DSP）的专用处理器或在专用硬件如特殊应用集成电路（ASIC）。

目前有用于数字信号处理的附加技术包括更强大的通用微处理器，现场可编程门阵列（FPGA），数字信号控制器（大多为工业应用，如电机控制）和流处理器和其他相关技术。

在数字信号处理过程中，工程师通常研究数字信号的以下领域：时间域（一维信号），空间域（多维信号），频率域，域和小波域的自相关。

他们选择在哪个领域过程中的一个信号，做一个明智的猜测（或通过尝试不同的可能性）作为该域的最佳代表的信号的本质特征。

从测量装置对样品序列产生一个时间或空间域表示，而离散傅立叶变换产生的频谱的频率域信息。

自相关的定义是互相关的信号本身在不同时间间隔的时间或空间的相关情况。

二、信号采样随着计算机的应用越来越多地使用，数字信号处理的需要也增加了。

英语语音信号处理技术研究与应用

英语语音信号处理技术研究与应用在信息时代，语音处理技术（Speech Processing Technology）受到广泛关注。

其中，语音信号处理技术的研究与应用是一项重要的研究领域。

英语作为全球共通的语言之一，其语音信号处理技术的研究与应用也备受关注。

一、英语语音信号处理技术的研究现状英语语音信号处理技术的研究主要涉及到音频录制、音频分析、音频压缩、自然语言处理等方面。

其中，音频录制是语音信号处理技术的基础，音频质量的提高对于后续的分析与处理都有着至关重要的作用。

在音频分析方面，常见的方法有基于短时傅里叶变换（STFT）的频域分析方法、本质波形重建（PSOLA）方法、隐马尔科夫模型（HMM）方法等。

这些方法将语音信号转化为数字信号，从而进行后续的处理。

音频压缩是语音信号处理技术的另一个核心部分。

传统的音频压缩方法有短时傅里叶变换-反变换压缩（STFT-ICTF）方法、小波变换方法等。

这些方法的特点在于能够在保证音频质量的前提下，减小音频文件的大小，从而便于存储和传输。

自然语言处理是语音信号处理技术的重要应用领域之一。

自然语言处理技术的发展，对于智能语音助手等产品的应用非常有帮助。

如今，基于深度学习的自然语言处理方法已经成为研究热点之一。

这些方法主要通过对语音信号进行分析和处理，从而提取出语音中的特征及其语义信息，帮助智能语音助手更好地理解用户的需求。

二、英语语音信号处理技术的应用场景1、英语教育语音信号处理技术在英语教育中有着广泛的应用。

通过语音识别技术，可以帮助学生更好地学习英语发音；音频压缩技术可以将优秀的英语教育资源压缩传输到全球各地；自然语言处理技术也可以帮助学生更好地理解英语语言的意思，并且进行在线语音交流。

2、智能家居随着智能家居的发展，智能语音助手成为越来越多用户的选择。

语音信号处理技术在智能家居中扮演着很重要的角色。

它可以帮助智能语音助手更好地识别用户的语音指令，从而实现远程控制家居设备的功能。

《语音信号处理》讲稿第1章

别。
05 语音信号处理的挑战与展望
语音信号处理的挑战
噪声干扰
语音信号在采集、传输和处理过程中容易受到各种噪声的干扰，如环境噪声、设备噪声等，导致语音质量下降。
多变性
语音信号具有极大的多变性，不同人的发音、语速、语调等差异较大，给语音信号处理带来很大的挑战。
实时性要求
许多语音信号处理应用需要实时处理，如语音识别、语音合成等，对算法的复杂度和处理速度要求较高。
语音信号的基本特征
01 02
时域特征
语音信号在时域上表现为振幅随时间变化的波形。时域特征包括短时能量、短时过零率、短时自相关函数等，用于描述语音信号的幅度、频率和周期性等特性。
频域特征
语音信号在频域上表现为不同频率成分的分布。频域特征包括频谱、功率谱、倒谱等，用于描述语音信号的频率结构、共振峰和声学特性等。
倒谱分析
对语音信号的频谱进行对数运算后，再进行傅里叶反变换，得到倒谱系数，用于语音合成、说话人识别等。
倒谱分析方法
线性预测倒谱系数（LPCC）
01
基于线性预测模型的倒谱系数，用于描述语音信号的声道特性。
梅尔频率倒谱系数（MFCC）
02
基于人耳听觉特性的倒谱系数，具有较好的抗噪性和鲁棒性，
广泛应用于语音识别、说话人识别等领域。
基音周期和基音频率
反映语音信号的周期性特征，是语音信号处理中的重要参数。
语音信号的识别技术
模板匹配法
将待识别语音与预先存储的模板进行比较，选取最相似的模板作
为识别结果。
随机模型法
利用统计模型来描述语音信号的特征，通过模型参数的训练和识
别来实现语音信号的识别。
人工智能方法
包括神经网络、支持向量机、深度学习等方法，通过训练和学习来建立语音信号与语义之间的映射关系，实现语音信号的智能识

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

附录：中英文翻译15 S p eech S ig n a l P r ocessi ng15 .3 Ana l y si s a nd S y n the sisJ esse W. FussellA fte r an acousti c spee ch s i gnal i s conve rte d to an ele ctri cal si gnal by a mi crophone, i t m ay be desi rable to anal yze the ele ctri cal si gnal to e stim a te some time -v ary i ng paramete rs whi ch provide i nform ati on about a model of the speech producti on me chanism. S peech a na ly sis i s the process of e stim ati ng such paramete rs. S imil arl y , g ive n some parametri c model of spee ch production and a se que nce of param eters for that m ode l, speech sy n thesi s is the proce ss of cre ating an ele ctri cal s i gnal w hi ch approxim ate s spe ech. W hile anal y si s and sy nthesis te chni ques m ay be done eithe r on the continuous si gnal or on a sam pled ve rsi on of the si gnal, most mode rn anal y sis and sy nthesis methods are base d on di gital si gnal processing.A ty pi cal spee ch production m odel is shown in Fi g . 15 .6. In this m odel the output of the ex citati on function is scale d by the gai n paramete r and then filtere d to produce spee ch. A ll of these functions are ti me -v ary ing.F IGUR E 15 .6 A ge ne ra l spee ch production model.F IGUR E 1 5 .7 W ave form of a spoken phone me /i/ as i n beet.For m any models, the parame ters are v arie d at a pe riodi c rate, ty pi call y 5 0 to 100 time s pe r se cond. M ost spee ch inform ati on is containe d i n the porti on of the si gnal bel ow about 4 k Hz.The ex citati on is usuall y modele d as e ithe r a mi xture or a choi ce of random noise and periodi c w aveform . For hum an spee ch, v oi ced e x citati on occurs w hen the vocal fol ds in the lary nx vibrate; unvoi ce d e x citati on occurs at constri cti ons i n the vocal tract w hi ch cre ate turbulent a i r fl ow [Fl anagan, 1965] . The rel ati ve mi x of these tw o ty pe s of ex citati on is terme d “v oicing .”In addition, the periodi c e xcitation i s characte rized by a fundame ntal fre quency , te rmed p itch or F0 . The ex citati on is scaled by a factor de si gne d to produce the prope r ampli tude or level of the spee ch si gnal . The scaled ex citati on function i s then fi ltere d to produce the prope r spe ctral characte risti cs. W hile the filter m ay be nonli near, i t i s usuall y m odele d as a li near function.An a l y sis of Excit a t ionIn a si m plified form, the ex citati on function m ay be consi dere d to be purel y pe ri odi c, for v oi ced speech, or purel y random, for unvoi ce d. T hese tw o states correspond to voi ce d phoneti c cl asse s such as vow els and nasals and unvoi ce d sounds such as unvoi ce d fri catives. This binary voi ci ng m odel is an ove rsi mplifi cation for sounds such as v oi ced fri cati ves, whi ch consist of a mi xture of peri odi c and random compone nts. Fi gure 1 5.7 is an ex ample of a time w ave form of a spoke n /i/ phoneme , w hi ch is w ell m odeled by onl y pe riodi c e x citati on.B oth ti me dom ai n and frequency dom ai n anal y s is te chni ques have bee n used to esti m ate the de gree of voi ci ng for a short se gme nt or frame of spee ch. One ti me dom ain fe ature, te rme d the ze ro crossing rate, i s the numbe r of ti mes the si gnal changes si gn in a short i nte rv al . As show n i n Fi g . 1 5. 7, the z ero crossing rate for voi ce d sounds i s rel ati vel y low . S i nce unvoi ce d spee ch ty pi call y has a la rger proportion of hi gh-frequency energy than voi ce d spee ch, the ratio of high-fre que ncy to low -frequency e nergy is a fre que ncy dom ai n techni que that provi des i nform ation on voi ci ng.A nothe r measure use d to estim ate the de gree of voi ci ng is the autocorrel ation functi on, w hi ch is de fine d fora sam pled speech se gment, S , asw here s( n) is the val ue of the nth sam ple w i t hi n the se gme nt of le ngth N. S ince the autocorrel ati on function of a periodi c functi on is i tsel f pe ri odi c, voi ci ng can be e sti mated from the de gree of pe ri odi city of the autocorrel ati on function. Fi gure 15. 8 i s a graph of the nonne gati ve te rms of the autocorrel ation functi on for a 64 -ms frame of the w aveform of Fi g . 15. 7. Ex cept for the de cre ase i n amplitude w ith i ncre asi ng lag, whi ch results from the re ctangul ar wi ndow functi on w hi ch delim its the se gment, the autocorre lati on function i s see n to be quite pe riodi c for thi s voi ce d utterance.F IGUR E 1 5 .8 A utocorrel ati on functi on of one frame of /i /.If an anal y sis of the voicing of the spee ch si gnal i ndi cate s a voice d or pe ri odi c com ponent is prese nt, another ste p i n the anal y si s process m ay be to estim ate the freque ncy ( or pe ri od) of the voi ce d com ponent. There are a num ber of w ay s in whi ch this m ay be done. One is to me asure the ti me l apse between pe aks i n the time dom ai n si gnal. For ex am ple i n Fi g . 15.7 the m aj or peaks are separate d by about 0. 00 71 s, for a fundame ntal fre quency of about 1 41 Hz. Note, it w oul d be quite possible to e rr i n the e stim ate of fundame ntal fre quency by mistaki ng the sm aller pe aks that occur betwee n the m a jor pe aks for the m aj or pe aks. These sm alle r pe aks are produced by resonance i n the v ocal tract w hi ch, i n this e x ample , happen to be at about twi ce the ex ci tationfre quency . T his ty pe of e rror w ould re sult in an e sti m ate of pitch approxi m atel y tw i ce the corre ct fre quency .The di stance betw ee n m ajor pe ak s of the autocorrel ation functi on is a closel y rel ate d fe ature that isfre quentl y use d to esti m ate the pitch pe ri od. In Fi g . 15. 8, the di stance between the m aj or peaks in the autocorrel ati on function i s about 0. 00 71 s. Esti m ates of pi tch from the autocorrel ation functi on are alsosusce pti ble to mistaking the fi rst vocal track resonance for the g l ottal e x citati on freque ncy.The absol ute m agnitude di ffere nce functi on ( AM DF), de fi ned as,is another functi on w hi ch is often use d i n estim ating the pitch of voi ce d spee ch. A n ex ample of the AM DF is shown in Fi g. 15. 9 for the same 6 4 -m s frame of the /i / phoneme. How e ve r, the minim a of the AM DF i s used as an indi cator of the pitch pe ri od. The AM DF has been show n to be a good pitch pe riod i ndi cator [R oss et al. ,19 74 ] and does not requi re multi pli cati ons.F ou r ie r An a ly sisOne of the m ore comm on processe s for e stim ating the spe ctrum of a se gme nt of spee ch is the Fourie r transform [ Oppenheim and S chafer, 1 97 5 ]. T he Fourie r transform of a seque nce is m athem ati call y de fine d asw here s( n) represe nts the terms of the sequence. The short-ti me Fourier transform of a seque nce i s atimede pende nt functi on, de fi ned asF IGUR E 1 5 .9 A bsolute m agnitude diffe rence functi on of one frame of /i/.w here the w indow function w( n) is usuall y ze ro ex ce pt for some fi nite range, and the vari able m is used to select the se cti on of the se que nce for anal y sis . The di screte Fourier transform ( DFT) i s obtai ned by uni forml y sam pling the short-ti me Fourie r transform i n the fre quency dime nsi on. Thus an N-point DFT is computed usi ng Eq. ( 15 .1 4),w here the set of N sample s, s( n), m ay have first been multiplie d by a window function. A n e x am ple of the m agnitude of a 5 12 -poi nt DFT of the w aveform of the /i/ from Fi g . 15. 10 i s show n i n Fi g. 15 .10. Note for this fi gure, the 512 poi nts in the se que nce have been m ulti plied by a Ham ming w i ndow de fi ned byF IGUR E 1 5 .1 0 M agnitude of 51 2-point FFT of Ham mi ng window e d /i/.S ince the spe ctral characteristi cs of spee ch m ay change dram a ti call y in a fe w milli se conds, the le ngth, ty pe, and l ocation of the wi ndow function are im portant consi derati ons. If the w indow is too long, changi ng spe ctral characte risti cs m ay cause a blurred result; if the w indow is too short, spe ctral i naccuracies re sult. A Ham ming wi ndow of 16 to 32 m s durati on is com m onl y use d for spee ch anal ysis.S everal characte risti cs of a speech utte rance m ay be dete rmine d by ex amination of the DFT m agnitude. In Fi g. 1 5. 10, the DFT of a v oi ce d utterance contai ns a se ries of sharp pe aks i n the fre quency dom ai n. T hese peaks, caused by the peri odi c sampl ing acti on of the g lottal ex ci tation, are separated by the fundame ntal fre quency w hi ch is about 141 Hz, i n this e x am ple. In addi tion, broader pe aks can be se en, for e x ample a t about 300 Hz and at about 2300 Hz. T hese broad peaks, calle d formants, result from resonances in the vocal tract.L in ea r P r ed ictive An a l y sisGi ven a sam pled ( discrete-ti me) si gnal s( n), a pow e rful and ge ne ral parame tri c model for ti me se ries anal y s is isw here s( n) i s the output and u( n) i s the input ( perhaps unknow n). The model parameters are a( k) for k = 1, p, b( l ) for l = 1, q, and G. b( 0) is assume d to be unity. Thi s m odel , describe d as an autore g ressi ve m ov ing average ( AR M A) or pole -ze ro m odel , forms the foundation for the anal y s is method te rme d li ne ar pre di ction. An autore gressive ( AR ) or all -pole model, for w hi ch all of the “b”coe ffi cie nts ex ce pt b( 0 ) are ze ro, i s freque ntl y used for spee ch anal y si s [M arkel and Gray, 1976 ].In the standard A R formul ati on of li ne ar predi ction, the model paramete rs are sele cte d to mi ni mize the me an-square d error betw ee n the m ode l and the speech data. In one of the v ariants of line ar pre di cti on, the autocorrel ati on method, the mi nimiz ation is carrie d out for a wi ndowe d se gment of data. In the autocorrel ation method, mi nimizi ng the me an-square e rror of the ti me dom ai n sam ples is equivalent to mi nimizing the inte grate d rati o of the si gnal spectrum to the spe ctrum of the all -pole m odel. Thus, line ar predi ctive anal y sis i s a good method for spe ctral anal y sis w hene ver the si gnal is produce d by an a ll -pol e sy ste m. M ost spee ch sounds fi t thi s model w ell .One ke y consi deration for li near pre dicti ve anal y si s is the order of the model, p. For spee ch, if the orde r is too sm all , the form ant s tructure is not we ll re pre sente d. If the order i s too l arge , pitch pul ses as well as form ants be g in to be represe nted. Tenth- or tw el fth-orde r anal y si s is ty pi cal for spe ech. Fi gures 1 5. 11 and 15.1 2 provi de e x amples of the spe ctrum produced by ei ghth-orde r and si xteenth-order line ar predi cti ve anal y sis of the /i / w ave form of Fi g . 1 5.7 . Fi gure 15 .11 show s there to be three form ants at fre que ncies of about 30 0,23 00, and 3200 Hz , whi ch are ty pi cal for an /i/.H om om or p h ic ( C epst r a l) A n a l y si sFor the speech m odel of Fi g. 15. 6, the e x citati on and filter i mpulse response are convol ved to produce thespee ch. One of the problem s of speech anal y sis is to separate or de convolve the spee ch into the se tw o com ponents. One such te chni que is called hom omorphi c filte ri ng [ Oppe nheim and S chafer, 1968 ]. The characte risti c sy ste m for a sy ste m for hom om orphi c deconvol ution conve rts a convolution operation to anadditi on ope ration. The output of such a characteristi c sy stem is calle d the com ple x cep str u m . The com plexcepstrum i s defi ned as the i nve rse Fourie r transform of the com plex logarithm of the Fourie r transform of the input. If the i nput sequence is mi nim um phase ( i .e., the z-transform of the input se que nce has no poles or ze ros outside the unit ci rcle), the se quence can be represe nted by the real portion of the transforms. Thus, the re al cepstrum can be com pute d by cal cul ati ng the inve rse Fourie r transform of the log- spe ctrum of the input.F IGUR E 15 .11 Eig hth-orde r li ne ar predi ctive anal y sis of an “i”.F IGUR E 1 5 .1 2 S ixteenth-orde r l ine ar pre di cti ve anal y si s of an “i”.Fi gure 1 5.1 3 show s an e x ample of the cepstrum for the voi ced /i/ utterance from Fi g. 15.7 . The cepstrum of such a voi ce d utterance i s characte rized by rel ati vel y la rge v alues in the fi rst one or tw o milli se conds as w ell asby pulses of de cayi ng am plitude s at m ulti ples of the pitch pe riod. T he fi rst tw o of these pulses can cle arl y be seen i n Fi g . 15. 13 at ti me l ags of 7 .1 and 1 4. 2 ms. The locati on and ampl itudes of these pulses m ay be used to estim ate pitch and v oi cing [R abi ner and S chafe r, 1978 ].In additi on to pitch and voi cing estim ation, a smooth log m agnitude function m ay be obtaine d bywi ndow i ng or “l i fte ring ”the ce pstrum to eli mi nate the te rm s w hi ch contain the pitch i nform ation. Fi gure 15. 14 is one such smoothed spectrum . It w as obtai ned from the DFT of the ce pstrum of Fi g . 15.1 3 afte r fi rst setting all terms ofthe cepstrum to ze ro ex ce pt for the fi rst 16.F IGUR E 15 .13 R eal ce pstrum of /i /.F IGUR E 15 .14 S moothe d spe ctrum of /i/ from 16 poi nts of ce pstrum.S p eech S y nth esisS pee ch sy nthesis is the cre ati on of spee ch-li ke w aveform s from te xtual w ords or sy m bols. In gene ral, the spee ch sy nthesi s process m ay be divi ded into three le vels of processi ng [ Kl att, 1 98 2] . T he first le vel transform s the te xt i nto a se ries of acousti c phoneti c sy m bols, the se cond transforms those sy mbols to。

语音信号处理中英文翻译

合集下载