Study on Acoustic Modeling in a Mandarin Continuous Speech Recognition
- 格式:pdf
- 大小:182.20 KB
- 文档页数:4
语言与认知研究中心关于中国社科院语言所李爱军研究员讲座的通知题目:语音学和语音学的应用研究主讲人:中国社科院语言所语音室主任李爱军研究员、博士导师时间:2008年9月8日下午2点钟地点:浙江大学语言与认知研究中心会议室(西溪校区教学主楼259室)欢迎参加!语言与认知研究中心2008-9-1中国社科院语言所语音实验室主任李爱军研究员的简介职务和职称:实验室主任、研究员博士导师中国语言学会语音学分会秘书长中国声学学会语言、听觉和音乐声学分会委员专业方向和研究领域:计算机语音处理(语音合成、语音数据库、语音特性分析)。
发表论文:Li, Aijun (1991), The development of a real-time speech recognition system based on an advanced VQ method,MA Thesis. TianJin,China.Li, Aijun (1992), The development of a real-time,small vocabulary speech recognition system on an advanced VQ method,in proceedings of the 4th signal processing nationalconference,pp.651-654,Chengdu, China.Li,Aijun(1994),Duration Characteristics of stress and its synthesis rules on Standard Chinese, Report of Phonetic Research,CASS.Li,Aijun (1995), The development of a waveform synthesis system based on phonetic rules. Report of Phonetic Research,CASS.J.F. Cao & A.J. Li (1995), The study of retroflexed and neutralized syllables and their synthesis in Standard Chinese, Proceedings of the 7th National Academic Conf. on Signal Processing of Speech, Image and Communication, Xi’an: Xi’an Electrical University of Science and Technology.Li, Aijun & Yang,Sun’an (1995), Speech Synthesis,in “Digital Signal Processings of Speech”,Publishing House of Electronics Industry.Li, Aijun(1997),Pausing in News Broadcasting in Standard Chinese, CYCA’97, Ha'erbin Engineering University Press. pp.262-266.Li, Aijun (1997) Perceptual study of intersyllabic formant transitions insynthesized V1-V2 in Standard Chinese. In Proceedings of 5th European Conference on Speech Communication and Technology, Vol. 4: 2143-2146.Li, Aijun & Zu, Yiqing (1997) Database design for speech syntheis and speech recognition. In “The interface of IT computer and its pregress in applications”— Proceedings of the 3rd national conference of computer IT interface and its applications, pp.174-178, eds. By Wu quanyuan & Qian Yueliang. Publishing House of Electronics Industry.Li, Aijun(1998) Durational Characteristics of the Prosodic Phrase in Standard Chinese, in the proceedings of the conference on phonetics of the languages in China. pp.65-68. Edited by Eric Zee & Lin,Maocan.Li, Aijun(1999),A national database design for speech synthesis and prosodic labeling of standard Chinese, In proceedings of oriental COCOSDA’99, TaiPei, TaiWan.Li Aijun (1999), Acoustic analysis on prosodic phrase and the sentence stress,in proceedings of the 4th national modern phonetics conference, edited by Lv Shinan, Jingcheng publishing house.Li Aijun, ZHENG Fang,William Byrne, et al. (2000), Cass: “A Phonetically Transcribed Corpus of Mandarin Spontaneous”, ICSLP’2000.Li Aijun, Lin Maocan, ChenXiaoXia, et al.(2000), “Spee ch corpus of Chinese discourse and the phonetic research”, ICSLP’2000.Li Aijun, Chen Xiaoxia, et al. (2000), “The phonetic labeling on read and spontaneous discourse corpora”, ICSLP’2000.Li Aijun, Chen Xiaoxia, et al.(2000) Speech corpus collecti on and annotation, ISCSLP’2000.Li Aijiun (2000), “Perspectives of Basic Research Highlighted in ICSLP’2000”, in the summary session of ICSLP’2000.Li Aijun, Xu Bo,et.al.(2001), A spontaneous Conversation Corpus CADCC,Oriental COCOCSDA’2001, Korea.Li Aijun (2002), Chinese Prosody and Prosodic Labeling of Spontaneous Speech, Prosody Speech 2002, AIX-EN-PROVENCE France.李爱军陈肖霞孙国华华武殷治纲,CASS:一个具有语音学标注的汉语口语语音库,《当代语言学》,2002.Li Aijun (2003), Prosodic Boundary Perception in Spontaneous Speech of Standard Chinese, Proceedings of ICPHS2003.Liu Yabin and Li Aijun 2003), Cues of Prosodic Boundaries in Chinese Spontaneous Speech, Proceedings of ICPHS2003, Barcelona.Li Aijun,Xia Wang (2003), A Contrastive Investigation of Standard Mandarin and Accented Mandarin, Proceedings of Eurospeech2003, Geniva.于珏,李爱军,王霞(2003), 上海普通话与普通话卷舌元音的声学特征对比研究, 第六届全国现代语音学学术会议论文集.于珏,李爱军,王霞(2003), 上海普通话与普通话元音系统的声学特征对比研究,第七届全国人机语音通讯学术会议.陈娟文,李爱军,王霞(2003), 上海普通话和普通话词重音的差异, 第六届全国现代语音学学术会议论文集.陈娟文,李爱军,王霞(2003), 上海普通话与普通话双音节词连读调的差异,第七届全国人机语音通讯学术会议论文集.刘亚斌,李爱军(2003),自然口语对话中的边界征兆,第六届全国现代语音学论文集.王海波,李爱军(2003),普通话情绪语音库的建立及听辨实验,第六届全国现代语音学学术会议论文集.方强,李爱军(2003),普通话鼻化元音的研究,第六届全国现代语音学学术会议论文集.王天庆,李爱军(2003),连续汉语语音识别语料库的设计,第六届全国现代语音学会论文集. 李爱军,王天庆,殷治纲(2003)863语音识别语音语料库RASC863 -- 四大方言普通话语音库*,第七届全国人机语音通讯学术会议.Aijun Li, Fangxin Chen, Haibo Wang, Tianqing Wang (2004),Perception on Synthesized Friendly Speech in Standard Chinese,TAL2004, Beijing.Fangxin Chen, Aijun Li, Haibo Wang, Tianqing Wang, Qiang Fang, (2004),Acoustic Analysis of Friendly Speech, ICASSP2004-5, Montreal, Canada.Aijun Li, Jue Yu, Juanwen Chen, Xia Wang (2004), A Contrastive Investigation of Standard Mandarin and Shanghai-Accented Mandarin, in H. Fujisaki, G. Funt, J. Cao and Y. XU ed.’ From Traditional Phonology to Modern Speech Processing’. Foreign language teaching and research press, 2004.Aijun Li, Haibo Wang(2004) , Friendly Speech Analysis and Perception in Standard Chinese, ICSLP2004, JEJU, Korea.Aijun Li, Zhigang Yin, Tianqing Wang, Qiang Fang, Fang Hu (2004), RASC863 - A Chinese Speech Corpus with Four Regional Accents, ICSLT-o-COCOSDA, New Delhi, India.。
第二语言习得研究问答梳理Document number:WTWYT-WYWY-BTGTT-YTTYU-2018GT第二语言习得研究问答梳理1.如何区分“母语”与“第一语言”、“第二语言”与“第二语言习得环境”答:母语指学习者所属种族、社团使用的语言,第一语言指儿童幼年最先接触和习得的语言。
母语通常就是第一语言,但也有例外的情况。
比如在美国出生的汉族儿童,他最先接触和习得的是英语,英语就是他的第一语言,而他的母语仍然是汉语。
第二语言指学习者在习得第一语言之后习得的语言,第二语言习得环境指学习者所学的语言在语言习得发生的环境中作为交际语言。
第二语言是相对于第一语言就语言习得的时间顺序而言的。
第二语言习得环境跟时序无关,着眼于在哪儿学。
2.如何区分“自然的第二语言习得”与“有指导的第二语言习得”答:从习得方式和环境两方面区分。
自然的第二语言习得(naturalistic SLA)以交际的方式,在自然的社会环境下发生的;有指导的第二语言习得(instructed SLA)以教学指导的方式,在课堂教学环境中发生。
3.什么是语言能力什么是语言表达答:语言能力(competence)是一种反应交际双方语言知识的心理语法,语言表达(performance)是交际双方在语言的理解与生成过程中对其内在语法的运用。
语言能力是关于语言的知识,语言表达是关于语言运用的知识。
4.第二语言习得研究与语言学研究的对象、目的和方法有何不同5.如何看待第二语言习得研究与心理学和心理语言学的关系答:(1)第二语言习得研究与心理学的关系:(2)第二语言习得研究与心理语言学的关系:有人把第二语言习得研究看做心理语言学的分支,其实他们有诸多不同:6.第二语言习得研究的学科性质和学科特点是什么答:跨学科。
7.为什么第二语言习得研究领域的学者把Corder和Selinker发表的文章作为第二语言习得研究的起点答:因为Corder 1967年发表的《学习者偏误的意义》(the significance of learners’ errors)和Selinker 1972年发表的《中介语》(Interlanguage)先后明确了第二语言习得研究的研究对象,创建了相似的理论假说,指明了第二语言习得研究的方向,为后来的第二语言习得研究奠定了坚实的理论基础。
《基于迁移学习的喀尔喀蒙古语语音识别声学建模》篇一一、引言随着人工智能技术的不断发展,语音识别技术在各个领域得到了广泛的应用。
喀尔喀蒙古语作为蒙古族的重要语言之一,其语音识别的研究具有重要的实际意义。
然而,由于喀尔喀蒙古语的语音数据集相对较小,传统的方法在建模时往往面临数据稀疏、模型泛化能力差等问题。
为了解决这些问题,本文提出了一种基于迁移学习的喀尔喀蒙古语语音识别声学建模方法。
二、迁移学习概述迁移学习是一种将在一个任务中学到的知识应用于另一个相关任务的方法。
在语音识别领域,迁移学习可以通过利用大量已标注或未标注的数据,提高模型的泛化能力和识别准确率。
在喀尔喀蒙古语语音识别中,我们可以利用其他语言或领域的数据,通过迁移学习的方法,提高模型的性能。
三、基于迁移学习的声学建模1. 数据预处理在进行声学建模之前,需要对喀尔喀蒙古语的语音数据进行预处理。
包括数据清洗、特征提取等步骤。
其中,特征提取是关键的一步,它可以将语音信号转换为计算机可以处理的数字特征。
2. 模型选择选择合适的声学模型是提高语音识别准确率的关键。
在本文中,我们选择了深度神经网络(DNN)作为声学模型。
DNN可以自动提取音频数据的特征,并通过多层网络结构进行训练和优化。
同时,DNN也具有较好的泛化能力,可以应对不同说话人的口音和语速等问题。
3. 迁移学习应用在模型训练过程中,我们采用了迁移学习的思想。
首先,我们使用大量已标注的通用语言数据对模型进行预训练,使其具有一定的通用性。
然后,我们将预训练好的模型参数迁移到喀尔喀蒙古语的数据上进行微调,以提高模型对喀尔喀蒙古语的适应能力。
四、实验与分析为了验证基于迁移学习的声学建模方法的有效性,我们进行了实验并进行了分析。
实验结果表明,采用迁移学习的方法可以提高模型的泛化能力和识别准确率。
同时,我们还对不同参数设置下的模型性能进行了比较和分析,以找到最佳的参数设置。
五、结论本文提出了一种基于迁移学习的喀尔喀蒙古语语音识别声学建模方法。
Natural Language Processing Techniques Natural Language Processing (NLP) TechniquesNatural Language Processing (NLP) is a subfield of artificial intelligence that focuses on the interaction between computers and humans using natural language. In recent years, NLP techniques have made significant advancements in various applications such as sentiment analysis, chatbots, machine translation, and speech recognition. In this article, we will explore some of the most commonly used NLP techniques and their applications.1. Tokenization: Tokenization is the process of breaking down a text into individual words, phrases, or symbols known as tokens. This technique is essential for many NLP tasks as it helps to convert unstructured text data into a structured format that can be easily processed by machines. Tokenization can be done at different levels, such as word level, sentence level, or character level.2. Part-of-Speech (POS) tagging: POS tagging is the process of assigning a grammatical category (noun, verb, adjective, etc.) to each word in a sentence. This technique helps in understanding the syntactic structure of a sentence and is crucial for tasks like named entity recognition, sentiment analysis, and machine translation.3. Named Entity Recognition (NER): Named Entity Recognition is the task of identifying and classifying named entities (such as names of people, organizations, locations, etc.) in a text. NER is widely used in information extraction, question answering systems, and social media analysis.4. Sentiment Analysis: Sentiment analysis is the process of determining the sentiment expressed in a piece of text, whether it is positive, negative, or neutral. This technique is commonly used in social media monitoring, customer feedback analysis, and brand reputation management.5. Machine Translation: Machine translation is the task of translating text from one language to another automatically. NLP techniques such as neural machine translation have significantly improved the accuracy and fluency of machine translation systems.6. Text Classification: Text classification is the process of categorizing text data into predefined categories or classes. This technique is widely used in spam detection, topic categorization, and sentiment analysis.7. Information Extraction: Information extraction is the process of automatically extracting structured information from unstructured text data. This technique is used in various domains such as web scraping, document summarization, and question answering systems.8. Summarization: Text summarization is the task of generating a concise and coherent summary of a longer text. NLP techniques such as extractive and abstractive summarization have been widely used in news summarization, document summarization, and keyword extraction.9. Word Embeddings: Word embeddings are vector representations of words in a continuous vector space. This technique allows us to capture semantic relationships between words and is crucial for tasks like named entity recognition, sentiment analysis, and machine translation.10. Speech Recognition: Speech recognition is the task of automatically converting spoken language into text. NLP techniques such as acoustic modeling and language modeling have significantly improved the accuracy and performance of speech recognition systems.In conclusion, natural language processing techniques have revolutionized the way we interact with machines and have enabled a wide range of applications in various domains. As NLP continues to evolve and innovate, we can expect even more advanced applications and capabilities in the future.。
基于特征音素的说话人识别方法第28卷第1O期2007年1O月仪器仪表ChineseJournalofScientificInstrumentV o1.28No.100ct.2007基于特征音素的说话人识别方法王昌龙,周福才,凌裕平,於锋(1扬州大学机械学院扬州225009;2扬州大学农学院扬州225009)摘要:本文提出了一种基于特征音素的说话人识别方法,并在低成本门禁系统中获得实现.首先利用清音和浊音悬殊的数字特征将语音信号中的清音和浊音分离,再将分离后的几个浊音的特征频率和相对强度作为特征参数组成3O维特征向量.在Pc上进行了高阶谱分析和快速傅里叶变换,比较了2种方法声韵分离的效果.然后分别用神经网络识别算法和模板比对法进行识别实验,主要应用目标为单住户语音门禁系统,具有自学习功能,能随着家庭成员的年龄和生理变化不断调整特征向量模板,该方法已在低成本单片机系统中实现.关键词:语音信号处理;说话人识别;特征提取;频谱分析中图分类号:TP312文献标识码:A国家标准学科分类代码:510.4010 SpeakerrecognitionbyspecialphonemesWangChanglong,ZhouFucai,LingYuping,YuFeng (JCollegeofMechanicalEngineering,Y angzhouUniversity,Y angzhou225009,China;2CollegeofAgricultural,Y angzhouUniversity,Y angzhou225009,China)Abstract:Aspeakerrecognitionmethodwasproposedbasedonspecialphonemes.Thisspea kerrecognitionmethodwasimplementedinalowcostvoicedoormanagingsystem.V oicedandunvoicedphonemeso ftimedomainspeech signalweredistinguishedbytheirgreatdifferencesindigitalcharacteristics.A30elementeig envectorwasextractedfromseveralvoicedphonemes.BothhighorderspectrumanalysisandFFTrmethodswereem ployedtodistinguishthevoicedandunvoicedphonemes.Thedistinguishingefficienciesofthesemethodswerecomp aredwitheachother. Recognitionexperimentswerecarriedoutusingneuralnetworkandtemplatematchingmeth ods.Singleinhabitant voicedoormanagingsystemisourmainapplicationtarget.Thesystemhasself-learningfunct ion,whichcanadjust theeigenvectoraccordingtophysiologicalchangesoftheinhabitant.Thismethodhasbeenim plementedinlowcostsinglechipmicrocomputersystems.Keywords:speechsignalprocessing;speakerrecognition;featureextraction;spectrumanal ysis1引言目前,开锁公司众多,开锁队伍良莠不齐.只要打个电话,许多服务人员不要求查看任何证件,收费就给开锁,对居室安全带来严重威胁.本课题设计了一种多信息智能锁具,附加在原有锁具附近,具有人声,密码等多种信息识别功能.电子锁具位于室内,有效地防止了非收稿日期:2007-02ReceivedDate:2007-02基金项目:江苏省教委科学研究基金(06KJB510135)资助项目法开锁.该系统以单片机为核心,具备声音采集功能和电磁执行机构.本文主要介绍其中的说话人识别部分.说话人识别是语音识别的一个重要分支,在公安侦察,声控装置,甚至在医生进行病情诊断等方面都有着广泛的应用. 说话人识别和语音识别的主要差别在于,它并不注重语音信号的含义,只是从语音信号中提取出个人的声道,发音习惯等特征信息.因此,说话人识别是深入挖掘出包1832仪器仪表第28卷含在语音信号中的个性因素,而语音识别是从不同人的语音信号中寻找共同因素.2说话人识别基本原理2.1发声方式人的发声过程是通过肺部的收缩压迫气流由支气管经过声门和声道引起音频振荡而产生的.发音有3种激励方式:(1)气流通过声门时声带发生较低频率的振荡, 形成准周期性的空气脉冲,脉冲激励声道产生浊音,脉冲周期就是基音周期;(2)如果把声道最小截面面积控制得很小,气流高速冲过,形成清音;(3)声道某处完全封闭,气流突然冲出形成爆破音.2.2清浊音的分离清音频谱近似白噪声,虽然对于人耳听辩语义有重要的意义,但对于说话人识别不能提供多少有用信息. 而浊音由几种频率的正弦波叠加而成,构成人的"声纹" 数据,对于说话人识别有重要价值引.本文对每一帧采样信号分别进行高阶谱估计和FFr变换.定义浊音度为几个峰值频率的能量之和与总能量之比为:∑s:()V=L(1)∑s2()=I5.02.5孽0.0-2.5-5.O5.02?5孽0.0-2.5-5.0式中:为浊音度,即特征频率能量之和与频谱总能量之比.V=0为清音,相当于白噪声;V=1为浊音;V:0—1之间为清浊音混合帧口.Ⅳ为特征频率(共振峰)个数, 为采样点数的一半(512),s2(i)为第i个特征频率成分的能量.2.3浊音音素特征的提取对于浊音帧,用一组中心频率可调的窄带滤波器对采样信号(t)进行带通滤波,再对各频率分量A,(t)取平方,得到功率谱密度:r,,,.G(:‰寺一㈩d(2)对于离散时问信号(rt),通过对其自相关函数()l-+,作快速傅里叶变换得到功率谱密度:()=2h∑()c0s~xk__zr,其中r:一m,…,一1,0,1,…,m一1,为时延数;h为采样时间间隔;Ⅳ为每帧采样点数.清浊音分离后进行浊音音素的识别,根据人们同一浊音的发音特点,"a","o","e"等韵母中,能量集中的频带内能量之比,不同音素差别很大,可以识别出具体的浊音音素(见图2—5).02505oo750loool25o15001750200022502500025*******loool250l500l75020002250250027503000(b)去除清音帧后的浊音波形~一一一一.一一.…一A一人一H(c)浊音帧频谱图1汉字"胡"的时域波形记录和分离出的浊音波形以及浊音帧频谱Fig.1Timedomainwaveform_recordingofChinesesyllable"hu",itsseparatedvoicewavefo rmandspectrum第1O期王昌龙等:基于特征音素的说话人识别方法l8332.4特征向量的构成说话人识别技术的难点在于尚未找到简单可靠的语音特征参数,也没有发现简单的声学参数能够可靠地识别说话人.而语音信号的时变特征,情绪,环境和健康因素的影响更增加了特征提取的难度,因此研究人员至今仍在不断寻找更好的识别方法.说话人有关的特征参数大体分2类:一类是生理因素决定的固有参数,如基音和共振峰,不易被模仿,但受健康状况和年龄影响;另一类是发音习惯决定的动态参数,比较稳定但容易被模仿".图2—4每幅图包含30多帧对同一个人"a","e","u"3个浊音的采样数据的频谱分析结果,可见频谱比较稳定,但每种频率成分的幅度时变明显.图5为不同音量下特征频率成分的幅度比较,可见随着音量的改变,每种频率的幅度变化趋势相近,即随音量的增加,各频率成分幅度都有相近的增加程度(直流分量除外).这就为进行图2浊音…a'30次采样的频谱Fig.2Spectrumof30voicea'samples图3浊音…e'30次采样的频谱Fig.3Spectrumof30,voicee'samples图4浊音"u"3O次采样的频谱Fig.4Spectrumof30voice''O"samples图5音量变化时频谱幅度同时增减Fig.5Spectrumamplitudevarieswithvolume为了提高可靠性,防止单一音素被他人模仿,根据汉语普通话的音节结构框架细节(见图6),每个人发音习惯不同,过渡方式不同,本文选用了几个特征音素组成高维特征向量,为了增加安全性还可以定期更换特征音素.l23456789图6汉语普通话语音结构框架Fig.6SyllableframeofChinesePutonghua3说话人识别系统的实现用驻极体话筒和音频放大电路获取语音信号,语音信号A/D采样率为8kHz.对采集到的语音信号进行帧分割,每帧1024点.采集的语音信号中提取[a],[o],[I1]等浊音音节,采用神经网络和模板比对2种识别算法进行识别实验, 将现场提取的特征向量与特征库中存储的频谱和相对幅1834仪器仪表第28卷度进行比较.这里提出的相对幅度概念是考虑到人们说话时音量可大可小,因此不同频率成分幅度大小只有相对意义,绝对值并不重要,只按幅度大小排出序号,不计绝对数值.每个浊音只选取频谱中幅度最大的5个频率作为特征,幅度最大的成分相对幅度为1,其余频率的幅度与最大幅度之比作为相对幅度.选取3个浊音建立了一个30 维的浊音特征向量,对于每个浊音来说,由5个频率及其各自的相对幅度组成.考虑到同一个人不同时问说相同内容也是时变的,特别是儿童成长中的"变声"现象,系统设计了学习功能,每次成功识别后对存储于24C02在线串行EPROM中的特征模板按照偏差的10%进行修正.3.1系统的软硬件组成说话人识别系统主要由语音传感器——驻极体话筒,音频信号放大滤波电路,A/D转换电路和单片机控制单元,执行单元组成.软件系统由数据采集与说话人识别软件组成,如图7,图8所示.图7系统硬件结构框图Fig.7Blockdiagramofsystemhardwarestructure采集子程序图8系统软件流程图Fig.8Flowchartofsoftwaresystem研制初期,首先在PC上进行了相关实验,取得了信号采集和特征提取方面的第一手资料.为在单片机系统中实现打下了基础.系统软件主要包括A/D转换启动触发,数据存储和模式识别,神经网络训练过程在Pc上完成.神经网络权值固化在单片机中的FlashROM中, 家庭成员的语音特征向量存储在串行EPROM中,每次识别成功后,按照新的特征向量与原先存储的特征向量差值的1/10进行修正.保证系统可以跟随少年儿童成长逐渐调整特征向量.3.2识别算法的效果比较目前语音识别所应用的模式匹配和模型训练技术主要有动态时间归整技术(DTW),隐马尔可夫模型(HMM)和人工神经元网络(ANN).文本相关的说话人识别方法容易被模仿,伪造,特别是样本数较大时,容易误识别.因此只有提取语音信号中与说话人有关的生物学信息,才不易被模仿,伪造.本文从语音信号中提取与说话人口腔,声道结构,发音习惯有关的特征信息,建立特征库,进行识别实验,取得了较为精确的识别结果,正确识别接收率达到97%.由于语音信号的时变特征,有时需要多次呼叫才能达到这一正确率,这是为了适应门禁系统的要求,允许偏差设置较小造成的,对于安全性要求不高的多人工作场所,可以通过增大允许偏差提高通过速度.4结论本文根据语音识别的基本原理,从语音信号中提取共性信息的一般方法,根据人类语音发音的共性特征提取特定音素.在这一点上比语音识别要求理解语意要简单得多.根据汉字发音的特点,除少数零声母汉字外,均为声一韵结构.而利用声母和韵母悬殊的频谱特征很容易将声母和韵母分离,再将分离后的几个韵母的特征参数组成高维特征向量,该方法易于在低成本单片机系统中实现,主要应用目标为单住户门禁系统.不需要记住特殊的口令,只要随便说几句话,包含常见的韵母即可. 参考文献【1】A VCIE.Anewoptimumfeatureextractionandclassifi. cationforspeakerrecognition[J].GWPNN,ExpertSys.temwithAlications,2007,32(4):485-498.[2]于哲舟.智能仪器嵌入式声纹识别技术方法[J].仪器仪表,2004,25(5):447-450.YUZHZH.Intelligentinstrumentembeddedvoicerecog. nitiontechnology[J].ChineseJournalofScientificIn strument,20O4,25(5):447-450.[3]王成儒.一种用于说话人辨认的概率神经网络的MCE 训练算法[J].仪器仪表,2002,23(8):154.156.W ANGCHR.MCE—basedPNNtrainingalgorithmfor speakeridentification[J].ChineseJournalofScientific第1O期王昌龙等:基于特征音素的说话人识别方法1835 Instrument,2002,23(8):154—156.[4]杨行峻,迟惠生.语音信号数字处理[M].北京:电子工业出版社,1995:4-26.Y ANGXJ.CHIHSH.Speechsignaldataprocess[M]. Bering:ElectronicIndustrialPress,1995:4-26.[5]马建芬.一种基于小波参数滤波的音素分段算法[J].电子测量与仪器,2001(2):36-39.MAJF.Anewspeechsegmentmethodbywaveletpa—rameterfiltering『J].JournalofElectronicMeasurement andInstrument,2001(2):36—39.[6]应娜,赵晓晖.语音清浊音分类及浊音谐波提取算法一三阶累积量基于正弦语音模型的应用[J].计算机工程与应用,2006(1):64.68.YINGN,ZHAOXH.Aspeechunvoiced/voicedclassi—ficationandvoicedharmonicextractionalgorithm--Using third?ordercumulantbasedonsinusoidalspeechmodel [J].ComputerEngineeringandApplications,2006,(1):648.[7]ZEKERIYAT.Aliedmel—frequencydiscretewaveletCO—efficientsandparallelmodelcompensationfornoise--ro-- bustspeechrecognition[J].Speechcommunication,2006,48(1):1294.1307.[8]MAHADEVASR.Extractionofspeaker—specificexitati—oninformationfromlinearpredictionresidualofspeech [J].SpeechCommunication,2006,48(8):1243—1261. [9]DIMITRIOSV.Emotionalspeechrecognition,resouree8, featuresandmethods[J].SpeechCommunication,2006,48(7):1162—1181.[10]ZHENYX.Atreebasedkernelselectionapproachtoef- ficientGaussianmixturemode1.universalbackground modelbasedspeskeridentifyication[J].SpeechCommu. nication,2006,48(9):1273—1282.作者简介王昌龙,男,1963年出生,1984年获浙江大学学士学位,1991年获江苏大学硕士学位,2003年获东南大学博士学位,现为扬州大学副教授,主要研究方向为传感器,信号处理和模式识别.地址:扬州大学机械学院,225009E—mail:******************* WangChanglong,male,bornin1963,receivedBScfrom ZhejiangUniversityin1984,MScfromJiangsuUniversityin 1991.andPhDfromSoutheastUniversityin2003.HeiSnowan associateprofessorinYangzhouUniversity.Hismainresearch fieldsaresensors,DSPandpatternrecognition.Address:MechanicalSchool,Y angzhouUniversity,Y angzhou 22509.Jiangsu,ChinaE.mail:*******************周福才,男,1964年出生,1993年获扬州大学硕士学位,2002年获华南农业大学博士学位,现为扬州大学副教授,主要研究方向为昆虫嗅觉感知.地址:扬州大学农学院,225009E.mail:**************.cnZhouFucai,male,bornin1964,receivedhismasterdegree fromY angzhouUniversityin1993,PhDfromSouthChinaAgIi—culturalUniversityin2002.HeisnowanassistantprofessorinY angzhouUniversity.Hismainresearchfieldsareinsectolfactory sensing.Address:AgriculturalCollege,Y angzhouUniversity,yangzhou 22509,Jiangsu,ChinaE—mail:**************.cn。
1 The ___D______study of language studies the historical development of language over a period of time, it is a historical study.A. synchronicB. descriptive C prescriptive D diachronic2 The process of linguistic study can be summarized as ___BCDE__A. choosing a particular languageB. observing the way language is actually usedC. formulation some hypotheses on the basis of the linguistic facts observedD. testing the hypotheses on the basis of the linguistic factsE. construction a linguistic theory3. All languages are surprisingly _____C______ in their basic structure, whether they are found in South America, Australia or near the North Pole.different B. identical C. similar D. relevant4. The study of language at one time is a _____A___studyA synchronic B. historic C. diachronic D. descriptive5. Saussure’s crucial contribution was his explicit and reiterated statement that all language items are essentially_____C______A.interdependentB. interactC. interrelatedD. interlinked6. If a linguistic study describes and analyzes the language people actually use, it is said to be ___C____A.perspectiveB. analyticC. descriptiveD. linguistic7. Modern linguistics regards the written language as __C_____A. primaryB. correctC. secondaryD. stable8. Language can be used to refer to contexts removed form the immediate situations of the speaker. This feature is called__A__A.displacementB. dualityC. flexibilityD. cultural transmission9. According to F. de. Saussure, ____C_____ refers to the abstract linguistic system shared by all the members of a speech community.A. paroleB. performance C langue D. Language判断对错1.The distinction between competence and performance was proposed by F. de Saussure. F2.Modern linguistics regarded the spoken language as primary, not the written language. T10 General linguistics deals with __ABCDE__ which are applicable in any linguistic study.A. the basic conceptsB. the theoriesC. the descriptionsD. the modelsE. the methods11. __BCDE_____are the interdisciplinary branches of linguistic study.A. pragmaticsB. applied linguisticsC. psycholinguistics and neurological linguisticsD. sociolinguistics and anthropological linguisticsE. mathematical linguistics and computational linguistics12. __ACE____ are the design features of language specified by C. Hockett .A. arbitrarinessB. performanceC. dualityD. langueE. productivity3.A scientific study of language is on the basis of the systematic investigation of language facts. T4.. Langue is relatively stable and systematic while parole is subject to personal and situationalconstraints. T5. A scientific study of language is based on what the linguist thinks. F6.General linguistics, which relates itself to the research of other areas. Studies the basic concepts, theories, descriptions, models and methods applicable in any linguistic study. T7.. General linguistics is generally the study of language as a whole. T8. Modern linguistics is mostly prescriptive, but sometimes descriptive. F9. “A rose by any other name would smell as sweet’, a famous quotation form Shakespeare’s play Romeo and Juliet, illustrates the arbitrary nature of language. T10. Language is productive in that is makes possible the construction and interpretation of new signals by its users, which is why they can produce and understand a large number of sentences. T 11. Language can be taken as a conventional coding system to express thought. T1. A linguistic study is __descriptive __ if describes and analyzes facts observed. It is _prescriptive__ tries to lay down rules for “correct” behavior.2. One general principle of linguistic analysis is the primacy of ___speech________ over writing3. ___phonology__ __syntex_____ and ___ are the three main components of linguistics4. In Saussure’s opinion, what linguists should do is to abstract__langue_____ from ___parole___ since ___parole____ is simply a mass of linguistic fact, too varied and confusing for systematic study.5 The study of language development over a period of time is generally termed as _____diachronic___linguistics6 Linguistic is a branch of science which takes ____language___ as its object of investigation1 A Linguist is interested in ____A_______ primarily.A. speech sounds onlyB. all soundC. written languageD. general theory2 If two sounds are in complementary distribution, they are ___B_____ of the same phoneme.A. symbolsB. allophonesC. phonesD. signs3. ___C____ deals with the production and classification of speech sounds.A. Acoustic phoneticsB. Auditory phoneticsC. Articulatory phoneticsD. Phonetics4. Distinctive features are mainly based on __A______A. the place of articulation and manner of articulationB. the manner of articulation and tongue positionC. the place of articulation and tongue positionD. the manner of articulation5. A sound produced when airflow through the oral cavity is completely blocked and then released is called_____C______A a liquid B. a nasal C. an affricate D a stop6___BC_____ are the branches of linguistic study which related to soundsA. morphologyB. phoneticsC. phonologyD. syntaxE. semantics7. The sounds produced without the vocal cords vibrating are ____A____soundsA. voicelessB. voicedC. V owelD. consonantal1 __phonetics____ provides the means for describing speech sounds while _phonology____ studies the ways in which speech sounds form systems and patterns in human language2. A stop with a fricative release is called a (n)__affricates____3. Superasemental features include _stress______ ___tone_____ and ___intonation________4. __acoustic____ phonetics studies the physical properties of the speech sounds and the sound __waves___ through the use of such machines as a spectrograph5. Speech sounds can be transcribed in two ways. ____broad________ transcription is the transcription with letter-symbols only and ___narrow_______ transcription with letter-symbols together with the diacritics.1 Phonetics is different form phonology in that the latter studies the combinations of the sounds to convey meaning in communication. T2. Vibration of the vocal cords results in a quality of speech sounds called voicing. Which is a feature of all vowels and some consonants. T3 Phonetics studies all the sounds in the world. F1. There are _____B____ morphemes in the word “international”A. twoB. threeC. fourD. five2. Inflectional affixes convey ___B____meaning.A. lexicalB. grammaticalC. morphologicalD. morphemic3. ___ABCD_____ are open categories because new words are constantly addedA. N.B. V.C. Adj.D. Adv. E determiner4. __ABDE______ are closed class words since new words are not usually added to themA. DeterminersB. PronounsC. AdverbsD. ConjunctionsE. Prepositions5. Words are formed by combining a number of distinct units of meaning which are called morpheme. The word disadvantages is combined by the morphemes of ___AD_______A.dis-B. disadvantageC. advantageD. –sE. advantages1 The meaningful components at the lowest level of a word are called ________2. In English__nouns____ ___verbs___ and _____adjectives_______ make up the largest part of the vocabulary. They are called open classes3. Morphology is generally divided into two fields; the study of___inflection____and _word formation______判断对错1. Only one kind of prefix can be used to make the meaning of one adjective negative.F2. Morphology studies how words can be formed to produce meaningful sentences.F3. The study of the ways in which morphemes can be combined to form words called morphology. T4. A free morpheme can not occur as an independent word. F5. Bound morphemes can be affixes, infixes and bases. F6. Bound morphemes serve to derive different words from existing morphemes.F7. Mandarin Chinese is a highly inflectional language.F8. A word may be incompatible or even opposite to several words in different aspects.T9 According to the representation rules, phonemes should be represented__A______, a phonetic symbol should be shown_____A. between slashes/ in square bracketsB. in round brackets/ in square bracketsC. in square brackets/ between slashesD. between slashes/ in round brackets1. Phrase structure rules provided the explanation on _ACDE____A. how words and phrases form sentencesB. how constituents move from their normal position to the initial position in a sentenceC. how syntactic categories are formedD. how people produce grammatical sentenceE. how people recognize possible sentences.2. ____C__ are closed categories because no new words are allowed for.A. Syntactic categoriesB. Major lexical categoriesC. Minor lexical categoriesD. Phrasal categories1. Many linguists nowadays believe that sentence, like other phrases also have their own heads. They take __inflection__ as their heads. Which indicates the sentence’s tense and agreement.2. The hierarchical nature of sentence structure is that sentences are organized with words of the same ___category___, such as noun phrase or verb phrase, grouped together.3. In English there are two major types of syntactic movement, one involving the movement of a ____word_______, and the other involving the movement of a ____categories_________4. In addition to revealing a __________ order, a constituent structure tree has a ______________ structure that groups words into structural constituents and shows the ____________ of each structural constituents.4. “We shall know a word by the company it keeps.” This statement represents____B____A. the conceptualist viewB. contexutalismC. the naming theoryD. behaviorism5. ___A_____ deals with the relationship between the linguistic element and the non-linguistic world of experience.A. ReferenceB. ConceptC. SemanticsD. Sense6. The grammaticality of a sentence is governed by __A______A. grammatical rulesB. selectional restrictionsC. semantic rulesD. semantic features7. ___D___ is regarded as the background knowledge and raw material, not as one of the components of linguisticsA. Writing systemB. PhonologyC. MorphologyD. Phonetics8. The relation of the two clauses in a coordinate sentence is __C____A. one is subordinate to the otherB. they hold unequal statusC. they are structurally equal parts of the sentenceD. they are incorporated9. in semantic analysis of a sentence, a(n) ______ is a logical participant in a predication, largely identical with the nominal elements(s) in a sentenceA. subjectB. attributeC. argumentD. predicate10. Which of the following is a correct description of reference?A. a relationship between an expression and other expressions which have the same meaning.B. the set of all objects which can potentially be referred to by an expressionC. a relationship between a particular object in the world and an expression used in an utterance to pick out that objectD. an intra-linguistic relationship between lexical items11. ___ are the essential components of a sentence, comprising the two major syntactic categories of a sentence.A. subjectB. predicateC. noun phraseD. verb phraseE. adjective phrase12. When _____ are tied to the sentence rather than the word in isolation, they are collectively known as intonation.stress B. tone C. pitch D. sound E. sound length13. We call the relation between “animal” and “horse” as ________A. synonymyB. polysemyC. homonymyD. hyponymy14. “Alive” and “ dead” are ________A. gradable antonymsB. relational oppositesC. complementary antonymsD. None of the above15. _______ refers to the phenomenon that words having different meanings have the same formA. PolysemyB. SynonymyC. HomonymyD. Hyponymy16. Sense and reference are two related but different aspects of meaning . The features of sense are _________A. It is concerned with the inherent meaning of the linguistic formB. It is the collection of all the features of the linguistics formC. It is abstract and de-contextualizedD. It is the aspect of meaning that dictionary compilers are interested inE. It deals with the relationship between the word and the non-linguistic world17. Which of the following is a correct statement about sense?A. All words in a language may be used to refer, but only some words have senseB. The sense of an expression is its relationship to semantically equivalent or semantically related expressions in the same languageC. If two expressions have the same reference, they always have the same senseD. The sense of an expression is the interrelation between a word and an object18. A word with several meaning is a ________synonymy B. polysemic word C. co-hyponym D. complete hymonym19. Predication analysis is a way to analyze sentence meaning. Which is proposed by the British linguist________A. John FirthB. BloomfieldC. G. LeechD. Wittgenstein20. Predication analysis is a way to analyze a sentence meaning. So Tom is smoking a cigar can be analyzed as ___________A. TOM (SMOKE) CIGARB. TOM,CIGAR( SMOKE)C. (SMOKE) TOM, CIGARD. CIGAR (SMOKE)TOM1. Well-arranged sentences are considered grammatical sentences that are formed by a set of syntactic rules.2. The sequential order of words in sentence which is only in written form suggests that the structure of the sentence is linear.3. The problem which the conceptualist view encounters is what is precisely the link between the form and the referent4. Linguistic forms having the same sense must have different references in the same situation.5. All the words in a language can be used to refer but only some have senses.6. Sense is concerned with the relationship between the linguistic element and the non-linguistic world of experience, while the reference deals with the inherent meaning of the linguistic form7. Conceptualism is based on the presumption that one can derive meaning from or reduce meaning to observable contexts.8 in grammatical analysis, the sentence is taken to be the basic unit, but in semantic analysis of a sentence, the basis unit is predication, which is the abstraction of the meaning of a sentence1.The grammatically of a sentence is governed by the _________of the language, and whether a sentence is semantically meaningful is governed by the ___________2. In the semantic triangle, the _________ or __________ refers to the intuitive element (words, phrases), the ___________refers to the object in the world of experience, and _________ or ___________ refers to concept3. “Can I borrow your bike?” ________ “You have a bike”A. is synonymous withB. is inconsistent withC. entailsD. presupposes1. When one utters It is stuffy here. According to Austin’s later theory, he might be performing some acts simultaneously: ________A. constative actB. performative actC. locutionary actD. illocutionary actE. perlocutionary act2. What cooperative maxim is violated in the following dialogue?A: Can you answer the telephone? B: I’m in the bath.The maxim of _________A. relationB. qualityC. quantityD. manner3.Conversational implicature arises when the maxims of cooperative principle are flouted. Indicate which maxim has been flouted when the following exchange happens_________A: Can you answer the telephone?B: I am in the bathA. The maxim of quantityB. The maxim of qualityC. the maxim of relationD. the maxim of manner4. The sentence I will buy a book tomorrow had the illocutionary point of ________A representatives B. directives C. comissives D. declarations5. Martin Joos. An American linguist, distinguishes the stages of formatlity namely_____A.intimateB. casualC. consultativeD. formalE. frozen6. Every utterance occurs in a particular situation. The main components of the situation include_________A. the place and time of the utteranceB. the speaker and the hearerC. the action they are performing at the timeD. the various objects and events existent in the situationE. all the background knowledge7. Pragmatics is different from semantics in that pragmatics studies meaning not in isolation, but in context8. The distinction between pragmatics and semantics is that id we study meaning with the __________ considered, it is called a pragmatic study, otherwise a semantic study.9. Sentence meaning is abstract, while utterance meaning is concrete. The meaning of an utterance is based on ___________ meaning10. The distinction between pragmatics and semantics is that id we study meaning with the context considered. It is called a pragmatic study otherwise a semantics study.11. A sentence is a grammatical unit which is structurally independent and complete.1. The ______ provided great philosophical insight into the nature of linguistic communication.A. speech act theoryB. CP theoryC. communicative competenceD. linguistic competence2. The word such as “hi-tech”, “zoo” are ___________A. acronymsB. clipped wordsC. formed by blendingD. coined by back formation3. In L2 learning_____ would facilitate target language learning.A. fossilizationB. interlanguageC. positive transferD. negative transfer4. In sign and paradigm there is no phonetic [g], but in signature and paradigmatic [g] occur. This process is governed by the phonological rules called______A. sequential ruleB. assimilation ruleC. coordination ruleD. deletion rule1. Assimilation of neighboring sound is, for the most part, caused by articulatory or psychological process2 Linguistic competence refers to the internalized, unconscious set of rules which allows a native speaker to constructed an infinite number of sentences.3. As the process of communication is essentially a process of conveying meaning in a certain context, pragmatics can be also be regarded as a kind of meaning study.4 Language change is universal, continuous and, to a considerable degree, regular and systematic.5. In general, linguistic change in grammar is more noticeable than that in the sound system and vocabulary of a language’6. Language change is a gradual and constant process, therefore often indiscernible to speakers of the same generation.7. The sound changes include changes in vowel sounds, and in the loss, gain and movement of sounds8. One of the tasks of the historical linguists is to explore methods to reconstruct linguistic history and establish the relationship between languages.1.Historical linguistic looks into the _________ of language change and the __________that lead to language change2. Rule simplification and regulation are a type of a spontaneous __________ rule change that involves exceptional plural forms of nouns.1. It is the _____ difference that have often been used to illustrate the “illogic” of Black English.A. phonologicalB. morphologicalC. syntacticD. all of the above2. In a narrower sense, an individ ual speaker’s idiolect is made up of such factors as ______A. voice quality and pitchB. pitch and rhythmC. voice and rhythmD. voice, pitch and rhythm3. “Expensive, valuable, precious” are a group of words bearing the same meaning, but in dicating the different attitude of the user towards what he is talking about. They are _______ synonymsA. dialectalB. stylisticC. emotiveD. semantic4. In sociolinguistics, social groups may be defined in a number of ways such as by_________A. geographical regionsB. the distinct ethnic affiliationC. the characteristics of foodD. the characteristic uses of vocabularyE. the nonlinguistic behaviors5. The distinctive characteristics of a speech variety may include_______ features.A. syntacticB. phonologicalC. lexicalD. morphologicalE. a combination of linguistic6. Idiolect is a personal dialect that combines, in a broad sense, aspects of all the elements regarding________ variation in one form or another.A. voicedB. rhythmicalC. regionalD. socialE. stylistic7. In sociolinguistic studies, three types of speech variety are of special interest, they are ____A.regional dialectsB. sociolectsC. registersD. language and genderE. language and age1. The most distinguishable linguistic feature of a regional dialect is its ___________2. Stylistic variation in a person’s speech or writing usually range on a continuum from casual or ________ to ___________ or polite according to the type of communicative situation3. A. __________ is a special language variety that mixes or blends languages and it is used by people who speak different languages for restricted purposes such as trading, and when it has become the primary language a speech community and is acquired by the children, it is said to have become a ______________判断对错1.Dialectal synonyms can often be found in different regional dialects such as British English and American English but can not be found within the variety itself, for example, within British English or American English.2.A Lingua franca may, but does not need to, be a native language currently spoken by a particular people.3. Saussure took a sociological view of language and Chomsky looks at language form a psychological point of view.4. An ethnics dialect is a linguistic variety used by people living in the same geographical region 1.__________ is regarded as constituted by all kinds of knowledge shares by the speaker and hearer2. ________________ communication is communication between people whose cultural perceptions and symbols systems are distinct enough to alter the communication event.3. Through communication some elements of cultural A enter culture B and become part of culture B, thus bringing about cultural ____________1 In children’s utterances, “two feets”, “goes”, “comed”, etc. occur although some children are aware of the irregular forms of them. These inflectional errors are the result of ___A. children’s carelessnessB. improper instructionC. the wrong inputD. children’s over generalizing a constructed rule2. Since the 1960s, many studies have been conducted to examine the underlying progress of children’s language acquisition. The main stages of children’s language development include_______A. phonological developmentB. vocabulary developmentC. grammatical developmentD. semantic developmentE. pragmatic development1. SLA research is mainly concerned with__________A. the development of a native languageB. the development process of the native language acquisitionC. the extent to which SLA and FLA are similar or different processesD. the cause of the difficulties that adult learners encounter in their acquisition of a second languageE. the methods that may be used to facilitate the acquisition of a second language2. In L2 acquisition, mother tongue interference was found at all levels of the grammar:_________A. pronunciationB. morphologyC. syntaxD. methodologyE. Semantics3 Studies on the effects of formal instruction on SLA show that formal instruction may help learners perform some types of tasks except______A. planned speechB. writingC. career-oriented examD. casual and spontaneous conversation4. The concept of interlanguage was establishe d as leaner’s independent system of L2 which is neither the native language nor L2. The important features of it are _____A. systematicityB. permeabilityC. fossilizationD. self-corrigibilityE. defossilization1. In the disciplines of SLA, the ter m “ second” language is opposite to the term “ foreign” language.2. The lexical differences are all the most striking differences which are found to correlate with different generations of speakers.3. According to the acculturation hypothesis, SLA involves, and depends upon, the acquisition of culture of the target language community.4. Different from contrastive analysis, error analysis gave less consideration to learner’s native language1. Perfect bilingualism, however is __________ , as it is rare for individuals to be a perfect user of two ___________ in a full range of situations.2. Different form contrastive analysis ________ analysis gave less consideration to learner’s native language。
听障患者辅音的声学分析郭晶;刘勇智【摘要】Deafness in hearing impaired patients is a major adverse effect and interferes with their speech production and development . While there is increased research on deaf patients’verbal ability, there are limited reports on acoustic analysis of their voices. Many dialects in the Chinese language have distinctive-ly different pronunciation than mandarin Chinese and a number of approaches for acoustic analysis of conso-nants exist. This article is a preliminary summary on patients who speak mandarin Chinese.%耳聋对于听障患者最不利的影响之一,就是妨碍他们语言的形成和发展。
目前对于听障患者言语能力发展变化的研究较多,而对于发声情况的声学分析研究相对较少,汉语中有许多方言同普通话的发音方法及发音部位区别较大。
辅音的声学分析很多,本文主要对听障患者的普通话的辅音声学分析方法进行初步总结。
【期刊名称】《中华耳科学杂志》【年(卷),期】2015(000)004【总页数】4页(P623-626)【关键词】辅音;声学分析;言语康复【作者】郭晶;刘勇智【作者单位】内蒙古医科大学内蒙古10059;内蒙古自治区人民医院耳鼻咽喉科内蒙古010010【正文语种】中文【中图分类】R764.44听觉性言语障碍是指由于听功能障碍,患者缺乏听觉对发声的反馈作用,导致发音部位,气流方向等不准确,发音动作的不协调。
M越2O07 Journ—alofChina—University ofM
ining&Technology Vo1
.17 No.1
Available online at www.sciencedirect.com
J China Univ Mining&Technol 2007,17(1):0143—0146
Study on Acoustic Modeling in a Mandarin Continuous Speech Recognition
PENG Di,LIU Gang,GUO Jun School ofInformation Engineering,Beijing University ofPosts and Telecommunications,Beijing 1 00876,China
Abstract:The design of acoustic models is of vita1 importance to build a reliable connection between acoustic wave. f0rrn and linguistic messages in terms of individual speech units.According to the characteristic of Chinese phonemes. the base acoustic phoneme units set is decided and refined and a decision tree based state tying approach is explored. Since one of the advantages of top—down tying method is flexibility in maintaining a balance between model accuracy
and complexity,relevant adjustments are conducted,such as the stopping criterion of decision tree node splitting,during which optimal thresholds are captured.Better results are achieved in improving acoustic modeling accuracy as well as minimizing the scale of the model to a trainable extent. Key words:acoustic model;base acoustic phoneme units;decision tree CLC number:TB 42
1 lntroduction Acoustic models are constructed to capture the transition relationship between feature vector se— quences and corresponding recognition units,which has served as one of the crucial processes during large vocabulary continuous speech recognition State—of-the—art automatic speech recognition is based on standard pattern recognition algorithms.i.e. Hidden-Markov models(HMMs ,which are com— posed of states with emission probability densities and transitions between states,see Fig.1.
Fig.1 An example of a 3-state hidden Markov phoneme model
The general principle is to construct an N—state model for each base phoneme unit,with transcribed speech data to estimate model parameters. There are several key issues which ought to be tackled well during the acoustic modeling.First of all we have to decide the base modeling unit,also called the base phoneme unit.Chinese is a single syllable language with initials and final concatenating.As illustrated in reference f l 1,context independent mod— eling with 6 1 units.including 22 initials(plus zero initials),38 finals and silence,failed to provide satis— fying accuracy.which is mainly due to a co.articula— tion phenomenon aroused by contexts,mostly sylla— ble boundaries. As a consequence,CD(context)HMM models are widely adopted.However,among the several base phoneme unit candidates--words,syllables,semi— syllables.initials and finals or phonemes.initial and finals had survived the comparison,attributed to their preponderance in computational complex require— ments as well as to modeling accuracy.Therefore,the IF(Initial and Fina1)triphonemes are generally util. ized in modeling Mandarin speech units. The other problem faced is the overwhelming com— plexity in the context dependent modeling.which could be preferably solved by a decision tree based state tying method.This method was originally pre. sented by Young and has an incomparable advan. tage in achieving the balance between model com. plexity and the amount of training data required.
Received 16 May 2006;accepted 08 July 2006 Project 60475007 supported by the National Natural Science Foundation of China Corresponding author.Tel:+86—10—622850|9—1003;E—mail address:pengdi@pris.edu.cn or bdyjade@gmail.com
维普资讯 http://www.cqvip.com l44 Journal of China University of Mining&Technology Vo1.17 No.1
Considering the two methods of tying,a phonetic decision tree clustering and a data—driven approach, although they have obtained similar recognition per— formances,the former is far more superior in provid— ing a mapp!ng for unseen triphonemes.In Young’s later works .he also emphasized that the stopping criteria,used to determine appropriate tree sizes, ought to be chosen properly.A series of attempts have been made to this end,such as threshold tuning.pre— sented in our experiment sections. The arrangement of our paper is as follows.In sec— tion 2,the optimization of base phoneme unit selec. tion is provided in detail.In section 3.recent progress in tree based state tying method is presented as well as our attempts.Experimental results and compari— sons with previous work will be illustrated in section 4.The last section enunciates conclusions and future topics. 2 Base Acoustic Unit Selection Base unit selection has played a very significant part in acoustic modeling,since an appropriate base modeling unit would characterize phoneme variations sufficiently while maintaining a trainable HMM model size as wel1.As suggested in reference[4】, context independent monophonemes proved to be the most suitable base units in an embedded recognition system.if the vocabulary size is below 50,due to its limitations in memory and computational complexity. However.in large vocabulary continuous speech rec— ognition systems,more precise modeling units are very much needed,since co—articulation effects should be properly modeled,otherwise a slight con— fusion between similar phoneme units might lead to complete errors.Typically there are three types of basic acoustic units for Mandarin Chinese Con— text Dependent acoustic modeling:syllables.ini— tiald/finals or phonemes.Since Mandarin Chinese is a tonal language,each character is pronounced as a monosyllable with a tonal association.In total,there are about 4 1 0 toneless syllables and 1 250 tonal syl— lables.Choosing syllables as basic acoustic units will greatly increase the computation and storage corn plexities.Compared with syllables,the phoneme unit is rather small and there are only a small number of phonemes,but phonemes vary considerably in pro— nunciation.There are often phoneme deletions,pho— neme insertions and phoneme changes in a continu— OUS speech.However.Initial/Final is comparatively steady.Therefore,our experiments are based on Ini— tial/Final units. If we make acoustic units context dependent,we can significantly improve the recognition accuracy, provided there are enough training data to estimate parameters of these context—dependent models. Context—dependent phonemes have been widely used for large—vocabulary continuous speech recognl— tion.A context usually refers to the immediately 1eft and/or right neighboring phonemes.A triphoneme model is a phonetic model that takes into considera— tion both the left and the right neighboring phonemes. Triphoneme models are powerful because they cap— ture the most important co—articulation effects.Be— sides,subdivision of zero initials has proved to be effective in our experiments.The detailed base pho— neme selections are provided in Table 1.