语音识别系统中英文对照外文翻译文献

格式：doc
大小：136.83 KB
文档页数：14

下载文档原格式

/ 14

通信类中英文翻译、外文文献翻译

美国科罗拉多州大学关于在噪声环境下对大量连续语音识别系统的改进---------噪声环境下说话声音的识别工作简介在本文中，我们报道美国科罗拉多州大学关于噪声环境下海军研究语音词汇系统方面的最新改进成果。

特别地,我们介绍在有限语音数据的前提下，为了了解不确定观察者和变化的环境的任务(或调查方法)，我们必须在提高听觉和语言模式方面努力下工夫。

在大量连续词汇语音识别系统中,我们将展开MAPLR自适应方法研究。

它包括单个或多重最大可能线形回归。

当前噪声环境下语音识别系统使用了大量声音词汇识别的声音识别引擎。

这种引擎在美国科罗拉多州大学目前得到了飞速的发展，本系统在噪声环境下说话声音系统(SPINE-2)评价数据中单词错识率表现为30.5%，比起2001年的SPINE-2来,在相关词汇错识率减少16%。

1.介绍为获得噪声环境下的有活力的连续声音系统的声音，我们试图在艺术的领域做出计算和提出改善，这个工作有几方面的难点：依赖训练的有限数据工作；在训练和测试中各种各样的军事噪声存在；在每次识别适用性阶段中，不可想象的听觉溪流和有限数量的声音。

在2000年11月的SPIN-1和2001年11月SPIN-2中，海军研究词汇通过DARPT在工作上给了很大的帮助。

在2001年参加评估的种类有：SPIIBM,华盛顿大学，美国科罗拉多州大学，AT&T,奥瑞哥研究所，和梅隆卡内基大学。

它们中的许多先前已经报道了SPINE-1和SPLNE-2工作的结果。

在这方面的工作中不乏表现最好的系统.我们在特性和主模式中使用了自适应系统，同时也使用了被用于训练各种参数类型的多重声音平行理论(例如MFCC、PCP等)。

其中每种识别系统的输出通常通过一个假定的熔合的方法来结合。

这种方法能提供一个单独的结果，这个结果的错误率将比任何一个单独的识别系统的结果要低。

美国科罗拉多州大学参加了SPIN-2和SPIN-1的两次评估工作。

我们2001年11月的SPIN-2是美国科罗拉多州大学识别系统基础上第一次被命名为SONIC(大量连续语音识别系统)的。

外文翻译---说话人识别

附录A 英文文献Speaker RecognitionBy Judith A. Markowitz, J. Markowitz ConsultantsSpeaker recognition uses features of a person‟s voice to identify or verify that person. It is a well-established biometric with commercial systems that are more than 10 years old and deployed non-commercial systems that are more than 20 years old. This paper describes how speaker recognition systems work and how they are used in applications.1. IntroductionSpeaker recognition (also called voice ID and voice biometrics) is the only human-biometric technology in commercial use today that extracts information from sound patterns. It is also one of the most well-established biometrics, with deployed commercial applications that are more than 10 years old and non-commercial systems that are more than 20 years old.2. How do Speaker-Recognition Systems WorkSpeaker-recognition systems use features of a person‟s voice and speaking style to:●attach an identity to the voice of an unknown speaker●verify that a person is who she/ he claims to be●separate one person‟s voice from other voices in a multi-speakerenvironmentThe first operation is called speak identification or speaker recognition; the second has many names, including speaker verification, speaker authentication, voice verification, and voice recognition; the third is speaker separation or, in some situations, speaker classification. This papers focuses on speaker verification, the most highly commercialized of these technologies.2.1 Overview of the ProcessSpeaker verification is a biometric technology used for determining whether the person is who she or he claims to be. It should not be confused with speech recognition, a non-biometric technology used for identifying what a person is saying. Speech recognition products are not designed to determine who is speaking.Speaker verification begins with a claim of identity (see Figure A1). Usually, the claim entails manual entry of a personal identification number (PIN), but a growing number of products allow spoken entry of the PIN and use speech recognition to identify the numeric code. Some applications replace manual or spoken PIN entry with bank cards, smartcards, or the number of the telephone being used. PINS are also eliminated when a speaker-verification system contacts the user, an approach typical of systems used to monitor home-incarcerated criminals.Figure A1.Once the identity claim has been made, the system retrieves the stored voice sample (called a voiceprint) for the claimed identity and requests spoken input from the person making the claim. Usually, the requested input is a password. The newly input speech is compared with the stored voiceprint and the results of that comparison are measured against an acceptance/rejection threshold. Finally, the system accepts the speaker as the authorized user, rejects the speaker as an impostor, or takes another action determined by the application. Some systems report a confidence level or other score indicating how confident it about its decision.If the verification is successful the system may update the acoustic information in the stored voiceprint. This process is called adaptation. Adaptation is an unobtrusive solution for keeping voiceprints current and is used by many commercial speaker verification systems.2.2 The Speech SampleAs with all biometrics, before verification (or identification) can be performed the person must provide a sample of speech (called enrolment). The sample is used to create the stored voiceprint.Systems differ in the type and amount of speech needed for enrolment and verification. The basic divisions among these systems are●text dependent●text independent●text prompted2.2.1 Text DependentMost commercial systems are text dependent.Text-dependent systems expect the speaker to say a pre-determined phrase, password, or ID. By controlling the words that are spoken the system can look for a close match with the stored voiceprint. Typically, each person selects a private password, although some administrators prefer to assign passwords. Passwords offer extra security, requiring an impostor to know the correct PIN and password and to have a matching voice. Some systems further enhance security by not storing a human-readable representation of the password.A global phrase may also be used. In its 1996 pilot of speaker verification Chase Manhattan Bank used …Verification by Chemical Bank‟. Global phrases avoid the problem of forgotten passwords, but lack the added protection offered by private passwords.2.2.2 Text IndependentText-independent systems ask the person to talk. What the person says is different every time. It is extremely difficult to accurately compare utterances that are totally different from each other - particularly in noisy environments or over poor telephone connections. Consequently, commercial deployment of text-independentverification has been limited.2.2.3 Text PromptedText-prompted systems (also called challenge response) ask speakers to repeat one or more randomly selected numbers or words (e.g. “43516”, “27,46”, or “Friday, c omputer”). Text prompting adds time to enrolment and verification, but it enhances security against tape recordings. Since the items to be repeated cannot be predicted, it is extremely difficult to play a recording. Furthermore, there is no problem of forgetting a password, even though the PIN, if used, may still be forgotten.2.3 Anti-speaker ModellingMost systems compare the new speech sample with the stored voiceprint for the claimed identity. Other systems also compare the newly input speech with the voices of other people. Such techniques are called anti-speaker modelling. The underlying philosophy of anti-speaker modelling is that under any conditions a voice sample from a particular speaker will be more like other samples from that person than voice samples from other speakers. If, for example, the speaker is using a bad telephone connection and the match with the speaker‟s voiceprint is poor, it is likely that the scores for the cohorts (or world model) will be even worse.The most common anti-speaker techniques are●discriminate training●cohort modeling●world modelsDiscriminate training builds the comparisons into the voiceprint of the new speaker using the voices of the other speakers in the system. Cohort modelling selects a small set of speakers whose voices are similar to that of the person being enrolled. Cohorts are, for example, always the same sex as the speaker. When the speaker attempts verification, the incoming speech is compared with his/her stored voiceprint and with the voiceprints of each of the cohort speakers. World models (also called background models or composite models) contain a cross-section of voices. The same world model is used for all speakers.2.4 Physical and Behavioural BiometricsSpeaker recognition is often characterized as a behavioural biometric. This description is set in contrast with physical biometrics, such as fingerprinting and iris scanning. Unfortunately, its classification as a behavioural biometric promotes the misunderstanding that speaker recognition is entirely (or almost entirely) behavioural. If that were the case, good mimics would have no difficulty defeating speaker-recognition systems. Early studies determined this was not the case and identified mimic-resistant factors. Those factors reflect the size and shape of a speaker‟s speaking mechanism (called the vocal tract).The physical/behavioural classification also implies that performance of physical biometrics is not heavily influenced by behaviour. This misconception has led to the design of biometric systems that are unnecessarily vulnerable to careless and resistant users. This is unfortunate because it has delayed good human-factors design for those biometrics.3. How is Speaker Verification Used?Speaker verification is well-established as a means of providing biometric-based security for:●telephone networks●site access●data and data networksand monitoring of:●criminal offenders in community release programmes●outbound calls by incarcerated felons●time and attendance3.1 Telephone NetworksToll fraud (theft of long-distance telephone services) is a growing problem that costs telecommunications services providers, government, and private industry US$3-5 billion annually in the United States alone. The major types of toll fraud include the following:●Hacking CPE●Calling card fraud●Call forwarding●Prisoner toll fraud●Hacking 800 numbers●Call sell operations●900 number fraud●Switch/network hits●Social engineering●Subscriber fraud●Cloning wireless telephonesAmong the most damaging are theft of services from customer premises equipment (CPE), such as PBXs, and cloning of wireless telephones. Cloning involves stealing the ID of a telephone and programming other phones with it. Subscriber fraud, a growing problem in Europe, involves enrolling for services, usually under an alias, with no intention of paying for them.Speaker verification has two features that make it ideal for telephone and telephone network security: it uses voice input and it is not bound to proprietary hardware. Unlike most other biometrics that need specialized input devices, speaker verification operates with standard wireline and/or wireless telephones over existing telephone networks. Reliance on input devices created by other manufacturers for a purpose other than speaker verification also means that speaker verification cannot expect the consistency and quality offered by a proprietary input device. Speaker verification must overcome differences in input quality and the way in which speech frequencies are processed. This variability is produced by differences in network type (e.g. wireline v wireless), unpredictable noise levels on the line and in the background, transmission inconsistency, and differences in the microphone in telephone handset. Sensitivity to such variability is reduced through techniques such as speech enhancement and noise modelling, but products still need to be tested under expected conditions of use.Applications of speaker verification on wireline networks include secure calling cards, interactive voice response (IVR) systems, and integration with security forproprietary network systems. Such applications have been deployed by organizations as diverse as the University of Maryland, the Department of Foreign Affairs and International Trade Canada, and AMOCO. Wireless applications focus on preventing cloning but are being extended to subscriber fraud. The European Union is also actively applying speaker verification to telephony in various projects, including Caller Verification in Banking and Telecommunications, COST250, and Picasso.3.2 Site accessThe first deployment of speaker verification more than 20 years ago was for site access control. Since then, speaker verification has been used to control access to office buildings, factories, laboratories, bank vaults, homes, pharmacy departments in hospitals, and even access to the US and Canada. Since April 1997, the US Department of Immigration and Naturalization (INS) and other US and Canadian agencies have been using speaker verification to control after-hours border crossings at the Scobey, Montana port-of-entry. The INS is now testing a combination of speaker verification and face recognition in the commuter lane of other ports-of-entry.3.3 Data and Data NetworksGrowing threats of unauthorized penetration of computing networks, concerns about security of the Internet, and increases in off-site employees with data access needs have produced an upsurge in the application of speaker verification to data and network security.The financial services industry has been a leader in using speaker verification to protect proprietary data networks, electronic funds transfer between banks, access to customer accounts for telephone banking, and employee access to sensitive financial information. The Illinois Department of Revenue, for example, uses speaker verification to allow secure access to tax data by its off-site auditors.3.4 CorrectionsIn 1993, there were 4.8 million adults under correctional supervision in the United States and that number continues to increase. Community release programmes, such as parole and home detention, are the fastest growing segments of this industry. It is no longer possible for corrections officers to provide adequate monitoring ofthose people.In the US, corrections agencies have turned to electronic monitoring systems. Since the late 1980s speaker verification has been one of those electronic monitoring tools. Today, several products are used by corrections agencies, including an alcohol breathalyzer with speaker verification for people convicted of driving while intoxicated and a system that calls offenders on home detention at random times during the day.Speaker verification also controls telephone calls made by incarcerated felons. Inmates place a lot of calls. In 1994, US telecommunications services providers made $1.5 billion on outbound calls from inmates. Most inmates have restrictions on whom they can call. Speaker verification ensures that an inmate is not using another inmate‟s PIN to make a forbidden contact.3.5 Time and AttendanceTime and attendance applications are a small but growing segment of the speaker-verification market. SOC Credit Union in Michigan has used speaker verification for time and attendance monitoring of part-time employees for several years. Like many others, SOC Credit Union first deployed speaker verification for security and later extended it to time and attendance monitoring for part-time employees.4. StandardsThis paper concludes with a short discussion of application programming interface (API) standards. An API contains the function calls that enable programmers to use speaker-verification to create a product or application. Until April 1997, when the Speaker Verification API (SV API) standard was introduced, all available APIs for biometric products were proprietary. SV API remains the only API standard covering a specific biometric. It is now being incorporated into proposed generic biometric API standards. SV API was developed by a cross-section of speaker-recognition vendors, consultants, and end-user organizations to address a spectrum of needs and to support a broad range of product features. Because it supports both high level functions (e.g. calls to enrol) and low level functions (e.g. choices of audio input features) itfacilitates development of different types of applications by both novice and experienced developers.Why is it important to support API standards? Developers using a product with a proprietary API face difficult choices if the vendor of that product goes out of business, fails to support its product, or does not keep pace with technological advances. One of those choices is to rebuild the application from scratch using a different product. Given the same events, developers using a SV API-compliant product can select another compliant vendor and need perform far fewer modifications. Consequently, SV API makes development with speaker verification less risky and less costly. The advent of generic biometric API standards further facilitates integration of speaker verification with other biometrics. All of this helps speaker-verification vendors because it fosters growth in the marketplace. In the final analysis active support of API standards by developers and vendors benefits everyone.附录B 中文翻译说话人识别作者：Judith A. Markowitz, J. Markowitz Consultants 说话人识别是用一个人的语音特征来辨认或确认这个人。

基于智能语音识别技术的语音翻译系统设计

基于智能语音识别技术的语音翻译系统设计一、概述随着国际贸易、旅游、文化交流等的不断推进，越来越多人需要进行跨语言交流。

传统的语言翻译工具通常需要人工参与，过程繁琐耗时，不利于信息快速传递，这时就需要一种能够自动语音识别并快速翻译的系统。

基于智能语音识别技术的语音翻译系统应运而生。

二、系统架构基于语音识别技术的语音翻译系统主要分为以下几个模块：1. 语音输入模块：接受用户的输入语音，将语音信号转换为数字信号。

2. 语音识别模块：将数字信号转换为文字信息。

3. 机器翻译模块：将识别出的文字信息进行翻译并生成目标语言的文本结果。

4. 文字合成模块：将翻译出的目标语言文本转换为语音信号。

5. 语音输出模块：输出经过合成的语音信号。

三、系统设计1. 语音输入模块语音输入模块是语音翻译系统的输入途径，主要用于接收用户的语音指令。

在语音输入模块中，将使用麦克风采集用户的语音信号，并将其转换为数字信号。

数字信号采样频率和量化位数对语音识别的准确度有很大的影响，通常采用16kHz以上的采样频率和16位量化位数。

2. 语音识别模块语音识别模块是语音翻译系统的核心模块，用于将用户输入的语音信号转换为可识别的文本信息。

常用的语音识别技术有隐马尔可夫模型、循环神经网络、卷积神经网络等，其中最常用的是隐马尔可夫模型。

在语音识别模块中，将会对所有能够被识别的语音进行建模，使得系统可以通过比对来判断用户输入的语音信号所属的文本种类。

3. 机器翻译模块机器翻译模块是语音翻译系统的翻译核心模块，用于将用户输入的文本信息翻译成目标语言的文本结果。

通常采用的机器翻译算法有基于规则的机器翻译、统计机器翻译和神经网络机器翻译等，目前最常用的是神经网络机器翻译。

在机器翻译模块中，需要调用前端处理程序对用户输入的文本信息进行预处理，例如分词等，以提高翻译的准确度。

4. 文字合成模块文字合成模块是将翻译出的目标语言文本转换为语音信号的核心模块。

音频信号处理博士论文中英文资料外文翻译文献

音频信号处理博士论文中英文资料外文翻
译文献
音频信号处理是一个广泛研究的领域，涉及到音频信号的获取、分析、传输和处理等方面。

本文翻译了以下两篇外文文献，为音频
信号处理博士论文的写作提供参考。

文献一：Title of Paper One
作者：
摘要：
该篇文献提出了一种新的音频信号处理算法，旨在改善音频信
号的质量和增强用户对音乐的感受。

通过对音频信号进行特征提取
和分析，该算法能够有效地消除噪音和失真，并提供更清晰、更丰
富的音频体验。

文献介绍了算法的原理和实现方式，并通过实验验
证了其在不同音频数据集上的有效性。

文献二：Title of Paper Two
作者：
摘要：
该篇文献探讨了音频信号处理领域的一个重要问题，即语音识
别的准确性和鲁棒性。

通过分析现有的语音识别算法，文献指出了
当前算法存在的一些问题，并提出了一种改进的方法。

该方法基于
深度研究和卷积神经网络，并通过对音频信号进行多层次的特征研
究和表示研究，提高了语音识别的准确性和鲁棒性。

文献还介绍了
该方法的实验结果，并与其他算法进行了比较。

总结
这两篇外文文献介绍了音频信号处理领域的一些重要研究进展
和算法。

它们提供了宝贵的参考和借鉴，可以在音频信号处理博士
论文的写作中起到指导作用。

通过综合运用这些研究成果，我们可
以进一步改进音频信号处理算法，提高音频信号的质量和用户体验。

智能交通系统中英文对照外文翻译文献

智能交通系统中英文对照外文翻译文献(文档含英文原文和中文翻译)原文:Traffic Assignment Forecast Model Research in ITS IntroductionThe intelligent transportation system (ITS) develops rapidly along with the city sustainable development, the digital city construction and the development of transportation. One of the main functions of the ITS is to improve transportation environment and alleviate the transportation jam, the most effective method to gain the aim is to forecast the traffic volume of the local network and the important nodes exactly with GIS function of path analysis and correlation mathematic methods, and this will lead a better planning of the traffic network. Traffic assignment forecast is an important phase of traffic volume forecast. It will assign the forecasted traffic to every way in the traffic sector. If the traffic volume of certain road is too big, which would bring on traffic jam, planners must consider the adoption of new roads or improving existing roads to alleviate the traffic congestion situation. This study attempts to present an improved traffic assignment forecast model, MPCC, based on analyzing the advantages and disadvantages of classic traffic assignment forecast models, and test the validity of the improved model in practice.1 Analysis of classic models1.1 Shortcut traffic assignmentShortcut traffic assignment is a static traffic assignment method. In this method, the traffic load impact in the vehicles’ travel is not considered, and the traffic impedance (travel time) is a constant. The traffic volume of every origination-destination couple will be assigned to the shortcut between the origination and destination, while the traffic volume of other roads in this sector is null. This assignment method has the advantage of simple calculation; however, uneven distribution of the traffic volume is its obvious shortcoming. Using this assignment method, the assignment traffic volume will be concentrated on the shortcut, which isobviously not realistic. However, shortcut traffic assignment is the basis of all theother traffic assignment methods.1.2 Multi-ways probability assignmentIn reality, travelers always want to choose the shortcut to the destination, whichis called the shortcut factor; however, as the complexity of the traffic network, thepath chosen may not necessarily be the shortcut, which is called the random factor.Although every traveler hopes to follow the shortcut, there are some whose choice isnot the shortcut in fact. The shorter the path is, the greater the probability of beingchosen is; the longer the path is, the smaller the probability of being chosen is.Therefore, the multi-ways probability assignment model is guided by the LOGIT model:∑---=n j ii i F F p 1)exp()exp(θθ (1)Where i p is the probability of the path section i; i F is the travel time of thepath section i; θ is the transport decision parameter, which is calculated by the followprinciple: firstly, calculate the i p with different θ (from 0 to 1), then find the θwhich makes i p the most proximate to the actual i p .The shortcut factor and the random factor is considered in multi-ways probabilityassignment, therefore, the assignment result is more reasonable, but the relationshipbetween traffic impedance and traffic load and road capacity is not considered in thismethod, which leads to the assignment result is imprecise in more crowded trafficnetwork. We attempt to improve the accuracy through integrating the several elements above in one model-MPCC.2 Multi-ways probability and capacity constraint model2.1 Rational path aggregateIn order to make the improved model more reasonable in the application, theconcept of rational path aggregate has been proposed. The rational path aggregate,which is the foundation of MPCC model, constrains the calculation scope. Rationalpath aggregate refers to the aggregate of paths between starts and ends of the trafficsector, defined by inner nodes ascertained by the following rules: the distancebetween the next inner node and the start can not be shorter than the distance betweenthe current one and the start; at the same time, the distance between the next innernode and the end can not be longer than the distance between the current one and theend. The multi-ways probability assignment model will be only used in the rationalpath aggregate to assign the forecast traffic volume, and this will greatly enhance theapplicability of this model.2.2 Model assumption1) Traffic impedance is not a constant. It is decided by the vehicle characteristicand the current traffic situation.2) The traffic impedance which travelers estimate is random and imprecise.3) Every traveler chooses the path from respective rational path aggregate.Based on the assumptions above, we can use the MPCC model to assign thetraffic volume in the sector of origination-destination couples.2.3 Calculation of path traffic impedanceActually, travelers have different understanding to path traffic impedance, butgenerally, the travel cost, which is mainly made up of forecast travel time, travellength and forecast travel outlay, is considered the traffic impedance. Eq. (2) displaysthis relationship. a a a a F L T C γβα++= (2)Where a C is the traffic impedance of the path section a; a T is the forecast traveltime of the path section a; a L is the travel length of the path section a; a F is theforecast travel outlay of the path section a; α, β, γ are the weight value of that threeelements which impact the traffic impedance. For a certain path section, there aredifferent α, β and γ value for different vehicles. We can get the weighted average of α,β and γ of each path section from the statistic percent of each type of vehicle in thepath section.2.4 Chosen probability in MPCCActually, travelers always want to follow the best path (broad sense shortcut), butbecause of the impact of random factor, travelers just can choose the path which is ofthe smallest traffic impedance they estimate by themselves. It is the key point ofMPCC. According to the random utility theory of economics, if traffic impedance is considered as the negativeutility, the chosen probability rs p of origination-destinationpoints couple (r, s) should follow LOGIT model:∑---=n j jrs rs bC bC p 1)exp()exp( (3) where rs p is the chosen probability of the pathsection (r, s);rs C is the traffic impedance of the path sect-ion (r, s); j C is the trafficimpedance of each path section in the forecast traffic sector; b reflects the travelers’cognition to the traffic impedance of paths in the traffic sector, which has reverseratio to its deviation. If b → ∞ , the deviation of understanding extent of trafficimpedance approaches to 0. In this case, all the travelers will follow the path whichis of the smallest traffic impedance, which equals to the assignment results withShortcut Traffic Assignment. Contrarily, if b → 0, travelers ’ understanding error approaches infinity. In this case, the paths travelers choose are scattered. There is anobjection that b is of dimension in Eq.(3). Because the deviation of b should beknown before, it is difficult to determine the value of b. Therefore, Eq.(3) is improvedas follows:∑---=n j OD j OD rsrs C bC C bC p 1)exp()exp(，∑-=n j j OD C n C 11（4） Where OD C is the average of the traffic impedance of all the as-signed paths; bwhich is of no dimension, just has relationship to the rational path aggregate, ratherthan the traffic impedance. According to actual observation, the range of b which is anexperience value is generally between 3.00 to 4.00. For the more crowded cityinternal roads, b is normally between 3.00 and 3.50.2.5 Flow of MPCCMPCC model combines the idea of multi-ways probability assignment anditerative capacity constraint traffic assignment.Firstly, we can get the geometric information of the road network and OD trafficvolume from related data. Then we determine the rational path aggregate with themethod which is explained in Section 2.1.Secondly, we can calculate the traffic impedance of each path section with Eq.(2),Fig.1 Flowchart of MPCC which is expatiated in Section 2.3.Thirdly, on the foundation of the traffic impedance of each path section, we cancalculate the respective forecast traffic volume of every path section with improvedLOGIT model (Eq.(4)) in Section 2.4, which is the key point of MPCC.Fourthly, through the calculation processabove, we can get the chosen probability andforecast traffic volume of each path section, but itis not the end. We must recalculate the trafficimpedance again in the new traffic volumesituation. As is shown in Fig.1, because of theconsideration of the relationship between trafficimpedance and traffic load, the traffic impedanceand forecast assignment traffic volume of everypath will be continually amended. Using therelationship model between average speed andtraffic volume, we can calculate the travel timeand the traffic impedance of certain path sect-ionunder different traffic volume situation. For theroads with different technical levels, therelationship models between average speeds totraffic volume are as follows: 1) Highway: 1082.049.179AN V = (5) 2) Level 1 Roads: 11433.084.155AN V = (6) 3) Level 2 Roads: 66.091.057.112AN V = (7) 4) Level 3 Roads: 3.132.01.99AN V = (8) 5) Level 4 Roads: 0988.05.70A N V =(9) Where V is the average speed of the path section; A N is the traffic volume of thepath section.At the end, we can repeat assigning traffic volume of path sections with themethod in previous step, which is the idea of iterative capacity constraint assignment,until the traffic volume of every path section is stable.译文智能交通交通量分配预测模型介绍随着城市的可持续化发展、数字化城市的建设以及交通运输业的发展，智能交通系统（ITS）的发展越来越快。

语音识别中英文对照外文翻译文献

中英文资料对照外文翻译(文档含英文原文和中文翻译)Speech Recognition1 Defining the ProblemSpeech recognition is the process of converting an acoustic signal, captured by a microphone or a telephone, to a set of words. The recognized words can be the final results, as for applications such as commands & control, data entry, and document preparation. They can also serve as the input to further linguistic processing in order to achieve speech understanding, a subject covered in section.Speech recognition systems can be characterized by many parameters, some of the more important of which are shown in Figure. An isolated-word speech recognition system requires that the speaker pause briefly between words, whereas a continuous speech recognition system does not. Spontaneous, or extemporaneously generated, speech contains disfluencies, and is much more difficult to recognize than speech read from script. Some systems require speaker enrollment---a user must provide samples of his or her speech before using them, whereas other systems are said to be speaker-independent, in that no enrollment is necessary. Some of the other parameters depend on the specific task. Recognition is generally more difficult when vocabularies are large or have many similar-sounding words. When speech is produced in a sequence of words, language models or artificial grammars are used to restrict the combination of words.The simplest language model can be specified as a finite-state network, where the1permissible words following each word are given explicitly. More general language models approximating natural language are specified in terms of a context-sensitive grammar.One popular measure of the difficulty of the task, combining the vocabulary size and the language model, is perplexity, loosely defined as the geometric mean of the number of words that can follow a word after the language model has been applied (see section for a discussion of language modeling in general and perplexity in particular). Finally, there are some external parameters that can affect speech recognition system performance, including the characteristics of the environmental noise and the type and the placement of the microphone.Table: Typical parameters used to characterize the capability of speech recognition systems Speech recognition is a difficult problem, largely because of the many sources of variability associated with the signal. First, the acoustic realizations of phonemes, the smallest sound units of which words are composed, are highly dependent on the context in which they appear. These phonetic variabilities are exemplified by the acoustic differences of the phoneme，At word boundaries, contextual variations can be quite dramatic---making gas shortage sound like gash shortage in American English, and devo andare sound like devandare in Italian.Second, acoustic variabilities can result from changes in the environment as well as in the position and characteristics of the transducer. Third, within-speaker variabilities can result from changes in the speaker's physical and emotional state, speaking rate, or voice quality. Finally, differences in sociolinguistic background, dialect, and vocal tract size and shape can contribute to across-speaker variabilities.Figure shows the major components of a typical speech recognition system. The digitized speech signal is first transformed into a set of useful measurements or features at a fixed rate, typically once every 10--20 msec (see sectionsand 11.3 for signal representation and digital signal processing, respectively). These measurements are then used to search for the most likely word candidate, making use of constraints imposed by the acoustic, lexical, and language models. Throughout this process, training data are used to determine the values of the model parameters.Figure: Components of a typical speech recognition system.Speech recognition systems attempt to model the sources of variability described above in several ways. At the level of signal representation, researchers have developed representations that emphasize perceptually important speaker-independent features of the signal, and de-emphasize speaker-dependent characteristics. At the acoustic phonetic level, speaker variability is typically modeled using statistical techniques applied to large amounts of data. Speaker adaptation algorithms have also been developed that adapt speaker-independent acoustic models to those of the current speaker during system use, (see section). Effects of linguistic context at the acoustic phonetic level are typically handled by training separate models for phonemes in different contexts; this is called context dependent acoustic modeling.Word level variability can be handled by allowing alternate pronunciations of words in representations known as pronunciation networks. Common alternate pronunciations of words, as well as effects of dialect and accent are handled by allowing search algorithms to find alternate paths of phonemes through these networks. Statistical language models, based on estimates of the frequency of occurrence of word sequences, are often used to guide the searchthrough the most probable sequence of words.The dominant recognition paradigm in the past fifteen years is known as hidden Markov models (HMM). An HMM is a doubly stochastic model, in which the generation of the underlying phoneme string and the frame-by-frame, surface acoustic realizations are both represented probabilistically as Markov processes, as discussed in sections,and 11.2. Neural networks have also been used to estimate the frame based scores; these scores are then integrated into HMM-based system architectures, in what has come to be known as hybrid systems, as described in section 11.5.An interesting feature of frame-based HMM systems is that speech segments are identified during the search process, rather than explicitly. An alternate approach is to first identify speech segments, then classify the segments and use the segment scores to recognize words. This approach has produced competitive recognition performance in several tasks.2 State of the ArtComments about the state-of-the-art need to be made in the context of specific applications which reflect the constraints on the task. Moreover, different technologies are sometimes appropriate for different tasks. For example, when the vocabulary is small, the entire word can be modeled as a single unit. Such an approach is not practical for large vocabularies, where word models must be built up from subword units.Performance of speech recognition systems is typically described in terms of word error rate E, defined as:where N is the total number of words in the test set, and S, I, and D are the total number of substitutions, insertions, and deletions, respectively.The past decade has witnessed significant progress in speech recognition technology. Word error rates continue to drop by a factor of 2 every two years. Substantial progress has been made in the basic technology, leading to the lowering of barriers to speaker independence, continuous speech, and large vocabularies. There are several factors that have contributed to this rapid progress. First, there is the coming of age of the HMM. HMM is powerful in that, with the availability of training data, the parameters of the model can be trained automatically to giveoptimal performance.Second, much effort has gone into the development of large speech corpora for system development, training, and testing. Some of these corpora are designed for acoustic phonetic research, while others are highly task specific. Nowadays, it is not uncommon to have tens of thousands of sentences available for system training and testing. These corpora permit researchers to quantify the acoustic cues important for phonetic contrasts and to determine parameters of the recognizers in a statistically meaningful way. While many of these corpora (e.g., TIMIT, RM, ATIS, and WSJ; see section 12.3) were originally collected under the sponsorship of the U.S. Defense Advanced Research Projects Agency (ARPA) to spur human language technology development among its contractors, they have nevertheless gained world-wide acceptance (e.g., in Canada, France, Germany, Japan, and the U.K.) as standards on which to evaluate speech recognition.Third, progress has been brought about by the establishment of standards for performance evaluation. Only a decade ago, researchers trained and tested their systems using locally collected data, and had not been very careful in delineating training and testing sets. As a result, it was very difficult to compare performance across systems, and a system's performance typically degraded when it was presented with previously unseen data. The recent availability of a large body of data in the public domain, coupled with the specification of evaluation standards, has resulted in uniform documentation of test results, thus contributing to greater reliability in monitoring progress (corpus development activities and evaluation methodologies are summarized in chapters 12 and 13 respectively).Finally, advances in computer technology have also indirectly influenced our progress. The availability of fast computers with inexpensive mass storage capabilities has enabled researchers to run many large scale experiments in a short amount of time. This means that the elapsed time between an idea and its implementation and evaluation is greatly reduced. In fact, speech recognition systems with reasonable performance can now run in real time using high-end workstations without additional hardware---a feat unimaginable only a few years ago.One of the most popular, and potentially most useful tasks with low perplexity (PP=11) is the recognition of digits. For American English, speaker-independent recognition of digit strings spoken continuously and restricted to telephone bandwidth can achieve an error rate of 0.3% when the string length is known.One of the best known moderate-perplexity tasks is the 1,000-word so-called Resource Management (RM) task, in which inquiries can be made concerning various naval vessels in the Pacific ocean. The best speaker-independent performance on the RM task is less than 4%, using a word-pair language model that constrains the possible words following a given word (PP=60). More recently, researchers have begun to address the issue of recognizing spontaneously generated speech. For example, in the Air Travel Information Service (ATIS) domain, word error rates of less than 3% has been reported for a vocabulary of nearly 2,000 words and a bigram language model with a perplexity of around 15.High perplexity tasks with a vocabulary of thousands of words are intended primarily for the dictation application. After working on isolated-word, speaker-dependent systems for many years, the community has since 1992 moved towards very-large-vocabulary (20,000 words and more), high-perplexity (PP≈200), speaker-independent, continuous speech recognition. The best system in 1994 achieved an error rate of 7.2% on read sentences drawn from North America business news.With the steady improvements in speech recognition performance, systems are now being deployed within telephone and cellular networks in many countries. Within the next few years, speech recognition will be pervasive in telephone networks around the world. There are tremendous forces driving the development of the technology; in many countries, touch tone penetration is low, and voice is the only option for controlling automated services. In voice dialing, for example, users can dial 10--20 telephone numbers by voice (e.g., call home) after having enrolled their voices by saying the words associated with telephone numbers. AT&T, on the other hand, has installed a call routing system using speaker-independent word-spotting technology that can detect a few key phrases (e.g., person to person, calling card) in sentences such as: I want to charge it to my calling card.At present, several very large vocabulary dictation systems are available for document generation. These systems generally require speakers to pause between words. Their performance can be further enhanced if one can apply constraints of the specific domain such as dictating medical reports.Even though much progress is being made, machines are a long way from recognizing conversational speech. Word recognition rates on telephone conversations in the Switchboard corpus are around 50%. It will be many years before unlimited vocabulary, speaker-independentcontinuous dictation capability is realized.3 Future DirectionsIn 1992, the U.S. National Science Foundation sponsored a workshop to identify the key research challenges in the area of human language technology, and the infrastructure needed to support the work. The key research challenges are summarized in. Research in the following areas for speech recognition were identified:Robustness:In a robust system, performance degrades gracefully (rather than catastrophically) as conditions become more different from those under which it was trained. Differences in channel characteristics and acoustic environment should receive particular attention.Portability:Portability refers to the goal of rapidly designing, developing and deploying systems for new applications. At present, systems tend to suffer significant degradation when moved to a new task. In order to return to peak performance, they must be trained on examples specific to the new task, which is time consuming and expensive.Adaptation:How can systems continuously adapt to changing conditions (new speakers, microphone, task, etc) and improve through use? Such adaptation can occur at many levels in systems, subword models, word pronunciations, language models, etc.Language Modeling:Current systems use statistical language models to help reduce the search space and resolve acoustic ambiguity. As vocabulary size grows and other constraints are relaxed to create more habitable systems, it will be increasingly important to get as much constraint as possible from language models; perhaps incorporating syntactic and semantic constraints that cannot be captured by purely statistical models.Confidence Measures:Most speech recognition systems assign scores to hypotheses for the purpose of rank ordering them. These scores do not provide a good indication of whether a hypothesis is correct or not, just that it is better than the other hypotheses. As we move to tasks that require actions,we need better methods to evaluate the absolute correctness of hypotheses.Out-of-Vocabulary Words:Systems are designed for use with a particular set of words, but system users may not know exactly which words are in the system vocabulary. This leads to a certain percentage of out-of-vocabulary words in natural conditions. Systems must have some method of detecting such out-of-vocabulary words, or they will end up mapping a word from the vocabulary onto the unknown word, causing an error.Spontaneous Speech:Systems that are deployed for real use must deal with a variety of spontaneous speech phenomena, such as filled pauses, false starts, hesitations, ungrammatical constructions and other common behaviors not found in read speech. Development on the ATIS task has resulted in progress in this area, but much work remains to be done.Prosody:Prosody refers to acoustic structure that extends over several segments or words. Stress, intonation, and rhythm convey important information for word recognition and the user's intentions (e.g., sarcasm, anger). Current systems do not capture prosodic structure. How to integrate prosodic information into the recognition architecture is a critical question that has not yet been answered.Modeling Dynamics:Systems assume a sequence of input frames which are treated as if they were independent. But it is known that perceptual cues for words and phonemes require the integration of features that reflect the movements of the articulators, which are dynamic in nature. How to model dynamics and incorporate this information into recognition systems is an unsolved problem.语音识别一定义问题语音识别是指音频信号的转换过程，被电话或麦克风的所捕获的一系列的消息。

外文翻译---统一消息系统

附录A 译文统一消息系统所有用户生来都不是一样的。

您的销售人员花一整天时间在手机上和保持以生产力。

如果他们疲惫于语音邮件，从而错过任何一个电话，这可能意味着其收入的损失。

在另一方面，知识型工人由于他们花太多时间在手机上而遭受折磨。

对于他们来说，生产力与在电话上花的小时数成反比。

为了满足双方和那些在此之间的用户的需要，我们期待一个统一的信息系统。

一度，UM简单意味着将电子邮件，传真邮件和语音邮件放到一个邮箱进行管理和操纵。

今天，我们通过电话用户界面，根据一个用户的操作，把更多的实时通信系统，包括即时消息（ IM ），现场管理和呼叫路由规则的使用集成在一起。

这听起来可能很繁琐，但供应商将这些功能分开出售，使您的组织可以选择需要的部分。

根据Radicati组织的调查，UC（统一通信）促成了企业的健康增长。

支持融合技术，如VoIP和SIP （会话发起协议），将促进收益从2005年的4.69亿美元增长到2009年的9.39亿美元，其中包括PBX和电话供应商，通讯软件，语音系统，甚至业务流程系统供应商。

为了赶上这股浪潮，我们要求供应商向我们提供他们的SIP兼容产品。

我们希望每个产品与开放原始码的IP PBX在SIP的信令模式下工作并进行测试。

我们的理论依据是PBX上购买已接近生命周期结束的Y2K进行升级，企业期待考虑能结合现有资源的IP PBX从而提升他们的电话系统。

这使得支持SIP的IP PBX 成为一张热门车票。

SIP在终端使用更多的智能化处理，如IP电话和个人电脑，能迅速整合业务应用，电子邮件服务器，目录，甚至实时的UC申请。

此外，基于IP和SIP开放式标准的产品将缩短开发时间。

我们还规定，产品必须支持一个通用电子邮件信箱，语音邮件和传真；TTS （文本语音）被用来管理一个通用邮箱，并使用其它各种各样的电子邮件协议，如IMAP， MAPI， POP3和SMTP ;和Active Directory或LDAP 。

外文文献翻译译稿和原文

外文文献翻译译稿1卡尔曼滤波的一个典型实例是从一组有限的，包含噪声的，通过对物体位置的观察序列（可能有偏差）预测出物体的位置的坐标及速度。

在很多工程应用（如雷达、计算机视觉）中都可以找到它的身影。

同时，卡尔曼滤波也是控制理论以及控制系统工程中的一个重要课题。

例如，对于雷达来说，人们感兴趣的是其能够跟踪目标。

但目标的位置、速度、加速度的测量值往往在任何时候都有噪声。

卡尔曼滤波利用目标的动态信息，设法去掉噪声的影响，得到一个关于目标位置的好的估计。

这个估计可以是对当前目标位置的估计（滤波），也可以是对于将来位置的估计（预测），也可以是对过去位置的估计（插值或平滑）。

命名[编辑]这种滤波方法以它的发明者鲁道夫.E.卡尔曼（Rudolph E. Kalman）命名，但是根据文献可知实际上Peter Swerling在更早之前就提出了一种类似的算法。

斯坦利。

施密特（Stanley Schmidt）首次实现了卡尔曼滤波器。

卡尔曼在NASA埃姆斯研究中心访问时，发现他的方法对于解决阿波罗计划的轨道预测很有用，后来阿波罗飞船的导航电脑便使用了这种滤波器。

关于这种滤波器的论文由Swerling（1958）、Kalman (1960)与Kalman and Bucy（1961）发表。

目前，卡尔曼滤波已经有很多不同的实现。

卡尔曼最初提出的形式现在一般称为简单卡尔曼滤波器。

除此以外，还有施密特扩展滤波器、信息滤波器以及很多Bierman, Thornton开发的平方根滤波器的变种。

也许最常见的卡尔曼滤波器是锁相环，它在收音机、计算机和几乎任何视频或通讯设备中广泛存在。

以下的讨论需要线性代数以及概率论的一般知识。

卡尔曼滤波建立在线性代数和隐马尔可夫模型（hidden Markov model）上。

其基本动态系统可以用一个马尔可夫链表示，该马尔可夫链建立在一个被高斯噪声（即正态分布的噪声）干扰的线性算子上的。

系统的状态可以用一个元素为实数的向量表示。

语音识别参考文献

语音识别参考文献语音识别是一项广泛应用于人机交互、语音翻译、智能助手等领域的技术。

它的目标是将人的语音输入转化为可理解和处理的文本数据。

随着人工智能和机器学习的发展，语音识别技术也得到了极大的提升和应用。

在语音识别领域，有许多经典的参考文献和研究成果。

以下是一些值得参考和研究的文献：1. Xiong, W., Droppo, J., Huang, X., Seide, F., Seltzer, M., Stolcke, A., & Yu, D. (2016). Achieving human parity in conversational speech recognition. arXiv preprintarXiv:1610.05256.这篇文章介绍了微软团队在语音识别方面的研究成果，实现了与人类口语识别准确率相媲美的结果。

2. Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A. R., Jaitly, N., ... & Kingsbury, B. (2012). Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal processing magazine, 29(6), 82-97.这篇文章介绍了深度神经网络在语音识别中的应用和研究进展，对于理解当前主流的语音识别技术有很大的帮助。

3. Hinton, G., Deng, L., Li, D., & Dahl, G. E. (2012). Deep neural networks for speech recognition. IEEE Signal Processing Magazine, 29(6), 82-97.这篇文章是语音识别中的经典之作，介绍了深度神经网络在语音识别中的应用和优势。

中英文语音数据集

中英文语音数据集
以下是一些中英文语音数据集的示例：
1. LibriSpeech：一个广泛使用的英文语音数据集，包含来自约1000个书籍的读物语音录音。

该数据集有不同的子集，包括
训练集、开发集和测试集。

它提供了多个说话人的录音，并且语音内容涵盖了多种主题。

2. Common Voice：由Mozilla开发的开源语音数据集，包含多种语言和口音。

该数据集是由互联网用户录制的语音片段组成，可以用于开发和训练语音识别系统。

3. AISHELL：一个来自中国的普通话语音数据集，包含约170小时的录音。

该数据集由中国科学院自动化研究所开发，可用于开发与普通话相关的语音识别和合成技术。

4. TIMIT：一个常用的英文语音数据集，包含美国各地不同说
话人的英语语音片段。

这个数据集对于语音识别和语音合成的研究非常有用。

5. UrbanSound：一个包含城市环境中各种声音的数据集，包
括车辆噪音、人声、音乐等。

这个数据集可用于开发和研究环境音频识别和分类技术。

这只是几个中英文语音数据集的示例，还有许多其他数据集可用于开发和研究语音相关的技术。

外文文献原稿和译文--公交车自动报站系统

The bus stops system automatically1、The bus is automatically stops the background and significanceThe people of car out for provides convenient service, while the bus stops directly affect the quality of the service. Traditional stops by the crew artificially, and in this way because of its poor and working intensity effect is too great, in many big cities have been eliminated. In recent years, with the development of science and technology progress and microcomputer technology in many fields has been widely used. In the acoustic field, with various pronunciation chip microcomputer technology, can complete combined speech synthesis technology, makes the car stops controller is realized for citizens becomes possible, and thus provide a more personalized service. In view of the traditional bus stops system deficiency, combined with the use of public transport vehicle characteristics and practical operating environment, the design of a single-chip microcomputer control bus stops system automatically. The bus stops the design of automatic device is mainly to compensate for changing the traditional voice stops device must have driver control can work backward way, pitted, automatic station broadcasts six-foot-tall service term for the public and provide more humanized more perfect service.2. The system design of each componentThis system is designed hardware circuit design part: use AT89C51 as controller, through ISD4004 pronunciation chip establish speech, forming a variety of information and use the voice messages broadcast speech information and tips amplifier, and using speech, LED digital display for standing count. When the bus arrived at one site, use the keyboard control the system work, through the yukon voice circuits output speech information and tips, stood in several information LED digital tube display. The whole system hardware design including keyboard circuit and reset circuit, display driver circuit, display circuit, memory expansion circuit module. In order to realize the bus stops, namely in speech automatic six-foot-tall; and when voice prompt information and automatic reporting service term, while utilizing of LED dot matrix circuit Chinese displaying. This design is required to exploit the AT89C51 as the master control circuit design of chips, auxiliary circuit requirements including voice circuits, Chinese dot matrix display circuit, the power supply circuit, etc. The CPU control and control signals, pronunciation chip, output indicator light component. The bus station is automatically stops the car wheel design, to count the pulse Angle, will count value compared with preset value, can determine moments, attain the precise automatically stops the purpose. USES AT89C51 as main control chip, combining to foreign pulse count ISD4004 output voice pronunciation chip. System consists of pulse detection, pulse count, CPU control and control signals, pronunciation chip, output indicator light component.About AT89C51 chip: A T89C51 it mainly consists of for the following parts: 1 eight central processing unit (CPU), piece you in memory, the pieces (Flash RAM, 4 of 8 bits two-way addressable I/O port, 1 full-duplex UART (general asynchronous receiver transmitter) serial interface, 2 16 timer/counters, multiple priority nested interrupt structure, and a piece inside oscillator and clock circuit. In AT89C51 structure, the most striking characteristics of internal contains Flash memory is, while in other aspects of the structure of the Intel corporation, the and the structure of the 8051 no much difference.Main performance:1. With MCS - 51 compatible2. 4K bytes programmable flashing memory Life expectancy: 1000 times to write/wipe cycle Data retention time: 10 years3. All the static job: 0Hz - 24Hz4. Tertiary program memory lock5. 128 * 8 bits inside6. 32 programmable I/O lines7. Two 16 timers, counter8. Five interrupt source9. Programmable serial channels10. Slice clock circuit oscillator and withinThe design of pulse detection circuit:The design of the key is that the rotor turn lap count, considering the vehicles will be running in a complex environment, and the hall componets are resistant to vibrate, afraid of dust, grease, water vapor and salt fog the advantages of the pollution or corrosion, so adopt reliable hall element DN6848 as signal acquisition device, again by photoelectric couplers 4N25 input to microcontroller. Photoelectric coupler current transmission 10% ~ 25%, than for less than 10us response time.About speech output circuit designThis series of chip required by the microprocessor or micro controller series through serial peripheral interfaces and serial interface addressing and control. The recording data is stored method of multistage storage is through ISD patent technology implementation, with sound and audio signals directly in the natural form of solid state memory, thus providing high quality replay the fidelity of speech.ISD4004 voice recording devices for 6.4 kHZ sampling frequency, time and recording a single chip 8 points, 10 points, 12 points, 16 points several, and its use of built-in FLASH memory cost nonvolatile CaXie memory, this fast data, and it is not lost power save data department needs power consumption. The typical stored information can save time up to 100 years, the same storage unit can be repeated be recorded 10 million times.IAD4004 chip audio output pin can drive a five thousand uefa load, when device after power up, change the power output pins for 1.2 v. to this design of chosen amplifier is LM386 is for low voltage application design audio amplifier, the working voltage of 6V, maximum distortion degree of 0.2, power frequency response to 20 ~ 100 KHZAbout LED display output design:This circuit USES 16 * 256 destem to display 16 16 * 16 Chinese characters, using the video memory U14 to deposit the characters bitmap information. Screen points page 32, each page 16 line 8 column LED by constitute the light emitting diode, destem with a four - 16 decoder 74LS154 decode, will address A0 - A3 decode formed by two do signal, 4-16 decoder 74LS154 form a 5-32 decoder, carries on the page decode, will address A4 - A8 decode form page, choose communication, respectively, to choose a 74LS244 general 74LS244 data through this system to a page in a line of eight leds into display information.3. The characteristics of the system and advantageThis system greatly improve the accuracy of bus stops, and reliability. Improving the service quality of the bus system. Promote the city economic development and harmonious development of traffic changes. Made up for changing the traditional voice stops device must have drivercontrol can work means, in the bus stops behind when the station broadcasts, automatic six-foot-tall service term for the public and provide more humanized more perfect service.公交车自动报站系统1.公交车自动报站器的背景及意义共汽车为外出的人们提供了方便快捷的服务，而公共汽车的报站直接影响服务的质量。

自然语言处理外文翻译文献

自然语言处理外文翻译文献
这篇文献介绍了自然语言处理（Natural Language Processing, NLP）的基本概念和应用，以及它在现代社会中的重要性。

NLP 是一门研究如何让计算机能够理解和处理人类语言的学科。

它涵盖了语言识别、文本理解、语义分析等多个方面。

NLP 在多个领域有着广泛的应用，包括机器翻译、语音识别、情感分析、信息检索等。

例如，在机器翻译方面，NLP 的技术使得计算机可以自动将一种语言翻译成另一种语言，为跨语言交流提供了便利。

在情感分析方面，NLP 可以帮助识别文本中的情感倾向，并对用户的情感进行分析。

随着人工智能技术的发展，NLP 在社会中的地位变得越来越重要。

NLP 技术的进步不仅可以提高计算机与人类之间的交流能力，还可以为各个行业带来革新和进步。

未来，NLP 有望在医疗保健、金融、智能客服等领域发挥更大的作用。

总之，NLP 是一门前沿的技术学科，它对于提高计算机与人类之间的交流能力和推动社会进步具有重要意义。

在未来的发展中，NLP 有望产生更大的影响，并在各个领域得到广泛应用。

参考文献：
- Smith, J. (2020). Introduction to Natural Language Processing. Journal of Artificial Intelligence, 25(3), 45-59.。

基于Android平台的多语种文字识别翻译APP

基于Android平台的多语种文字识别翻译APP
张璘;唐瑞寒
【期刊名称】《厦门理工学院学报》
【年(卷),期】2017(25)5
【摘要】集成中、英、日、韩、法、西班牙6种世界主要语种,通过jTessBoxEditor OCR开发引擎自主生成训练文字库,利用Tesseract识别引擎对文字进行识别,并将已识别文字送入第三方翻译接口进行目标语种翻译,开发了一款基于Android平台的集成了多语种文字的拍照识别翻译软件APP.通过景区样本实测结果验证,该APP的文字识别率可达93%,实现有效翻译,达到市场准入水平.
【总页数】6页(P61-66)
【作者】张璘;唐瑞寒
【作者单位】厦门理工学院光电与通信工程学院, 福建厦门361024;厦门理工学院光电与通信工程学院, 福建厦门361024
【正文语种】中文
【中图分类】TP391.43
【相关文献】
1.一种基于Android平台的图像文字识别系统 [J], 赵思宁
2.基于文字识别与页面布局的APP控件识别算法 [J], 丁世举;顾乃杰;黄章进;侯津
3.畅玩儿去多语种、便携翻译App推荐 [J],
4.基于平行语料和翻译概率的多语种词对齐方法 [J], 杨飞扬; 赵亚慧; 崔荣一; 易志
伟
5.基于多模态输入的多语种实时翻译软件设计与实现 [J], 权朝臣;邓长明;袁凌云因版权原因，仅展示原文概要，查看原文内容请购买。

机器人技术发展趋势论文中英文对照资料外文翻译文献

中英文对照资料外文翻译文献机器人技术发展趋势谈到机器人，现实仍落后于科幻小说。

但是，仅仅因为机器人在过去的几十年没有实现它们的承诺，并不意味着机器人的时代不会到来，或早或晚。

事实上，多种先进技术的影响已经使得机器人的时代变得更近——更小、更便宜、更实用和更具成本效益。

肌肉、骨骼和大脑任何一个机器人都有三方面：·肌肉——有效联系有关物理荷载以便于机器人运动。

·骨骼——一个机器人的物理结构取决于它所做的工作；它的尺寸大小和重量则取决于它的物理荷载。

·大脑——机器人智能；它能独立思考和做什么；需要多少人工互动。

由于机器人在科幻世界中所被描绘过的方式，很多人希望机器人在外型上与人类相似。

但事实上，机器人的外形更多地取决于它所做的工作或具备的功能。

很多一点儿也不像人的机器也被清楚地归为机器人。

同样，很多看起来像人的机器却还是仅仅属于机械结构和玩具。

很多早期的机器人是除了有很大力气而毫无其他功能的大型机器。

老式的液压动力机器人已经被用来执行3-D任务即平淡、肮脏和危险的任务。

由于第一产业技术的进步，完全彻底地改进了机器人的性能、业绩和战略利益。

比如，20世纪80年代，机器人开始从液压动力转换成为电动单位。

精度和性能也提高了。

工业机器人已经在工作时至今日，全世界机器人的数量已经接近100万，其中超过半数的机器人在日本，而仅仅只有15%在美国。

几十年前，90%的机器人是服务于汽车生产行业，通常用于做大量重复的工作。

现在，只有50%的机器人用于汽车制造业，而另一半分布于工厂、实验室、仓库、发电站、医院和其他的行业。

机器人用于产品装配、危险物品处理、油漆喷雾、抛光、产品的检验。

用于清洗下水道，探测炸弹和执行复杂手术的各种任务的机器人数量正在稳步增加，在未来几年内将继续增长。

机器人智能即使是原始的智力，机器人已经被证明了在生产力、效率和质量方面都能够创造良好的效益。

除此之外，一些“最聪明的”机器人没有用于制造业；它们被用于太空探险、外科手术遥控，甚至于宠物，比如索尼的AIBO电子狗。

毕业论文《语音识别系统的设计与实现》

摘要 (III)Abstract (I)前言 (I)第一章绪论 (1)1.1 研究的目的和意义 (1)1.2 国内外研究历史与现状 (1)1.3 语音识别存在的问题 (4)1.4 论文主要研究内容及结构安排 (5)第二章语音识别系统 (6)2.1 语音识别系统简介 (6)2.1.1 语音识别系统的结构 (6)2.1.2 语音识别的系统类型 (7)2.1.3 语音识别的基元选择 (9)2.2 语音识别系统的应用 (9)2.2.1 语音识别系统的应用分类 (9)2.2.2语音识别系统应用的特点 (10)2.2.3 语音识别系统的应用所面临的问题 (11)2.3 语音识别的算法简介 (12)2.3.1 基于语音学和声学的方法 (12)2.3.2 模板匹配的方法 (13)2.3.3神经网络的方法 (15)第三章语音识别系统的理论基础 (16)3.1 语音识别系统的基本组成 (16)3.2 语音预处理 (17)3.2.1 预加重 (17)3.2.2 加窗分帧 (17)3.2.3 端点检测 (18)3.2.4 语音特征参数提取 (18)3.2.5 语音训练和识别 (22)第四章特定人孤立词语音识别系统的设计方案 (26)4.1 基于VQ语音识别系统的模型设计 (26)4.2 语音识别系统特征参数提取提取 (27)4.2.1 特征参数提取过程 (27)4.2.2 特征提取matlab实现 (28)4.3 VQ训练与识别 (30)4.3.1 用矢量量化生成码本 (30)4.3.2 基于VQ的说话人识别 (31)4.4 设计结果分析 (33)总结与体会 (36)谢辞 (38)参考文献 (39)摘要本文主要介绍了语音识别系统的基础知识，包括语音识别系统的应用、结构以及算法。

重点阐述了语音识别系统的原理以及相关算法，通过参考查阅资料，借助MATLAB工具，设计基于VQ码本训练程序和识别程序，识别特定人的语音。

系统主要包括训练和识别两个阶段。

关于人工智能的外文文献

关于人工智能的外文文献
人工智能（Artificial Intelligence，简称 AI）是计算机科学和数学等相关学科的交叉学科领域，旨在研究、开发和应用智能计算机系统，使计算机具备模仿、实现和智能控制人类智能的能力。

AI 研究的历史可以追溯到上世纪 50 年代，但真正意义上的 AI 技术直至近年来才得到了飞跃式的发展。

AI 技术主要包括机器学习、自然语言处理、机器人技术等。

其中，机器学习是 AI 领域发展最快，应用最广泛的一个技术分支。

机器学习是一种能够让计算机自动学习和改进的算法，通过大量的数据训练模型、发现规律、优化算法，来实现某项特定任务的智能技术。

自然语言处理也是 AI 中一个极为重要的分支，旨在实现计算机对人类语言的自然理解和处理。

该技术被广泛应用于机器翻译、语音识别、智能客服等领域。

机器人技术则将 AI 技术应用于实际机器人的制造和智能控制，实现了人机交互、自主感知、自主导航等功能。

AI 技术的广泛应用，不仅能够提高生产效率，实现科学管理，还可以在医疗、教育、金融、民生等领域中解决眼下一系列难题，改善人类生活。

然而，AI 技术的应用同时也引发了一些问题和挑战。

例如，AI 技术的误判问题，即计算机因处理数据的不足而做出错误决策；AI 技术的隐私问题，即人们的信息可能被 AI 系统收集和利用；AI 技术的失业问题，即一些传统行业可能被 AI 技术替代。

因此，我们需要在推进 AI 技术发展的同时，加强监管和管理。

仅仅依靠技术本身解决问题是不够的，还需要法律法规的支持，严格控制 AI 技术的使用范围和标准，确保人类社会长远的发展。

基于语音识别技术的语音翻译系统研究

基于语音识别技术的语音翻译系统研究一、引言随着全球化进程的加速和信息化技术的发展，语言越来越成为人们沟通的重要障碍。

语音翻译技术作为信息技术领域中的一个重要研究方向，可以帮助人们消除语言不同带来的障碍，使得跨语言交流变得更加方便和高效。

本文旨在研究基于语音识别技术的语音翻译系统，以便更好地促进跨语言交流。

二、语音识别技术语音识别技术是语音翻译技术中的一个重要组成部分。

它主要是指将人类声音转换为文字的过程，常常使用自然语言处理和机器学习等技术来实现。

现代语音识别技术已经比较成熟，其准确率已达到了90%以上，在许多实际应用中已经取得了重要进展。

在语音翻译系统中，语音识别技术主要是用来辨识出输入的语音内容，为后续的翻译过程提供基础数据的。

三、语音翻译技术语音翻译技术是指将不同语言之间的口语转换为另一种语言的口语的技术。

语音翻译技术难度很大，因为不同的语言有不同的语法规则，语音的语调、语速、音调等差异也很大。

目前的语音翻译技术主要是基于机器学习和神经网络等思想，将大量不同语言的数据输入到算法中进行分析，并通过不断的训练和调整来提高翻译的准确率。

语音翻译技术的一个重要应用是在国际会议上的使用，它可以帮助不同语言的人们更好地沟通和交流。

四、语音翻译系统语音翻译系统是语音识别技术和语音翻译技术相结合的产物。

它的主要作用是将用户输入的语音转换为另一种语言的口语，并将翻译结果显示在屏幕上。

语音翻译系统通常包括语音采集、语音识别、语音翻译和语音合成等模块。

语音采集模块主要是将用户输入的语音数据采集下来，并进行预处理和噪音消除等操作。

语音识别模块基于前面的语音识别技术，将处理后的语音转换为文本内容，并进行一定的校验和修正。

语音翻译模块则是根据前面的语音翻译技术，将文本内容转换为另一种语言的文字，然后根据相应的声学模型将其转换为需要的口音。

语音合成模块则是将转换后的语音数据通过合成引擎转换为最终的语音信号，以便用户进行听取和保存。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

中英文资料对照外文翻译Speech Recognition1 Defining the ProblemSpeech recognition is the process of converting an acoustic signal, captured by a microphone or a telephone, to a set of words. The recognized words can be the final results, as for applications such as commands & control, data entry, and document preparation. They can also serve as the input to further linguistic processing in order to achieve speech understanding, a subject covered in section.Speech recognition systems can be characterized by many parameters, some of the more important of which are shown in Figure. An isolated-word speech recognition system requires that the speaker pause briefly between words, whereas a continuous speech recognition system does not. Spontaneous, or extemporaneously generated, speech contains disfluencies, and is much more difficult to recognize than speech read from script. Some systems require speaker enrollment---a user must provide samples of his or her speech before using them, whereas other systems are said to be speaker-independent, in that no enrollment is necessary. Some of the other parameters depend on the specific task. Recognition is generally more difficult when vocabularies are large or have many similar-sounding words. When speech is produced in a sequence of words, language models or artificial grammars are used to restrict the combination of words.1The simplest language model can be specified as a finite-state network, where the permissible words following each word are given explicitly. More general language models approximating natural language are specified in terms of a context-sensitive grammar.One popular measure of the difficulty of the task, combining the vocabulary size and the language model, is perplexity, loosely defined as the geometric mean of the number of words that can follow a word after the language model has been applied (see section for a discussion of language modeling in general and perplexity in particular). Finally, there are some external parameters that can affect speech recognition system performance, including the characteristics of the environmental noise and the type and the placement of the microphone.Table: Typical parameters used to characterize the capability of speech recognition systems Speech recognition is a difficult problem, largely because of the many sources of variability associated with the signal. First, the acoustic realizations of phonemes, the smallest sound units of which words are composed, are highly dependent on the context in which they appear. These phonetic variabilities are exemplified by the acoustic differences of the phoneme，At word boundaries, contextual variations can be quite dramatic---making gas shortage sound like gash shortage in American English, and devo andare sound like devandare in Italian.Second, acoustic variabilities can result from changes in the environment as well as in the position and characteristics of the transducer. Third, within-speaker variabilities can result from changes in the speaker's physical and emotional state, speaking rate, or voice quality. Finally, differences in sociolinguistic background, dialect, and vocal tract size and shape can contributeto across-speaker variabilities.Figure shows the major components of a typical speech recognition system. The digitized speech signal is first transformed into a set of useful measurements or features at a fixed rate, typically once every 10--20 msec (see sectionsand 11.3 for signal representation and digital signal processing, respectively). These measurements are then used to search for the most likely word candidate, making use of constraints imposed by the acoustic, lexical, and language models. Throughout this process, training data are used to determine the values of the model parameters.Figure: Components of a typical speech recognition system.Speech recognition systems attempt to model the sources of variability described above in several ways. At the level of signal representation, researchers have developed representations that emphasize perceptually important speaker-independent features of the signal, and de-emphasize speaker-dependent characteristics. At the acoustic phonetic level, speaker variability is typically modeled using statistical techniques applied to large amounts of data. Speaker adaptation algorithms have also been developed that adapt speaker-independent acoustic models to those of the current speaker during system use, (see section). Effects of linguistic context at the acoustic phonetic level are typically handled by training separate models for phonemes in different contexts; this is called context dependent acoustic modeling.Word level variability can be handled by allowing alternate pronunciations of words in representations known as pronunciation networks. Common alternate pronunciations of words, as well as effects of dialect and accent are handled by allowing search algorithms to find alternate paths of phonemes through these networks. Statistical language models, based onestimates of the frequency of occurrence of word sequences, are often used to guide the search through the most probable sequence of words.The dominant recognition paradigm in the past fifteen years is known as hidden Markov models (HMM). An HMM is a doubly stochastic model, in which the generation of the underlying phoneme string and the frame-by-frame, surface acoustic realizations are both represented probabilistically as Markov processes, as discussed in sections,and 11.2. Neural networks have also been used to estimate the frame based scores; these scores are then integrated into HMM-based system architectures, in what has come to be known as hybrid systems, as described in section 11.5.An interesting feature of frame-based HMM systems is that speech segments are identified during the search process, rather than explicitly. An alternate approach is to first identify speech segments, then classify the segments and use the segment scores to recognize words. This approach has produced competitive recognition performance in several tasks.2 State of the ArtComments about the state-of-the-art need to be made in the context of specific applications which reflect the constraints on the task. Moreover, different technologies are sometimes appropriate for different tasks. For example, when the vocabulary is small, the entire word can be modeled as a single unit. Such an approach is not practical for large vocabularies, where word models must be built up from subword units.Performance of speech recognition systems is typically described in terms of word error rate E, defined as:where N is the total number of words in the test set, and S, I, and D are the total number of substitutions, insertions, and deletions, respectively.The past decade has witnessed significant progress in speech recognition technology. Word error rates continue to drop by a factor of 2 every two years. Substantial progress has been made in the basic technology, leading to the lowering of barriers to speaker independence, continuous speech, and large vocabularies. There are several factors that have contributed to this rapid progress. First, there is the coming of age of the HMM. HMM is powerful in that, with theavailability of training data, the parameters of the model can be trained automatically to give optimal performance.Second, much effort has gone into the development of large speech corpora for system development, training, and testing. Some of these corpora are designed for acoustic phonetic research, while others are highly task specific. Nowadays, it is not uncommon to have tens of thousands of sentences available for system training and testing. These corpora permit researchers to quantify the acoustic cues important for phonetic contrasts and to determine parameters of the recognizers in a statistically meaningful way. While many of these corpora (e.g., TIMIT, RM, ATIS, and WSJ; see section 12.3) were originally collected under the sponsorship of the U.S. Defense Advanced Research Projects Agency (ARPA) to spur human language technology development among its contractors, they have nevertheless gained world-wide acceptance (e.g., in Canada, France, Germany, Japan, and the U.K.) as standards on which to evaluate speech recognition.Third, progress has been brought about by the establishment of standards for performance evaluation. Only a decade ago, researchers trained and tested their systems using locally collected data, and had not been very careful in delineating training and testing sets. As a result, it was very difficult to compare performance across systems, and a system's performance typically degraded when it was presented with previously unseen data. The recent availability of a large body of data in the public domain, coupled with the specification of evaluation standards, has resulted in uniform documentation of test results, thus contributing to greater reliability in monitoring progress (corpus development activities and evaluation methodologies are summarized in chapters 12 and 13 respectively).Finally, advances in computer technology have also indirectly influenced our progress. The availability of fast computers with inexpensive mass storage capabilities has enabled researchers to run many large scale experiments in a short amount of time. This means that the elapsed time between an idea and its implementation and evaluation is greatly reduced. In fact, speech recognition systems with reasonable performance can now run in real time using high-end workstations without additional hardware---a feat unimaginable only a few years ago.One of the most popular, and potentially most useful tasks with low perplexity (PP=11) is the recognition of digits. For American English, speaker-independent recognition of digit strings spoken continuously and restricted to telephone bandwidth can achieve an error rate of 0.3%when the string length is known.One of the best known moderate-perplexity tasks is the 1,000-word so-called Resource Management (RM) task, in which inquiries can be made concerning various naval vessels in the Pacific ocean. The best speaker-independent performance on the RM task is less than 4%, using a word-pair language model that constrains the possible words following a given word (PP=60). More recently, researchers have begun to address the issue of recognizing spontaneously generated speech. For example, in the Air Travel Information Service (ATIS) domain, word error rates of less than 3% has been reported for a vocabulary of nearly 2,000 words and a bigram language model with a perplexity of around 15.High perplexity tasks with a vocabulary of thousands of words are intended primarily for the dictation application. After working on isolated-word, speaker-dependent systems for many years, the community has since 1992 moved towards very-large-vocabulary (20,000 words and more), high-perplexity (PP≈200), speaker-independent, continuous speech recognition. The best system in 1994 achieved an error rate of 7.2% on read sentences drawn from North America business news.With the steady improvements in speech recognition performance, systems are now being deployed within telephone and cellular networks in many countries. Within the next few years, speech recognition will be pervasive in telephone networks around the world. There are tremendous forces driving the development of the technology; in many countries, touch tone penetration is low, and voice is the only option for controlling automated services. In voice dialing, for example, users can dial 10--20 telephone numbers by voice (e.g., call home) after having enrolled their voices by saying the words associated with telephone numbers. AT&T, on the other hand, has installed a call routing system using speaker-independent word-spotting technology that can detect a few key phrases (e.g., person to person, calling card) in sentences such as: I want to charge it to my calling card.At present, several very large vocabulary dictation systems are available for document generation. These systems generally require speakers to pause between words. Their performance can be further enhanced if one can apply constraints of the specific domain such as dictating medical reports.Even though much progress is being made, machines are a long way from recognizing conversational speech. Word recognition rates on telephone conversations in the Switchboardcorpus are around 50%. It will be many years before unlimited vocabulary, speaker-independent continuous dictation capability is realized.3 Future DirectionsIn 1992, the U.S. National Science Foundation sponsored a workshop to identify the key research challenges in the area of human language technology, and the infrastructure needed to support the work. The key research challenges are summarized in. Research in the following areas for speech recognition were identified:Robustness:In a robust system, performance degrades gracefully (rather than catastrophically) as conditions become more different from those under which it was trained. Differences in channel characteristics and acoustic environment should receive particular attention.Portability:Portability refers to the goal of rapidly designing, developing and deploying systems for new applications. At present, systems tend to suffer significant degradation when moved to a new task. In order to return to peak performance, they must be trained on examples specific to the new task, which is time consuming and expensive.Adaptation:How can systems continuously adapt to changing conditions (new speakers, microphone, task, etc) and improve through use? Such adaptation can occur at many levels in systems, subword models, word pronunciations, language models, etc.Language Modeling:Current systems use statistical language models to help reduce the search space and resolve acoustic ambiguity. As vocabulary size grows and other constraints are relaxed to create more habitable systems, it will be increasingly important to get as much constraint as possible from language models; perhaps incorporating syntactic and semantic constraints that cannot be captured by purely statistical models.Confidence Measures:Most speech recognition systems assign scores to hypotheses for the purpose of rank ordering them. These scores do not provide a good indication of whether a hypothesis is corrector not, just that it is better than the other hypotheses. As we move to tasks that require actions, we need better methods to evaluate the absolute correctness of hypotheses.Out-of-Vocabulary Words:Systems are designed for use with a particular set of words, but system users may not know exactly which words are in the system vocabulary. This leads to a certain percentage of out-of-vocabulary words in natural conditions. Systems must have some method of detecting such out-of-vocabulary words, or they will end up mapping a word from the vocabulary onto the unknown word, causing an error.Spontaneous Speech:Systems that are deployed for real use must deal with a variety of spontaneous speech phenomena, such as filled pauses, false starts, hesitations, ungrammatical constructions and other common behaviors not found in read speech. Development on the ATIS task has resulted in progress in this area, but much work remains to be done.Prosody:Prosody refers to acoustic structure that extends over several segments or words. Stress, intonation, and rhythm convey important information for word recognition and the user's intentions (e.g., sarcasm, anger). Current systems do not capture prosodic structure. How to integrate prosodic information into the recognition architecture is a critical question that has not yet been answered.Modeling Dynamics:Systems assume a sequence of input frames which are treated as if they were independent. But it is known that perceptual cues for words and phonemes require the integration of features that reflect the movements of the articulators, which are dynamic in nature. How to model dynamics and incorporate this information into recognition systems is an unsolved problem.语音识别一定义问题语音识别是指音频信号的转换过程，被电话或麦克风的所捕获的一系列的消息。