外文翻译---自适应维纳滤波方法的语音增强

格式：docx
大小：645.33 KB
文档页数：25

下载文档原格式

一种基于MMSE-LSA和VAD的语音增强算法

一种基于MMSE-LSA和VAD的语音增强算法晏光华【摘要】通过介绍语音增强的特点，详细分析了最小均方误差对数谱幅度估计（MMSE-LSA）算法，并提出了与MMSE-LSA算法相匹配的语音激活检测（VAD）算法。

该方案计算简单、易于实现且语音增强效果好，能够动态地跟踪背景噪声的变化。

最后通过仿真分析，比较了MMSE-LSA与其它几种语音增强算法的增强效果。

%The minimum mean square error of log-spectral amplitude estimator (MMSE-LSA) algorithm is analyzed in detail by introducing the characteristics of speech enhancement, and voice activity detection (VAD) algorithm matching with MMSE-LSA algorithm is proposed. This scheme is simple and easy to implement and its speech enhancement effect is good. In addition, it can track the changes of background noise dynamically. Finally, the enhancement effect of MMSE-LSA is compared with that of other algorithms by the analysis of simulation.【期刊名称】《移动通信》【年(卷),期】2014(000)010【总页数】5页(P59-62,66)【关键词】MMSE-LSA;VAD;语音增强【作者】晏光华【作者单位】海军司令部信息化部，北京100036【正文语种】中文【中图分类】TN912.351 引言在语音通信特别是军用语音通信中，各类的噪声干扰较为普遍，坦克、飞机、舰船上的电台常常会受到很强的背景噪声干扰，严重影响语音通信的质量和效果。

子空间与维纳滤波相结合的语音增强方法

子空间与维纳滤波相结合的语音增强方法张雪英;贾海蓉;靳晨升【期刊名称】《计算机工程与应用》【年(卷),期】2011(047)014【摘要】In view of the musical noise after the enhancement of speech corrupted by complicated additive noise, a speech enhancement method based on the combination of subspace and Winner filter is proposed. This method has following steps. By KL transformation the noisy speech is transformed into subspace domain,and the noisy speech eigenvalue is estimated.A Winner filter is formed by using the Signal-Noise-Ratio(SNR) formula in subspace domain. The estimated eigenvalue is filtered by the Winner filter. Thereby the new clean speech eigenvalue is gained. The clean speech is gained by KL reverse transformation. Simulation results show that under the background of white and train noise,the SNR in this method is more excellent than that in traditional subspace method. Meanwhile the musical noise after the enhancement is depressed effectively.%针对复杂背景噪声下语音增强后带有音乐噪声的问题,提出一种子空间与维纳滤波相结合的语音增强方法.对带噪语音进行KL变换,估计出纯净语音的特征值,再利用子空间域中的信噪比计算公式构成一个维纳滤波器,使该特征值通过这个滤波器,从而得到新的纯净语音特征值,由KL逆变换还原出纯净语音.仿真结果表明,在白噪声和火车噪声的背景下,信噪比都比传统予空间方法有明显提高,并有效抑制了增强后产生的音乐噪声.【总页数】3页(P146-148)【作者】张雪英;贾海蓉;靳晨升【作者单位】太原理工大学信息工程学院,太原030024;太原理工大学信息工程学院,太原030024;太原理工大学信息工程学院,太原030024【正文语种】中文【中图分类】TN912【相关文献】1.传声器阵列空间维纳滤波语音增强方法的研究 [J], 王立东;肖熙2.一种改进的子空间语音增强方法 [J], 王文杰;王霞;王国君;佟强3.基于子空间域噪声特征值估计的语音增强方法 [J], 吴北平;李辉;戴蓓倩;陆伟4.基于子空间语音增强方法的研究 [J], 崔秀美5.用小波包改进子空间的语音增强方法 [J], 贾海蓉;张雪英;牛晓薇因版权原因，仅展示原文概要，查看原文内容请购买。

改进的参数自适应的维纳滤波语音增强算法

改进的参数自适应的维纳滤波语音增强算法孟欣;马建芬;张雪英【摘要】To explore that different effects of different types of noise have on the performance of speech enhancement algorithm, a parameter adaptive Wiener filtering speech enhancement algorithm with setting different initial parameters and making the different noise power spectrum estimation according to different types of noise was proposed.The deep neural network was used to classify the noise, and the accurate classification result was obtained.For different noises, the optimally coefficient combination for the Wiener filtering algorithm integrated with the voice activity detection noise power estimator was obtained.A series of experiments were carried out.The objection evaluation shows that the proposed algorithm facing the Babble noise and 5 db SNR increases the PESQ value by 0.25.For other noises, the PESQ value also has a corresponding increase under different signal-to-noise ratios.%为探究不同的噪声对语音增强算法性能的不同影响,提出一种参数自适应维纳滤波语音增强算法,根据不同的噪声类型,设置不同的参数初始值,做不同的噪声功率谱评估.使用深度神经网络对噪声进行分类,得到准确的分类结果;对不同的噪声,得到维纳滤波算法与使用声音活动检测(VAD)进行噪声功率谱评估相结合的语音增强算法的最优系数组合.进行系列实验,客观的评价结果表明,该算法在Babble噪声下,5 db的信噪比时,能够将PESQ值提高0.25,针对其它的噪声与不同信噪比情况,PESQ值也有相应的提高.【期刊名称】《计算机工程与设计》【年(卷),期】2017(038)003【总页数】5页(P714-718)【关键词】深度神经网络;噪声分类;语音增强;维纳滤波算法;声音活动检测【作者】孟欣;马建芬;张雪英【作者单位】太原理工大学计算机科学与技术学院,山西榆次 030600;太原理工大学计算机科学与技术学院,山西榆次 030600;太原理工大学信息工程学院,山西榆次 030600【正文语种】中文【中图分类】TP391.9语音增强算法分为单通道语音增强和多通道语音增强，由于单通道语音增强具有简单且普通适用性等优点，一直被广泛研究[1-5]。

VQ算法语音识别外文翻译文献

文献信息：文献标题：Enhanced VQ-based Algorithms for Speech Independent Speaker Identification（增强的基于VQ算法的说话人语音识别）国外作者： Ningping Fan，Justinian Rosca文献出处：《Audio-and Video-based Biometrie Person Authentication, International Conference, Avbpa,Guildford, Uk, June》, 2003, 2688:470-477 字数统计：英文1869单词，9708字符；中文3008汉字外文文献：Enhanced VQ-based Algorithms for Speech IndependentSpeaker IdentificationAbstract Weighted distance measure and discriminative training are two different approaches to enhance VQ-based solutions for speaker identification. To account for varying importance of the LPC coefficients in SV, the so-called partition normalized distance measure successfully used normalized feature components. This paper introduces an alternative, called heuristic weighted distance, to lift up higher order MFCC feature vector components using a linear formula. Then it proposes two new algorithms combining the heuristic weighting and the partition normalized distance measure with group vector quantization discriminative training to take advantage of both approaches. Experiments using the TIMIT corpus suggest that the new combined approach is superior to current VQ-based solutions (50% error reduction). It also outperforms the Gaussian Mixture Model using the Wavelet features tested in a similar setting.1.IntroductionVector quantization (VQ) based classification algorithms play an important rolein speech independent speaker identification (SI) systems. Although in baseline form, the VQ-based solution is less accurate than the Gaussian Mixture Model (GMM) , it offers simplicity in computation. For a large database of over hundreds or thousands of speakers, both accuracy and speed are important issues. Here we discuss VQ enhancements aimed at accuracy and fast computation.1.1 VQ Based Speaker Identification SystemFig. 1 shows the VQ based speaker identification system. It contains an offline training sub-system to produce VQ codebooks and an online testing sub-system to generate identification decision. Both sub-systems contain a preprocessing or feature extraction module to convert an audio utterance into a set of feature vectors. Features of interest in the recent literatures include the Mel-frequency cepstral coefficients (MFCC), the Line spectra pairs (LSP), the Wavelet packet parameter (WPP), or PCA and ICA features]. Although the WPP and ICA have been shown to offer advantages, we used MFCC in this paper to focus our attention on other modules of the system.Fig. 1. A VQ-based speaker identification system features an online sub-system for identifying testing audio utterance, and an offline training sub-system, which uses training audio utterance to generate a codebook for each speaker in the database.A VQ codebook normally consists of centroids of partitions over spea ker’s feature vector space. The effects to SI by different partition clustering algorithms, such as the LBG and the RLS, have been studied. The average error or distortion of the feature vectors }1,{T t X t ≤≤ of length T with a speaker k codebook is given by)],([1,11min j k t Tt s j k C X d T e ∑=≤≤= L k ≤≤1（1） d(.,.) is a distance function between two vectors. T D j k j k C c C j k ),...,(,,1,,,=is the j code of dimension D. S is the codebook size. L is the total number of speakers in the database. The baseline VQ algorithm of SI simply uses the LBG to generate codebooks and the square of the Euclidean distance as the d(.,.) .Many improvements to the baseline VQ algorithm have been published. Among them, there are two independent approaches: (1) choose a weighted distance function, such as the F-ratio and IHM weights, the Partition Normalized Distance Measure (PNDM) , and the Bhattacharyya Distance; (2) explore discrimination power of inter-speaker characteristics using the entire set of speakers, such as the Group Vector Quantization (GVQ) discriminative training, and the Speaker Discriminative Weighting. Experimentally we have found that PNDM and GVQ are two very effective methods in each of the groups respectively.1.2 Review of Partition Normalized Distance MeasureThe Partition Normalized Distance Measure is defined as the square of the weighted Euclidean distance.2,,1,,,)(),(i j k i D i i j k j k p c x w C X d -=∑=（2） The weighting coefficients are determined by minimizing the average error of training utterances of all the speakers, subject to the constraint that the geometric mean of the weights for each partition is equal to 1.T D j k j k j k x x X ),...,(,,1,,,= be a random training feature vector of speaker k, which is assigned to partition j via minimization process in Equation (1). It has mean and variance vectors:)]()[(][,,,,,,,j k j k T j k j k j k j k j k C X C X E V X E C --== （3）The constrained optimization criterion to be minimized in order to derive the weights is∑∑∑∑∑∑∑∑------------∏+⋅=-∏+-⋅=-∏+⋅=L k S j Di i j k D i j k i j k L k S j D i i j k D i j k i j k i j k i j k L k S j i j k D i j k j k j k p w w S L w c x E w S L w C X d E S L 111,,1,,,111,,1,2,,,,,,,11,,1,,,})1({1})1(])[({1)}1()],([{1λλλξ（4） Where L is the number of speakers, and S is the codebook size. Letting0,,=∂∂i j k w ξ and 0,=∂∂j k λξ （5） We haveD i j k D i j k v 1,,1,⎪⎭⎫ ⎝⎛∏=-λ and ij k jk i j k v w ,,,,,λ= （6）Where sub-script i is the feature vector component index, k and j are speaker andpartition indices respectively. Because k and j are in both sides of the equations, the weights are only dependent on the data from one partition of one speaker.1.3 Review of Group Vector QuantizationDiscriminative training is to use the data of all the speakers to train the codebook, so that it can achieve more accurate identification results by exploring the inter-speaker differences. The GVQ training algorithm is described as follows.Group Vector Quantization Algorithm:（1）Randomly choose a speaker j.（2）Select N vectors }1,{,N t X t j ≤≤（3）calculate error for all the codebooks.If following conditions are satisfied go to （4）a ）}{min k k i e e ∀= ，but j i ≠;b ）W e e e j ij <-，where W is a window size;Else go to （5）（4）for each }1,{,N t X t j ≤≤t j m j m j X C C ,,,)1(⋅+⋅-⇐αα where )},({min arg ,,,,l j t j C m j C X d C lt j ∀=t j n i n i X C C ,,,)1(⋅-⋅+⇐αα )},({min arg ,,,,n i t j C n i C X d C ll i ∀=（5）for each }1,{,N t X t j ≤≤，t j m j m j X C C ,,,)1(⋅+⋅-⇐εαα ，where )},({min arg ,,,,l j t j C m j C X d C ll j ∀=2.EnhancementsWe propose the following steps to further enhance the VQ based solution: (1) a Heuristic Weighted Distance (HWD), (2) combination of HWD and GVQ, and (3) combination of PNDM and GVQ.2.1 Heuristic Weighted DistanceThe PNDM weights are inversely proportional to partition variances of the feature components, as shown in Equation (6). It has been shown that variances of cepstral 21 . Clearly 11,1-≤≤>+D i v v i i where i is the vector element index, which reflects frequency band. The higher the index, the less feature value and its variance.We considered a Heuristic Weighted Distance as2,,1,)(),(),(i j k i D i i j k h c x D S w C X d -⋅=∑= （7）The weights are calculated by)1(),(1),(-⋅+=i D S c D S w i （8）Where c (S , D) is a function of both the codebook size S and the feature vector dimension D. For a given codebook, S and D are fixed, and thus c (S , D) is a constant. The value of c (S , D) is estimated experimentally by performing an exhaustive search to achieve the maximum identification rate in a given sample test dataset.2.2 Combination of HWD and GVQCombination of the HWD and the GVQ is achieved by simply replacing the original square of the Euclidean distance with the HWD Equation (7), and to adjust the GVQ updating parameter α whenever needed.2.3 Combination of PNDM and GVQTo combine PNDM with the GVQ requires a slight more work, because the GVQ alters the partition and thus its component variance. We have used the following algorithm to overcome this problem.Algorithm to Combine PNDM with the GVQ Discriminative Training:（1）Use LBG algorithm to generate initial LBG codebooks;（2）Calculate PNDM weights using the LBG codebooks, and produce PNDM weighted LBG codebooks, which are LBG codebooks appended with the PNDM weights;（3）Perform GVQ training with PNDM distance function, and generate the initial PNDM+GVQ codebooks by replacing the LBG codes with the GVQ codes;（4）Recalculate PNDM weights using the PNDM+GVQ codebooks, and produce the final PNDM+GVQ codebooks by replacing the old PNDM weights with the new ones.3.Experimental Comparison of VQ-based Algorithms3.1 Testing Data and Procedures168 speakers in TEST section of the TIMIT corpus are used for SI experiment, and 190 speakers from DR1, DR2, DR3 of TRAIN section are used for estimating the c(S,D) parameter. Each speaker has 10 good quality recordings of 16 KHz, 16bits/sample, and stored as WA VE files in NIST format. Two of them, SA1.WA V and SA2.WA V, are used for testing, and the rest for training codebooks. We did not perform silence removal on WA VE files, so that others could reproduce the environment with no additional complication of V AD algorithms and their parameters.A MFCC program converts all the WA VE files in a directory into one feature vector file, in which all the feature vectors are indexed with its speaker and recording. For each value of feature vector dimension, D=30, 40, 50, 60, 70, 80, 90, one training file and one testing file are created. They are used by all the algorithms to train codebooks of size S=16, 32, 64, and to perform identification test, respectively.The MFCC feature vectors are calculated as follows: 1) divide the entireutterance into blocks of size 512 samples with 256 overlapping; 2) perform pre-emphasize filtering with coefficient 0.97; 3) multiply with Hamming window, and perform short-time FFT; 4) apply the standard mel-frequency triangular filter banks to the square of magnitude of FFT; 5) apply the logarithm to the sum of all the outputs of each individual filter; 6) apply DCT on the entire set of data resulted from all filters; 7) drop the zero coefficient, to produce the cepstral coefficients; 8) after all the blocks being processed, calculate the mean over the entire time duration and subtract it from the cepstral coefficients; 9) calculate the 1st order time derivatives of cepstral coefficients, and concatenate them after the cepstral coefficients, to form a feature vector. For example, a filter-bank of size 16 will produce 30 dimensional feature vectors.Due to project time constraint, the HWD parameter c(S, D) was estimated at S=16, 32, 64, D=40, 80, so that it achieves the highest identification rate using the 190 speakers dataset of TRAIN section. For other values of S and D, it was interpolated or extrapolated from optimized samples. The results are shown in the bottom section of Table 1. The identification experiment was then performed using the 168 speakers dataset from TEST section. We have used different datasets for c(S, D) estimation, codebooks training, and identification rate testing, to produce objective results.3.2 Testing ResultsTable 1 shows identification rates for various algorithms. The value of the learning parameter a is displayed after the GVQ title, and the parameter c(S, D) is displayed at bottom section. Combination of the algorithms are indicated by a “+” sign between their name abbreviations.Table 1. Identification rates (%) and parameters for various VQ-based algorithms tested, where the 1st row is the feature vector dimension D, and the 1st column is the codebook size S.The baseline algorithm performs poorest as expected. The plain HWD, PNDM, and GVQ all show enhancements over the baseline. Combination methods further enhanced the plain methods. The PNDM+GVQ performs best when codebook size is 16 or 32, while the HWD+GVQ is better at codebook size 64. The highest score of the test is 99.7%, and corresponds to a single miss in 336 utterances of 168 speakers. It outperforms the reported rate 98.4% by using the GMM with WPP features.4.ConclusionA new approach combining the weighted distance measure and the discriminative training is proposed to enhance VQ-based solutions for speech independent speaker identification. An alternative heuristic weighted distance measure was explored, which lifts up higher order MFCC feature vector components using a linear formula. Two new algorithms combining the heuristic weighted distance and the partitionnormalize distance with the group vector quantization discriminative training were developed, which gathers the power of both the weighted distance measure and the discriminative training. Experiments showed that the proposed methods outperform the corresponding single approach VQ-based algorithms, and even more powerful GMM based solutions. Further research on heuristic weighted distance is being conducted particularly for small codebook size.中文译文：增强的基于VQ算法的说话人语音识别摘要在提高基于VQ的说话人识别的解决方案中，加权距离测度和区分性训练是两种不同的方法。

基于自适应滤波的语音信号增强算法研究

基于自适应滤波的语音信号增强算法研究自适应滤波（Adaptive Filtering）是一种处理信号的方法，可以用于语音信号增强。

语音信号增强是指通过消除噪声、提升语音清晰度，改善语音通信质量的技术，具有广泛的应用价值。

本文将探讨基于自适应滤波的语音信号增强算法，并介绍其原理、应用和优缺点。

一、自适应滤波算法原理自适应滤波是一种根据输入信号的统计特性自动调整滤波器参数的方法。

其主要思想是通过使用自适应性参数来调整滤波器的响应，使其能够自动地适应不同环境下的信号特性。

在语音信号增强中，常用的自适应滤波算法有最小均方差（LMS）算法和最小二乘（RLS）算法。

（这里可以适当增加关于LMS和RLS算法的原理和特点的描述）二、自适应滤波在语音信号增强中的应用自适应滤波在语音信号增强领域有广泛的应用。

主要包括噪声抑制、回声消除、语音增强等方面。

1. 噪声抑制噪声是影响语音通信质量的主要因素之一。

传统的降噪方法通常采用固定的滤波器参数，效果有限。

而自适应滤波算法可以根据噪声的统计特性进行动态调整，能够更有效地抑制噪声，提升语音清晰度。

2. 回声消除在语音通信中，由于音频信号在传输过程中会产生回声，会导致语音信号的失真和混淆。

自适应滤波算法可以通过建立模型估计回声信号，并将其从原始语音信号中消除，从而提升语音通信的质量和清晰度。

3. 语音增强语音增强是指通过滤除背景噪声和改善语音质量，提升语音的可听性和识别率。

自适应滤波算法能够自适应地调整滤波器参数，使得语音信号与背景噪声相分离，从而实现对语音信号的增强。

三、自适应滤波算法的优缺点自适应滤波算法在语音信号增强中具有一定的优势和局限性。

优点：1. 自适应性：能够根据不同环境下的信号特性自动调整滤波器参数，适应不同的噪声环境。

2. 实时性：自适应滤波算法通常具有快速收敛的特性，能够在实时系统中实现高效的语音信号增强。

3. 有效性：相比于传统的固定滤波器方法，自适应滤波算法能够更精确地对信号进行增强和抑制，提高语音通信质量。

基于自适应滤波的语音增强算法研究

基于自适应滤波的语音增强算法研究第一章：绪论语音信号增强一直以来都是语音信号处理领域的研究重点之一。

本文主要研究基于自适应滤波的语音增强算法。

自适应滤波可以根据信号的统计特性自动调整滤波器的系数，从而达到滤波效果不受环境噪声和信号特性变化的影响的目的。

自适应滤波已广泛应用于语音增强、降噪、伴奏分离等领域。

本文以语音增强为例，通过研究自适应滤波算法进行语音信号增强，改善语音信号质量，提高语音信号的识别准确率。

在此基础上，结合实验分析，探究自适应滤波算法在语音增强中的应用。

第二章：语音增强技术概述2.1 语音信号增强的定义语音信号增强是指通过一系列的信号处理方法，对被破坏的语音信号进行恢复和修复，以达到清晰易懂的目的。

2.2 语音增强的目标语音增强的目标是通过各种信号处理技术使得语音信号自然、清晰、稳定、易于识别。

主要包括降低噪声、改善语音信号的信噪比、弥补信号损失、提高语音信号的品质。

语音增强广泛应用于语音识别、语音合成、电话、通信、广播电视等领域。

其中最具代表性的应用是语音识别，在嘈杂环境下，语音增强能够显著提高语音识别的准确率。

2.4 语音增强的方法语音增强的方法主要包括时域增强、频域增强、小波域增强和自适应滤波增强。

其中，自适应滤波增强是最为常用的一种方法。

第三章：自适应滤波技术3.1 自适应滤波的定义自适应滤波是一种能够根据信号的统计特性自动调整滤波器系数以实现有效滤波的方法。

3.2 自适应滤波的分类自适应滤波可分为线性自适应滤波和非线性自适应滤波两种。

其中，线性自适应滤波是最常见的一种。

3.3 自适应滤波的原理自适应滤波器根据输入信号的统计特性（如自相关系数、互相关系数等），自动调节滤波器的系数，从而达到滤波效果不受环境噪声和信号特性变化的影响的目的。

自适应滤波已广泛应用于信号增强、降噪、伴奏分离等领域。

在语音信号增强中，自适应滤波器能够减少噪声、强化语音信息，提高语音识别的准确率。

第四章：基于自适应滤波的语音增强算法4.1 基于自适应滤波的语音增强算法原理基于自适应滤波的语音增强算法主要包括三个步骤：预处理、滤波处理、后处理。

基于自适应滤波的语音增强技术研究

基于自适应滤波的语音增强技术研究第一章前言语音增强技术是语音处理领域中的一项重要技术，它可以增强语音信号的质量，使得人们在通话、语音识别、音频转文字等方面表现出更高的准确性和可靠性。

其中，自适应滤波技术作为一种常用的语音增强技术之一，已经被广泛地应用于音频信号处理中。

本文将从自适应滤波的原理和实现方法两个方面入手，探讨自适应滤波在语音增强技术中的应用。

第二章自适应滤波的原理自适应滤波的原理是通过不断地调整滤波器的系数，使得滤波器的输出能够最小化输入信号和期望输出信号之间的均方差（MSE），以达到过滤噪声、增强信号的目的。

自适应滤波器主要分为基于最小均方（LMS）算法和基于最小均方误差（LMS）的算法。

LMS算法是一种简单且广泛使用的自适应滤波算法，通常用于降低系统中激励噪声对语音信号的影响。

第三章自适应滤波的实现方法自适应滤波的实现方法主要包括前向滤波、后向滤波、双向滤波等。

其中前向滤波方法是在时域上进行的滤波，其滤波结果在很大程度上受到系统延迟的影响。

后向滤波方法基于z变换，可以实现更为精确的滤波，但受到计算复杂度的限制，其应用较为有限。

双向滤波方法结合了前向滤波和后向滤波的优点，可以实现更加精确的滤波，但其计算复杂度较高，不适用于实时应用场景。

第四章自适应滤波在语音增强技术中的应用自适应滤波作为一种常用的语音增强技术，可以广泛应用于语音识别、通话质量控制、音频转文字等方面。

在语音识别中，通过自适应滤波技术可以降低语音中的噪声，提升语音识别的准确性。

在通话质量控制中，自适应滤波技术可以有效地降低通话中的噪声，提高通话的清晰度和可靠性。

在音频转文字方面，自适应滤波技术可以滤除音频中的噪声和杂音，提高转换的准确性和稳定性。

第五章结论自适应滤波作为一种重要的语音增强技术，在语音处理领域中具有广泛的应用前景。

通过对自适应滤波的原理和实现方法的研究，可以更好地理解自适应滤波技术在语音增强中的应用场景及优势，提高其在实际应用中的效率和准确性。

2010使用仿生小波变换和自适应阈值函数的语音增强

2010使用仿生小波变换和自适应阈值函数的语音增强Yang Xi Liu Bing-wu Yan FangSchool of InformationBeijing Wuzi UniversityBeijing, China摘要：通过仿真小波变换和自适应阈值的使用，本文介绍了一种改善的基于小波变换的语音增强方法--自适应仿真小波语音增强。

由于将人类听觉系统模型融合到小波变换中，此方法最主要的优点是避免超出阈值语音段。

这可以常常出现在传统基于小波变换语音增强方案中。

然后，他可以追踪没有经过SNR先验知识估计的带噪语音变异。

结果，相对于传统方法此方法做出的语音增强质量大幅度提高。

引言：在实际中，语音信号在接受和传输过程中难免受周围环境中的噪声干扰，如传输介质，通信设备和其他说话者的声音。

受损的信号就是带噪信号。

语音增强方案的主要目的就是从带噪信号中获取纯净语音以降低听者的疲劳同时提语音的感知质量。

在过去十年中已经提出了许多语音增强方法。

但都不完美，这是由于复杂和非稳定的语音信号。

小波变换具有多分辨率域和频域。

所以小波变换能够分析非平稳信号。

最近，小波变换已经成功的运用到信号处理中，比如语音增强。

通过充分的选择小波系数阈值，从嘈杂的小波系数中减去阈值来有效的去除高斯白噪声。

但是，它受到严重的残留噪声和语音失真影响。

Pinter 和Istvan[l]提出一种改进的方案，结合小波变换和听觉性能的临界频带。

他们根据临界频带解压缩带噪信号来减少语音失真。

Mohammed Bahoura[2]和Jean Rouat 提出基于Teager能量算子的小波语音增强。

这种方法通过时域中阈值适应来大大地减少了噪声。

但是当语音信后受到轻微噪声感染会纯在超过阈值的问题。

Hu Yi【3】提出一种基于小波阈值muititaper谱语音增强的低方差谱估计的使用。

它抑制了残留噪声产生更好的质量。

本文中，我们提出运用仿生小波变换和自适应阈值函数的语音增强方法。

子空间与维纳滤波相结合的语音增强方法

Y, S 和 N 分别为 k 维带噪语音矢量、纯净语音矢量和噪声信
号矢量，令 RY ，R S 和 R N 分别表示 Y, S 和 N 的协方差矩阵，令 S = HY 为纯净的语音信号的估计，H 为 k ´ k 的线性预测器。则预测值和真实值的误差为：
ε = S - S = ( H - I ) × S + H × N = εS + εN
146
2011， 47 （14）
Computer Engineering and Applications 计算机工程与应用
子空间与维纳滤波相结合的语音增强方法
张雪英，贾海蓉，靳晨升 ZHANG Xueying， JIA Hairong， JIN Chensheng
太原理工大学信息工程学院，太原 030024 College of Information Engineering， Taiyuan University of Technology， Taiyuan 030024， China ZHANG Xueying， JIA Hairong， JIN Chensheng.Speech enhancement method based on combination of subspace and Winner puter Engineering and Applications， 2011， 47 （14）： 146-148. Abstract： In view of the musical noise after the enhancement of speech corrupted by complicated additive noise， a speech enhancement method based on the combination of subspace and Winner filter is proposed.This method has following steps. By KL transformation the noisy speech is transformed into subspace domain， and the noisy speech eigenvalue is estimated.A Winner filter is formed by using the Signal-Noise-Ratio （SNR） formula in subspace domain.The estimated eigenvalue is filtered by the Winner filter.Thereby the new clean speech eigenvalue is gained.The clean speech is gained by KL reverse transformation.Simulation results show that under the background of white and train noise， the SNR in this method is more excellent than that in traditional subspace method.Meanwhile the musical noise after the enhancement is depressed effectively. Key words：speech enhancement； subspace； Winner filter； musical noise 摘要：针对复杂背景噪声下语音增强后带有音乐噪声的问题，提出一种子空间与维纳滤波相结合的语音增强方法。对带噪语

不同背景噪声下基于维纳滤波的语音增强

242CHINA SCIENCE&TECHNOLOGY不同背景噪声下基于维纳滤波的语音增强王正欢王俊芳武汉大学电子信息学院引言语音信号在传输过程中各种噪声的干扰会影响语音质量。

语音增强的目的就是从带噪语音中恢复原始的语音信号。

它的应用十分广泛，是很多语音信号处理比如语音识别、语音编码等不可或缺的预处理步骤。

语音增强的方法有很多，如谱减法、维纳滤波法、卡尔曼滤波法、MMSE等等。

维纳滤波法是基于最小均方误差准则下构造的一种滤波器。

本文从维纳滤波法出发，由在不同噪声背景下使用维纳滤波法得到的语音增强的效果来分析维纳滤波法的性能。

1.维纳滤波的基本原理语音信号是短时平稳的，一般语音信号在处理之前先要对其进行分帧加窗处理。

假设某帧原始纯净语音为()x m ,带噪语音为()y m ，带噪语音的FFT为()Y f ，带噪语音经过维纳滤波器()W f 后得到原始纯净语音频谱的估计为ˆ()Xf ，则)()()(ˆf Y f W f X =。

估计误差信号()E f 定义为原始纯净语音谱()X f 与)(ˆf X之差，频域的均方误差为为了得到最小均方误差滤波器，上式对()W f 求导令其为0，、分别为()Y f 的自功率谱，()Y f 与()X f 的互功率谱，由此得到频域最小均方误差维纳滤波器为)()()(f P f Pf W =。

而对于含有加性噪声的语音信号，维纳滤波器为)()()()(f P f P f P f W +=将)()()(f Pf P f SNR =带入上式，有1)()()(+=f SNR f SNR f W 从中可以看出维纳滤波器可以用信噪比简单的表示，得到信噪比的估计便得到了维纳滤波器的实现。

2.维纳滤波用于语音增强的具体实现2．1 进行维纳滤波的关键就是得到信噪比，得到信噪比后维纳滤波器就可以进行意义。

现在我们用本文中的维纳滤波器，分别对不同噪声背景下的语音进行增强，通过分析增强效果得到维纳滤波器的稳健性。

基于小波包与自适应维纳滤波的语音增强算法

Speech Enhancement Algorithm Based on Wavelet Packet and Adaptive Wiener Filter
DONG Hu1,2 ,XU Yu-ming1 ,MA Zhen-zhong1 ,LI Lie-wen1 ,REN Ke1
(1. School of Information Science and Engineering,Changsha Normal University,Changsha 410100,China; 2. School of Physics and Electronics,Hunan Normal University,Changsha 410181,China)
(1. 长沙师范学院信息科学与工程学院,湖南长沙 410100; 2. 湖南师范大学物理与电子科学学院,湖南长沙 410181)
摘摇要:语音增强主要用来提高受噪声污染的语音可懂度和语音质量,它的主要应用与在嘈杂环境中提高移动通信质量有关。传统的语音增强方法有谱减法、维纳滤波、小波系数法等。针对复杂噪声环境下传统语音增强算法增强后的语音质量不佳且存在音乐噪声的问题,提出了一种结合小波包变换和自适应维纳滤波的语音增强算法。分析小波包多分辨率在信号频谱划分中的作用,通过小波包对含噪信号作多尺度分解,对不同尺度的小波包系数进行自适应维纳滤波,使用滤波后的小波包系数重构进而获取增强的语音信号。仿真实验结果表明,与传统增强算法相比,该算法在低信噪比的非平稳噪声环境下不仅可以更有效地提高含噪语音的信噪比,而且能较好地保存语音的谱特征,提高了含噪语音的质量。关键词:语音增强;小波包;自适应维纳滤波;多分辨率分析;多尺度分解中图分类号:TP301. 6摇摇摇摇摇摇文献标识码:A摇摇摇摇摇摇文章编号:1673-629X(2020)01-0050-04 doi:10. 3969 / j. issn. 1673-629X. 2020. 01. 009

中英翻译《使用加权滤波器的一种改进的谱减语音增强算法》

使用加权滤波器的一种改进的谱减语音增强算法摘要在噪声环境，例如飞机座舱、汽车引擎中，语音中或多或少地夹杂着噪声。

为了减少带噪语音中的噪声，我们提出了一种改进型的谱减算法。

这种算法是利用对谱减的过度减法而实现的。

残余噪声能够利用人类听觉系统的掩蔽特性被掩蔽。

为了消除残余的音乐噪声，引入了一种基于心理声学的有用的加权滤波器。

通过仿真发现其增强的语音并未失真，而且音乐噪声也被有效地掩蔽，从而体现了一种更好的性能。

关键词：语音增强；谱减1.引言语音信号中经常伴有环境中的背景噪声。

在一些应用中如：语音命令系统，语音识别，说话者认证，免提系统，背景噪声对语音信号的处理有许多不利的影响。

语音增强技术可以被分为单通道和多通道或多通道增强技术。

单通道语音增强技术的应用情况是只有一个采集通道可用。

谱减语音增强算法是一个众所周知的单通道降噪技术[]2,1。

大多数实现和多种基本技术的运用是在语音谱上减去对噪声谱的估计而得以实现的。

传统的功率谱相减的方法大大减少了带噪语音中的噪声水平。

然而，它也在语音信号中引入了一种被称为音乐噪声的恼人的失真。

在本文中我们运用一种能够更好、更多地抑制噪声的改进的频谱过度减法的方法[]3。

该方法的运用是为了估计纯净语音的功率谱，它是通过从语音功率谱中减去噪声功率谱的过度估计而实现的。

此外，为了在语音失真和噪声消除之间找到最佳的平衡点，一种基于声学心理学的动机谱加权规则被纳入。

通过利用人耳听觉系统的掩蔽特性能够掩蔽现有的残余噪声。

当确定了语音掩蔽阈值的时候，运用一种改进的掩蔽阈值估计来消除噪声的影响。

该方法提供了比传统的功率谱相减法更优越的性能，并能在很大程度上降低音乐噪声。

2．过度谱相减算法该方法的基本假设是把噪声看作是独立的加性噪声。

假设已经被不相关的加性噪声信号()t n降解的语音信号为()t s：()()()t n t s t x += （1）带噪语音信号的短时功率谱近似为：()()()ωωωj j j e N e S e X +≈ （2）通过用无音期间得到的平均值()2ωj e N 代替噪声的平方幅度值()2ωj e N 得到功率谱相减的估计值为： ()()()222ˆωωωj j j e N e X e S -= （3）在运用了谱减算法之后，由于估计的噪声和有效噪声之间的差异而出现了一种残余噪声。

基于自适应滤波算法的自适应语音增强技术研究

基于自适应滤波算法的自适应语音增强技术研究自适应语音增强技术是智能化语音处理领域的重要研究方向之一。

其核心思想是利用数字信号处理技术，根据噪声环境及其特征，自动地调整语音信号的特性，使其更加清晰、准确地传递语音信息。

作为自适应语音增强技术的一种主要方法，自适应滤波算法在实际应用中表现出了出色的性能和效果。

一、自适应滤波算法基本原理自适应滤波算法是一种基于信号甄别技术的算法，其核心思想是通过对噪声环境的实时学习，自动调整滤波器的参数，以有效地抑制噪声对语音信号的影响，从而实现自适应语音增强的目的。

自适应滤波器的结构主要包括输入端、滤波器结构和误差修正器三部分。

输入端接受被噪声污染的语音信号，然后通过滤波器结构，根据预设的滤波器参数，对输入信号进行滤波操作，得到滤波后的输出信号。

误差修正器通过对滤波后的输出信号与期望输出信号之间的误差进行计算，更新滤波器参数，从而逐渐适应不同噪声环境，使滤波器的滤波效果越来越优秀。

二、基于自适应滤波算法的语音增强技术目前，基于自适应滤波算法的语音增强技术已经得到了广泛应用。

其主要技术路线是对语音信号进行预处理、滤波和后处理三个环节，其中自适应滤波算法作为关键技术之一，将噪声信号进行有效抑制，从而提高语音信号的质量和可懂度。

在预处理环节中，主要采用的技术包括降噪、去混响、幅度归一化等。

降噪技术的作用是在语音信号的前提下，对噪声进行有效抑制。

去混响技术则是通过模拟信号在空间间隔的反射和衰减过程，去除语音信号中的残余混响。

幅度归一化技术则是将语音信号的特征幅值进行归一化处理，以保证各语音信号数据的相对平衡性。

在滤波环节中，自适应滤波算法则是进行噪声滤波的关键技术。

在语音信号被噪声污染的情况下，自适应滤波器能够自动调整滤波器的参数，对噪声进行有效抑制。

具体来讲，在自适应滤波器的构架中，输入信号通过滤波器结构之后得到滤波后的输出信号，重要的是通过误差修正器，对滤波器参数进行逐步调整，以达到自适应抑噪的效果。

自适应滤波器在语音增强中的应用

自适应滤波器在语音增强中的应用自适应滤波器是一种常用的数字信号处理技术，广泛应用于语音增强领域。

它能够根据环境中的噪声情况，自动地调整滤波器参数，从而有效地减少噪声对语音信号的干扰。

本文将讨论自适应滤波器在语音增强中的应用。

一、简介自适应滤波器基于自适应信号处理理论，它可以根据环境的变化自动调整滤波器的参数，以适应不同的信号干扰情况。

在语音增强领域，自适应滤波器可以用来降低噪声干扰，提升语音信号的质量和清晰度。

二、原理自适应滤波器的原理是基于最小均方误差准则。

通过不断调整滤波器的权值，使得滤波器的输出与语音信号的期望值之间的均方误差最小化。

具体来说，自适应滤波器通过与参考信号的相关性来调整滤波器的参数，使得滤波器的输出尽可能接近期望信号。

三、应用1. 语音增强自适应滤波器在语音增强中起到了关键作用。

通过对麦克风采集到的语音信号进行处理，自适应滤波器可以减少噪声的干扰，提高语音的清晰度和可懂性。

这对于语音通信、语音识别等应用非常重要。

2. 语音降噪在噪声环境中，语音信号往往受到各种噪声的干扰，导致语音质量下降。

自适应滤波器可以根据噪声的特征和信号的相关性，自动调整滤波器参数，抑制噪声成分，从而降低噪声对语音信号的干扰。

3. 语音增强器设计自适应滤波器在语音增强器设计中起到了重要的作用。

通过不断地调整滤波器的参数，自适应滤波器可以根据实际信号的情况，提供最佳的语音增强效果。

这对于提高语音清晰度和语音信号质量具有重要意义。

四、实验与应用案例1. 实验设计为了验证自适应滤波器在语音增强中的应用效果，我们进行了一系列的实验。

首先，我们采集了一组包含噪声的语音信号作为输入信号。

然后，我们采用了自适应滤波器对输入信号进行处理。

最后，我们比较了经过自适应滤波器处理后的语音信号与原始信号的差异。

2. 结果与分析实验结果表明，自适应滤波器在语音增强中能够有效地抑制噪声干扰，提升语音信号的清晰度和可懂性。

经过自适应滤波器处理后的语音信号相较于原始信号有明显的改善，噪声被有效地降低，语音部分得到了保留和增强。

DCT域维纳滤波语音增强

ｐｕｔａｔｉｏｎａｌｃｏｓｔｉｓｓｍａｌｌ，ｅａｓｙｔｏｒｅａｌｉｚｅ．
Ｋｅｙｗｏｒｄｓ：ｓｐｅｅｃｈｅｎｈａｎｃｅｍｅｎｔ；Ｗｉｅｎｅｒｉｌｆｔｅｒｉｎｇ；ｄｉｓｃｒｅｔｅｃｏｓｉｎｅｔｒａｎｓｆｏｒｍ；ａｄａｐｔｉｖｅｐｒｏｃｅｓｓｉｎｇ
Ａｐｐｌｉｃａｔｉｏｎｓ，２０１－２３０．
Ａｂｓｔｒａｃｔ：Ａｃｏｕｓｔｉｃｓｉｇｎａｌｉｓｄｉｉｃｆｕｌｔｔｏｂｅｅｘｔｒａｃｔｅｄｕｎｄｅｒｎｏｉｓｅｗｉｔｈｓｔｒｏｎｇｂａｃｋｇｒｏｕｎｄｎｏｎ — ｓｔａｔｉｏｎａｒｙｎｏｉｓｅ，ａＷｉｅｎｅｒｉｌｆｔｅｒｉｎｇｍｅｔｈｏｄｉｓｐｒｏｐｏｓｅｄａｔＤＣＴｄｏｍａｉｎ．Ｔｈｅｐａｐｅｒｄｅｔａｉｌｓｓｅｇｍｅｎｔａｔｉｏｎｓｔｅｐｓｂｅｔｗｅｅｎｔｈｅｖｏｉｃｅｌｅｓｓａｎｄｖｏｉｃｅｄ，ｔｏ
摘要：针对非平稳噪声和强背景噪声下声音信号难以提取的实际问题，提出了一种ＤＣＴ域的维纳滤波方法。列出了ＤＣＴ域清浊音分割步骤，给出了ＤＣＴ域频谱信噪比迭代更新机制与具体实施方案，设计了ＤＣＴ域的二维维纳滤波。实验仿真表明，该算法能有效地去噪滤波，改善可懂度，且在不同的噪声环境和信噪比条件下具有鲁棒性。该

一种改进的维纳滤波语音增强算法

ŝ k (n) = F -1[Ŝ k ( ω)]
| N k (ω) |new =
2 2 new
β × N̂ ( ω)
|
|
2 NIS
+ | Y k (ω) |
2
1+β
（8）
其中| N k ( ω) | 表示通过平滑处理后对当前帧的噪声功率谱估计。 β为噪声平滑因子。| Y k | 为第k 帧带噪语音信号的功率谱。
|
|
2 ì ̂ ïξ = SNR = α. S k - 1( ω) + (1 - a) × max[SNR - 1 0] prio post 2 ï k N̂ k ( ω) ï （6） í 2 ï Y k (ω) | | ïγ k = SNR post = 2 ï N̂ k ( ω) î
|
|
| | | | | | | |
假设 s k (t)和 n k (t)都是短时平稳随机信号，且二者互不相关 [7]。对上式进行傅里叶变换，得到：
Y k (ω) = S k (ω) + N k (ω)
其次，从起始帧开始，判断当前帧为语音信号还是噪声信号，更新对噪声功率谱的估计，以便通过式（6）计算先验信噪比和后验信噪比。若是噪声信号，则通过对 N̂ ( ω)
|
|
式中 Ŝ k - 1( ω) 为第k - 1帧纯净语音信号的功率谱估计， | Y k (ω) |
|
|
2
2
为第 k 帧带噪语音信号的功率谱，N̂ k ( ω) 为对第 k 帧的噪声功率谱估计[6]。
|
|
2
|
|
2 NIS
；
（2）从语音信号起始帧 NIS 开始，判断当前帧为语音信号还是噪声信号。若是噪声信号，通过式（8）对 N̂ ( ω)

自适应滤波：维纳滤波器——GSC算法及语音增强

⾃适应滤波：维纳滤波器——GSC 算法及语⾳增强作者：桂。

时间：2017-03-26 06:06:44链接：声明：欢迎被转载，不过记得注明出处哦~ 【读书笔记04】前⾔仍然是西蒙.赫⾦的《⾃适应滤波器原理》第四版第⼆章，⾸先看到，接着到了，此处为约束扩展的维纳滤波，全⽂包括： 1）背景介绍； 2）⼴义旁瓣相消（Generalized Sidelobe Cancellation, GSC ）理论推导； 3）GSC 应⽤——语⾳阵列信号增强；内容为⾃⼰的学习记录，其中错误之处，还请各位帮忙指正！⼀、背景介绍在中，有w H s θ0=g 的约束条件，即s H θ0w =g .如s θ0为旋转向量时，希望在θ0处保留波束—>对应g 1=1，希望在θ2处抑制波束—>对应g 2=0，写成⼀般形式：写成更⼀般的形式：C H w =g假设w 权值个数为M ，在⼀般约束维纳滤波中可以看出：限定条件使得结果更符合预期的效果。

假设C 为M×L 的矩阵：L 个线性约束条件。

对于M 个变量的⽅程组，对应唯⼀解最多有M 个⽅程，即：对于L 个线性约束来讲，我们仍可以继续利⽤剩下的M-L 个⾃由度进⾏约束，使得结果更加符合需求（⽐如增强某信号、抑制某信号等），这便是GSC 的背景。

⼆、GSC 理论推导 A-理论介绍书中的推导较为繁琐，我们可以从投影空间的⾓度加以理解，也就是最⼩⼆乘结果的矩阵求逆形式，给出简要说明：对于矩阵A （N×M ）:如果A 是满列秩（N>=M ）对于符合LA=I 的矩阵解为：L =A H A −1A H ;如果A 是满⾏秩（N<=M ）对于符合AR=I 的矩阵解为：R =A H AA H−1.对于C H w =g ，得出最优解：()()()()()()w q =C C H C−1g记：w re =w −w q为了便于对余量w re 进⾏控制，将C 扩展为：[ C | C a ]，C a 的列向量为矩阵C 列向量张成空间的正交补空间的基，即：C H a C =0分析新的空间特性：上式有C H w re =0，这就说明只要满⾜该条件，r e =C H a w re 就是补空间的余量，如何保证⼀定有C H w re =0呢？可以将w re 写为：−C a w a 的形式，之所以添加−可能是因为正交补空间可以认为C 列向量空间不能表征的成分，我们通常认为这⼀部分为该丢弃的残差，也因为是残差：C a 通常被称为阻塞矩阵（取Block 之意）,很多书籍⽤B 表⽰。

Gammatone与Wiener滤波联合语音增强研究

Gammatone与Wiener滤波联合语音增强研究潘欣裕;赵鹤鸣【摘要】语音增强的多频带处理技术日益受到重视,提出G_W联合算法,使用Gammatone滤波器组将含噪语音信号分解成若干个频带信号,通过改进的Wiener滤波技术对各个频带信号进行降噪,最后综合这若干频带,得到增强后的语音.实验结果表明,该方法能够更为有效地抑制宽带噪声,降低音乐噪声残留,获得较好的听觉舒适度,具有实用价值.【期刊名称】《计算机工程与应用》【年(卷),期】2010(046)026【总页数】4页(P14-16,52)【关键词】Gammatone 滤波器;Wiener 滤波器;语音增强【作者】潘欣裕;赵鹤鸣【作者单位】苏州科技学院,电子与信息工程学院,江苏,苏州,215011;苏州大学,电子信息学院,江苏,苏州,215021;苏州大学,电子信息学院,江苏,苏州,215021【正文语种】中文【中图分类】TN912.31 引言现代信号处理技术的进步促进了语音增强技术的发展，从Boll提出经典的谱减法语音增强技术[1]至今已有30余年了。

期间经历了以Wiener滤波[2]为代表的一系列自适应增强算法，发展到以人的心理声学模型为依托的听觉模型方法[3]等，但是单传声器采集的语音与噪声的频谱不可能完全没有交叠，所以通过引入空间信息的多个传声器阵列语音增强方法被越来越多的研究人员所采用[4]，获得了较好的增强效果；可由于引入了阵列采集，庞大的设备体积给应用带来诸多不便。

近年来一些研究表明将单通道语音分成不同的频段来处理[5]，可以改善增强的效果，但是此种方法都是集中在频域处理，系统复杂且计算量相对较大。

根据语音的发音机理可知，基音轨迹决定了声调，共振峰信息很大程度上体现了语义信息，传统的谱减法和Wiener滤波算法都或多或少会对这些信息造成损失。

依据人的听觉原理，利用Gammatone时域滤波器将单通道语音分解成多个不同频段的时域信号，这相当于用单通道信号构建出不同频率成分的多通道信号。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

附录ADAPTIVE WIENER FILTERING APPROACH FOR SPEECHENHANCEMENTM. A. Abd El-Fattah*, M. I. Dessouky , S. M. Diab and F. E. Abd El-samie #Department of Electronics and Electrical communications, Faculty of ElectronicEngineering Menoufia University, Menouf, EgyptE-mails:************************,#*********************ABSTRACTThis paper proposes the application of the Wiener filter in an adaptive manner inspeech enhancement. The proposed adaptive Wiener filter depends on the adaptation of the filter transfer function from sample to sample based on the speech signal statistics(meanand variance). The adaptive Wiener filter is implemented in time domain rather than infrequency domain to accommodate for the varying nature of the speech signal. Theproposed method is compared to the traditional Wiener filter and spectral subtractionmethods and the results reveal its superiority.Keywords: Speech Enhancement, Spectral Subtraction, Adaptive Wiener Filter1 INTRODUCTIONSpeech enhancement is one of the most important topics in speech signal processing.Several techniques have been proposed for this purpose like the spectral subtraction approach, the signal subspace approach, adaptive noise canceling and the iterative Wiener filter[1-5] . The performances of these techniques depend on quality andintelligibility of the processed speech signal. The improvement of the speech signal-tonoise ratio (SNR) is the target of most techniques.Spectral subtraction is the earliest method for enhancing speech degraded by additive noise[1]. This technique estimates the spectrum of the clean(noise-free) signal by the subtraction of the estimated noise magnitude spectrum from the noisy signal magnitude spectrum while keeping the phase spectrum of the noisy signal. The drawback of this technique is the residual noise.Another technique is a signal subspace approach [3]. It is used for enhancing a speech signal degraded by uncorrelated additive noise or colored noise [6,7]. The idea of this algorithm is based on the fact that the vector space of the noisy signal can be decomposed into a signal plus noise subspace and an orthogonal noise subspace.Processing is performed on the vectors in the signal plus noise subspace only, while the noise subspace is removed first. Decomposition of the vector space of the noisy signal is performed by applying an eigenvalue or singular value decomposition or by applying the Karhunen-Loeve transform (KLT)[8]. Mi. et. al. have proposed the signal / noise KLT based approach for colored noise removal[9]. The idea of this approach is that noisy speech frames are classified into speech-dominated frames and noise-dominated frames. In the speech-dominated frames, the signal KLT matrix is used and in the noise-dominated frames, the noise KLT matrix is used.In this paper, we present a new technique to improve the signal-to-noise ratio in the enhanced speech signal by using an adaptive implementation of the Wiener filter. This implementation is performed in time domain to accommodate for the varying nature of the signal.The paper is organized as follows: in section II, a review of the spectral subtraction technique is presented. In section III, the traditional Wiener filter in frequency domain is revisited. Section IV, proposes the adaptive Wiener filtering approach for speech enhancement. In section V, a comparative study between the proposed adaptive Wiener filter, the Wiener filter in frequency domain and the spectral subtraction approach ispresented.2 SPECTRAL SUBTRACTIONSpectral subtraction can be categorized as a non -parametric approach, which simply needs an estimate of the noise spectrum. It is assume that there is an estimate of the noise spectrum that is typically estimated during periods of speaker silence. Let x (n ) be a noisy speech signal :x (n ) = s (n ) + v (n ) (1) where s (n ) is the clean (the noise -free) signal, and v (n ) is the white gaussian noise. Assume that the noise and the clean signals are uncorrelated. By applying the spectral subtraction approach that estimates the short term magnitude spectrum of the noise -freesignal ()ωS by subtraction of the estimated noise magnitude spectrum )(ˆωVfrom the noisy signal magnitude spectrum ()ωX It is sufficient to use the noisy signal phase spectrum as an estimate of the clean speech phase spectrum,[10]:()()()()()()ωωωωX j N X S ∠-=exp ˆˆ (2) The estimated time -domain speech signal is obtained as the inverse Fourier transform of ()ωSˆ. Another way to recover a clean signal s (n ) from the noisy signal x(n ) using the spectral subtraction approach is performed by assuming that there is an the estimate of the power spectrum of the noise Pv ( ω) , that is obtained by averaging over multiple frames of a known noise segment. An estimate of the clean signal short -time squared magnitude spectrum can be obtained as follow [8]:()()()()()⎪⎩⎪⎨⎧≥--=otherwisev P X if v P X S ,00ˆ,ˆˆ222ωωωωω (3) It is possible combine this magnitude spectrum estimate with the measured phase and then get the Short Time Fourier Transform (STFT) estimate as follows:()()()ωωωX j e S S∠=ˆˆ (4) A noise -free signal estimate can then be obtained with the inverse Fourier transform. This noise reduction method is a specific case of the general technique given by Weiss, et al. and extended by Berouti , et al.[2,12].The spectral subtraction approach can be viewed as a filtering operation where high SNR regions of the measured spectrum are attenuated less than low SNR regions. This formulation can be given in terms of the SNR defined as:()()ωωv P X SNR ˆ2= (5) Thus, equation (3) can be rewritten as:()()()()()1222211ˆˆ-⎥⎦⎤⎢⎣⎡+≈-=SNR X X v P X S ωωωωω (6) An important property of noise suppression using spectral subtraction is that the attenuation characteristics change with the length of the analysis window. A common problem for using spectral subtr action is the musicality that results from the rapid coming and going of waves over successive frames [13].3 WIENER FILTER IN FREQUNCY DOMAINThe Wiener filter is a popular technique that has been used in many signal enhancement methods. The basic principle of the Wiener filter is to obtain a clean signal from that corrupted by additive noise. It is required estimate an optimalfilter for the noisy input speech by minimizing the Mean Square Error (MSE) between the desired signal s(n) and the estimated signal s ˆ(n ) . The frequency domain solution to this optimization problem is given by[13]:()()()()ωωωωPv Ps Ps H += (7) where Ps (ω) and Pv (ω) are the power spectral densities of the clean and the noise signals, respectively. This formula can be derived considering the signal s and the noise signal v as uncorrelated and stationary signals. The signal -to -noise ratio is defined by[13]:()()ωωv P Ps SNR ˆ= (8) This definition can be incorporated to the Wiener filter equation as follows:()111-⎥⎦⎤⎢⎣⎡+=SNR H ω (9) The drawback of the Wiener filter is the fixed frequency response at all frequencies and the requirement to estimate the power spectral density of the clean signal and noise prior to filtering.4 THE PROPOSED ADAPTIVE WIENER FILTERThis section presents and adaptive implementation of the Wiener filter which benefits from the varying local statistics of the speech signal. A block diagram of the proposed approach is illustrated in Fig. (1). In this approach, the estimated speech signal mean x mand variance 2x σare exploited.Figure 1: Typical adaptive speech enhancement system for additive noise reductionIt is assumed that the additive noise v(n) is of zero mean and has a white nature withvariance of 2x σ.Thus, the power spectrum Pv (ω) can be approximated by:()2v Pv σω= (10)Consider a small segment of the speech signal in which the signal x(n) is assumed to be stationary, The signal x(n) can be modeled by:()()n m n x x x ωσ+= (11)where x m and x σ are the local mean and standard deviation of x(n). w(n) is a unit variance noise.Within this small segment of speech, the Wiener filter transfer function can be approximated by:()()()()222vs s Pv Ps Ps H σσσωωωω+=+= (12) From Eq.(12), because H(ω) is constant over the small segment of speech, the impulse response of the Wiener filter can be obtained by:()()n n h vs s δσσσ222+= (13) From Eq.(13), the enhanced speech ()n Sˆ within this local segment can be expressed as:()()()()()()x v s s x v s s x x m n x m n m n x m n S -++=+*-+=222222ˆσσσδσσσ (14)If it is assumed that mx and σ s are updated at each sample, we can say:()()()()()()()n m n x n n n m n S x v s s x -++=222ˆσσσ (15) In Eq.(15), the local mean mx (n ) and (x (n ) − mx (n )) are modified separately fromsegment to segment and then the results are combined. If 2v σ is much larger than 2v σ theoutput signal s ˆ(n ) is assumed to be primarily due to x(n) and the input signal x (n) is not attenuated. If 2s σ is smaller than 2v σ , the filtering effect is performe.Notice that mx is identical to ms when mv is zero. So, we can estimate mx (n) in Eq.(15) from x (n) by:()()()()∑+-=+==M n Mn k x s k x M n m n m 121ˆˆ (16) where (2M +1) is the number of samples in the short segment used in the estimation.To measure the local signal statistics in the system of Figure 1, the algorithm developed uses the signal variance 2s σ. The specific method used to designing thespace -variant h(n) is given by(17.b).Since 222v s x σσσ+= may be estimated from x (n) by:()()()⎩⎨⎧>-=otherwise n if n n v v v x s,0ˆˆ,ˆˆˆ22222σσσσσ (17.a)Where()()()()()∑+-=-+=M n M n k x xn m k x M n 22ˆ121ˆσ (17.b) By this proposed method, we guarantee that the filter transfer function is adapted from sample to sample based on the speech signal statistics.5 EXPERIMENTAL RESULTSFor evaluation purposes, we use different speech signals like the handel, laughter and gong signals. White Gaussian noise is added to each speech signal with different SNRs. The different speech enhancement algorithms such as the spectral subtraction method, the Weiner filter in frequency domain and the proposed adaptive Wiener filter are carried out on the noisy speech signals. The peak signal to noise ratio (PSNR)results for each enhancement algorithm are compared.In the first experiment , all the above-mentioned algorithms are carried out on the Handle signal with different SNRs and the output PSNR results are shown in Fig. (2). The same experiment is repeated for the Laughter and Gong signals and the results are shown in Figs.(3) and (4), respectively.From these figures, it is clear that the proposed adaptive Wiener filter approach has the best performance for different SNRs. The adaptive Wiener filter approach gives about 3-5 dB improvement at different values of SNR. The nonlinearity between input SNR and output PSNR is due to the adaptive nature of the filter.Figure 2:PSNR results for white noise case at-10 dB to +35 dB SNR levels for Handle signalFigure 3: PSNR results for white noise case at -10 dB to +35 dB SNR levels for Laughter signalFigure 4:PSNR results for white noise case at -10 dB to +35 dB SNR levels for Gong signalThe results of the different enhancement algorithms for the handle signal with SNRs of 5,10,15 and 20 dB in the both time and frequency domain are given in Figs. (5) to (12). These results reveal that the best performance is that of the proposed adaptive Wiener filter.Figure 5: Time domain results of the Handel sig. At SNR = +5dB (a) original sig. (b) noisy sig. (c) spectral subtraction. (d) Wiener filtering. (e) adaptive WienerFiltering.Figure 6:The spectrum of the Handel sig. in Fig.(5) (a) original sig. (b) noisy sig. (c) spectral subtraction. (d) Wiener filtering. (e) adaptive Wiener filtering.Figure 7: Time domain results of the Handel sig. At SNR = 10 dB (a) original sig. (b) noisy sig. (c) spectral subtraction. (d) Wiener filtering. (e) adaptive Wiener filtering.Figure 8: The spectrum of the Handel sig. in Fig.(7)(a) original sig. (b) noisy sig. (c) spectral subtraction. (d)Wiener filtering. (e) adaptive Wiener filtering.Figure 9: Time domain results of the Handel sig. At SNR = 15 dB (a) original sig. (b) noisy sig. (c) spectral subtraction. (d) Wiener filtering. (e) adaptive Wiener filtering.Figure 10: The spectrum of the Handel sig. in Fig.(9)(a) original sig. (b) noisy sig. (c) spectral subtraction. (d)Wiener filtering. (e) adaptive Wiener filtering.Figure 11: Time domain results of the Handel sig. At SNR = 20 dB (a) original sig. (b) noisy sig. (c) spectral subtraction. (d) Wiener filtering. (e) adaptive WienerFiltering.Figure 12:The spectrum of the Handel sig. in Fig.(11)(a) original sig. (b) noisy sig. (c) spectral subtraction. (d)Wiener filtering. (e) adaptive Wiener filtering.6 CONCLUSIONAn adaptive Wiener filter approach for speech enhancement is proposed in this papaper. This approach depends on the adaptation of the filter transfer function from sample to sample based on the speech signal statistics(mean and variance). This results indicates that the proposed approach provides the best SNR improvementamong the spectral subtraction approach and the traditional Wiener filter approach in frequency domain. The results also indicate that the proposed approach can treat musical noise better than the spectral subtraction approach and it can avoid the drawbacks of Wiener filter in frequency domain .自适应维纳滤波方法的语音增强摘要本文提出了维纳滤波器的方式应用在自适应语音增强。

外文翻译---自适应维纳滤波方法的语音增强

合集下载

一种基于MMSE-LSA和VAD的语音增强算法

子空间与维纳滤波相结合的语音增强方法

改进的参数自适应的维纳滤波语音增强算法

VQ算法语音识别外文翻译文献

基于自适应滤波的语音信号增强算法研究

基于自适应滤波的语音增强算法研究

基于自适应滤波的语音增强技术研究

2010使用仿生小波变换和自适应阈值函数的语音增强

子空间与维纳滤波相结合的语音增强方法

不同背景噪声下基于维纳滤波的语音增强

基于小波包与自适应维纳滤波的语音增强算法

中英翻译《使用加权滤波器的一种改进的谱减语音增强算法》

基于自适应滤波算法的自适应语音增强技术研究

自适应滤波器在语音增强中的应用

DCT域维纳滤波语音增强

一种改进的维纳滤波语音增强算法

自适应滤波：维纳滤波器——GSC算法及语音增强

Gammatone与Wiener滤波联合语音增强研究

文档推荐

最新文档