外文翻译---自适应维纳滤波方法的语音增强
- 格式:docx
- 大小:645.33 KB
- 文档页数:25
一种基于MMSE-LSA和VAD的语音增强算法晏光华【摘要】通过介绍语音增强的特点,详细分析了最小均方误差对数谱幅度估计(MMSE-LSA)算法,并提出了与MMSE-LSA算法相匹配的语音激活检测(VAD)算法。
该方案计算简单、易于实现且语音增强效果好,能够动态地跟踪背景噪声的变化。
最后通过仿真分析,比较了MMSE-LSA与其它几种语音增强算法的增强效果。
%The minimum mean square error of log-spectral amplitude estimator (MMSE-LSA) algorithm is analyzed in detail by introducing the characteristics of speech enhancement, and voice activity detection (VAD) algorithm matching with MMSE-LSA algorithm is proposed. This scheme is simple and easy to implement and its speech enhancement effect is good. In addition, it can track the changes of background noise dynamically. Finally, the enhancement effect of MMSE-LSA is compared with that of other algorithms by the analysis of simulation.【期刊名称】《移动通信》【年(卷),期】2014(000)010【总页数】5页(P59-62,66)【关键词】MMSE-LSA;VAD;语音增强【作者】晏光华【作者单位】海军司令部信息化部,北京100036【正文语种】中文【中图分类】TN912.351 引言在语音通信特别是军用语音通信中,各类的噪声干扰较为普遍,坦克、飞机、舰船上的电台常常会受到很强的背景噪声干扰,严重影响语音通信的质量和效果。
子空间与维纳滤波相结合的语音增强方法张雪英;贾海蓉;靳晨升【期刊名称】《计算机工程与应用》【年(卷),期】2011(047)014【摘要】In view of the musical noise after the enhancement of speech corrupted by complicated additive noise, a speech enhancement method based on the combination of subspace and Winner filter is proposed. This method has following steps. By KL transformation the noisy speech is transformed into subspace domain,and the noisy speech eigenvalue is estimated.A Winner filter is formed by using the Signal-Noise-Ratio(SNR) formula in subspace domain. The estimated eigenvalue is filtered by the Winner filter. Thereby the new clean speech eigenvalue is gained. The clean speech is gained by KL reverse transformation. Simulation results show that under the background of white and train noise,the SNR in this method is more excellent than that in traditional subspace method. Meanwhile the musical noise after the enhancement is depressed effectively.%针对复杂背景噪声下语音增强后带有音乐噪声的问题,提出一种子空间与维纳滤波相结合的语音增强方法.对带噪语音进行KL变换,估计出纯净语音的特征值,再利用子空间域中的信噪比计算公式构成一个维纳滤波器,使该特征值通过这个滤波器,从而得到新的纯净语音特征值,由KL逆变换还原出纯净语音.仿真结果表明,在白噪声和火车噪声的背景下,信噪比都比传统予空间方法有明显提高,并有效抑制了增强后产生的音乐噪声.【总页数】3页(P146-148)【作者】张雪英;贾海蓉;靳晨升【作者单位】太原理工大学信息工程学院,太原030024;太原理工大学信息工程学院,太原030024;太原理工大学信息工程学院,太原030024【正文语种】中文【中图分类】TN912【相关文献】1.传声器阵列空间维纳滤波语音增强方法的研究 [J], 王立东;肖熙2.一种改进的子空间语音增强方法 [J], 王文杰;王霞;王国君;佟强3.基于子空间域噪声特征值估计的语音增强方法 [J], 吴北平;李辉;戴蓓倩;陆伟4.基于子空间语音增强方法的研究 [J], 崔秀美5.用小波包改进子空间的语音增强方法 [J], 贾海蓉;张雪英;牛晓薇因版权原因,仅展示原文概要,查看原文内容请购买。
改进的参数自适应的维纳滤波语音增强算法孟欣;马建芬;张雪英【摘要】To explore that different effects of different types of noise have on the performance of speech enhancement algorithm, a parameter adaptive Wiener filtering speech enhancement algorithm with setting different initial parameters and making the different noise power spectrum estimation according to different types of noise was proposed.The deep neural network was used to classify the noise, and the accurate classification result was obtained.For different noises, the optimally coefficient combination for the Wiener filtering algorithm integrated with the voice activity detection noise power estimator was obtained.A series of experiments were carried out.The objection evaluation shows that the proposed algorithm facing the Babble noise and 5 db SNR increases the PESQ value by 0.25.For other noises, the PESQ value also has a corresponding increase under different signal-to-noise ratios.%为探究不同的噪声对语音增强算法性能的不同影响,提出一种参数自适应维纳滤波语音增强算法,根据不同的噪声类型,设置不同的参数初始值,做不同的噪声功率谱评估.使用深度神经网络对噪声进行分类,得到准确的分类结果;对不同的噪声,得到维纳滤波算法与使用声音活动检测(VAD)进行噪声功率谱评估相结合的语音增强算法的最优系数组合.进行系列实验,客观的评价结果表明,该算法在Babble噪声下,5 db的信噪比时,能够将PESQ值提高0.25,针对其它的噪声与不同信噪比情况,PESQ值也有相应的提高.【期刊名称】《计算机工程与设计》【年(卷),期】2017(038)003【总页数】5页(P714-718)【关键词】深度神经网络;噪声分类;语音增强;维纳滤波算法;声音活动检测【作者】孟欣;马建芬;张雪英【作者单位】太原理工大学计算机科学与技术学院,山西榆次 030600;太原理工大学计算机科学与技术学院,山西榆次 030600;太原理工大学信息工程学院,山西榆次 030600【正文语种】中文【中图分类】TP391.9语音增强算法分为单通道语音增强和多通道语音增强,由于单通道语音增强具有简单且普通适用性等优点,一直被广泛研究[1-5]。
文献信息:文献标题:Enhanced VQ-based Algorithms for Speech Independent Speaker Identification(增强的基于VQ算法的说话人语音识别)国外作者: Ningping Fan,Justinian Rosca文献出处:《Audio-and Video-based Biometrie Person Authentication, International Conference, Avbpa,Guildford, Uk, June》, 2003, 2688:470-477 字数统计:英文1869单词,9708字符;中文3008汉字外文文献:Enhanced VQ-based Algorithms for Speech IndependentSpeaker IdentificationAbstract Weighted distance measure and discriminative training are two different approaches to enhance VQ-based solutions for speaker identification. To account for varying importance of the LPC coefficients in SV, the so-called partition normalized distance measure successfully used normalized feature components. This paper introduces an alternative, called heuristic weighted distance, to lift up higher order MFCC feature vector components using a linear formula. Then it proposes two new algorithms combining the heuristic weighting and the partition normalized distance measure with group vector quantization discriminative training to take advantage of both approaches. Experiments using the TIMIT corpus suggest that the new combined approach is superior to current VQ-based solutions (50% error reduction). It also outperforms the Gaussian Mixture Model using the Wavelet features tested in a similar setting.1.IntroductionVector quantization (VQ) based classification algorithms play an important rolein speech independent speaker identification (SI) systems. Although in baseline form, the VQ-based solution is less accurate than the Gaussian Mixture Model (GMM) , it offers simplicity in computation. For a large database of over hundreds or thousands of speakers, both accuracy and speed are important issues. Here we discuss VQ enhancements aimed at accuracy and fast computation.1.1 VQ Based Speaker Identification SystemFig. 1 shows the VQ based speaker identification system. It contains an offline training sub-system to produce VQ codebooks and an online testing sub-system to generate identification decision. Both sub-systems contain a preprocessing or feature extraction module to convert an audio utterance into a set of feature vectors. Features of interest in the recent literatures include the Mel-frequency cepstral coefficients (MFCC), the Line spectra pairs (LSP), the Wavelet packet parameter (WPP), or PCA and ICA features]. Although the WPP and ICA have been shown to offer advantages, we used MFCC in this paper to focus our attention on other modules of the system.Fig. 1. A VQ-based speaker identification system features an online sub-system for identifying testing audio utterance, and an offline training sub-system, which uses training audio utterance to generate a codebook for each speaker in the database.A VQ codebook normally consists of centroids of partitions over spea ker’s feature vector space. The effects to SI by different partition clustering algorithms, such as the LBG and the RLS, have been studied. The average error or distortion of the feature vectors }1,{T t X t ≤≤ of length T with a speaker k codebook is given by)],([1,11min j k t Tt s j k C X d T e ∑=≤≤= L k ≤≤1(1) d(.,.) is a distance function between two vectors. T D j k j k C c C j k ),...,(,,1,,,=is the j code of dimension D. S is the codebook size. L is the total number of speakers in the database. The baseline VQ algorithm of SI simply uses the LBG to generate codebooks and the square of the Euclidean distance as the d(.,.) .Many improvements to the baseline VQ algorithm have been published. Among them, there are two independent approaches: (1) choose a weighted distance function, such as the F-ratio and IHM weights, the Partition Normalized Distance Measure (PNDM) , and the Bhattacharyya Distance; (2) explore discrimination power of inter-speaker characteristics using the entire set of speakers, such as the Group Vector Quantization (GVQ) discriminative training, and the Speaker Discriminative Weighting. Experimentally we have found that PNDM and GVQ are two very effective methods in each of the groups respectively.1.2 Review of Partition Normalized Distance MeasureThe Partition Normalized Distance Measure is defined as the square of the weighted Euclidean distance.2,,1,,,)(),(i j k i D i i j k j k p c x w C X d -=∑=(2) The weighting coefficients are determined by minimizing the average error of training utterances of all the speakers, subject to the constraint that the geometric mean of the weights for each partition is equal to 1.T D j k j k j k x x X ),...,(,,1,,,= be a random training feature vector of speaker k, which is assigned to partition j via minimization process in Equation (1). It has mean and variance vectors:)]()[(][,,,,,,,j k j k T j k j k j k j k j k C X C X E V X E C --== (3)The constrained optimization criterion to be minimized in order to derive the weights is∑∑∑∑∑∑∑∑------------∏+⋅=-∏+-⋅=-∏+⋅=L k S j Di i j k D i j k i j k L k S j D i i j k D i j k i j k i j k i j k L k S j i j k D i j k j k j k p w w S L w c x E w S L w C X d E S L 111,,1,,,111,,1,2,,,,,,,11,,1,,,})1({1})1(])[({1)}1()],([{1λλλξ(4) Where L is the number of speakers, and S is the codebook size. Letting0,,=∂∂i j k w ξ and 0,=∂∂j k λξ (5) We haveD i j k D i j k v 1,,1,⎪⎭⎫ ⎝⎛∏=-λ and ij k jk i j k v w ,,,,,λ= (6)Where sub-script i is the feature vector component index, k and j are speaker andpartition indices respectively. Because k and j are in both sides of the equations, the weights are only dependent on the data from one partition of one speaker.1.3 Review of Group Vector QuantizationDiscriminative training is to use the data of all the speakers to train the codebook, so that it can achieve more accurate identification results by exploring the inter-speaker differences. The GVQ training algorithm is described as follows.Group Vector Quantization Algorithm:(1)Randomly choose a speaker j.(2)Select N vectors }1,{,N t X t j ≤≤(3)calculate error for all the codebooks.If following conditions are satisfied go to (4)a )}{min k k i e e ∀= ,but j i ≠;b )W e e e j ij <-,where W is a window size;Else go to (5)(4)for each }1,{,N t X t j ≤≤t j m j m j X C C ,,,)1(⋅+⋅-⇐αα where )},({min arg ,,,,l j t j C m j C X d C lt j ∀=t j n i n i X C C ,,,)1(⋅-⋅+⇐αα )},({min arg ,,,,n i t j C n i C X d C ll i ∀=(5)for each }1,{,N t X t j ≤≤,t j m j m j X C C ,,,)1(⋅+⋅-⇐εαα ,where )},({min arg ,,,,l j t j C m j C X d C ll j ∀=2.EnhancementsWe propose the following steps to further enhance the VQ based solution: (1) a Heuristic Weighted Distance (HWD), (2) combination of HWD and GVQ, and (3) combination of PNDM and GVQ.2.1 Heuristic Weighted DistanceThe PNDM weights are inversely proportional to partition variances of the feature components, as shown in Equation (6). It has been shown that variances of cepstral 21 . Clearly 11,1-≤≤>+D i v v i i where i is the vector element index, which reflects frequency band. The higher the index, the less feature value and its variance.We considered a Heuristic Weighted Distance as2,,1,)(),(),(i j k i D i i j k h c x D S w C X d -⋅=∑= (7)The weights are calculated by)1(),(1),(-⋅+=i D S c D S w i (8)Where c (S , D) is a function of both the codebook size S and the feature vector dimension D. For a given codebook, S and D are fixed, and thus c (S , D) is a constant. The value of c (S , D) is estimated experimentally by performing an exhaustive search to achieve the maximum identification rate in a given sample test dataset.2.2 Combination of HWD and GVQCombination of the HWD and the GVQ is achieved by simply replacing the original square of the Euclidean distance with the HWD Equation (7), and to adjust the GVQ updating parameter α whenever needed.2.3 Combination of PNDM and GVQTo combine PNDM with the GVQ requires a slight more work, because the GVQ alters the partition and thus its component variance. We have used the following algorithm to overcome this problem.Algorithm to Combine PNDM with the GVQ Discriminative Training:(1)Use LBG algorithm to generate initial LBG codebooks;(2)Calculate PNDM weights using the LBG codebooks, and produce PNDM weighted LBG codebooks, which are LBG codebooks appended with the PNDM weights;(3)Perform GVQ training with PNDM distance function, and generate the initial PNDM+GVQ codebooks by replacing the LBG codes with the GVQ codes;(4)Recalculate PNDM weights using the PNDM+GVQ codebooks, and produce the final PNDM+GVQ codebooks by replacing the old PNDM weights with the new ones.3.Experimental Comparison of VQ-based Algorithms3.1 Testing Data and Procedures168 speakers in TEST section of the TIMIT corpus are used for SI experiment, and 190 speakers from DR1, DR2, DR3 of TRAIN section are used for estimating the c(S,D) parameter. Each speaker has 10 good quality recordings of 16 KHz, 16bits/sample, and stored as WA VE files in NIST format. Two of them, SA1.WA V and SA2.WA V, are used for testing, and the rest for training codebooks. We did not perform silence removal on WA VE files, so that others could reproduce the environment with no additional complication of V AD algorithms and their parameters.A MFCC program converts all the WA VE files in a directory into one feature vector file, in which all the feature vectors are indexed with its speaker and recording. For each value of feature vector dimension, D=30, 40, 50, 60, 70, 80, 90, one training file and one testing file are created. They are used by all the algorithms to train codebooks of size S=16, 32, 64, and to perform identification test, respectively.The MFCC feature vectors are calculated as follows: 1) divide the entireutterance into blocks of size 512 samples with 256 overlapping; 2) perform pre-emphasize filtering with coefficient 0.97; 3) multiply with Hamming window, and perform short-time FFT; 4) apply the standard mel-frequency triangular filter banks to the square of magnitude of FFT; 5) apply the logarithm to the sum of all the outputs of each individual filter; 6) apply DCT on the entire set of data resulted from all filters; 7) drop the zero coefficient, to produce the cepstral coefficients; 8) after all the blocks being processed, calculate the mean over the entire time duration and subtract it from the cepstral coefficients; 9) calculate the 1st order time derivatives of cepstral coefficients, and concatenate them after the cepstral coefficients, to form a feature vector. For example, a filter-bank of size 16 will produce 30 dimensional feature vectors.Due to project time constraint, the HWD parameter c(S, D) was estimated at S=16, 32, 64, D=40, 80, so that it achieves the highest identification rate using the 190 speakers dataset of TRAIN section. For other values of S and D, it was interpolated or extrapolated from optimized samples. The results are shown in the bottom section of Table 1. The identification experiment was then performed using the 168 speakers dataset from TEST section. We have used different datasets for c(S, D) estimation, codebooks training, and identification rate testing, to produce objective results.3.2 Testing ResultsTable 1 shows identification rates for various algorithms. The value of the learning parameter a is displayed after the GVQ title, and the parameter c(S, D) is displayed at bottom section. Combination of the algorithms are indicated by a “+” sign between their name abbreviations.Table 1. Identification rates (%) and parameters for various VQ-based algorithms tested, where the 1st row is the feature vector dimension D, and the 1st column is the codebook size S.The baseline algorithm performs poorest as expected. The plain HWD, PNDM, and GVQ all show enhancements over the baseline. Combination methods further enhanced the plain methods. The PNDM+GVQ performs best when codebook size is 16 or 32, while the HWD+GVQ is better at codebook size 64. The highest score of the test is 99.7%, and corresponds to a single miss in 336 utterances of 168 speakers. It outperforms the reported rate 98.4% by using the GMM with WPP features.4.ConclusionA new approach combining the weighted distance measure and the discriminative training is proposed to enhance VQ-based solutions for speech independent speaker identification. An alternative heuristic weighted distance measure was explored, which lifts up higher order MFCC feature vector components using a linear formula. Two new algorithms combining the heuristic weighted distance and the partitionnormalize distance with the group vector quantization discriminative training were developed, which gathers the power of both the weighted distance measure and the discriminative training. Experiments showed that the proposed methods outperform the corresponding single approach VQ-based algorithms, and even more powerful GMM based solutions. Further research on heuristic weighted distance is being conducted particularly for small codebook size.中文译文:增强的基于VQ算法的说话人语音识别摘要在提高基于VQ的说话人识别的解决方案中,加权距离测度和区分性训练是两种不同的方法。
基于自适应滤波的语音信号增强算法研究自适应滤波(Adaptive Filtering)是一种处理信号的方法,可以用于语音信号增强。
语音信号增强是指通过消除噪声、提升语音清晰度,改善语音通信质量的技术,具有广泛的应用价值。
本文将探讨基于自适应滤波的语音信号增强算法,并介绍其原理、应用和优缺点。
一、自适应滤波算法原理自适应滤波是一种根据输入信号的统计特性自动调整滤波器参数的方法。
其主要思想是通过使用自适应性参数来调整滤波器的响应,使其能够自动地适应不同环境下的信号特性。
在语音信号增强中,常用的自适应滤波算法有最小均方差(LMS)算法和最小二乘(RLS)算法。
(这里可以适当增加关于LMS和RLS算法的原理和特点的描述)二、自适应滤波在语音信号增强中的应用自适应滤波在语音信号增强领域有广泛的应用。
主要包括噪声抑制、回声消除、语音增强等方面。
1. 噪声抑制噪声是影响语音通信质量的主要因素之一。
传统的降噪方法通常采用固定的滤波器参数,效果有限。
而自适应滤波算法可以根据噪声的统计特性进行动态调整,能够更有效地抑制噪声,提升语音清晰度。
2. 回声消除在语音通信中,由于音频信号在传输过程中会产生回声,会导致语音信号的失真和混淆。
自适应滤波算法可以通过建立模型估计回声信号,并将其从原始语音信号中消除,从而提升语音通信的质量和清晰度。
3. 语音增强语音增强是指通过滤除背景噪声和改善语音质量,提升语音的可听性和识别率。
自适应滤波算法能够自适应地调整滤波器参数,使得语音信号与背景噪声相分离,从而实现对语音信号的增强。
三、自适应滤波算法的优缺点自适应滤波算法在语音信号增强中具有一定的优势和局限性。
优点:1. 自适应性:能够根据不同环境下的信号特性自动调整滤波器参数,适应不同的噪声环境。
2. 实时性:自适应滤波算法通常具有快速收敛的特性,能够在实时系统中实现高效的语音信号增强。
3. 有效性:相比于传统的固定滤波器方法,自适应滤波算法能够更精确地对信号进行增强和抑制,提高语音通信质量。
基于自适应滤波的语音增强算法研究第一章:绪论语音信号增强一直以来都是语音信号处理领域的研究重点之一。
本文主要研究基于自适应滤波的语音增强算法。
自适应滤波可以根据信号的统计特性自动调整滤波器的系数,从而达到滤波效果不受环境噪声和信号特性变化的影响的目的。
自适应滤波已广泛应用于语音增强、降噪、伴奏分离等领域。
本文以语音增强为例,通过研究自适应滤波算法进行语音信号增强,改善语音信号质量,提高语音信号的识别准确率。
在此基础上,结合实验分析,探究自适应滤波算法在语音增强中的应用。
第二章:语音增强技术概述2.1 语音信号增强的定义语音信号增强是指通过一系列的信号处理方法,对被破坏的语音信号进行恢复和修复,以达到清晰易懂的目的。
2.2 语音增强的目标语音增强的目标是通过各种信号处理技术使得语音信号自然、清晰、稳定、易于识别。
主要包括降低噪声、改善语音信号的信噪比、弥补信号损失、提高语音信号的品质。
语音增强广泛应用于语音识别、语音合成、电话、通信、广播电视等领域。
其中最具代表性的应用是语音识别,在嘈杂环境下,语音增强能够显著提高语音识别的准确率。
2.4 语音增强的方法语音增强的方法主要包括时域增强、频域增强、小波域增强和自适应滤波增强。
其中,自适应滤波增强是最为常用的一种方法。
第三章:自适应滤波技术3.1 自适应滤波的定义自适应滤波是一种能够根据信号的统计特性自动调整滤波器系数以实现有效滤波的方法。
3.2 自适应滤波的分类自适应滤波可分为线性自适应滤波和非线性自适应滤波两种。
其中,线性自适应滤波是最常见的一种。
3.3 自适应滤波的原理自适应滤波器根据输入信号的统计特性(如自相关系数、互相关系数等),自动调节滤波器的系数,从而达到滤波效果不受环境噪声和信号特性变化的影响的目的。
自适应滤波已广泛应用于信号增强、降噪、伴奏分离等领域。
在语音信号增强中,自适应滤波器能够减少噪声、强化语音信息,提高语音识别的准确率。
第四章:基于自适应滤波的语音增强算法4.1 基于自适应滤波的语音增强算法原理基于自适应滤波的语音增强算法主要包括三个步骤:预处理、滤波处理、后处理。
基于自适应滤波的语音增强技术研究第一章前言语音增强技术是语音处理领域中的一项重要技术,它可以增强语音信号的质量,使得人们在通话、语音识别、音频转文字等方面表现出更高的准确性和可靠性。
其中,自适应滤波技术作为一种常用的语音增强技术之一,已经被广泛地应用于音频信号处理中。
本文将从自适应滤波的原理和实现方法两个方面入手,探讨自适应滤波在语音增强技术中的应用。
第二章自适应滤波的原理自适应滤波的原理是通过不断地调整滤波器的系数,使得滤波器的输出能够最小化输入信号和期望输出信号之间的均方差(MSE),以达到过滤噪声、增强信号的目的。
自适应滤波器主要分为基于最小均方(LMS)算法和基于最小均方误差(LMS)的算法。
LMS算法是一种简单且广泛使用的自适应滤波算法,通常用于降低系统中激励噪声对语音信号的影响。
第三章自适应滤波的实现方法自适应滤波的实现方法主要包括前向滤波、后向滤波、双向滤波等。
其中前向滤波方法是在时域上进行的滤波,其滤波结果在很大程度上受到系统延迟的影响。
后向滤波方法基于z变换,可以实现更为精确的滤波,但受到计算复杂度的限制,其应用较为有限。
双向滤波方法结合了前向滤波和后向滤波的优点,可以实现更加精确的滤波,但其计算复杂度较高,不适用于实时应用场景。
第四章自适应滤波在语音增强技术中的应用自适应滤波作为一种常用的语音增强技术,可以广泛应用于语音识别、通话质量控制、音频转文字等方面。
在语音识别中,通过自适应滤波技术可以降低语音中的噪声,提升语音识别的准确性。
在通话质量控制中,自适应滤波技术可以有效地降低通话中的噪声,提高通话的清晰度和可靠性。
在音频转文字方面,自适应滤波技术可以滤除音频中的噪声和杂音,提高转换的准确性和稳定性。
第五章结论自适应滤波作为一种重要的语音增强技术,在语音处理领域中具有广泛的应用前景。
通过对自适应滤波的原理和实现方法的研究,可以更好地理解自适应滤波技术在语音增强中的应用场景及优势,提高其在实际应用中的效率和准确性。
2010使用仿生小波变换和自适应阈值函数的语音增强Yang Xi Liu Bing-wu Yan FangSchool of InformationBeijing Wuzi UniversityBeijing, China摘要:通过仿真小波变换和自适应阈值的使用,本文介绍了一种改善的基于小波变换的语音增强方法--自适应仿真小波语音增强。
由于将人类听觉系统模型融合到小波变换中,此方法最主要的优点是避免超出阈值语音段。
这可以常常出现在传统基于小波变换语音增强方案中。
然后,他可以追踪没有经过SNR先验知识估计的带噪语音变异。
结果,相对于传统方法此方法做出的语音增强质量大幅度提高。
引言:在实际中,语音信号在接受和传输过程中难免受周围环境中的噪声干扰,如传输介质,通信设备和其他说话者的声音。
受损的信号就是带噪信号。
语音增强方案的主要目的就是从带噪信号中获取纯净语音以降低听者的疲劳同时提语音的感知质量。
在过去十年中已经提出了许多语音增强方法。
但都不完美,这是由于复杂和非稳定的语音信号。
小波变换具有多分辨率域和频域。
所以小波变换能够分析非平稳信号。
最近,小波变换已经成功的运用到信号处理中,比如语音增强。
通过充分的选择小波系数阈值,从嘈杂的小波系数中减去阈值来有效的去除高斯白噪声。
但是,它受到严重的残留噪声和语音失真影响。
Pinter 和Istvan[l]提出一种改进的方案,结合小波变换和听觉性能的临界频带。
他们根据临界频带解压缩带噪信号来减少语音失真。
Mohammed Bahoura[2]和Jean Rouat 提出基于Teager能量算子的小波语音增强。
这种方法通过时域中阈值适应来大大地减少了噪声。
但是当语音信后受到轻微噪声感染会纯在超过阈值的问题。
Hu Yi【3】提出一种基于小波阈值muititaper谱语音增强的低方差谱估计的使用。
它抑制了残留噪声产生更好的质量。
本文中,我们提出运用仿生小波变换和自适应阈值函数的语音增强方法。
242CHINA SCIENCE&TECHNOLOGY不同背景噪声下基于维纳滤波的语音增强王正欢 王俊芳 武汉大学电子信息学院引言语音信号在传输过程中各种噪声的干扰会影响语音质量。
语音增强的目的就是从带噪语音中恢复原始的语音信号。
它的应用十分广泛,是很多语音信号处理比如语音识别、语音编码等不可或缺的预处理步骤。
语音增强的方法有很多,如谱减法、维纳滤波法、卡尔曼滤波法、MMSE等等。
维纳滤波法是基于最小均方误差准则下构造的一种滤波器。
本文从维纳滤波法出发,由在不同噪声背景下使用维纳滤波法得到的语音增强的效果来分析维纳滤波法的性能。
1.维纳滤波的基本原理语音信号是短时平稳的,一般语音信号在处理之前先要对其进行分帧加窗处理。
假设某帧原始纯净语音为()x m ,带噪语音为()y m ,带噪语音的FFT为()Y f ,带噪语音经过维纳滤波器()W f 后得到原始纯净语音频谱的估计为ˆ()Xf ,则)()()(ˆf Y f W f X =。
估计误差信号()E f 定义为原始纯净语音谱()X f 与)(ˆf X之差,频域的均方误差为为了得到最小均方误差滤波器,上式对()W f 求导令其为0,、分别为()Y f 的自功率谱,()Y f 与()X f 的互功率谱,由此得到频域最小均方误差维纳滤波器为)()()(f P f Pf W =。
而对于含有加性噪声的语音信号,维纳滤波器为)()()()(f P f P f P f W +=将)()()(f Pf P f SNR =带入上式,有1)()()(+=f SNR f SNR f W 从中可以看出维纳滤波器可以用信噪比简单的表示,得到信噪比的估计便得到了维纳滤波器的实现。
2.维纳滤波用于语音增强的具体实现2.1 进行维纳滤波的关键就是得到信噪比,得到信噪比后维纳滤波器就可以进行意义。
现在我们用本文中的维纳滤波器,分别对不同噪声背景下的语音进行增强,通过分析增强效果得到维纳滤波器的稳健性。
附录ADAPTIVE WIENER FILTERING APPROACH FOR SPEECHENHANCEMENTM. A. Abd El-Fattah*, M. I. Dessouky , S. M. Diab and F. E. Abd El-samie #Department of Electronics and Electrical communications, Faculty of ElectronicEngineering Menoufia University, Menouf, EgyptE-mails:************************,#*********************ABSTRACTThis paper proposes the application of the Wiener filter in an adaptive manner inspeech enhancement. The proposed adaptive Wiener filter depends on the adaptation of the filter transfer function from sample to sample based on the speech signal statistics(meanand variance). The adaptive Wiener filter is implemented in time domain rather than infrequency domain to accommodate for the varying nature of the speech signal. Theproposed method is compared to the traditional Wiener filter and spectral subtractionmethods and the results reveal its superiority.Keywords: Speech Enhancement, Spectral Subtraction, Adaptive Wiener Filter1 INTRODUCTIONSpeech enhancement is one of the most important topics in speech signal processing.Several techniques have been proposed for this purpose like the spectral subtraction approach, the signal subspace approach, adaptive noise canceling and the iterative Wiener filter[1-5] . The performances of these techniques depend on quality andintelligibility of the processed speech signal. The improvement of the speech signal-tonoise ratio (SNR) is the target of most techniques.Spectral subtraction is the earliest method for enhancing speech degraded by additive noise[1]. This technique estimates the spectrum of the clean(noise-free) signal by the subtraction of the estimated noise magnitude spectrum from the noisy signal magnitude spectrum while keeping the phase spectrum of the noisy signal. The drawback of this technique is the residual noise.Another technique is a signal subspace approach [3]. It is used for enhancing a speech signal degraded by uncorrelated additive noise or colored noise [6,7]. The idea of this algorithm is based on the fact that the vector space of the noisy signal can be decomposed into a signal plus noise subspace and an orthogonal noise subspace.Processing is performed on the vectors in the signal plus noise subspace only, while the noise subspace is removed first. Decomposition of the vector space of the noisy signal is performed by applying an eigenvalue or singular value decomposition or by applying the Karhunen-Loeve transform (KLT)[8]. Mi. et. al. have proposed the signal / noise KLT based approach for colored noise removal[9]. The idea of this approach is that noisy speech frames are classified into speech-dominated frames and noise-dominated frames. In the speech-dominated frames, the signal KLT matrix is used and in the noise-dominated frames, the noise KLT matrix is used.In this paper, we present a new technique to improve the signal-to-noise ratio in the enhanced speech signal by using an adaptive implementation of the Wiener filter. This implementation is performed in time domain to accommodate for the varying nature of the signal.The paper is organized as follows: in section II, a review of the spectral subtraction technique is presented. In section III, the traditional Wiener filter in frequency domain is revisited. Section IV, proposes the adaptive Wiener filtering approach for speech enhancement. In section V, a comparative study between the proposed adaptive Wiener filter, the Wiener filter in frequency domain and the spectral subtraction approach ispresented.2 SPECTRAL SUBTRACTIONSpectral subtraction can be categorized as a non -parametric approach, which simply needs an estimate of the noise spectrum. It is assume that there is an estimate of the noise spectrum that is typically estimated during periods of speaker silence. Let x (n ) be a noisy speech signal :x (n ) = s (n ) + v (n ) (1) where s (n ) is the clean (the noise -free) signal, and v (n ) is the white gaussian noise. Assume that the noise and the clean signals are uncorrelated. By applying the spectral subtraction approach that estimates the short term magnitude spectrum of the noise -freesignal ()ωS by subtraction of the estimated noise magnitude spectrum )(ˆωVfrom the noisy signal magnitude spectrum ()ωX It is sufficient to use the noisy signal phase spectrum as an estimate of the clean speech phase spectrum,[10]:()()()()()()ωωωωX j N X S ∠-=exp ˆˆ (2) The estimated time -domain speech signal is obtained as the inverse Fourier transform of ()ωSˆ. Another way to recover a clean signal s (n ) from the noisy signal x(n ) using the spectral subtraction approach is performed by assuming that there is an the estimate of the power spectrum of the noise Pv ( ω) , that is obtained by averaging over multiple frames of a known noise segment. An estimate of the clean signal short -time squared magnitude spectrum can be obtained as follow [8]:()()()()()⎪⎩⎪⎨⎧≥--=otherwisev P X if v P X S ,00ˆ,ˆˆ222ωωωωω (3) It is possible combine this magnitude spectrum estimate with the measured phase and then get the Short Time Fourier Transform (STFT) estimate as follows:()()()ωωωX j e S S∠=ˆˆ (4) A noise -free signal estimate can then be obtained with the inverse Fourier transform. This noise reduction method is a specific case of the general technique given by Weiss, et al. and extended by Berouti , et al.[2,12].The spectral subtraction approach can be viewed as a filtering operation where high SNR regions of the measured spectrum are attenuated less than low SNR regions. This formulation can be given in terms of the SNR defined as:()()ωωv P X SNR ˆ2= (5) Thus, equation (3) can be rewritten as:()()()()()1222211ˆˆ-⎥⎦⎤⎢⎣⎡+≈-=SNR X X v P X S ωωωωω (6) An important property of noise suppression using spectral subtraction is that the attenuation characteristics change with the length of the analysis window. A common problem for using spectral subtr action is the musicality that results from the rapid coming and going of waves over successive frames [13].3 WIENER FILTER IN FREQUNCY DOMAINThe Wiener filter is a popular technique that has been used in many signal enhancement methods. The basic principle of the Wiener filter is to obtain a clean signal from that corrupted by additive noise. It is required estimate an optimalfilter for the noisy input speech by minimizing the Mean Square Error (MSE) between the desired signal s(n) and the estimated signal s ˆ(n ) . The frequency domain solution to this optimization problem is given by[13]:()()()()ωωωωPv Ps Ps H += (7) where Ps (ω) and Pv (ω) are the power spectral densities of the clean and the noise signals, respectively. This formula can be derived considering the signal s and the noise signal v as uncorrelated and stationary signals. The signal -to -noise ratio is defined by[13]:()()ωωv P Ps SNR ˆ= (8) This definition can be incorporated to the Wiener filter equation as follows:()111-⎥⎦⎤⎢⎣⎡+=SNR H ω (9) The drawback of the Wiener filter is the fixed frequency response at all frequencies and the requirement to estimate the power spectral density of the clean signal and noise prior to filtering.4 THE PROPOSED ADAPTIVE WIENER FILTERThis section presents and adaptive implementation of the Wiener filter which benefits from the varying local statistics of the speech signal. A block diagram of the proposed approach is illustrated in Fig. (1). In this approach, the estimated speech signal mean x mand variance 2x σare exploited.Figure 1: Typical adaptive speech enhancement system for additive noise reductionIt is assumed that the additive noise v(n) is of zero mean and has a white nature withvariance of 2x σ.Thus, the power spectrum Pv (ω) can be approximated by:()2v Pv σω= (10)Consider a small segment of the speech signal in which the signal x(n) is assumed to be stationary, The signal x(n) can be modeled by:()()n m n x x x ωσ+= (11)where x m and x σ are the local mean and standard deviation of x(n). w(n) is a unit variance noise.Within this small segment of speech, the Wiener filter transfer function can be approximated by:()()()()222vs s Pv Ps Ps H σσσωωωω+=+= (12) From Eq.(12), because H(ω) is constant over the small segment of speech, the impulse response of the Wiener filter can be obtained by:()()n n h vs s δσσσ222+= (13) From Eq.(13), the enhanced speech ()n Sˆ within this local segment can be expressed as:()()()()()()x v s s x v s s x x m n x m n m n x m n S -++=+*-+=222222ˆσσσδσσσ (14)If it is assumed that mx and σ s are updated at each sample, we can say:()()()()()()()n m n x n n n m n S x v s s x -++=222ˆσσσ (15) In Eq.(15), the local mean mx (n ) and (x (n ) − mx (n )) are modified separately fromsegment to segment and then the results are combined. If 2v σ is much larger than 2v σ theoutput signal s ˆ(n ) is assumed to be primarily due to x(n) and the input signal x (n) is not attenuated. If 2s σ is smaller than 2v σ , the filtering effect is performe.Notice that mx is identical to ms when mv is zero. So, we can estimate mx (n) in Eq.(15) from x (n) by:()()()()∑+-=+==M n Mn k x s k x M n m n m 121ˆˆ (16) where (2M +1) is the number of samples in the short segment used in the estimation.To measure the local signal statistics in the system of Figure 1, the algorithm developed uses the signal variance 2s σ. The specific method used to designing thespace -variant h(n) is given by(17.b).Since 222v s x σσσ+= may be estimated from x (n) by:()()()⎩⎨⎧>-=otherwise n if n n v v v x s,0ˆˆ,ˆˆˆ22222σσσσσ (17.a)Where()()()()()∑+-=-+=M n M n k x xn m k x M n 22ˆ121ˆσ (17.b) By this proposed method, we guarantee that the filter transfer function is adapted from sample to sample based on the speech signal statistics.5 EXPERIMENTAL RESULTSFor evaluation purposes, we use different speech signals like the handel, laughter and gong signals. White Gaussian noise is added to each speech signal with different SNRs. The different speech enhancement algorithms such as the spectral subtraction method, the Weiner filter in frequency domain and the proposed adaptive Wiener filter are carried out on the noisy speech signals. The peak signal to noise ratio (PSNR)results for each enhancement algorithm are compared.In the first experiment , all the above-mentioned algorithms are carried out on the Handle signal with different SNRs and the output PSNR results are shown in Fig. (2). The same experiment is repeated for the Laughter and Gong signals and the results are shown in Figs.(3) and (4), respectively.From these figures, it is clear that the proposed adaptive Wiener filter approach has the best performance for different SNRs. The adaptive Wiener filter approach gives about 3-5 dB improvement at different values of SNR. The nonlinearity between input SNR and output PSNR is due to the adaptive nature of the filter.Figure 2:PSNR results for white noise case at-10 dB to +35 dB SNR levels for Handle signalFigure 3: PSNR results for white noise case at -10 dB to +35 dB SNR levels for Laughter signalFigure 4:PSNR results for white noise case at -10 dB to +35 dB SNR levels for Gong signalThe results of the different enhancement algorithms for the handle signal with SNRs of 5,10,15 and 20 dB in the both time and frequency domain are given in Figs. (5) to (12). These results reveal that the best performance is that of the proposed adaptive Wiener filter.Figure 5: Time domain results of the Handel sig. At SNR = +5dB (a) original sig. (b) noisy sig. (c) spectral subtraction. (d) Wiener filtering. (e) adaptive WienerFiltering.Figure 6:The spectrum of the Handel sig. in Fig.(5) (a) original sig. (b) noisy sig. (c) spectral subtraction. (d) Wiener filtering. (e) adaptive Wiener filtering.Figure 7: Time domain results of the Handel sig. At SNR = 10 dB (a) original sig. (b) noisy sig. (c) spectral subtraction. (d) Wiener filtering. (e) adaptive Wiener filtering.Figure 8: The spectrum of the Handel sig. in Fig.(7)(a) original sig. (b) noisy sig. (c) spectral subtraction. (d)Wiener filtering. (e) adaptive Wiener filtering.Figure 9: Time domain results of the Handel sig. At SNR = 15 dB (a) original sig. (b) noisy sig. (c) spectral subtraction. (d) Wiener filtering. (e) adaptive Wiener filtering.Figure 10: The spectrum of the Handel sig. in Fig.(9)(a) original sig. (b) noisy sig. (c) spectral subtraction. (d)Wiener filtering. (e) adaptive Wiener filtering.Figure 11: Time domain results of the Handel sig. At SNR = 20 dB (a) original sig. (b) noisy sig. (c) spectral subtraction. (d) Wiener filtering. (e) adaptive WienerFiltering.Figure 12:The spectrum of the Handel sig. in Fig.(11)(a) original sig. (b) noisy sig. (c) spectral subtraction. (d)Wiener filtering. (e) adaptive Wiener filtering.6 CONCLUSIONAn adaptive Wiener filter approach for speech enhancement is proposed in this papaper. This approach depends on the adaptation of the filter transfer function from sample to sample based on the speech signal statistics(mean and variance). This results indicates that the proposed approach provides the best SNR improvementamong the spectral subtraction approach and the traditional Wiener filter approach in frequency domain. The results also indicate that the proposed approach can treat musical noise better than the spectral subtraction approach and it can avoid the drawbacks of Wiener filter in frequency domain .自适应维纳滤波方法的语音增强摘要本文提出了维纳滤波器的方式应用在自适应语音增强。