当前位置:文档之家› Spatial separation of speech signals using amplitude estimation based on interaural comparisons of z

Spatial separation of speech signals using amplitude estimation based on interaural comparisons of z

Spatial separation of speech signals using amplitude estimation based on interaural comparisons of z
Spatial separation of speech signals using amplitude estimation based on interaural comparisons of z

Spatial separation of speech signals using amplitude estimation

based on interaural comparisons of zero-crossings

Hyung-Min Park a,b,*,Richard M.Stern b

a

Department of Electronic Engineering,Sogang University,Seoul 121-742,Republic of Korea

b

Language Technologies Institute and Department of Electrical and Computer Engineering,Carnegie Mellon University,

Pittsburgh,PA 15213,USA

Received 23January 2008;received in revised form 22May 2008;accepted 27May 2008

Abstract

This paper describes an algorithm called zero-crossing-based amplitude estimation (ZCAE)that enhances speech by reconstructing the desired signal from a mixture of two signals using continuously-variable weighting factors,based on pre-processing that is motivated by the well-known ability of the human auditory system to resolve spatially-separated signals.Although most conventional methods of signal separation have been based on interaural time di?erences (ITDs)derived from cross-correlation information,the ZCAE approach provides sound segregation based on estimates of ITD from comparisons of zero-crossings [Kim,Y.-I.,An,S.J.,Kil,R.M.,Park,H.-M.,2005.Sound segregation based on binaural zero-crossings.In:Proc.European Conf.on Speech Communication and Technology (INTERSPEECH-2005),Lisbon,Portugal,pp.2325–2328].These ITD estimates are used to determine the relative contribution of the desired source in a mixture and subsequently to reconstruct a closer approximation to the desired signal.The estimation of relative target intensity in a given time-frequency segment is accomplished by analytically deriving a monotonic function that maps the estimated ITD in each time-frequency segment to the putative relative intensity of each source.The ZCAE method is evaluated by comparing the sample standard deviation of ITD estimates derived using cross-correlation and using zero-crossing information,by comparing the speech recognition accuracy that is obtained by applying the proposed methods to speech in the presence of interfering speech sources,and by comparing recognition accuracy obtained using a continuous weighting versus a binary weighting of the target and masker.It is found that better results are obtained when ITDs are estimated using zero-crossing information rather than cross-correlation informa-tion,and when continuous weighting functions are used in place of binary weighting of the target and masker in each time-frequency segment.

ó2008Elsevier B.V.All rights reserved.

Keywords:Automatic speech recognition;Noise robustness;Speech enhancement;Sound source segregation;Interaural time di?erences;Zero-crossings

1.Introduction

Noise robustness remains a very important issue in the ?eld of automatic speech recognition (ASR).While ASR systems can achieve high recognition accuracy in controlled and noise-free acoustic environments,the performance of

these systems is seriously degraded in more realistic envi-ronments which may be corrupted by noise and subjected to other types of distortion.This degradation is mainly due to di?erences between training and testing environ-ments.Many algorithms have been proposed to compen-sate for these mismatches (e.g.Juang,1991;Singh et al.,2002a,b ).While these approaches can provide useful improvements in recognition accuracy under many circum-stances,they frequently fail to obtain high recognition accuracy in dynamically changing environments with tran-sient sources of disturbance,or in the presence of back-ground speech or background music (e.g.Raj et al.,1997).

0167-6393/$-see front matter ó2008Elsevier B.V.All rights reserved.doi:10.1016/j.specom.2008.05.012

*

Corresponding author.Address:Department of Electronic Engineer-ing,Sogang University,Seoul 121-742,Republic of Korea.Tel.:+8227058916;fax:+8227064216.

E-mail address:hpark@sogang.ac.kr (H.-M.Park)https://www.doczj.com/doc/087656496.html,/locate/specom

Available online at https://www.doczj.com/doc/087656496.html,

Speech Communication 51(2009)

15–25

In contrast,humans can understand speech even in the presence of competing speech or other noise sources(e.g. Assmann and Summer?eld,2004).This observation has motivated the development of many types of signal pro-cessing approaches based on aspects of human auditory perception(e.g.Hermansky,1998;Wang and Brown, 2006).In his treatise on auditory scene analysis(ASA), Bregman(1990)identi?ed various cues that are believed to be used by the human auditory system to segregate a tar-get sound from interfering sources.These cues include fun-damental frequency(F0),harmonics,onset or o?set times, and source location.While most of the monaural cues such as F0can be used as the basis for separation in computa-tional ASA systems,their performance is critically depen-dent on characteristics of the input signal such as the presence or absence of voicing(Brown and Cooke,1994). Localization cues which exploit small di?erences in the sig-nals to the two ears,however,can identify the azimuth of sound sources regardless of signal content.Localization plays an important role in the human auditory system’s ability to select a particular sound source and track the sound originating from that source.The primary acoustical cues for human sound localization are interaural time dif-ferences(ITDs)and interaural intensity di?erences(IIDs). ITDs serve as the localization cue primarily at frequencies below1.5kHz(Strutt,1907),although the ITDs of the low-frequency envelopes of higher-frequency components of a sound can also be useful for sound localization(e.g.Hen-ning,1974;Nuetzel and Hafter,1981).Information based on IIDs is primarily useful at higher frequencies.

Je?ress(1948)proposed a simple and intuitive mecha-nism that describes the estimation of ITDs based on inter-aural coincidences of hypothetical neural activity.Je?ress’s hypothesis has motivated many computational models that describe and predict binaural processing(e.g.Braasch, 2005;Colburn and Kulkarni,2005;Stern and Trahiotis, 1996).Most of these include a model of peripheral auditory processing which includes frequency analysis and subse-quent nonlinear operations(e.g.Meddis and Hewitt, 1991),a mechanism for estimating the interaural cross-cor-relation function on a frequency-by-frequency basis,and a mechanism to disambiguate the temporal analysis,typi-cally exploiting the IID of the signal or consistency over frequency.These models have been incorporated into sev-eral systems that perform ASR(e.g.Bodden,1993;Bodden and Anderson,1995;Tessier et al.,1999;Roman et al., 2003;Paloma¨ki et al.,2004).Recently,Kim et al.estimated ITDs by measuring the time di?erence between zero-cross-ings,and showed that with this measure sound sources could be localized or segregated more robustly and accu-rately than using cross-correlation-based ITD estimation (Kim and Kil,2004,2005;Kim and Kil,2007).In this paper we consider the use of zero-crossings of bandpass-?l-tered speech as an alternate way of estimating ITD information.

Once ITDs and IIDs are obtained at each frequency of interest,sound segregation or speech recognition can be performed on the basis of their values.Many contempo-rary algorithms based on ASA have used binary masks that specify which time-frequency components belong to a par-ticular sound source on an‘‘all-or-none”basis(e.g.Roman et al.,2003;Paloma¨ki et al.,2004;Kim et al.,2005).This is clearly an oversimpli?ed description of how sound sources are combined since each sound source contributes to the mixture to varying extents.For speech recognition applica-tions,this oversimpli?cation may be mitigated by the use of ‘‘soft masks”such as those proposed by Barker et al.(2000) and Morris et al.(2001).Over the years several research groups have described systems in which speech signals are reconstructed using spectro-temporal components that are implicitly weighted according to the likelihood that the observed ITD(and in some cases IID)would be appropri-ate for the desired source location(e.g.Bodden,1993;Bod-den and Anderson,1995;Tessier et al.,1999).Recently, Srinivasan et al.(2004,2006)introduced a way to obtain masks that provide continuously-variable estimates of the energy ratio between the desired components of a signal and the total signal.These ratio masks were obtained by ?rst obtaining an empirical characterization of the depen-dence of ITDs and IIDs on energy ratios,and using this characterization to develop a function that estimates energy ratios from observed ITDs and IIDs.

In contrast to the empirically-derived ratio-masks that have been described by other groups,we derive in this paper an analytical relationship between the observed ITD at each frequency and the relative extent to which a desired sound source contributes to a particular time-fre-quency segment.From this information we develop an estimate of the desired signal in isolation by combining time-frequency segments of the total waveform in propor-tion to the estimated power of the target signal.As had been demonstrated previously by Srinivasan et al.(2004, 2006),the use of a continuously-variable weighting func-tion rather than a binary mask provides smoother transi-tions between segments that contain greater and lesser components of the desired target signal,which improves speech recognition accuracy.

In addition to introducing an analytical approach to ratio-mask estimation,we also compare the performance of zero-crossing-based ITD estimation with cross-correla-tion-based ITD estimation.We will show that the zero-crossing method provides superior performance,both in terms of the sample standard deviation of the ITD esti-mates and in the resulting speech recognition accuracy.

The remainder of the paper is organized as follows:Sec-tion2describes the procedure for estimating the desired signal from observations by using continuously-variable masks derived from zero-crossing-based ITD estimation. In Section3,the standard deviation of the resulting esti-mates obtained using this method is compared to the cor-responding results obtained using cross-correlation-based https://www.doczj.com/doc/087656496.html,parisons of speech recognition accuracy using the algorithms considered are described in Section 4.Finally,our conclusions are summarized in Section5.

16H.-M.Park,R.M.Stern/Speech Communication51(2009)15–25

2.Algorithm description

Fig.1illustrates the overall procedure of the proposed algorithm,which we refer to as zero-crossing-based ampli-tude estimation(ZCAE).The signals which comprise the inputs to the two sensors of the system are mixtures of tar-get and interfering sources from spatially-separated loca-tions.The input signals are?rst subjected to frequency analysis,typically accomplished by passing each input through a bank of Gammatone?lters which simulates cochlear?ltering,and zero-crossings are detected from each?lter output.The ITD is estimated from the time dif-ferences between the zero-crossings of the signals from the ?lter outputs at each center frequency,and these estimated ITDs are used to estimate the amplitude ratio which describes the corresponding relative contribution of the desired signal to each component.Finally,an estimate of the desired signal is obtained using a procedure based on the methods of Weintraub(1986)and Brown and Cooke (1994)that compensates for the phase distortion intro-duced by the Gammatone?lters.Speci?cally,a time-reversed version of the amplitude-weighted signal from each frequency band is convolved with the corresponding Gammatone?lter,and the results of each convolution are time-reversed again and summed across frequency to estimate the?nal output signal.These procedures are described in greater detail below.

2.1.ITD estimation based on zero-crossings

It is well-known that the auditory system develops esti-mates of ITD from the synchronous response of low-fre-quency auditory-nerve?bers to the?ne structure of a sound source,and the exploitation of zero-crossing informa-tion is one possible way in which this processing could be achieved.The ITD in each frequency channel is estimated by?rst identifying the sample points at which the?ltered input signal changed from a negative value to a positive value,or vice versa.The exact zero-crossing time(which generally falls between the sample times)is then estimated by linear interpolation based on the actual amplitudes of the signals at the sample points that straddle the zero-cross-ings.The corresponding zero-crossing time in the second sensor is assumed to be the zero-crossing that was closest in time,and the estimated ITD is de?ned to be the di?erence between these two zero-crossing times.Estimates of ITD that are greater than the time needed for a sound wave to tra-vel from one sensor to the other are discarded.

It should be noted that while we use terms such as ITD from the binaural hearing literature,we make use of two sensors that are spaced far more closely than human ears to avoid the e?ects of spatial aliasing for frequencies up to half the sampling frequency,so the largest possible delay between the sensors is always less than half a period over all frequencies of interest.

2.2.Analytical derivation of the relationship between the ITD and the relative signal strength

Since processing algorithms using ASA are based on the development of a‘‘mask”that describes which time-fre-quency components of an input signal are likely to be use-ful in describing the desired target component,the development of accurate masks is of critical importance to the system’s performance.With some exceptions as noted above and especially the work of Srinivasan et al. (2004,2006),most previous work has been based on binary masks which select the time-frequency segments where the estimated target energy is greater than the estimated inter-ference energy.Nevertheless,binary masks have the draw-back that they cannot describe small di?erences in the extent to which a desired source contributes to a mixture.

H.-M.Park,R.M.Stern/Speech Communication51(2009)15–2517

In order to overcome this limitation of binary masks,we developed a reliable method that estimates continuously-variable masks by analytically deriving the relationship between the ITD and the relative contribution of the desired target.Because the two sensors are su?ciently close to avoid the e?ects of spatial aliasing,and because there is no object between them to cause acoustical shadowing,we assume zero IID between the signals arriving at these sen-sors.In addition,our derivation of the relationship between the ITD and the contribution of a desired signal is based on the assumption that there is only one target and one interfering source,and that their azimuth angles are known.Many researchers have developed methods that estimate the azimuth angles of sources composing observa-tions(e.g.Stern and Colburn,1978;Lyon,1983;Linde-mann,1986;Bodden,1993),and a zero-crossing-based method also can be used for estimating the azimuth and provided robust estimation especially for noisy mixtures (Kim and Kil,2004;Kim and Kil,2007).Using these meth-ods,one may obtain azimuth angles of sources reliably,so the assumption on the azimuth angles is not critical.

We now describe a method by which the relative contri-bution of the target signal can be estimated from the esti-mates of ITD in each frequency band.Let us approximate the outputs of the Gammatone?lters as pure tones,one from each source as follows:

x1etT?A1cosex1tt/1TtA2cosex2tt/2T;e1aTx2etT?A1cosex1etàd1Tt/1T

tA2cosex2etàd2Tt/2T;e1bTwhere A i,x i,d i,and/i denote amplitude,frequency,delay, and phase for the i th source,respectively.As noted in the previous paragraph,the amplitudes of the signals arriving at the two sensors are assumed to be equal.

Without loss of generality,we assume that a zero-cross-ing for x1occurs at time t1,and that the nearest zero-cross-ing for x2occurs at t1ts,producing an ITD of s.We assume that x iesàd iTis small,which is especially valid at low frequencies,so x2et1tsTcan be approximated by x2et1tsT%àA1sinex1t1t/1Táx1esàd1T

àA2sinex2t1t/2Táx2esàd2T:e2TSince x2et1tsT?0,

seA1x1sinex1t1t/1TtA2x2sinex2t1t/2TT

%A1x1sinex1t1t/1Tád1tA2x2sinex2t1t/2Tád2:e3TSince x1et1T?A1cosex1t1t/1TtA2cosex2t1t/2T?0 and s is obtained from the nearest zero-crossing point,

s%

x1

??????????????????????????????

A2

1

àA2

2

cos2ex2t1t/2T

p

ád1tA2x2j sinex2t1t/2Tjád2

x1

??????????????????????????????

A2

1

àA2

2

cos2ex2t1t/2T

p

tA2x2j sinex2t1t/2Tj

if A1P A2;

A1x1j sinex1t1t/1Tjád1tx2

??????????????????????????????

A2

2

àA2

1

cos2ex1t1t/1T

p

ád2

A1x1j sinex1t1t/1Tjtx2

??????????????????????????????

A2

2

àA2

1

cos2ex1t1t/1T

p otherwise:

8

>><

>>:

e4T

We assume that the frequencies x i are distributed over a

narrow band,so one of them is approximately same as

the other.In addition,by assuming that the phases/i are

uniformly distributed over the intervaleàp;pT,we may

consequently assume that w

i

?x i t1t/i are also uniformly

distributed over the same interval.Since Eq.(4)is periodic

with period p,one may obtain the mean of the estimated

ITD, s,which can be approximated by

s%g0

1

eA1;A2Tád1tg0

2

eA1;A2Tád2;e5T

where

g0

1

eA1;A2T?

R p

??????????????????????

A2

1

àA2

2

cos2ew2T

p

??????????????????????

A2

1

àA2

2

cos2ew2T

p

tA2sinew2T

d w2

p

if A1>A2;

R p

????????????????

1àcos2ew2T

p

????????????????

1àcos2ew2T

p

tsinew2T

d w2

p

if A1?A2;

R p

A1sinew1T

A1sinew1Tt

??????????????????????

A2

2

àA2

1

cos2ew1T

p d w1

p

otherwise;

8

>>>

>>>

<

>>>

>>>

:

e6T

and

g0

2

eA1;A2T?

R p

A2sinew2T

??????????????????????

A2

1

àA2

2

cos2ew2T

p

tA2sinew2T

d w2

p

if A1>A2;

R p

sinew2T

????????????????

1àcos2ew2T

p

tsinew2T

d w2

p

if A1?A2;

R p

??????????????????????

A2

2

àA2

1

cos2ew1T

p

A1sinew1Tt

??????????????????????

A2

2

àA2

1

cos2ew1T

p d w1

p

otherwise:

8

>>>

>>>

<

>>>

>>>

:

e7T

Using the MATLAB symbolic integration function we

obtain

s%geA1;A2Tád1te1àgeA1;A2TTád2;e8T

where

geA1;A2T?

peA2

1

à

A2

2

2

TàA2

1

arctan A2????????

A2

1

àA2

2

p

àA2

??????????

A2

1

àA2

2

p

peA2

1

àA2

2

T

if A1>A2;

1

2

if A1?A2;

àp

2

A2

1

tA2

2

arctan A1????????

A2

2

àA2

1

p

tA1

??????????

A2

2

àA2

1

p

peA2

2

àA2

1

T

otherwise:

8

>>>

>>>

>>>

<

>>>

>>>

>>>

:

e9T

By introducing the signal-to-interference ratio(SIR)in

decibels(dBs)according to the relation

SIR?20log

10

A1

A2

edBT;e10T

where A1and A2denote the amplitudes for the target and

interference signals,respectively,

s%geSIRTád1te1àgeSIRTTád2;e11T

18H.-M.Park,R.M.Stern/Speech Communication51(2009)15–25

where

Fig.2displays the mixing function https://www.doczj.com/doc/087656496.html,ing Eqs.(11) and(12),one can easily relate the estimated ITDs to the SIRs in a mixture.Note that Eq.(11)describes a monotonic function that is suitable for one-to-one mapping and that does not depend on any frequency-speci?c parameters,so the same function can be used in all frequency bands.

Since most systems that perform binaural analysis esti-mate ITDs using cross-correlation rather than zero-cross-ing methods,we also consider the corresponding relationship between the measured ITD in a frequency band and the relative signal strength of the desired target signal based on cross-correlation analysis.We show in the Appendix that the relationship between the ITD esti-mated using cross-correlation methods and signal ampli-tudes A i of the target and interfering source can be expressed as the relationship:

s cc A2

1esinex1TTx1cose2/1Ttx2

1

TT

t

2A1A2

x1tx2

sineex1tx2TT=2Tex2

1

tx2

2

Tcose/1t/2T

t

2A1A2

x1àx2

sineex1àx2TT=2Tex2

1

tx2

2

Tcose/1à/2T

tA2

2esinex2TTx2cose2/2Ttx2

2

TT

!

%d1A2

1esinex1TTx1cose2/1Ttx2

1

TT

t

2A1A2

12sineex1tx2TT=2Tx2

1

cose/1t/2T

t

2A1A2

12sineex1àx2TT=2Tx2

1

cose/1à/2T

!

td2

2A1A2

x1tx2

sineex1tx2TT=2Tx2

2

cose/1t/2T

t

2A1A2

x1àx2

sineex1àx2TT=2Tx2

2

cose/1à/2T

tA2

2esinex2TTx2cose2/2Ttx2

2

TT

!

àA2

1

sinex1TTsine2/1Tt2A1A2sineex1tx2TT=2T

?sine/1t/2Tt2A1A2sineex1àx2TT=2Tsine/1à/2T

tA2

2sinex2TTsine2/2T

!

:e13T

The value of s cc is,of course,easily obtained by dividing both sides of Eq.(13)by the expression on the left side of the equation in brackets.In deriving this equation we assume that x1–x2because the two frequencies are regarded as random variables that are uniformly distributed within the bandwidth of an analysis band.

Many of the terms of the complicated Eq.(13)above serve primarily to re?ect the e?ects of the interaction between the period of the sinusoidal waveforms and the ?nite-duration of the observation window T.As the obser-vation duration T increases,these‘‘fringe e?ects”will become decreasingly important,and for very large T Eq.

(13)converges to

s cc?

10SIR=10x2

1

d1tx2

2

d2

10SIR=10x2

1

tx2

2

e14Twith SIR de?ned as in Eq.(10).

2.3.Estimation of a desired signal using continuously-variable weighting factors

Given the relationship between the ITD and the relative contribution of a source as described by Eq.(11),the ITDs estimated at each zero-crossing point are used to estimate weighting factors according to the extent to which the desired target signal is assumed to be present in each time-frequency region of the mixture.This processing results in a series of estimated weights for the desired signal corresponding to each zero-crossing in each frequency band.Weighting factors for all samples in time are obtained by developing a piecewise-linear amplitude mod-ulation function that passes through the values of the weighting coe?cients for the desired signal that are calcu-lated at each zero-crossing point.An estimate of the desired signal in a frequency band is then obtained by multiplying the input signal after band-pass?ltering by

geSIRT?

pe10SIR=10à1

2

Tà10SIR=10arctan1

??????????????

10SIR=10à1

p

à

????????????????

10SIR=10à1

p

pe10SIR=10à1T

if SIR>0dB;

1

2

if SIR?0dB;

àp

2

t10àSIR=10arctan1

???????????????

10àSIR10à1

p

t

?????????????????

10àSIR=10à1

p

pe10à1T

otherwise:

8

>>>

>>>

<

>>>

>>>

:

e12T

H.-M.Park,R.M.Stern/Speech Communication51(2009)15–2519

the corresponding amplitude modulation function.Finally, the enhanced full-band signal is recovered by summing the subband signals across all frequencies after compensating for the phase distortion introduced by the Gammatone?l-ters as described above.

https://www.doczj.com/doc/087656496.html,parison of standard deviations of ITDs derived from zero-crossings and cross-correlation

In this section we compare the reliability of estimates of ITD obtained using the zero-crossing and cross-correlation methods in terms of the sample standard deviation of the ITDs obtained by each of two analysis methods.De?ning the relative signal strength R to be

R?

A1

A1tA2

?

1

1t10àSIR=20

;e15T

we obtained100,000samples of ITDs using Eqs.(4)and (13)using randomly-generated values of the phases/i and frequencies x i and various values of R.For each sam-ple,/i and x i were uniformly distributed over the intervals eàp;pTandex cfàBW=2;x cftBW=2T,respectively, where x cf and BW are the center frequencies and band-widths of the frequency channels.We obtained the sample values with parameter values d1?0,d2?1=16;000s.

Fig.3shows the sample standard deviation of estimates of the ITD as a function of the relative signal strength R for three di?erent frequency bands.In this simulation,the frame length was T?25:6ms,which corresponds to the frame length used for analysis in the speech recognition experiments that are described in the following section. Since the frequency of zero-crossings is approximately twice the center frequency,the sample standard deviation for the zero-crossing method was computed from averaged ITDs whose number approximately corresponded to the number of zero-crossings in a frame.In each of the three frequency bands,the variability of ITD estimates obtained using the zero-crossing method is much less than that of estimates obtained using cross-correlation,which suggests that the zero-crossing method is likely to provide more reli-able estimates of ITD than the cross-correlation method.

It should be noted that the large di?erences between the sample standard deviations observed using the zero-cross-ing and cross-correlation methods are primarily a conse-quence of the short-duration of the analysis frames and the consequent signi?cance of the interaction between the cross-correlation computation and the frame boundaries, as speci?ed in Eq.(13).If the calculations were carried out using a longer frame duration such that Eq.(14)were valid,the di?erence between standard deviations observed

20H.-M.Park,R.M.Stern/Speech Communication51(2009)15–25

based on zero-crossing-based versus cross-correlation-based estimates of ITD would be small.

4.Experimental evaluation on speech recognition

As noted in the Introduction,speech-on-speech interfer-ence has long been considered to be one of the most chal-lenging problems in ASR,both because the interfering speech tends to be confused with the target speech and because the non-stationary nature of the masking signal renders most conventional noise-compensation methods ine?ective(e.g.Singh et al.,2002a,b).In this section we compare the recognition accuracy obtained using the ZCAE method with correlation-based approaches and baseline processing.

4.1.Experimental design and signal generation

Recognition experiments were conducted using the DARPA Resource Management(RM1)database(Price et al.,1988)and the CMU SPHINX-III speech recognition system.The recognition system is based on fully-continu-ous hidden Markov models,which are trained on2880 RM1sentences recorded in a quiet environment.The test set consists of600RM1sentences.Speech recognition is based on the observed values of13th-order mel-frequency cepstral coe?cients with a frame size of25.6ms and a frame rate of10ms developed in the conventional fashion.

Each test utterance was corrupted by a second(interfer-ing)speech signal which had the same energy as the test utterance,producing a nominal SIR of0dB.To simulate the measurement of signals that would be obtained by microphones in close proximity,the target and interfering speech were combined with di?erent simulated delays from sensor to sensor,corresponding to di?erent putative arrival angles for the target and interfering source.Because the original sampling period is too coarse for this purpose, the original target and interfering speech signals were upsampled by a factor of4and added together at the 64kHz sampling rate,after appropriate ITDs were inserted to simulate the di?erent azimuths of the target and interfer-ing signal.These delays were selected independently and randomly in the range ofà3to3samples at64kHz,except that delays for the target and interference were forced to be di?erent from each other under the assumption that the sources are spatially-separated.As a result,the net di?er-ence in ITDs between the target and the interference ran-ged from1to6samples or15.6to93.8l s.(For microphones spaced by21mm,this would correspond to di?erences in azimuth of between about14.9°and 100.7°.)The target signal was always assumed to be the component with the more positive delay.After combining the target and interfering speech,the resulting signals were downsampled back to16kHz.Because we used a nominal spacing of about21mm between the sensors to avoid spa-tial aliasing at8kHz,the largest delay between observa-tions from sensor to sensor would be60.9l s,which is slightly smaller than the sampling period at16kHz.

4.2.Signal separation using continuously-variable masks

To obtain subband signals from each sensor signal according to the ZCAE algorithm,we used a40-channel bank of Gammatone?lters with center frequencies spaced linearly in equivalent rectangular bandwidth(ERB)from 170Hz to6430Hz(O’Mard,2000).The ITDs of the sub-band signal at each frequency band are converted into esti-mates of the relative signal strength of the target according to Eqs.(11)and(12)under the assumption that the actual delays for the desired target and interfering speech are known a priori.

For comparison,we also present recognition accuracies based on ITD estimation using cross-correlation.In order to obtain continuously-variable masks from the cross-cor-relation-based ITDs,we need to derive the relationship between the ITDs and the relative contribution of a source in a mixture.To simplify the derivation,we used the sim-pler expression in Eq.(14)for the ITD at the maximum of cross-correlation,s cc.

Since Eq.(14)is the exactly same as the derivation by Roman et al.(2003),we adopt their approximation for the mean of s cc given by

s cc?

d1td2

2

t

1

x cf

arctanà

e10SIR=10à1T

e10SIR=10t1T

tan b

"#

tk p

()

; k2f0;?1g;e16Twhere b?x cfáed2àd1T=22?0;p .If b6p=2,k?0. Otherwise,k?1when SIR<0dB and k?à1when SIR>0dB(Roman et al.,2003).The time lag correspond-ing to the maximum of the cross-correlation function was estimated by di?erentiating a20th-order polynomial approximation to the cross-correlation function in a region around the observed discrete-time maximum and?nding the root of the derivative closest to the discrete-time max-imum using Newtonian iteration.

4.3.Signal separation using binary masks

In addition to the recognition results that were obtained using the continuously-variable masks as described above, we also evaluated speech recognition accuracy using binary masks.These binary masks were determined by establish-ing a hard threshold between the ITD representing the azi-muth of the target and the ITD representing the azimuth of the interfering source in place of the functions that related the estimated ITDs and the relative contribution of a source according to Eqs.(11)or(16).Speci?cally,the value of the binary mask for the i th channel and the j th zero-crossing or the j th frame(depending on whether we are using zero-crossing-based or cross-correlation-based ITD extraction)is

H.-M.Park,R.M.Stern/Speech Communication51(2009)15–2521

mei;jT?

1;if s est P d1td2;

0;otherwise;

(

e17T

where s est denotes the estimated ITD regardless of which method was used to obtain it.Recall that d1>d2since the target signal is assumed to have the more positive delay than the interfering signal.

4.4.Incorporation of missing-feature reconstruction for signal separation using binary masks

Many conventional methods have employed missing-feature techniques to achieve noise robustness of ASR system through the use of the binary masks.Typical miss-ing-feature techniques,as reviewed by Raj and Stern (2005),either attempt to obtain optimal decisions while ignoring time-frequency regions that are considered to be unreliable,or they attempt to‘‘?ll in”the values of those unreliable features.We employed the cluster-based method of restoring missing-features.Brie?y,in cluster-based miss-ing-feature restoration it is assumed that the various spec-tral pro?les that represent speech sounds can be clustered into a set of prototypical spectra.For each input frame we?rst estimate the cluster to which the incoming spectral features are most likely to belong from the observed spec-tral components that are believed to be‘‘present”(or reli-able)and that are considered to belong to the target signal based on zero-crossing or cross-correlation informa-tion,as described above.The remaining‘‘missing”spectral components are obtained using bounded estimation based on the observed values of the components that are consid-ered to be reliable,and based on the knowledge of the spec-tral cluster to which the incoming speech is assumed to belong.A detailed description of this approach is available in Raj et al.(2004).When missing-feature reconstruction is used,it is no longer feasible to use the procedures described by Weintraub(1986)and Brown and Cooke(1994)to reconstruct an estimate of the separated desired signal. Instead of reconstructing the desired signal we obtained spectral features by directly computing frame energies after passing the input signals through the same bank of Gamm-atone?lters used in the earlier experiments.These spectral features were transformed into cepstral features in the usual fashion to train and test a second ASR system.The training and test utterances were the same as described in Section4.1.

4.5.Experimental results

Fig.4a presents word error rates(WERs)calculated according to the standard NIST metric for various mix-tures of the target and interfering speech combined as described in the previous section.WERs for‘‘clean”target speech presented in isolation are also provided as an upper bound on recognition accuracy,and results for the com-bined signals without any processing for enhancement are also included as baselines to assess the e?ectiveness of the processing.As described previously,the signal-to-interfer-ence ratio(SIR)of the desired speech source compared to the interfering speech is nominally0dB.

To obtain a crude characterization of the robustness of the methods considered,we also obtained WERs for the

22H.-M.Park,R.M.Stern/Speech Communication51(2009)15–25

same signals in the presence of white Gaussian noise at an SNR of20dB.Because there are two sensor signals,this noise was added to the signals in two di?erent ways.Specif-ically,the two speech signals were corrupted by identical noise(which could be caused by an additional non-speech source of interference with zero ITD),and they were also corrupted by statistically-independent noise(which could be caused by sensor or measurement noise).Results with identical noise,and with statistically-independent noise, are shown in Fig.4b and c,respectively.

It is seen that the use of the continuously-variable masks (unsurprisingly)provides much greater recognition accu-racy than the use of binary masks,as had been observed in previous studies(e.g.Srinivasan et al.,2004,2006).It is also evident,though,that signal separation that is accomplished using ITDs that are estimated using zero-crossing information is more e?ective in reducing WER than signal separation that is based on cross-correlation information.The WERs are(also unsurprisingly)a?ected adversely by the addition of noise in all cases.

Fig.5depicts speech recognition accuracy as a function of the signal-to-interference ratio(SIR)for the same signal processing procedures that were used for the data in Fig.4a.It can be seen that the hierarchy of WERs observed in Fig.4a at0dB holds true over a wide range of values of SIR,except that processing using binary masks and ITDs derived from the cross-correlation function produces worse performance than the baseline(no-processing)condition at higher https://www.doczj.com/doc/087656496.html,rge WERs are observed for the combina-tion of binary masks and cross-correlation-based ITD esti-mation because signals in frames that are dominated by the interfering signal are removed completely.This typically causes abrupt discontinuities in the spectrogram,which are abnormal for speech.

The best performance is observed when the ZCAE method is used,which employs continuously-variable weighting factors estimated from zero-crossing-based ITDs.The resulting WERs obtained with this approach are generally close to the recognition accuracy observed in the absence of an interfering source.

Fig.6describes results obtained using the binary masks described in previous sections combined with the cluster-based missing-feature reconstruction techniques developed by Raj et al.(2004).As noted above,the speech recognition system used in conjunction with missing-feature recon-struction was slightly di?erent from the one that had been used to obtain the previous results.Speci?cally,cepstral features in the present system were developed directly from the log-spectral values,rather than from a reconstructed speech waveform.The results in Fig.6are presented to facilitate comparisons across conditions,and they are the same data that had been presented for the results obtained using binary masks in Fig.5.By comparing the results pre-sented in Figs.5and6it can be seen that the use of miss-ing-feature reconstruction can reduce the WERs obtained using binary masks very substantially,but not to the extent enjoyed by the use of the continuous ratio-masks(except for the highest SIRs when the e?ect of the interfering speech source is minimal).

5.Conclusions

We have described the zero-crossing-based amplitude estimation(ZCAE)method that estimates continuously-variable weighting factors for enhancing the desired signal in the presence of a single interfering source,and we have evaluated it in the context of speech recognition.The use of zero-crossings as the basis for signal separation based on ITD is shown to be superior to the historically more popular use of cross-correlation information to estimate ITDs.

A major contribution of this work is the analytical development of a relationship between estimates of ITD based on zero-crossings in a particular frequency band and the implied ratio of desired signal energy to total

H.-M.Park,R.M.Stern/Speech Communication51(2009)15–2523

energy in that band.This result complements a similar der-ivation by Srinivasan et al.(2004,2006)for ITDs that are estimated from the cross-correlation function of the signals to the two sensors.In contrast to empirically-derived trans-fer functions that?t empirical data relating ITDs to energy ratios using a speci?c con?guration of sources and sensors, the ZCAE method provides a simple monotonic function which can be used with equal validity in all frequency bands and for any source and signal con?guration.

The reliability of continuously-variable weighting fac-tors estimated using the ZCAE method has been assessed both by considering the sample standard deviation of the estimated ITDs and the resulting speech recognition accu-racy.In both cases,the use of zero-crossings proved to be superior to the use of cross-correlation for the estimation of ITDs,and the use of continuously-variable weights remains superior to the use of binary masks.A limitation of this approach is that so far it has been applied only to the case of a single interfering source in the absence of reverberation.We are encouraged by these results and are working to extend them.

Acknowledgements

This work was supported by the Information and Tele-communication National Scholarship Program sponsored by the Institute of Information Technology Assessment, Korea,by the Sogang University Foundation Research Grants,and by the National Science Foundation(Grant IIS-0420866).

Appendix A.Derivation of the ITDs based on cross-correlation as a function of source amplitudes

In order to derive the ITDs based on cross-correlation as a function of source amplitudes,we follow almost the same derivation as in the case of the zero-crossing-based method.As before,we start from mixtures given by Eq.(1).

For?nite-duration signals in a frame of duration T,the cross-correlation function with time lag s is expressed by

cesT?1

T

Z T=2

t?àT=2

x1etTx2ettsTd t:eA:1T

For the values of x1etTand x2etTconsidered,the integrand in Eq.(A.1)is

x1etTx2ettsT

?1

2

A2

1

ecose2x1ttx1esàd1Tt2/1Ttcosex1esàd1TTT?

tA1A2ecosex1ttx2ettsàd2Tt/1t/2T

tcosex1tàx2ettsàd2Tt/1à/2T

tcosex1ettsàd1Ttx2tt/1t/2T

tcosex1ettsàd1Tàx2tt/1à/2TT

tA2

2

ecose2x2ttx2esàd2Tt2/2T

tcosex2esàd2TTT :eA:2THence,for x1–x2,the derivative of the cross-correlation function cesTcan de?ned as follows:

d cesT

d s

1

2T

A2

1

esinex1TTsinex1esàd1Tt2/1T

tx1T sinex1esàd1TTT

tA1A2

2x2

x1tx2

sineex1tx2TT=2T

?sinex2esàd2Tt/1t/2T

t

2x2

x1àx2

sineex1àx2TT=2T

?sinex2esàd2Tà/1t/2T

t

2x1

x1tx2

sineex1tx2TT=2T

?sinex1esàd1Tt/1t/2T

t

2x1

x1àx2

sineex1àx2TT=2T

?sinex1esàd1Tt/1à/2T

tA2

2

esinex2TTsinex2esàd2Tt2/2T

tx2T sinex2esàd2TTT

!

:eA:3T

The time lag s cc at the maximum of the cross-correlation, which corresponds to the estimated ITD,is obtained by setting the derivative of cesTin the equation above to zero. Using the approximation of small x iesàd iT,we obtain the expression below from which s cc is easily obtained:

s cc A2

1

esinex1TTx1cose2/1Ttx2

1

TT

t

2A1A2

x1tx2

sineex1tx2TT=2Tex2

1

tx2

2

Tcose/1t/2Tt

2A1A2

12

sineex1àx2TT=2Tex2

1

tx2

2

Tcose/1à/2T

tA2

2

esinex2TTx2cose2/2Ttx2

2

TT

!

%d1A2

1

esinex1TTx1cose2/1Ttx2

1

TT

t

2A1A2

x1tx2

sineex1tx2TT=2Tx2

1

cose/1t/2T

t

2A1A2

x1àx2

sineex1àx2TT=2Tx2

1

cose/1à/2T

!

td2

2A1A2

x1tx2

sineex1tx2TT=2Tx2

2

cose/1t/2Tt

2A1A2

x1àx2

sineex1àx2TT=2Tx2

2

cose/1à/2T

tA2

2

esinex2TTx2cose2/2Ttx2

2

TT

!

àA2

1

sinex1TTsine2/1Tt2A1A2sineex1tx2TT=2T?sine/1t/2Tt2A1A2sineex1àx2TT=2Tsine/1à/2T

tA2

2

sinex2TTsine2/2T

!

:eA:4T

24H.-M.Park,R.M.Stern/Speech Communication51(2009)15–25

References

Assmann,P.,Summer?eld,Q.,2004.The perception of speech under adverse conditions.In:Greenberg,S.,Ainsworth,W.,Popper,A.N., Fay,R.R.(Eds.),Speech Processing in the Auditory System,Springer Handbook of Auditory Research,Vol.18.Springer-Verlag,pp.231–308,Chapter5.

Barker,J.P.,Josifovski,L.,Cooke,M.P.,Green,P.,2000.Soft decisions in missing data techniques for robust automatic speech recognition.In: Proc.Internat.Conf.on Spoken Language Processing(ICSLP-2000), Beijing,China,pp.373–376.

Bodden,M.,1993.Modelling human sound-source localization and the cocktail party e?ect.Acta Acust.1,43–55.

Bodden,M.,Anderson,T.R.,1995.A binaural selectivity model for speech recognition.In:Proc.European Conf.on Speech Communi-cation and Technology(EUROSPEECH-1995),Madrid,Spain,pp.

127–130.

Braasch,J.,2005.Modelling of binaural hearing.In:Blauert,J.(Ed.), Communication Acoustics.Springer-Verlag,Berlin,Germany,pp.75–108,Chapter4.

Bregman,A.S.,1990.Auditory Scene Analysis.MIT Press,Cambridge, MA.

Brown,G.J.,Cooke,M.P.,https://www.doczj.com/doc/087656496.html,putational auditory scene analysis.

Comput.Speech Lang.8,297–336.

Colburn,H.S.,Kulkarni,A.,2005.Models of sound localization.In:Fay, R.,Popper,T.(Eds.),Sound Source Localization,Springer Handbook of Auditory Research,Vol.6.Springer-Verlag,pp.272–316,Chapter 8.

Henning,G.B.,1974.Detectability of interaural delay in high-frequency complex waveforms.J.Acoust.Soc.Amer.77,1129–1140. Hermansky,H.,1998.Should recognizers have ears?Speech Comm.25(1–

3),3–27.

Je?ress,L.A.,1948.A place theory of sound https://www.doczj.com/doc/087656496.html,put.

Physiol.Psychol41,35–39.

Juang,B.-H.,1991.Speech recognition in adverse https://www.doczj.com/doc/087656496.html,put.

Speech Lang.5(3),275–294.

Kim,Y.-I.,Kil,R.M.,2004.Sound source localization based on zero-crossing peak-amplitude coding.In:Proc.Internat.Conf.on Spoken Language Processing(INTERSPEECH-2004),Jeju,Korea,pp.477–480.

Kim,Y.-I.,Kil,R.M.,2007.Estimation of interaural time di?erences based on zero-crossings in noisy multisource environments.IEEE Trans.Audio Speech Lang.Process.15,734–743.

Kim,Y.-I.,An,S.J.,Kil,R.M.,Park,H.-M.,2005.Sound segregation based on binaural zero-crossings.In:Proc.European Conf.on Speech Communication and Technology(INTERSPEECH-2005),Lisbon, Portugal,pp.2325–2328.

Lindemann,W.,1986.Extension of a binaural cross-correlation model by contralateral inhibition.I.Simulation for lateralization for stationary signals.J.Acoust.Soc.Amer.80,1608–1622.

Lyon,R.,1983.A computational model of binaural localization and separation.In:Proc.IEEE Internat.Conf.on Acoustics,Speech,and Signal Processing(ICASSP’83),Boston,MA,pp.1148–1151. Meddis,R.,Hewitt,M.J.,1991.Virtual pitch and phase sensitivity of a computer model of the auditory periphery.I.Pitch identi?cation.J.

Acoust.Soc.Amer.89(6),2866–2882.

Morris,A.,Barker,J.,Bourlard,H.,2001.From missing data to maybe useful data:soft data modelling for noise robust ASR.In:Proc.IEEE

Internat.Workshop on Intelligent Signal Processing,Budapest, Hungary,pp.153–164.

Nuetzel,J.,Hafter,E.R.,1981.Discrimination of interaural delays in complex waveforms:spectral e?ects.J.Acoust.Soc.Amer.69,1112–1118.

O’Mard,L.P.,2000.Development system for auditory modelling.dsam-

2.6.62.tar.gz.. Paloma¨ki,K.J.,Brown,G.J.,Wang,D.,2004.A binaural processor for missing data speech recognition in the presence of noise and small-room reverberation.Speech Comm.43,361–378.

Price,P.,Fisher,W.M.,Bernstein,J.,Pallet,D.S.,1988.The DARPA 1000-word resource management database for continuous speech recognition.In:Proc.IEEE Internat.Conf.on Acoustics,Speech,and Signal Processing(ICASSP’88),New York,NY,pp.651–654.

Raj,B.,Stern,R.M.,2005.Missing-feature methods for robust automatic speech recognition.IEEE Signal Process.Mag.22(5),101–116. Raj,B.,Parikh,V.,Stern,R.M.,1997.The e?ects of background music on speech recognition accuracy.In:Proc.IEEE Internat.Conf.on Acoustics,Speech,and Signal Processing(ICASSP’97),Munich, Germany,pp.851–854.

Raj, B.,Seltzer,M.L.,Stern,R.M.,2004.Reconstruction of missing features for robust speech recognition.Speech Comm.43(4),275–296. Roman,N.,Wang,D.,Brown,G.J.,2003.Speech segregation based on sound localization.J.Acoust.Soc.Amer.114(4),2236–2252. Singh,R.,Raj,B.,Stern,R.M.,2002a.Model compensation and matched condition methods for robust speech recognition.In:Davis,G.(Ed.), CRC Handbook on Noise Reduction in Speech Applications.CRC Press,pp.245–276,Chapter10.

Singh,R.,Stern,R.M.,Raj,B.,2002b.Signal and feature compensation methods for robust speech recognition.In:Davis,G.(Ed.),CRC Handbook on Noise Reduction in Speech Applications.CRC Press, pp.219–244,Chapter9.

Srinivasan,S.,Roman,N.,Wang,D.,2004.On binary and ratio time-frequency masks for robust speech recognition.In:Proc.Internat.

Conf.on Spoken Language Processing(INTERSPEECH-2004),Jeju, Korea,pp.2541–2544.

Srinivasan,S.,Roman,N.,Wang, D.,2006.Binary and ratio time-frequency masks for robust speech recognition.Speech Comm.48, 1486–1501.

Stern,R.M.,Colburn,H.,1978.Theory of binaural interaction based on auditory-nerve data.IV.A model of subjective lateral position.J.

Acoust.Soc.Amer.64,127–140.

Stern,R.M.,Trahiotis, C.,1996.Models of binaural perception.In: Gilkey,R.,Anderson,T.R.(Eds.),Binaural and Spatial Hearing in Real and Virtual https://www.doczj.com/doc/087656496.html,wrence Erlbaum Associates,pp.

499–531,Chapter24.

Strutt,J.W.,1907.On our perception of sound direction.Philos.Mag.13, 214–232.

Tessier,E.,Berthommier,F.,Glotin,H.,Choi,S.,1999.A casa front-end using the localisation cue for segregation and then cocktail-party speech recognition.In:Proc.Internat.Conf.on Speech Processing (ICSP-99),Seoul,Korea,pp.97–102.

Wang, D.,Brown,G.J.(Eds.),https://www.doczj.com/doc/087656496.html,putational Auditory Scene Analysis:Principles,Algorithms,and Applications.Wiley-IEEE Press, Hoboken,NJ.

Weintraub,M.,1986.A computational model for separating two simultaneous talkers.In:Proc.IEEE Internat.Conf.on Acoustics, Speech,and Signal Processing(ICASSP’86),Tokyo,Japan,pp.81–84.

H.-M.Park,R.M.Stern/Speech Communication51(2009)15–2525

最新合肥工业大学研究生英语复习课本重点句

1.I have never cultivated a mustache though I’m sure one would enhance my distinguished looks and cause women to giggle as I passed along the boulevard. 尽管我确信蓄胡子会使我更加气度不凡,走在大街上会使女性发笑,但我从不留胡子。 2.I might be thrown into such a panic that I’d blurt out ... 我可能会惊慌得脱口而出... 3.It is one of the paradoxes of social intercourse that a compliment is harder to respond to than an insult. Here is an area of small talk that most of us act awkwardly. 在社会交往中,应对恭维比对付辱骂要艰难得多,这话听起来有点矛盾,却有一定的道理。闲聊时来句恭维话,往往让我们大多数人不知所措。 5.Someone utters a pleasing, praiseful remark in our direction and we grow inarticulate and our kneecaps begin to vibrate. 有人对我们说上一句动听、赞美的话,我们就慌得说不出话来,膝盖开始瑟瑟发抖。 6.I can’t accept with grace a compliment bestowed upon me for a thing that isn’t real ly mine. 如果别人称赞不是真正属于我自己的东西时,我根本无法欣然接受. 7.The nearest I ever came to downright acceptance of this particular compliment was the time I said, “Well, we like it.” 我在接受这种特定的恭维时,表示最能完全接受的说法就是“嗯,我们喜欢。” 8.... carried away by the vastness of his complimentary remark ... 被他的这种极度夸张的恭维话所吸引的 9.I think we make a mistake when we react to a compliment with denial and derogation. 我认为,对待恭维采取否定和贬低的态度是错误的。 10.The situation here is much the same as the one regarding my view. 这种情景,与我上述提出的观点非常相似。 11.I know a man who has put his mind to this problem and come up with a technique for brushing off praise. 我认识一个潜心研究这种问题的人,他想出了一个办法来避开别人的表扬。 12.He employs a sort of unreasonable realism. 他采取了一种不近情理的现实态度。 13.I don’t think this fellow is on the right track. 我想这个家伙回答的方式有问题。 14.This sort of thing, the witty reply, ought to be placed under government regulation. 这种俏皮机智的应答,应该置于政府的规定之中。 15.That one, I thought, was more than passable. 我想,这个回答相当不错。 16.But for every genuinely clever retort there are a thousand that fall flat. 但是,在千百次的应对中才会有一句真正巧妙的应答。 17.It takes a Dorothy Parker or a George S. Kaufman to handle the quip comeback with skill. 只有像多萝西·帕克或乔治·考夫曼这样的人才能应对自如。 18.… swell out their chests…挺着胸脯 19.I worked like a dog to get it written. 我当时写得好苦啊。 20.... the unwritten code of authorhood ... 著书人的一条不成文的规定. 21.... with immeasurably lovely egg on your face ... 一脸尴尬 22.There is no point in trying to play the game back at them —they’ll top you in the end, no matter what. 要想回敬他们是没有用的——不管说什么,最后他们总会占上风。 23.... at the bottom of all graceful social intercourse lies poise ... 在所有得体的社交场合,最根本的就是保持镇定24.Given the same circumstances I would have quietly asked for a coil of rope. 要是我遇到这种情况,我会感到极为窘迫,恨不得悄悄地找根绳子去上吊。 . 25.If we could all comport ourselves with that kind of dignity, and quit jittering, our social life would be much more enjoyable. 要是我们的行为举止都能保持这种风度,摆脱局促不安,那我们的社交生活就会有趣得多。 26.... the chances he’s just making conversation ...他可能只是想和你说说话。 Unit 2 1. I date a woman for a while—literary type, well-read, lots of books in her place—whom I admired a bit too extravagantly

合工大研究生英语翻译

汉翻英第一单元 1、要善于恭维他人,重要的一步就是要懂得为什么恭维会有助于你建立更好的人际关系。 An important step in becoming an effective flatterer is to understand why flattery helps you establish better relationships with others. 2、恭维之所以奏效,最根本的原因是恭维符合了人类行为的一个基本原则:人们渴望得到赏识。 The root cause of the power of flattery gets at(达到) a basic principle of human behavior:People crave being appreciated. Power of flattery lies in the fact that it fulfils a basic principle of human behavior: 3、尽管文化背景各不相同,但绝大多数人都有类似的想法。 The vast majority of people are of the similar idea despite different cultures. 4、在亚洲文化中,人们对群体赏识的渴求一般要强于对个体赏识的渴求。但不管怎样,人们渴望赏识是普遍存在的。 In Asian cultures the desire for group recognition is generally stronger than the desire for individual recognition. Nevertheless, the need for recognition is present/universal. 5、很多人认为,工作本身带来的乐趣要比外界赏识包括恭维更为重要。Many people hold that the joy of work itself is more important than external recognition, including flattery. 6、工作的乐趣也许是一种巨大的动力,但是即使是那些从工作中得到极大乐趣的人如科学家、艺术家、摄影师也渴望得到恭维和认可,否则他们就不会去竞争诺贝尔奖或在重要的展览会上展示他们的作品了。 The joy of work may be a powerful motivator, but even those who get the biggest joy from their work —such as scientists, artists, and photographers —crave flattery and recognition. 7、否则他们就不会去竞争诺贝尔奖或在重要的展览会上展示他们的作品了。Otherwise they wouldn’t compete for Nobel Prizes or enter their work in important exhibitions. 8、恭维之所以奏效,还因为它与人们对认可的正常需要有关。 Another reason flattery is so effective relates to the normal need to be recognized. 9、尽管有一些关于恭维的书和文章问世,并对恭维极力进行宣扬,但是大多数人还是没有得到应有的赏识。 Although some articles and books have been written and preached zealously about flattery, most people receive less recognition than they deserve. 10、很多人无论在工作上或在家里都很少收到赞美,所以对认可的渴求就更加强烈了。 Many people hardly ever receive compliments either on the job or at home, thus intensifying their demand for flattery. 汉翻英第二单元 1、鲜花是最常送的礼物之一。 Flowers are among the most frequently given gifts. 2、有一种传统的用鲜花表达的语言。精心挑选的一束花卉可以传达多种不同的情感和祝福。 There’s a traditional floral language, and a carefully selected bouquet or plant can convey a wide range of emotions (love, affection, pity)and sentiments 、 3、红玫瑰象征着爱情也象征着新事业充满希望的开端; Red roses symbolize love as well as the hopeful beginning of a new enterprise; 4、紫罗兰是祈求受花人不要忘却送花人。 violets beseech the recipient not to forget the donor; 5、兰花以及其他精美的花卉则表示(你希望)受花人认为你情调高雅(decent, graceful, elegant)、受人尊重(esteemed, respectable)、出类拔萃(outstanding, distinguished, celebrated)。 orchids and other exquisite blooms indicate that the recipient regards you as exotic(珍奇的), precious(讲究的)and rare(珍贵的). 6、送一束鲜花能唤起温馨的回忆,比那些仅仅显示炫耀和奢华的礼物更为珍贵。 A floral gift that evokes warm recollections will be prized more than one that is

合工大研究生英语课后汉翻英中英对照版考试必备(精)

要善于恭维他人,重要的一步就是要懂得为什么恭维会有助于你建立更好的人际关系。 An important step in becoming an effective flatterer is to understand why flattery helps you establish better relationships with others. 恭维之所以奏效,最根本的原因是恭维符合了人类行为的一个基本原则:人们渴望得到赏识。 The root cause of the power of flattery gets at a basic principle of human behavior: People crave being appreciated., 尽管文化背景各不相同,但绝大多数人都有类似的想法。 he vast majority of people are of the similar idea despite different cultures. 在亚洲文化中,人们对群体赏识的渴求一般要强于对个体赏识的渴求。 In Asian cultures the desire for group recognition is generally stronger than the desire for individual recognition. 但不管怎样,人们赏识是普遍存在的。 Nevertheless, the need for recognition is present. 很多人认为,工作本身带来的乐趣要比外界赏识包括恭维更为重要。工作的乐趣也许是一种巨大的动力,但是即使是那些从工作中得到极大乐趣的人,如科学家、艺术家、摄影师也渴望得到恭维和认可,否则他们就不会去竞争诺贝尔或在重要的展览会上展示他们的作品了。 Many people hold that the joy of work itself is more important than external recognition, including flattery. The joy of work may be a powerful motivator, but even those who get the biggest joy from their work--- such as scientists, artists, and photographers --- crave flattery and recognition. Otherwise they wouldn’t compete for Nobel Prizes or enter their work in important exhibitions.

2017合工大研究生英语全缩印版

2017合工大研究生英语全缩印版 第五章: 1.The serious part of my life ever since boyhood has been devoted to two different objects which for a long time remained separate and have only in recent years united into a single whole. 自少年时代起,我一生的重要时光都致力于实现两个不同的目标。这两个目标长久以来一直分开的,只是近几年才合为一体。 2. I was troubled by scepticism and unwillingly forced to the conclusion that most of what knowledge is open to reasonable doubt. 我被怀 疑论干扰着,不愿意被迫下这个结论,那就是过去被迫接受的东西都可以公开的,合理地质疑。 3. But I discovered that many mathematical demonstrations, which my teachers expected me to accept, were full of , and that, if certainty were indeed discoverable in mathematics, it would be in a new kind of mathematics, with more solid foundations than those that had 但是我发现老师们期望我接受的许多数学论证都充满了谬误:假如确切性果真能发现于数学,它也是一种新型的数学,比迄今为止被认为是无懈可击的数学的原理更为牢固。 4. there was nothing more that I could do in the way of making mathematical knowledge . 在使数学知识不容怀疑这个方面,我已尽了全力。 5. And I am convinced that intelligence, patience, and can, sooner or later, lead the human its self-imposed itself meanwhile. 我相信:只要人类不自相残杀,智慧、忍耐和雄辩迟早会指引人类走出自找的磨难。 6 But I remain completely incapable of agreeing with those who accept fatalistically the view that man is born to trouble. 但是我仍然不能完全认同那些拥有宿命论观点认为人生下来就是麻烦的观点。 7. And there have been morbid miseries fostered by gloomy creeds,which have led men into profound inner discords that made all outward prosperity of avail . 还有一种就是由悲观的信念滋生的病态的痛苦,它使人类的内心极不和谐,从而使外部的繁荣毫无成果。 8.To preserve hope in our world our intelligence and our energy. In those who despair it is

研究生英语课件

The Hidden Side of Happiness 1.激流漂流:an recreational activity using an inflatable (可充气的)raft to navigate a river or other bodies of water. hurricane: a storm that has very strong fast winds and that moves over water 飓风 e.g. In America, Hurricane Katrina turned New Orleans from a stable, wealthy and vibrant city into a wasteland in a few days. typhoon 台风 monsoon 季风与季风相伴的雨季 trade wind 信风 vicious: violent and cruel in a way that hurts someone physically e.g. a vicious murder/killer/cycle ? A tiger wearing a monk's beads/a vicious person pretending to be benevolent 老虎戴素珠——假充善人 very unkind in a way that is intended to hurt someone's feelings or make their character seem bad, e.g. She was shocked by the vicious tone in his voice. similar words: malicious, evil, violent, savage, cruel, ferocious, fierce Their refrain might go sth. like this: refrain: n. a remark or idea that is often repeated 重复 e.g. (1)Their proposal met with constant refrain that it was impractical. (2)I made a terrific effort to refrain from tears. refrain: v. to hold oneself back; forbear e.g. refrain your tongue from backbiting. 不要在背后中伤人家 2 tribulation: a serious trouble or a serious problem 苦难,困苦,考验 ?Rejoicing in hope, patient in tribulation. (John Kennedy, American president) 从希望中得到欢乐,在苦难中保持坚韧。(美国总统肯尼迪) ?Even at this point in our awful tribulation under the German scourge we were quite friendly. 此刻,即便是处在德军蹂躏的极端痛苦中,我们仍然是十分友好的。

合肥工业大学研究生英语考试(上)课文复习翻译

HFUT研究生英语上考试课文重点翻译 Unit One 1. I have never cultivated a mustache, though I’m sure one would enhance my distinguished looks and cause women to giggle as I passed along the boulevard. 尽管我确信蓄胡子会使我更加气度不凡,走在大街上会使女性发笑,但我从不留胡子。 2. I wouldn’t know how to respond.I might be thrown into such a panic that I’d blurt out, ‖I like yours, too.‖ 我会无所适从,不知怎样回答才好。我可能会惊慌得脱口而出:―我也喜欢您的胡子。‖ 3. It is one of the paradoxes of social intercourse that a compliment is much harder to respond to than an insult. Here is an area of small talk where most of use act awkwardly. 在社会交往中,应对恭维比对付辱骂要艰难得多,这话听起来有点矛盾,却有一定的道理。闲聊时来句恭维话,往往让我们大多数人不知所措。 4. Someone utters a pleasing, praise remark in our direction and we grow inarticulate and our kneecaps begin to vibrate. 有人对我们说上一句动听、赞美的话,我们就慌得说不出话来,膝盖开始瑟瑟发抖。 5. I can’t even accept with grace a compliment bestowed upon me for a thing that isn’t really mine. 如果别人称赞不是真正属于我自己的东西时,我根本无法欣然接受。 6.The nearest I ever came to downright acceptance of this particular compliment was the time I said, ―Well , we like it.‖ This is a response that should be used with caution. 我在接受这种特定的恭维时,表示最能完全接受的说法就是―嗯,我们喜欢。‖采用这种答话必须得小心谨慎。 7. Not long ago I was in a group where a geophysicist form Australia was talking eloquently about the wonders of the universe. 不久前,我和一批人在一起时,其中有位来自澳大利亚的地球物理学家在滔滔不绝地谈论宇宙中的奇观。 8.There was a long pause, and then, carried away by the vastness of his complimentary remark, a woman said, ―Well, we like it.‖ 随后便是长时间的停顿。这时,一位被他的这种极度夸张的恭维话所吸引的妇女,禁不住说道,―嗯,我们喜欢这个地球。‖ 9. I think we make a mistake when we react to a compliment with denial and derogation. 我认为,对待恭维采取否定和贬低的态度是错误的。

相关主题
相关文档 最新文档