Model_Based_Inversion_of_Dynamic_Range_Compression
- 格式:pdf
- 大小:1.84 MB
- 文档页数:11
一文详解flow based modelsFlow based models,也被称为可逆生成模型(invertible generative models),是一类用于生成模型的神经网络架构。
与其他生成模型如GAN和VAE不同,flow based models拥有可逆的编码器和解码器结构,使得输入样本可以通过解码器生成样本,同时编码器可以恢复原始样本。
本文将详细解释flow based models的原理和相关参考内容。
首先,flow based models的核心思想是建立输入和输出之间的一对一映射关系,以及通过联合分布近似来进行建模。
flow based models的主要优点是可以计算出精确的似然函数,而不需要通过变分推断或逼近技术。
同时,由于其可逆性质,flow based models还可以进行完全可解推理和采样。
具体地,flow based models通常由多个可逆层组成,每个层都有一个从输入到输出的可逆函数。
这些函数可以是简单的元素级函数如仿射变换和逐通道的非线性函数,也可以是复杂的非线性转换函数如卷积神经网络。
整个模型的可逆性由这些可逆层的组合实现。
通过这些可逆层,flow based models可以将原始样本空间映射到一个更简单的潜在空间,然后通过解码器进行重构。
近年来,flow based models在图像生成、图像修复、语言建模和强化学习等领域取得了广泛应用。
下面列举一些与flow based models相关的参考内容,供读者深入了解和学习:1. "Flow++: Improving Flow-Based Generative Models withVariational Dequantization and Architecture Design",由Jonathan Ho等人于2019年提出的论文,介绍了一种改进的flow based model,通过使用变分量化和架构设计提高了模型的生成效果。
huggingface trainer参数Huggingface库中的Trainer类用于训练和评估模型。
下面是一些常用的Trainer类的参数:1. model (required): 要训练的模型。
2. args (required): 训练的参数配置,是一个TrainingArguments对象。
3. data_collator (optional): 数据整理器,用于将输入数据集与模型的输入进行匹配。
4. train_dataset (optional): 训练数据集。
5. eval_dataset (optional): 评估数据集。
6. tokenizer (optional): 模型的分词器,用于对输入文本进行分词处理。
7. compute_metrics (optional): 自定义的评估指标函数,用于评估模型性能。
8. callbacks (optional): 自定义的回调函数列表,用于在训练过程中执行特定操作。
9. optimizers (optional): 自定义的优化器,用于训练模型。
10. scheduler (optional): 自定义的学习率调度器,用于调整模型的学习率。
11. data_parallel (optional): 是否在多个GPU上进行数据并行。
12. deepspeed (optional): 是否使用DeepSpeed库进行训练。
DeepSpeed 是用于深度学习模型的高效训练和优化的开源库。
13. gradient_accumulation_steps (optional): 梯度累积的步数,用于提高训练效果。
14. max_steps (optional): 最大训练步数。
15. num_train_epochs (optional): 最大训练轮数。
这些参数只是Trainer类的一部分,根据具体的任务和需求,您可能还需要使用其他参数。
二维ising模型蒙特卡洛算法
以下是二维 Ising 模型的蒙特卡洛算法的详细步骤:
1.初始化:生成一个二维自旋阵列,可以随机初始化每个自
旋的取值为+1或-1。
2.定义参数:设置模拟步数(或称为Monte Carlo 步数,MC
steps)、温度(T)、外部磁场(H)和相互作用强度(J)。
3.进行蒙特卡洛模拟循环:
o对于每个 MC 步:
▪对每个自旋位置(i,j)进行以下操作:
▪随机选择一个自旋(i,j)和其相邻的自
旋。
▪计算自旋翻转后的能量差ΔE。
▪如果ΔE 小于等于0,接受翻转,将自旋
翻转。
▪如果ΔE 大于0,根据Metropolis 准则以
概率 exp(-ΔE / T) 决定是否接受翻转。
o每个 MC 步结束后,记录自旋阵列的属性(例如平均磁化、能量等)。
o可以选择在一些 MC 步之后检查系统是否达到平衡状态。
如果需要,可以进行更多的 MC 步。
4.分析结果:使用模拟的自旋阵列进行统计和计算,例如计
算平均自旋、能量、磁化、磁化率、热容等。
这是基本的二维Ising 模型的蒙特卡洛算法步骤。
在实施算法时,还可以根据需要考虑边界条件(如周期性边界条件)、优化算法以提高效率等其他因素。
掌握深度学习中的生成对抗网络和变分自编码器生成对抗网络(GANs)和变分自编码器(VAEs)是深度学习中广泛应用的两种生成模型。
它们在模拟和生成数据方面有着独特的优势。
本文将介绍这两种模型的原理、应用和发展趋势。
一、生成对抗网络(GANs)生成对抗网络(GANs)由生成器(Generator)和判别器(Discriminator)组成。
生成器试图生成与真实数据相似的假数据,而判别器则负责将真实数据与生成的假数据区分开来。
在训练过程中,生成器和判别器不断通过对抗的方式进行优化,最终生成器能够生成高质量的假数据。
GANs的应用非常广泛。
例如,在计算机图像生成中,GANs可以用于生成逼真的人脸图片。
此外,在自然语言处理领域,GANs也可以用于生成写作风格独特的文章或诗歌。
近年来,GANs在医药领域也有突破性进展,可以用于生成新的分子结构,帮助药物发现研究。
GANs的发展也面临着一些挑战。
首先,GANs的训练过程不稳定,容易出现模式崩溃或模式坍塌的问题。
其次,GANs的训练需要大量的数据和计算资源,对硬件设备和数据集的要求较高。
此外,对于GANs 生成的假数据,如何进行评估和量化也是一个难题。
相对于GANs,变分自编码器(VAEs)是一种更为稳定和迅速的生成模型。
二、变分自编码器(VAEs)变分自编码器(VAEs)是一种基于概率模型的生成模型。
它将输入数据通过一个编码器(Encoder)映射到潜在空间,并在潜在空间中进行采样。
然后,通过解码器(Decoder)将潜在空间的向量解码为生成的数据。
与GANs不同,VAEs通过最大化数据的后验概率进行训练。
VAEs的应用也非常广泛。
在图像生成方面,VAEs可以用于生成逼真的人脸、动物等图像。
在自然语言处理领域,VAEs可以用于生成有逻辑和上下文的文章段落。
此外,VAEs还可以应用于数据压缩和降维等领域。
VAEs也面临一些挑战。
例如,VAEs生成的数据相对于GANs的输出来说可能不够清晰和逼真。
deep learning-based models
基于深度学习的模型(Deep Learning-based models)是一种机器学习的方法,它使用深度神经网络来处理大量的数据并从中学习。
深度学习模型通常使用大量的参数和复杂的网络结构,以在各种任务中实现卓越的性能,包括图像识别、语音识别、自然语言处理等。
深度学习模型的基本结构包括输入层、隐藏层和输出层。
输入层接收原始数据,隐藏层通过一系列复杂的计算将输入转化为有意义的特征表示,最后输出层将隐藏层的结果转化为具体的输出。
深度学习模型能够自动学习和提取输入数据的特征,这使得它们在许多任务中比传统的机器学习方法更有效。
深度学习的应用非常广泛,包括但不限于:
1.图像识别:深度学习模型可以自动学习和识别图像中的特征,例如人脸识别、物体检测等。
2.自然语言处理:深度学习模型可以处理和生成自然语言文本,例如机器翻译、文本生成等。
3.语音识别:深度学习模型可以自动识别和转化语音为文本,例如语音助手、语音搜索等。
4.推荐系统:深度学习模型可以根据用户的历史行为和偏好,自动推荐相关的内容或产品,例如视频推荐、电商推荐等。
5.医学影像分析:深度学习模型可以自动分析和识别医学影像,例如CT扫描、MRI图像等,用于辅助医生诊断和治疗疾病。
总的来说,基于深度学习的模型在人工智能领域中发挥着越来越重要的作用,并将在未来继续推动着技术的发展和创新。
生成对抗网络的生成模型训练中的超参数优化技巧分享生成对抗网络(GAN)是一种深度学习模型,由生成器和判别器组成,通过对抗训练来生成逼真的数据样本。
在训练生成模型的过程中,优化超参数是至关重要的一步。
本文将分享一些生成对抗网络的生成模型训练中的超参数优化技巧。
1. 学习率调整学习率是深度学习模型中非常重要的超参数之一。
对于生成对抗网络模型,学习率的选择尤为重要。
通常情况下,初始学习率可以设置为一个较小的值,然后随着训练的进行逐渐减小。
这个过程可以使用学习率衰减的方法,比如指数衰减或者余弦退火等方法。
2. 生成器和判别器的优化器选择在生成对抗网络中,生成器和判别器的优化器选取也是一个非常重要的超参数选择。
通常情况下,可以选择使用Adam优化器作为生成器和判别器的优化器。
Adam优化器能够较好地平衡收敛速度和模型稳定性。
3. 正则化项的选择在生成对抗网络的训练中,正则化项的选择也是一个重要的超参数。
正则化项可以帮助模型减小过拟合的风险,提高模型的泛化能力。
通常情况下,可以选择使用L1正则化或者L2正则化来约束模型的复杂度,防止模型过拟合。
4. 批量大小的选择批量大小是生成对抗网络训练中的另一个重要超参数。
通常情况下,较大的批量大小可以提高训练的效率,但过大的批量大小也会增加内存消耗,降低模型的泛化能力。
因此,在选择批量大小时需要进行权衡,可以通过实验找到一个合适的批量大小。
5. 噪声输入的选择在生成对抗网络的训练中,噪声输入是非常重要的一部分。
噪声输入可以影响生成器的输出结果,因此在训练时需要选择合适的噪声输入。
通常情况下,可以选择使用均匀分布或者正态分布的噪声输入,然后通过实验选择合适的噪声分布参数。
6. 梯度裁剪在生成对抗网络的训练中,梯度裁剪也是一个重要的技巧。
梯度裁剪可以帮助防止梯度爆炸的问题,提高模型的训练稳定性。
通常情况下,可以设置一个阈值,当梯度的范数超过阈值时对梯度进行裁剪。
7. 训练策略的选择在生成对抗网络的训练中,训练策略的选择也是非常重要的一部分。
生成对抗网络的生成模型训练中的超参数优化技巧分享生成对抗网络(GANs)是一种深度学习模型,由两个神经网络组成:生成器和判别器。
生成器试图生成看起来像真实样本的数据,而判别器则试图区分真实数据和生成器生成的假数据。
在生成对抗网络的训练过程中,超参数的选择对模型的性能和收敛速度起着至关重要的作用。
本文将分享一些生成对抗网络的生成模型训练中的超参数优化技巧。
一、学习率调整学习率是深度学习模型中最重要的超参数之一。
在生成对抗网络中,学习率的选择对模型的性能和收敛速度有着直接的影响。
通常情况下,初始的学习率设置为是一个较好的选择。
然后可以尝试不同的学习率调度策略,例如学习率衰减或动态调整学习率的方法,以找到最优的学习率设置。
二、批量大小调整批量大小是另一个重要的超参数,它决定了模型一次更新的样本数量。
在生成对抗网络的训练中,通常使用较大的批量大小来加速模型的训练,但是过大的批量大小可能导致模型收敛不稳定。
因此,需要对批量大小进行调整,找到一个合适的值。
通常情况下,批量大小设置为64或128是一个不错的选择。
三、激活函数选择在生成对抗网络的生成模型中,激活函数的选择也是一个重要的超参数。
常用的激活函数有ReLU、Leaky ReLU和tanh等。
不同的激活函数对模型的训练和生成效果有着不同的影响,因此需要进行合理的选择。
通常情况下,Leaky ReLU在生成对抗网络中的效果较为稳定,但是也可以尝试其他的激活函数,找到最适合当前模型的选择。
四、噪声输入在生成对抗网络的生成模型中,噪声输入是一个非常重要的因素。
噪声输入的大小和分布对模型的生成效果有着直接的影响。
通常情况下,使用均匀分布或正态分布的噪声输入是一个比较常见的选择。
但是也可以尝试其他的噪声输入分布,找到最适合当前模型的选择。
五、正则化方法正则化是在深度学习模型中用来防止过拟合的一种重要技巧。
在生成对抗网络的训练中,正则化方法的选择对模型的泛化能力和生成效果有着重要的影响。
生成对抗网络(GAN)是一种深度学习模型,由生成器和判别器两部分组成。
生成器负责生成数据样本,而判别器则负责判断生成的样本是真实的还是伪造的。
生成对抗网络的训练过程中存在一些常见问题,本文将分享一些解决方法。
一、梯度消失和梯度爆炸问题在生成对抗网络的训练过程中,梯度消失和梯度爆炸是常见的问题。
梯度消失指的是在反向传播过程中,梯度逐渐减小到接近零,导致模型无法收敛;而梯度爆炸则是指梯度逐渐增大,导致模型发散。
为了解决这一问题,可以采用合适的激活函数和初始化方法,以及对网络结构进行调整。
其次,可以尝试使用梯度裁剪技术,即限制梯度的大小,防止梯度爆炸。
此外,合理设置学习率和采用正则化方法也可以有效缓解梯度消失和梯度爆炸问题。
二、模式崩溃问题在生成对抗网络的训练过程中,模式崩溃是一个常见的问题。
模式崩溃指的是生成器只学习到数据分布中的部分模式,导致生成的样本缺乏多样性。
为了解决模式崩溃问题,可以采用多样性促进方法,如在损失函数中引入多样性惩罚项,或者使用多样性促进的评价指标来指导训练过程。
此外,可以通过增加生成器和判别器的容量,以及引入噪声等方法来增加模型的多样性。
同时,合理设计损失函数,平衡生成器和判别器的训练目标,也可以有效缓解模式崩溃问题。
三、模态崩溃问题模态崩溃指生成器只生成数据分布中的部分模态,而忽略了其他模态的情况。
为了解决模态崩溃问题,可以采用多模态损失函数,引入额外的监督信息来指导生成样本的多模态分布。
另外,可以通过增加噪声输入、扩大生成器的输入空间以及增加样本多样性等方法来增加模型的多模态性。
此外,合理设计生成器的网络结构,增加网络的深度和宽度,也可以有效缓解模态崩溃问题。
四、训练不稳定问题在生成对抗网络的训练过程中,训练的不稳定性是一个普遍存在的问题。
为了解决训练不稳定问题,可以采用一些稳定性训练技巧,如渐变惩罚技术、正则化技术等。
此外,采用逐步训练的方法,即先训练生成器,再训练判别器,可以有效提高训练的稳定性。
Model-Based Inversion of Dynamic Range CompressionStanislaw Gorlow,Student Member,IEEE,and Joshua D.Reiss,Member,IEEEAbstract—In this work it is shown how a dynamic nonlinear time-variant operator,such as a dynamic range compressor,can be inverted using an explicit signal model.By knowing the model parameters that were used for compression one is able to recover the original uncompressed signal from a“broadcast”signal with high numerical accuracy and very low computational complexity.A compressor-decompressor scheme is worked out and described in detail.The approach is evaluated on real-world audio material with great success.Index Terms—Dynamic range compression,inversion, model-based,reverse audio engineering.I.I NTRODUCTIONS OUND or audio engineering is an established discipline employed in many areas that are part of our everyday life without us taking notice of it.But not many know how the audio was produced.If we take sound recording and reproduction or broadcasting as an example,we may imagine that a prerecorded signal from an acoustic source is altered by an audio engineer in such a way that it corresponds to certain criteria when played back.The number of these criteria may be large and usually depends on the context.In general,the said alteration of the input signal is a sequence of numerous forward transformations, the reversibility of which is of little or no interest.But what if one wished to do exactly this,that is to reverse the transfor-mation chain,and what is more,in a systematic and repeatable manner?The research objective of reverse audio engineering is twofold:to identify the transformation parameters given the input and the output signals,as in[1],and to regain the input signal that goes with the output signal given the transformation parameters.In both cases,an explicit signal model is manda-tory.The latter case might seem trivial,but only if the applied transformation is linear and orthogonal and as such perfectly invertible.Yet the forward transform is often neither linear nor invertible.This is the case for dynamic range compressionManuscript received December05,2012;revised February28,2013; accepted February28,2013.Date of publication March15,2013;date of current version March29,2013.This work was supported in part by the “Agence Nationale de la Recherche”within the scope of the DReaM project (ANR-09-CORD-006)as well as the laboratory with which thefirst author is affiliated as part of the“mobilitéjuniors”program.The associate editor coordinating the review of this manuscript and approving it for publication was Prof.Woon-Seng Gan.S.Gorlow is with the Computer Science Research Laboratory of Bordeaux (LaBRI),CNRS,Bordeaux1University,33405Talence Cedex,France(e-mail: stanislaw.gorlow@labri.fr).J.D.Reiss is with the Centre for Digital Music(C4DM),Queen Mary,Uni-versity of London,London E14NS,U.K.(e-mail:josh.reiss@). Digital Object Identifier10.1109/TASL.2013.2253099(DRC),which is commonly described by a dynamic nonlinear time-variant system.The classical linear time-invariant(LTI) system theory does not apply here,so a tailored solution to the problem at hand must be found instead.At this point,we also like to highlight the fact that neither V olterra nor Wiener model approaches[2]–[4]offer a solution,and neither do describing functions[5],[6].These are useful tools when identifying a time-invariant or a slowly varying nonlinear system or ana-lyzing the limit cycle behavior of a feedback system with a static nonlinearity.A method to invert dynamics compression is described in[7], but it requires an instantaneous gain value to be transmitted for each sample of the compressed signal.To provide a means to control the data rate,the gain signal is subsampled and also en-tropy coded.This approach is highly inefficient as it does not rely on a gain model and is extremely generic.On the other hand,transmitting the uncompressed signal in conjunction with a few typical compression parameters like threshold,ratio,attack,and release would require a much smaller capacity and yield the best possible signal quality with regard to any thinkable measure.A more realistic scenario is when the uncompressed signal is not available on the consumer side.This is usually the case for studio music recordings and broadcast material where the listener is offered a signal that is meant to sound“good”to everyone.However,the loudness war [8]has resulted in over-compressed audio material.Over-com-pression makes a song lose its artistic features like excitingness or liveliness and desensitizes the ear thanks to a louder volume. There is a need to restore the original signal’s dynamic range and to experience audio free of compression.In addition to the normalization of the program’s loudness level,the Dolby solution[9],[10]also includes dynamic range expansion.The expansion parameters that help reproduce the original program’s dynamic range are tuned on the broadcaster side and transmitted as metadata together with the broadcast signal.This is a very convenient solution for broadcasters,not least because the metadata is quite compact.Dynamic range ex-pansion is yet another forward transformation rather than a true inversion.Evidently,none of the previous approaches satisfy the re-verse engineering objective of this work.The goal of the present work,hence,is to invert dynamic range compression,which is a vital element not only in broadcasting but also in mastering. The paper is organized as follows.Section II provides a brief introduction to dynamic range compression and presents the compressor model upon which our considerations are based. The data model,the formulation of the problem,and the pur-sued approach are described next in Section III.The inversion1558-7916/$31.00©2013IEEEFig.1.Basic broadband compressor model(feed forward).is discussed in detail in Section IV.Section V illustrates how an integral step of the inversion procedure,namely the search for the zero-crossing of a non-linear function,can be solved in an iterative manner by means of linearization.Some other com-pressor features are discussed in Section VI.The complete al-gorithm is given in the form of pseudocode in Section VII and its performance is evaluated for different compressor settings in Section VIII.Conclusions are drawn in Section IX,where some directions for future work are mentioned.II.D YNAMIC R ANGE C OMPRESSIONDynamic range compression or simply“compression”is a sound processing technique that attenuates loud sounds and/or amplifies quiet sounds,which in consequence leads to a reduc-tion of an audio signal’s dynamic range.The latter is defined as the difference between the loudest and quietest sound mea-sured in decibel.In the following,we will use the word“com-pression”having“downward”compression in mind,though the discussed approach is likewise applicable to“upward”compres-sion.Downward compressing means attenuating sounds above a certain threshold while leaving sounds below the threshold unchanged.A sound engineer might use a compressor to reduce the dynamic range of source material for purposes of aesthetics, intelligibility,recording or broadcast limitations.Fig.1illustrates the basic compressor model from([11],ch.2)amended by a switchable RMS/peak detector in the side chain making it compatible with the compressor/limiter model from ([12],p.106).We will hereafter restrict our considerations to this basic model,as the purpose of the present work is to demon-strate a general approach rather than a solution to a specific problem.First,the input signal is split and a copy is sent to the side chain.The detector then calculates the magnitude or level of the sidechain signal using the root mean square(RMS)or peak as a measure for how loud a sound is([12],p.107). The detector’s temporal behavior is controlled by the attack and release parameters.The sound level is compared with the threshold level and,for the case it exceeds the threshold,a scale factor is calculated which corresponds to the ratio of input level to output level.The knee parameter determines how quick the compression ratio is reached.At the end of the side chain,the scale factor is fed to a smoothingfilter that yields the gain.The response of thefilter is controlled by another set of attack and re-lease parameters.Finally,the gain control applies the smoothed gain to the input signal and adds afixed amount of makeup gain to bring the output signal to a desired level.Such a broad-band compressor operates on the input signal’s full bandwidth, treating all frequencies from zero through the highest frequency equally.A detailed overview of all sidechain controls of a basic gain computer is given in([11],ch.3),e.g.,III.D ATA M ODEL,P ROBLEM F ORMULATION,ANDP ROPOSED S OLUTIONA.Data Model and Problem FormulationThe employed data model is based on the compressor from Fig.1.The following simplifications are additionally made:the knee parameter(“hard”knee)and the makeup gain(fixed at 0dB)are ignored.The compressor is defined as a single-input single-output(SISO)system,that is both the input and the output are single-channel signals.What follows is a description of each block by means of a dedicated function.The RMS/peak detector as well as the gain computer build upon afirst-order(one-pole)lowpassfilter.The sound level or envelope of the input signal is obtained by(1)where represents an RMS detector,and a peak detector.The non-zero smoothing factor,may take on different values,or,depending on whether the detector is in the attack or release phase.The condition for the level detector to enter the attack phase and to choose over is(2)A formula that converts a time constant into a smoothing factor is given in([12],p.109),so e.g.,where is the sampling frequency.The static nonlinearity in the gain computer is usually modeled in the logarithmic domain as a continuous piecewise linear function:(3) where is the slope,,and is the threshold in decibel.The slope is further derived from the de-sired compression ratio according to(4)Equation(3)is equivalently expressed in the linear domain as(5) where,and is the linear scale factor beforefiltering.The smoothed gain is then calculated as the exponentially-weighted moving average,(6) where the decision for the gain computer to choose the attack smoothing factor instead of is subject to(7) The output signal isfinally obtained by multiplying the above gain with the input signal:(8) Due to the fact that the gain is strictly positive,,it follows that(9) where sgn is the signum or sign function.In consequence,it is convenient to factorize the input signal as a product of the sign and the modulus according to(10)The problem at hand is formulated in the following manner: Given the compressed signal and the model parameters recover the modulus of the original signal from based on.For a more intuitive use,the smoothing factors and may be replaced by the time constants and.The meaning of each parameter is listed below.The threshold in dBThe compression ratio dB:dBThe detector type(RMS or peak)The attack time of the envelopefilter in msThe release time of the envelopefilter in msThe attack time of the gainfilter in msThe release time of the gainfilter in msB.Proposed SolutionThe output of the side chain,that is the gain of,given ,and,may be written as(11) In(11),denotes a nonlinear dynamic operator that maps the modulus of the input signal onto a sequence of instanta-neous gain values according to the compressor model rep-resented ing(11),(8)can be solved for yieldingsubject to invertibility of.In order to solve the above equa-tion one requires the knowledge of,which is unavailable. However,since is a function of,we can express as a function of one independent variable,and in that manner we obtain an equation with a single unknown:(12) where represents the entire compressor.If is invertible, i.e.,bijective for all can be obtained from by(13) And yet,since is unknown,the condition for applying decompression must be predicted from,and ,and therefore needs the condition for toggling between the attack and release phases.Depending on the quality of the prediction,the recovered modulus may differ somewhat at transition points from the original modulus,so that in the end(14)In the next section it is shown how such an inverse compressor or decompressor is derived.IV.I NVERSION OF D YNAMIC R ANGE C OMPRESSIONA.Characteristic FunctionFor simplicity,we choose the instantaneous envelope value instead of as the independent variable in(12).The relation between the two items is given by(1).From(6)and(8), when(15)(16) From(1),(17) or equivalently(note that by definition)(18) Moreover,(18)has a unique solution if and also are in-vertible.Moving the expression on the left-hand side over to the right-hand side,we may define(19) which shall be termed the characteristic function.The root or zero-crossing of hence represents the sought-after enve-lope value.Once is found(see Section V),the current values of,and are updated as per(20) and the decompressed sample is then calculated as(21)B.Attack-Release Phase Toggle1)Envelope Smoothing:In case a peak detector is in use, takes on two different values.The condition for the attack phase is then given by(2)and is equivalent to(22) Assuming that the past value of is known at time,what is needed to be done is to express the unknown in terms of such that the above equation still holds true.If is rather small,,or equivalently if is sufficiently large,ms at44.1-kHz sampling,the term in(15)is negligible,so it approximates(15)as(23) Solving(23)for and plugging the result into(22),we obtain(24) If(24)holds true,the detector is assumed to be in the attack phase.2)Gain Smoothing:Just like the peak detector,the gain smoothingfilter may be in either the attack or release phase. The necessary condition for the attack phase in(7)may also be formulated as(25) But since the current envelope value is unknown,we need to substitute in the above inequality by something that is known.With this in mind,(15)is rewritten as(26) Provided that,and due to the fact that ,the expression in square brackets in(26)is smaller than one,and thus during attack(27) Substituting by using(20), and solving(27)for results in(28) If in(25)is substituted by the expression on the right-hand side of(28),(25)still holds true,so the following sufficient condition is used to predict the attack phase of the gainfilter:(29) Note that the values of all variables are known whenever(29)is evaluated.C.Envelope PredictorAn instantaneous estimate of the envelope value is re-quired not only to predict when compression is active,formally according to(5),but also to initialize the iterative search algorithm in Section V.Resorting once more to(15)itcan be noted that in the opposite case where, and so(30) The sound level of the input signal at time is therefore(31) which must be greater than the threshold for compression to set in,whereas and are selected based on(24)and(29), respectively.D.Error AnalysisConsider being estimated from according to(32) The normalized error is then(33)(34) As during attack andduring release,respectively.The instantaneous gain can also be expressed as(35) where is the runtime in ing(35)in(34),the mag-nitude of the error is given by(36)(37) For,(36)becomes(38) whereas for,(37)converges to infinity:(39) So,the error is smaller for large or short.The smallest possible error is for,which then again depends on the current and the previous value of.The error accumulatesifFig.2.Graphical illustration for the iterative search for the zero-crossing.with.The difference between consecutive-values is signal dependent.The signal envelopefluctuates less and is thus smoother for smaller or longer.is also more stable when the compression ratio is low.Foris perfectly constant.The threshold has a negative impact on error propagation.The lower the more the error depends on ,since more samples are compressed with different-values. The RMS detector stabilizes the envelope more than the peak detector,which also reduces the error.Furthermore,since usu-ally,the error due to is smaller during release whereas the error due to is smaller during attack.Finally,the error is expected to be larger at transition points between quiet to loud signal passages.The above error may cause a decision in favor of a wrong smoothing factor in(24),like instead of e.g.,The decision error from(24)then propagates to(29).Given that ,the error due to(32)is accentuated by(24)with the consequence that(29)is less reliable than(24).The total error in(29)thus scales with.In regard to(31),re-liability of the envelope’s estimate is subject to validity of(24) and(29).A better estimate is obtained when the sound level de-tector and the gainfilter are both in either the attack or release phase.Here too,the estimation error increases withand also with.V.N UMERICAL S OLUTION OF THE C HARACTERISTIC F UNCTION An approximate solution to the characteristic function can be found,e.g.,by means of linearization.The estimate from(31) may moreover serve as a starting point for an iterative search of an optimum:The criterion for optimality is further chosen as the deviation of the characteristic function from zero,initialized to(40) Thereupon,(19)may be approximated at a given point using the equation of a straight line,,where is the slope and is the-intercept.The zero-crossing is characterized by the equation(41)as shown in Fig.2.The new estimate of the optimal is found as(42) If is less optimal than,the iteration is stopped and is thefinal estimate.The iteration is also stopped if is smaller than some.In the latter case,has the optimal value with respect to the chosen criterion.Otherwise,is set to and is set to after every step and the procedure is repeated until has converged to a more optimal value.The proposed method is a special form of the secant method with a single initial value.VI.G ENERAL R EMARKSA.Stereo LinkingWhen dealing with stereo signals,one might want to apply the same amount of gain reduction to both channels to prevent image shifting.This is achieved through stereo linking.One way is to calculate the required amount of gain reduction for each channel independently and then apply the larger amount to both channels.The question which arises in this context is which of the two channels was the gain derived from.To give an answer resolving the dilemma of ambiguity,one solution would be to signal which of the channels carries the applied gain.One could then decompress the marked sample and use its gain for the other channel.Although very simple to implement, this approach provokes an additional data rate of44.1kbps at44.1-kHz sampling.A rate-efficient alternative that comes witha higher computational cost is realized in the following way. First,one decompresses both the left and the right channel in-dependently and in so doing one obtains two estimates and,where subscript shall denote the left channel and subscript the right channel,respectively.In a second step,one calculates the compressed values of and and selects the channel for which holds true.In afinal step,one updates the remaining variables using the gain of the selected channel.B.LookaheadA compressor with a look-ahead function,i.e.,with a delay in the main signal path as in([12],p.106),uses past input samples as weighted output samples.Now that some future input sam-ples are required to invert the process—which are unavailable, the inversion is rendered impossible.and must thus be in sync for the approach to be applied.C.Clipping and LimitingAnother point worth mentioning is that“hard”clipping and “brick-wall”limiting are special cases of compression with the attack time set to zero and the compression ratio set to. The static nonlinearity in that particular case is a one-to-many mapping,which by definition is noninvertible.VII.T HE A LGORITHMThe complete algorithm is divided into three parts,each of them given as pseudocode below.Algorithm1out-lines the compressor that corresponds to the model from Sections II–III.Algorithm2illustrates the decompressor de-scribed in Section IV,and the iterative search from Section V isfinally summarized in Algorithm3.The parameter repre-sents the sampling frequency in kHz.function C OMPfor doif thenelseend ifif thenelseend ifif thenelseend ifend forreturnend functionVIII.P ERFORMANCE E VALUATIONA.Performance MetricsTo evaluate the inverse approach,the following quantities are measured:the root-mean-square error(RMSE),(43) given in decibel relative to full scale(dBFS),the perceptual sim-ilarity between the original and decompressed signal,and the execution time of the decompressor relative to real time(RT). Furthermore,we present the percentage of compressed samples, the mean number of iterations until convergence per compressed sample,the error rate of the attack-release toggle for the gainsmoothingfilter,andfinally the error rate of the envelope pre-dictor.The perceptual similarity is assessed by PEMO-Q[13], Algorithm2The decompressorfunction D ECOMPfor doif thenelseend ifif thenelseend ifif thenC HARFZEROelseend ifend forreturnend functionAlgorithm3The iterative search for the zero-crossingfunction C HARFZEROrepeatif thenreturnend ifuntilreturnend function [14]with as metric.The simulations are run in MATLAB on an Intel Core i5-520M CPU.putational ResultsFig.3shows the inverse output signal for a synthetic input signal using an RMS detector.The inverse signal is obtained from the compressed signal with an error of dBFS.It is visually indistinguishable from the original signal.Due to the fact that the signal envelope is con-stant most of the time,the error is noticeable only around tran-sition points—which are few.The decompressor’s performance is further evaluated for some commercial compressor presets. The used audio material consists of12items covering speech, sung voice,music,and jingles.All items are normalized to LKFS[15].The-value in the break condition of Algorithm3 is set to.A detailed overview of compressor settings and performancefigures is given in Tables I–II.The presented results suggest that the decompressed signal is perceptually in-distinguishable from the original—the-value isflawless. This was also confirmed by the authors through informal lis-tening tests.As can be seen from Table II,the largest inversion error is associated with setting E and the smallest with setting B.For allfive settings,the error is larger when an RMS detector is in use.This is partly due to the fact that has a stronger curvature in comparison to.By defining the distance in (40)as,it is possible to attain a smaller error for an RMS detector at the cost of a slightly longer runtime.In most cases,the envelope predictor works more reliably as compared to the toggle switch between attack and release.It can also be observed that the choice of time constants seems to have little impact on decompressor’s accuracy.The major parameters that affect the decompressor’s performance are and,while the threshold is evidently the predominant one:the RMSE strongly correlates with the threshold level.Figs.4–5show the inversion error as a function of various time constants.These are in the range of typical attack and re-lease times for a limiter(peak)or compressor(RMS)([12],pp. 109–110).It can be observed that the inversion accuracy de-pends on the release time of the peak detector and not so much on its attack time for both the envelope and the gainfilter,see Figs.4,5(b).For the envelopefilter,all error curves exhibit a local dip around a release time of0.5s.The error increases steeply below that bound but moderately with larger values.In the proximity of5s,the error converges to dBFS.With regard to the gainfilter,the error behaves in a reverse manner. The curves in Fig.5(b)exhibit a local peak around0.5s with a value of180dBFS.It can further be observed in Fig.4(a) that the curve for ms has a dip where is close to1ms,i.e.,where is minimal.This is also true for Fig.4(c)and(d):the lowest error is where the attack and release times are identical.As a general rule,the error that is due to the attack-release switch is smaller for the gainfilter in Fig.5. Looking at Fig.6one can see that the error decreases with threshold and increases with compression ratio.At a ratio of 10:1and beyond,the RMSE scales almost exclusively with the threshold.The lower the threshold,the stronger the error prop-agates between decompressed samples,which leads to a largerFig.3.An illustrative example using an RMS amplitude detector with set to 5ms,a threshold ofdBFS (dashed line in the upper right corner),acom-pression ratio of 4:1,and set to 1.6ms for attack and 17ms for release,respectively.TheRMSE is dBFS.TABLE IS ELECTED C OMPRESSOR S ETTINGSTABLE IIP ERFORMANCE F IGURES O BTAINED FOR V ARIOUS A UDIO M ATERIAL (12I TEMS )RMSE value.The RMS detector further augments the error be-cause it stabilizes the envelope more than the peak de-tector.Clearly,the threshold level has the highest impact on the decompressor’s accuracy.IX.C ONCLUSION AND O UTLOOKThis work examines the problem of finding an inverse to a nonlinear dynamic operator such as a digital compressor.The proposed approach is characterized by the fact that it uses an explicit signal model to solve the problem.To find the “dry”or uncompressed signal with high accuracy,it is suf ficient to know the model parameters.The parameters can e.g.,be sent together with the “wet”or compressed signal in the form of metadata as is the case with Dolby V olume and ReplayGain [16].A new bit-stream format is not mandatory,since many digital audio stan-dards,like WA V or MP3,provide means to tag the audio con-Fig.4.RMSE as a function of typical attack and release times using a peak (upper row)or an RMS amplitude detector (lower row).In the left column,the attack time of the envelope filter is varied while the release time is held constant.The right column shows the reverse case.The time constants of the gain filter are fixed at zero.In all four cases,threshold and ratio are fixed at 32dBFS and 4:1,respectively.Fig.5.RMSE as a function of typical attack and release times using a peak (upper row)or an RMS amplitude detector (lower row).In the left column,the attack time of the gain filter is varied while the release time is held constant.The right column shows the reverse case.The time constants of the envelope filter are fixed at zero.In all four cases,threshold and ratio are fixed at 32dBFS and 4:1,respectively.tent with “ancillary”data.With the help of the metadata,one can then reverse the compression applied after mixing or be-fore broadcast.This allows the end user to have control over the amount of compression,which may be preferred because the sound engineer has no control over the playback environ-ment or the listener’s individual taste.When the compressor parameters are unavailable,they can possibly be estimated from the compressed signal.This mayFig.6.RMSE as a function of threshold relative to the signal’s average loudness level(left column)and compression ratio(right column)using a peak(upper row)or an RMS amplitude detector(lower row).The time constants are:ms,ms,and s.thus be a direction for future work.Another direction would be to apply the approach to more sophisticated models that include a“soft”knee,parallel and multiband compression,or perform gain smoothing in the logarithmic domain,see[11],[12],[17], [18]and references therein.In conclusion,we want to draw the reader’s attention to the fact that the presentedfigures suggest that the decompressor is realtime capable which can pave the way for exciting new applications.One such application could be the restoration of dynamics in over-compressed audio or else the accentuation of transient components,see[19]–[21],by an adaptively tuned decompressor that has no prior knowledge of the compressor parameters.A CKNOWLEDGMENTThis work was carried out in part at the Centre for Digital Music(C4DM),Queen Mary,University of London.R EFERENCES[1]D.Barchiesi and J.Reiss,“Reverse engineering of a mix,”J.AudioEng.Soc.,vol.58,pp.563–576,2010.[2]T.Ogunfunmi,Adaptive Nonlinear System Identification:The Volterraand Wiener Model Approaches.New York,NY,USA:Springer Sci-ence+Business Media,2007,ch.3.[3]Y.Avargel and I.Cohen,“Adaptive nonlinear system identificationin the short-time Fourier transform domain,”IEEE Trans.SignalProcess.,vol.57,no.10,pp.3891–3904,Oct.2009.[4]Y.Avargel and I.Cohen,“Modeling and identification of nonlinear sys-tems in the short-time Fourier transform domain,”IEEE Trans.SignalProcess.,vol.58,no.1,pp.291–304,Jan.2010.[5]A.Gelb and W.E.Vander Velde,Multiple-Input Describing Functionsand Nonlinear System Design.New York,NY,USA:McGraw-Hill,1968,ch.1.[6]P.W.J.M.Nuij,O.H.Bosgra,and M.Steinbuch,“Higher-order sinu-soidal input describing functions for the analysis of non-linear systems with harmonic responses,”Mech.Syst.Signal Process.,vol.20,pp.1883–1904,2006.[7]chaise and L.Daudet,“Inverting dynamics compression withminimal side information,”in Proc.DAFx,2008,pp.1–6.[8]E.Vickers,“The loudness war:Background,speculation and recom-mendations,”in Proc.AES Conv.129,Nov.2010.[9]Dolby Digital and Dolby V olume Provide a Comprehensive LoudnessSolution,Dolby Laboratories,2007.[10]Broadcast Loudness Issues:The Comprehensive Dolby Approach,Dolby Laboratories,2011.[11]R.Jeffs,S.Holden,and D.Bohn,Dynamics processor—Technology&Application Tips,Rane Corporation,2005.[12]U.Zölzer,DAFX:Digital Audio Effects,2nd ed.Chichester,WestSussex,U.K.:Wiley,2011,ch.4,The Atrium,Southern Gate,PO19 8SQ.[13]R.Huber and B.Kollmeier,“PEMO-Q—A new method for objectiveaudio quality assessment using a model of auditory perception,”IEEE Trans.Audio Speech Lang.Process.,vol.14,no.6,pp.1902–1911, Nov.2006.[14]HörTech gGmbH,PEMO-Q[Online].Available:http://www.ho-ertech.de/web_en/produkte/pemo-q.shtml,version1.3[15]ITU-R,Algorithms to Measure Audio Programme Loudness and True-Peak Audio Level,Mar.2011,rec.ITU-R BS.1770-2.[16]Hydrogenaudio,ReplayGain[Online].Available:http://wiki.hydroge-/index.php?title=ReplayGain,Feb.2013[17]J.C.Schmidt and J.C.Rutledge,“Multichannel dynamic range com-pression for music signals,”in Proc.IEEE ICASSP,1996,vol.2,pp.1013–1016.[18]D.Giannoulis,M.Massberg,and J.D.Reiss,“Digital dynamic rangecompressor design—A tutorial and analysis,”J.Audio Eng.Soc.,vol.60,pp.399–408,2012.[19]M.M.Goodwin and C.Avendano,“Frequency-domain algorithms foraudio signal enhancement based on transient modification,”J.Audio Eng.Soc.,vol.54,pp.827–840,2006.[20]M.Walsh,E.Stein,and J.-M.Jot,“Adaptive dynamics enhancement,”in Proc.AES Conv.130,May2011.[21]M.Zaunschirm,J.D.Reiss,and A.Klapuri,“A sub-band approachto modification of musical transients,”Comput.Music J.,vol.36,pp.23–36,2012.。