Discriminative linear transforms for feature normalization and speaker adaptation
- 格式:pdf
- 大小:131.25 KB
- 文档页数:25
discriminator()函数discriminator()函数是一种用于人工智能和机器学习领域的函数,主要用于区分不同类型的数据和信息。
该函数通常用于将数据分为两个类别,分别为真和假,以判断所输入的信息是否是真实有效的。
在机器学习中,discriminator()函数被广泛应用于对数据集中的样本进行分类。
例如,当我们在训练生成对抗网络(GAN)时,该函数将用于对生成的图像和真实的图像进行区分。
在这种情况下,discriminator()函数是判别器,因为它必须区分出生成的图像是否与真实图像相似。
在深度学习领域中,discriminator()函数还可以用于对图像、文本、语音等不同类型的数据进行分类。
在这个过程中,数据集中的样本被随机分配到不同的类别中,该函数会自动学习如何将这些数据分类,以便在未来的预测和决策中更加精确和可信。
除了上述应用之外,discriminator()函数还可以用于识别垃圾邮件、恶意软件和诈骗行为等方面。
通过将数据集中的样本分为真实和虚假,该函数能够识别出那些不真实的信息,从而帮助用户防止不必要的骗局和欺诈行为。
在编写discriminator()函数时,需要注意以下几点:1. 建立适当的训练数据集,以确保该函数可以有效地分类各种不同类型的数据。
2. 使用适当的算法和技术来训练并优化该函数,以便使其能够准确地识别不同类别的数据。
3. 通过不断改进和调整函数参数,确保其在不同场景和应用中的性能和效果都能够得到充分的体现和发挥。
总之,discriminator()函数是一种非常有用的函数,可以用于对各种数据和信息进行分类和区分。
通过不断改进和优化该函数的性能和效果,我们可以更好地应对各种数据分类和识别的挑战,从而大大提高机器学习和人工智能技术的应用效果和价值。
离散傅里叶变换的算术傅里叶变换算法张宪超1,武继刚1,蒋增荣2,陈国良1(1.中国科技大学计算机科学与技术系,合肥230027;2.国防科技大学系统工程与数学系,长沙410073)摘要:离散傅里叶变换(DFT)在数字信号处理等许多领域中起着重要作用.本文采用一种新的傅里叶分析技术—算术傅里叶变换(AFT)来计算DFT.这种算法的乘法计算量仅为0(N);算法的计算过程简单,公式一致,克服了任意长度DFT传统快速算法(FFT)程序复杂、子进程多等缺点;算法易于并行,尤其适合VLSI设计;对于含较大素因子,特别是素数长度的DFT,其速度比传统的FFT方法快;算法为任意长度DFT的快速计算开辟了新的思路和途径.关键词:离散傅里叶变换(DFT);算术傅里叶变换(AFT);快速傅里叶变换(FFT)中图分类号:TN917文献标识码:A文章编号:0372-2112(2000)05-0105-03An Algorithm for Computing DFT Using Arithmetic Fourier TransformZHANG Xian-chao1,WU Ji-gang1,JIANG Zeng-rong2,CHEN Guo-iiang1(1.Dept.of CompUter Science&Technology,Unio.of Science&Technology of China,Hefei230027,China;2.Dept.of System Engineering&Mathematics,National Unio.of Defense Technology,Changsha410073,China)Abstract:The Discrete Fourier Transform(DFT)piays an important roie in digitai signai processing and many other fieids.In this paper,a new Fourier anaiysis technigue caiied the arithmetic Fourier transform(AFT)is used to compute DFT.This aigorithm needs oniy0(N)muitipiications.The process of the aigorithm is simpie and it has a unified formuia,which overcomes the disadvantage of the traditionai fast method that has a compieX program containing too many subroutines.The aigorithm can be easiiy performed in paraiiei,especiaiiy suitabie for VLSI designing.For a DFT at a iength that contains big prime factors,especiaiiy for a DFT at a prime iength,it is faster than the traditionai FFT method.The aigorithm opens up a new approach for the fast computation of DFT.Key words:discrete Fourier transform(DFT);arithmetic Fourier transform(AFT);fast Fourier transform(FFT)!引言离散傅里叶变换(DFT)在数字信号处理等许多领域中起着重要作用.但DFT的计算量很大(N点DFT需0(N2)乘法和加法).因此,DFT的快速计算问题非常重要.1965年,Cooiey 和Tukey开创了快速傅里叶变换(FFT)方法,使N点DFT的计算量从0(N2)降到0(N iog N),开辟了DFT的快速计算时代.但FFT的计算仍较复杂,且对不同长度的DFT其计算公式不一致,致使任意长DFT的FFT程序非常复杂,包含大量子进程.1988年,Tufts和Sadasiv[1]提出了一种用莫比乌斯反演公式(Mibius inversion formuia)计算连续函数的傅立叶系数的方法并命名为算术傅立叶变换(AFT).AFT有许多良好的性质:其乘法量仅为0(N);算法简单,并行性好,尤其适合VLSI设计.因此很快得到广泛关注,并在数字图像处理等领域得到应用.AFT已成为继FFT后一种新的重要的傅立叶分析技术[2~5].根据DFT和连续函数的傅立叶系数的关系,可以用AFT 计算DFT.这种方法保持了AFT的良好性质,且具有公式一致性.大量实验表明,同直接计算相比,AFT方法可以将DFT的计算时间减少90%,对含较大素因子,特别是其长度本身为素数的DFT,它的速度比传统的FFT快.从而它为DFT快速计算开辟了新的途径."算术傅立叶变换本文采用文[3]中的算法.设A(t)为周期为T的函数,它的傅立叶级数只含有限项,即:A(t)=a0+!Nn=1a n cos2!f0t+!Nn=1b n sin2!f0t(1)其中:f0=1/T,a0=1T"TA(t)dt.令:B(2n,!)=12n!2n-1m=0(-1)m A(m2nT+!T),-1<!<1(2)则傅立叶系数a n和b n可以由下列公式计算:a n=![N/n]l=1,3,5,…U(l)B(2nl,0)b n=![N/n]l=1,3,5,…U(l)(-1)(l-1)/2B(2n,14nl),n=1,…,N(3)第5期2000年5月电子学报ACTA ELECTR0NICA SINICAVoi.28No.5May2000其中:!(l )=I ,(-I )r ,0{,l =I l =p I p 2…p r 3p 使p 2\l为莫比乌斯(M bioLS )函数.这就是AFT ,其计算量为:加法:N 2+[N /2]+[N /3]+…+I -2N ;乘法:2N.AFT 需要函数大量的不均匀样本点,而在实际应用中,若计算函数前N 个傅立叶系数,根据奈奎斯特(NygLiSt )抽样定律,只需在函数的一个周期内均匀抽取2N 个样本点.这时可以用零次插值解决样本不一致问题.文献[2、3]已作了详细的分析,本文不再重复.3DFT 的AFT 算法3.1DFT 的定义及性质定义1设X I 为一长度为N 的序列,它的DFT 定义为:Y I =Z N-II =0X I w II ,I =0,I ,…,N -I ;w =e -i 2!/N(4)性质1用记号X I 、=、Y I 表示序列Y I 为序列X I 的DFT ,G I 、=、H I ,则:pX I +gG I 、=、pY I +gH I (5)因此,一个复序列的DFT 可以用两个实序列的DFT 计算.故本文只讨论实序列DFT 的计算问题.性质2设X I 为一实序列,X I 、=、Y I ,则:Re Y I =Re Y N -I ,Im Y I =-Im Y N -I (Re Y I 和Im Y I 分别代表Y I 的实部和虚部)(6)因此,对N 点实序列DFT ,只需计算:Re Y I 和Im Y I (I =0,…,「N /2).3.2DFT 的AFT 算法离散序列的DFT 和连续函数的傅立叶系数有着密切的联系.事实上,若序列X I 是一段区间[0,T ]上的函数A (t )经过离散化后得到的,再设A (t )的傅立叶级数只含前N /2项,即:A (t )=a 0+Z「N /2-II =Ia I coS2!f 0t +Z「N /2-II =I6I Sin2!f 0t(7)则DFT Y I 和傅立叶系数的关系为:Re Y I =「N /2a I /2Im Y I =「N /26I /{2,I =0,…,「N /2(8)式(7)中函数代表的是一种截频信号.对一般函数,式(8)中的“=”要改为“匀”[7].因此,序列X I 的DFT 可以通过函数A (t )的傅里叶系数计算.对于一般给定序列X I ,注意到在任意一个区间上,经过离散后能得到序列X I 的函数有无穷多个.对所有这些插值函数,公式(8)都近似地满足(仅式(7)中的函数精确地满足式(8))[7].AFT 的零次插值实现实质上就是用这些插值函数中的零次插值函数代替原来的函数进行计算的.而从AFT 的零次插值实现方法可知,用AFT 计算傅里叶系数,实际上参与计算的只是函数经离散化后得到的序列,而不必知道函数本身.因此,我们可以任取一个区间,在这个区间上,把序列X算(8)中的“傅里叶系数”,再通过式(8),就可以计算出序列的DFT .算法描述如下(采用[0,I ]区间):for I =I to 「N /2for m =0to 2I -IB (2I ,0):=B (2I ,0)+(-I )mX[Nm /2I +0.5]B (2I ,I /4I ):=B (2I ,I /4I )+(-I )mX[Nm /2I +N /4I +0.5]endforB (2I ,0):=B (2I ,0)/2I B (2I ,I /4I ):=B (2I ,I /4I )/2I endforfor =0to N -I a 0:=a 0+X ( )/N for I =I to 「N /2for I =I to[「N /2/I ]by 2a I :=a I +!(I )B (2II ,0)6I :=6I +!(I )(-I )(K -I )/2B (2II ,I /4II )endforRe Y I :=「N /2a I /2Re Y N -I :=Re Y I Im Y I :=「N /2a I /2Im Y N -I :=-Im Y I endfor endfor图IDFT 的AFT 算法程序AFT 方法的误差主要是由零次插值引起的,大量实验表明,同FFT 相比,其误差是可以接受的(部分实验结果见附录).4算法的性能4.1算法的程序DFT 的AFT 算法具有公式一致性,且公式简单,因此算法的程序也很简单(图I ).图2DFT 的AFT 算法进程示意为便于比较,不妨看一下FFT 的流程.图3FFT 算法进程示意可以看出,FFT 的程序中包含大量子进程,且这些子程序都较复杂.其中素数长度DFT 的FFT 算法程序尤其复杂.因此,任意长DFT 的FFT 算法其程序是非常复杂的.4.2算法的计算效率AFT 方法把DFT 的乘法计算量从0(N 2)降到0(N ),它2电子学报2000年计算时间减少90%.当DFT的长度!为2的幂时,FFT比AFT 方法快"对一般长度的DFT,当!含较大素因子时,AFT方法比FFT快;当!的因子都较小时,AFT方法不如FFT快.当DFT长度!本身为一较大素数时,AFT方法比FFT快"附录中给出部分实验结果以便比较"特别指出,对素数长度DFT,FFT的计算过程非常复杂,很难在实际中应用.而AFT方法算法简单,提供了较好的素数长度DFT快速算法"表1是两种算法计算效率较详细的比较"表1长度52191197114832417FFT效率67.30%68.03%72.50%71.23%76.22% AFT方法效率91.39#91.78#91.63#91.81#91.83# 4.3算法的并行性AFT具有良好的并行性,尤其适合VLSI设计,已有许多VLSI设计方案被提出,并在数字图像处理等领域得到应用.DFT的AFT算法继承了AFT优点,同样具有良好的并行性"5结论和展望本文采用算术傅里叶变换(AFT)计算DFT.这种方法把AFT的各种优点引入DFT的计算中来,开辟了DFT快速计算的新途径.把AFT方法同FFT结合起来,还可以进一步提高DFT的计算速度"参考文献[1] D.W.Tufts and G.Sadasiv.The arithmetic Fourier transform.IEEE ASSP Mag,Jan.1988:13~17[2]I.S.Reed,D.W.Tufts,Xiao Yu,T.K.Troung,M.T.Shih and X.Yin.Fourier anaiysis and signai processing by use of Mobius inversion for-muiar.IEEE Trans.Acoust.Speech Speech Processing,Mar,1990,38(3):458~470[3]I.S.Reed,Ming Tang Shih,T.K.Truong,R.Hendon and D.W.Tufts.A VLSI architecture for simpiified arithmetic fourier transform aigo-rithm.IEEE Trans.Signai Processing,May,1993,40(5):1122~1132[4]H.Park and V.K.Prasanna.Moduiar VLSI architectures for computing the arithmetic fourier transform.IEEE.Signai Processing,June,1993,41(6):2236~2246[5]Lovine.F.P,Tantaratanas.Some aiternate reaiizations of the arithmetic Fourier transform.Conference on Signai,system and Computers,1993,(Cat,93,CH3312-6):310~314[6]蒋增荣,曾泳泓,余品能.快速算法.长沙:国防科技大学出版社,1993[7]E.0.布赖姆.快速傅立叶变换.上海:上海科学技术出版社,1976附录:较详细的实验结果(机型:586微机,主频:166MHz单位:秒)2的幂长度长度AFT方法基-2FFT直接算法2560.005160.002400.115120.018600.004400.4410240.075800.01100 1.81素数长度长度AFT方法FFT直接算法5210.03790.14390.449710.13400.4400 1.6014830.3103 1.0904 3.7924170.8206 2.389910.75任意长度长度因子分解AFT方法FFT直接算法13462!6370.270.44 3.1429862!1483 1.26 2.1414.8235793!1193 1.81 1.9222.1646374637 3.0821.4237.4755742!3!929 4.45 2.4752.2964364!1609 5.94 3.5772.6278933!3!8778.96 1.92105.49最大相对误差长度AFT方法FFT1024实部 2.1939>10-2 2.3328>10-2虚部 2.1938>10-29.9342>10-2 2048实部 4.2212>10-3 1.1967>10-2虚部 6.1257>10-3 4.9385>10-2 4096实部 2.3697>10-3 6.0592>10-3虚部 2.0422>10-3 2.4615>10-3张宪超1971年生"1994年、1998年分别获国防科技大学学士、硕士学位"现在中国科技大学攻读博士学位"主要研究方向为信号处理的快速、并行计算等"武继刚1963年生"烟台大学副教授,现在中国科技大学攻读博士学位"主要研究方向为算法设计和分析等"3第5期张宪超:离散傅里叶变换的算术傅里叶变换算法。
f.linear用法-回复线性回归是一种常用的机器学习算法,用于预测连续型变量的值。
它基于对自变量和因变量之间的线性关系进行建模,并以此建立预测模型。
在实际应用中,我们经常使用f.linear函数来进行线性回归。
本文将介绍f.linear 函数的用法,并通过一步一步的解释,帮助读者理解如何使用f.linear进行线性回归分析。
1. 简介f.linear是PyTorch框架中的一个函数,用于执行线性回归任务。
它接受多个输入张量,并对它们进行线性操作。
在该函数中,每个输入张量都会被视为一个样本,而每个样本都包含一个或多个特征。
例如,如果我们有100个样本,每个样本有3个特征,我们可以将这100个样本表示为一个形状为(100, 3)的张量。
2. f.linear函数的参数f.linear函数有三个主要的参数: input、weight和bias。
其中,input是输入的张量,weight是线性层的权重张量,而bias是线性层的偏置张量。
下面我们会介绍每个参数的作用。
- input:输入张量,该张量的形状应为(batch_size, n_features)。
其中,batch_size表示样本数量,n_features表示每个样本的特征数量。
- weight:权重张量,该张量的形状应为(output_features,input_features),其中output_features表示线性层的输出特征数量,input_features表示线性层的输入特征数量。
- bias:偏置张量,该张量的形状应为(output_features,)。
当使用偏置时,输出特征将会加上这个偏置值。
3. 使用f.linear进行线性回归接下来我们将通过一个例子来解释如何使用f.linear进行线性回归。
我们假设有一个数据集,其中包含了房屋的面积和价格两个特征。
我们想要通过建立一个线性回归模型来预测房屋的价格。
首先,我们需要准备我们的数据集。
18IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 19, NO. 1, JANUARY 2008MPCA: Multilinear Principal Component Analysis of Tensor ObjectsHaiping Lu, Student Member, IEEE, Konstantinos N. (Kostas) Plataniotis, Senior Member, IEEE, and Anastasios N. Venetsanopoulos, Fellow, IEEEAbstract—This paper introduces a multilinear principal component analysis (MPCA) framework for tensor object feature extraction. Objects of interest in many computer vision and pattern recognition applications, such as 2-D/3-D images and video sequences are naturally described as tensors or multilinear arrays. The proposed framework performs feature extraction by determining a multilinear projection that captures most of the original tensorial input variation. The solution is iterative in nature and it proceeds by decomposing the original problem to a series of multiple projection subproblems. As part of this work, methods for subspace dimensionality determination are proposed and analyzed. It is shown that the MPCA framework discussed in this work supplants existing heterogeneous solutions such as the classical principal component analysis (PCA) and its 2-D variant (2-D PCA). Finally, a tensor object recognition system is proposed with the introduction of a discriminative tensor feature selection mechanism and a novel classification strategy, and applied to the problem of gait recognition. Results presented here indicate MPCA’s utility as a feature extraction tool. It is shown that even without a fully optimized design, an MPCA-based gait recognition module achieves highly competitive performance and compares favorably to the state-of-the-art gait recognizers. Index Terms—Dimensionality reduction, feature extraction, gait recognition, multilinear principal component analysis (MPCA), tensor objects.I. INTRODUCTION HE term tensor object is used here to denote a multidimensional object, the elements of which are to be addressed by more than two indices [1]. The number of indices used in the description defines the order of the tensor object and each index defines one of the so-called “modes.” Many image and video data are naturally tensor objects. For example, color images are 3-D (third-order tensor) objects with column, row, and color modes [2]. Gait silhouette sequences, the input to most if not all gait recognition algorithms [3]–[7], as well as other grayscale video sequences can be viewed as third-order tensorsManuscript received May 14, 2006; revised October 31, 2006 and January 2, 2007; accepted March 1, 2007. This work was supported in part by the Ontario Centres of Excellence through the Communications and Information Technology Ontario Partnership Program and the Bell University Labs, University of Toronto, Toronto, ON, Canada. This paper was presented in part at the Biometrics Consortium/IEEE 2006 Biometrics Symposium, Baltimore, MD, September 19–21, 2006. H. Lu and K. N. Plataniotis are with The Edward S. Rogers Sr. Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON M5S 3G4, Canada (e-mail: kostas@). A. N. Venetsanopoulos was with the The Edward S. Rogers Sr. Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON M5S 3G4, Canada. He is now with the Ryerson University, Toronto, ON M5B 2K3, Canada. Digital Object Identifier 10.1109/TNN.2007.901277Twith column, row, and time modes. Naturally, color video sequences are fourth-order tensors with the addition of a color mode. In the most active area of biometrics research, namely, that of face recognition, 3-D face detection and recognition using 3-D information with column, row, and depth modes, in other words, a third-order tensor, has emerged as an important research direction [8]–[10]. Moreover, the research problem of matching still probe images to surveillance video sequences can be viewed as a pattern recognition problem in a third-order tensorial setting [11]. Beyond biometrics signal analysis, many other computer vision and pattern recognition tasks can be also viewed as problems in a multilinear domain. Such tasks include 3-D object recognition tasks [12] in machine vision, medical image analysis, and content-based retrieval, space-time analysis of video sequences for gesture recognition [13] and activity recognition [14] in human-computer interaction (HCI), and space-time super resolution [15] for digital cameras with limited spatial and temporal resolution. The wide range of applications explains the authors’ belief that a comprehensive study of a specialized feature extraction problem, such as multilinear feature extraction, is worthwhile. A typical tensor object in pattern recognition or machine vision applications is commonly specified in a high-dimensional tensor space. Recognition methods operating directly on this space suffer from the so-called curse of dimensionality [16]: Handling high-dimensional samples is computationally expensive and many classifiers perform poorly in high-dimensional spaces given a small number of training samples. However, since the entries of a tensor object are often highly correlated with surrounding entries, it is reasonable to assume that the tensor objects encountered in most applications of interest are highly constrained and thus the tensors are confined to a subspace, a manifold of intrinsically low dimension [16], [17]. Feature extraction or dimensionality reduction is thus an attempt to transform a high-dimensional data set into a low-dimensional equivalent representation while retaining most of the information regarding the underlying structure or the actual physical phenomenon [18]. Principal component analysis (PCA) is a well-known unsupervised linear technique for dimensionality reduction. The central idea behind PCA is to reduce the dimensionality of a data set consisting of a larger number of interrelated variables, while retaining as much as possible the variation present in the original data set [19]. This is achieved by transforming to a new set of variables, the so-called principal components (PCs), which are uncorrelated, and ordered so that the first few retain most of the original data variation. Naive application of PCA to tensor objects requires their reshaping into vectors with1045-9227/$25.00 © 2007 IEEELU et al.: MPCA: MULTILINEAR PRINCIPAL COMPONENT ANALYSIS OF TENSOR OBJECTS19high dimensionality (vectorization), which obviously results in high processing cost in terms of increased computational and memory demands. For example, vectorizing a typical gait silhouette sequence of size (120 80 20) results in a vector with dimensionality (192 000 1), the singular value decomposition (SVD) or eigendecomposition processing of which may be beyond the computing processing capabilities of many computing devices. Beyond implementation issues, it is well understood that reshaping breaks the natural structure and correlation in the original data, removing redundancies and/or higher order dependencies present in the original data set and losing potentially more compact or useful representations that can be obtained in the original form [20]. Vectorization as PCA preprocessing ignores the fact that tensor objects are naturally multidimensional objects, e.g., gait sequences are 3-D objects, instead of 1-D objects. Therefore, a dimensionality reduction algorithm operating directly on a tensor object rather than its vectorized version is desirable. Recently, dimensionality reduction solutions representing images as matrices (second-order tensors) rather than vectors (first-order tensors) have been introduced. A 2-D PCA algorithm is proposed in [21], where the image covariance matrix is constructed using image matrices as inputs. However, a linear transformation is applied only to the right-hand side of the input image matrices. As a result, image data is projected in one mode only, resulting in poor dimensionality reduction. The less restrictive 2-D PCA algorithm introduced in [20] takes into account the spatial correlation of the image pixels within a localized neighborhood. Two linear transforms are applied to both the left- and the right-hand sides of the input image matrices. Thus, projections in both modes are calculated and better dimensionality reduction results are obtained according to [22]. Similarly to the solutions introduced in [21] and [22], the so-called tensor subspace analysis algorithm of [23] represents the input image as a matrix residing in a tensor space and attempts to detect local geometrical structure in that tensor space by learning a lower dimensional tensor subspace. For the theoretically inclined reader, it should be noted that there are some recent developments in the analysis of higher order tensors. The higher order singular value decomposition (HOSVD) solution, which extends SVD to higher order tensors, was formulated in [24] and its computation leads to the calculation of (the order) different matrix SVDs of unfolded matrices. An alternating least square (ALS) algorithm for the approximation of higher order tenbest ranksors was studied in [1], where tensor data was projected into a lower dimensional tensor space iteratively. The application apof HOSVD truncation and the best rankproximation to dimensionality reduction in independent component analysis (ICA) was discussed in [25]. These multilinear algorithms have been used routinely for multiple factor analysis [26], [27], where input data such as images are still represented as vectors but with these vectors arranged into a tensor for the subsequent analysis of the multiple factors involved in image/video formation. It should be added that in [25]–[27], the tensor data under consideration is projected in the original coordinate without data centering. However, for classification/recognition applications where eigenproblem solutions are attempted,the eigendecomposition in each mode can be influenced by the mean (average) of the data set. Recently, there have been several attempts to develop multilinear subspace algorithms for tensor object feature extraction and classification. In [28], a heuristic MPCA approach based on HOSVD was proposed. The MPCA formulation in [29] targets optimal reconstruction applications (where data is not centered) with a solution built in a manner similar to that of [1]. It should be noted that the solution in [29] was focused on reconstruction not recognition and that it did not cover a number of important algorithmic issues, namely, initialization, termination, convergence, and subspace dimensionality determination. When applied to the problem of tensor object recognition, the methodology described in [29] uses all the entries in the projected tensor for recognition although the discrimination power of these entries varies considerably. There is also a recent work on multilinear discriminant analysis (MLDA) [30], [31], named discriminant analysis with tensor representation (DATER), where an iterative algorithm similar to ALS of [1] is utilized in order to maximize a tensor-based discriminant criterion. Unfortunately, this MLDA variant does not converge and it appears to be extremely sensitive to parameter settings [32]. As the number of possible subspace dimensions for tensor objects is extremely high (e.g., there are 225 280 possible subspace dimensions for the gait recognition problem considered in this work), exhaustive testing for determination of parameters is not feasible. Consequently, the algorithmic solution of [30] and [31] cannot be used to effectively determine subspace dimensionality in a comprehensive and systematic manner. Motivated by the works briefly reviewed here, this paper introduces a new MPCA formulation for tensor object dimensionality reduction and feature extraction. The proposed solution follows the classical PCA paradigm. Operating directly on the original tensorial data, the proposed MPCA is a multilinear algorithm performing dimensionality reduction in all tensor modes seeking those bases in each mode that allow projected tensors to capture most of the variation present in the original tensors. The main contributions of this paper include the following. 1) The introduction of a new MPCA framework for tensor object dimensionality reduction and feature extraction using tensor representation. The framework is introduced from the perspective of capturing the original tensors’ variation. It provides a systematic procedure to determine effective representations of tensor objects. This contrasts to previous work such as those reported in [16], [26], and [27], where vector, not tensor, representation was used, and the works reported in [20], [21], and [23], where matrix representation was utilized. It also differs from the works reported in [1], [24], and [25], where tensor data were processed as part of a reconstruction/regression solution. Furthermore, unlike previous attempts, such as the one in [29], design issues of paramount importance in practical applications, such as the initialization, termination, convergence of the algorithm, and the determination of the subspace dimensionality, are discussed in detail. 2) The definition of eigentensors and -mode eigenvalues as counterparts of the eigenvectors and eigenvalues in clas-20IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 19, NO. 1, JANUARY 2008sical PCA. The geometrical interpretation of these concepts is provided, enabling a deeper understanding of the main principles and facilitating the application of multilinear feature extraction. 3) The presentation of a recognition system that selects discriminative tensor features from tensor objects and uses a novel weighting method for classification. This differs from traditional vector-based object recognition systems [16] that often encounter computational and memory difficulties when dealing with tensor object inputs. It also differs from [29], where all of the projected features were used for recognition. 4) The development of a solution to the gait recognizer by representing gait sequences as tensor samples and extracting discriminative features from them. This is a more natural approach that differs from [3]–[7], where either silhouettes or heuristic features derived from silhouettes were used as features. The rest of this paper is organized as follows. Section II introduces basic multilinear algebra notations, concepts, and the notion of multilinear projection for dimensionality reduction. In Section III, the problem of MPCA is formulated and an iterative solution is presented. Initialization procedures, termination criteria, convergence, and subspace dimensionality are discussed in detail. The connection to PCA and 2-D PCA is illustrated. The computational aspects of the proposed framework are also discussed in this section. The problem of tensor object recognition is discussed in Section IV. Section V lists experiments on both synthetic data sets and true application data. Synthetic data sets are used to verify the properties of the proposed methodology while gait data sets are used to demonstrate performance on a recognition problem of particular importance. Finally, Section VI summarizes the major findings of this work. II. MULTILINEAR PROJECTION OF TENSOR OBJECTS This section briefly reviews some basic multilinear concepts used in the MPCA framework development and introduces the multilinear projection of tensor objects for the purpose of dimensionality reduction. A. Notations and Basic Multilinear Algebra Table I lists the fundamental symbols defined in this paper. The notations followed are those decreed by convention in the multilinear algebra, pattern recognition, and adaptive learning literature. Thus, in this paper, vectors are denoted by lowercase boldface letters, e.g., , matrices by uppercase boldface, e.g., , and tensors by calligraphic letters, e.g., . Their elements are denoted with indices in brackets. Indices are denoted by lowercase letters and span the range from 1 to the uppercase letter of . To indicate part of a vector/mathe index, e.g., trix/tensor, “:” denotes the full range of the corresponding index denotes indices ranging from to . Throughout and this paper, the discussion is restricted to real-valued vectors, matrices, and tensors since the targeted applications, such as holistic gait recognition using binary silhouettes, involve real data only. The extension to the complex valued data sets is out of the scope of this work and it will be the focus of a forthcoming paper.TABLE I LIST OF SYMBOLSAn th-order tensor is denoted as . indices , and each It is addressed by addresses the -mode of . The -mode product of a tensor by a matrix , denoted by , is a tensor with entries . The scalar product of two tensors is defined as and the . The th Frobenius norm of is defined as “ -mode slice” of is an th-order tensor obtained by . fixing the -mode index of to be : The “ -mode vectors” of are defined as the -dimensional vectors obtained from by varying the index while keeping all the other indices fixed. A rank-1 tensor equals to the outer , which means product of vectors for that all values of indices. Unfolding along the -mode is denoted as . The column vectors are the -mode vectors of . Fig. 1 illustrates the of 1-mode (column mode) unfolding of a third-order tensor. Following standard multilinear algebra, any tensor can be expressed as the product (1) where is an orthogonal and matrix. SinceLU et al.: MPCA: MULTILINEAR PRINCIPAL COMPONENT ANALYSIS OF TENSOR OBJECTS21Fig. 1. Visual illustration of the 1-mode unfolding of a third-order tensor.Fig. 2. Visual illustration of multilinear projection: (a) projection in the 1-mode vector space and (b) 2-mode and 3-mode vectors.has orthonormal columns, [1]. A matrix representation of this decomposition can be obtained by unfolding and as(2) denotes the Kronecker product. The decomposition where can also be written as(3) i.e., any tensor can be written as a linear combination of rank-1 tensors. This decomposition is used in the following to formulate multilinear projection for dimensionality reduction. B. Tensor Subspace Projection for Dimensionality Reduction resides in the tensor (multilinear) An th-order tensor , where are the space vector (linear) spaces [23]. For typical image and video tensor objects such as 3-D face images and gait sequences, although the corresponding tensor space is of high dimensionality, tensor objects typically are embedded in a lower dimensional tensor subspace (or manifold), in analogy to the (vectorized) face image embedding problem where vector image inputs reside in a lowdimensional subspace of the original input space [33]. Thus, it is possible to find a tensor subspace that captures most of thevariation in the input tensor objects and it can be used to extract features for recognition and classification applications. To orthonormal basis vectors (prinachieve this objective, are sought for each ciple axes) of the -mode linear space is formed mode and a tensor subspace denote the matrix by these linear subspaces. Let containing the orthornormal -mode basis vectors. The prois jection of onto the tensor subspace defined as . is comThe projection of an -mode vector of by puted as the inner product between the -mode vector and the rows of . Fig. 2 provides a visual illustration of the multilinear projection. In Fig. 2(a), a third-order tensor is projected in the 1-mode vector space by a projection matrix , resulting in the projected tensor . In the 1-mode projection, each 1-mode vector of of to obtain a vector of length 5, length 10 is projected by as the differently shaded vectors indicate in Fig. 2(a). Similarly, Fig. 2(b) depicts the 2-mode and 3-mode vectors. III. MULTILINEAR PRINCIPAL COMPONENT ANALYSIS In this section, an MPCA solution to the problem of dimensionality reduction for tensor objects is introduced, researched, and analyzed. Before formally stating the objective, the following definition is needed. be a set of tensor Definition 1: Let . The total scatter of these tensamples in , where is sors is defined as . The the mean tensor calculated as -mode total scatter matrix of these samples is then defined as22IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 19, NO. 1, JANUARY 2008Fig. 3. Pseudocode implementation of the proposed MPCA algorithm., where is the -mode unfolded matrix of . The previous statement leads to the following formal definition of the problem to be solved. tensor objects is availA set of able for training. Each tensor object assumes values in a tensor space , is the -mode dimension of the tensor. The where MPCA objective is to define a multilinear transformation that maps the original into a tensor subspace tensor space (with , for ): , such that captures most of the variations observed in the original tensor objects, assuming that these variations are measured by the total tensor scatter. In other words, the MPCA objective is the determination of projection matrices the that maximize the total tensor scatter (4) Here, the dimensionality for each mode is assumed to be known or predetermined. Discussions on the adaptive determination of , when it is not known in advance, will be presented in Section III-F. A. MPCA Algorithm To the best of the authors’ knowledge, there is no known optimal solution which allows for the simultaneous optimiza-projection matrices. Since the projection to an tion of the th-order tensor subspace consists of projections to vector subspaces, optimization subproblems can be solved that maximizes the scatter in the -mode by finding the vector subspace. This is discussed in Theorem 1. be the soluTheorem 1: Let tion to (4). Then, given all the other projection matrices , the matrix coneigenvectors corresponding to the largest sists of the eigenvalues of the matrix(5) where(6) Proof: The proof of Theorem 1 is given in Appendix I-B. depends on , the optimization of depends on the projections in other modes and there is no closed-form solution to this maximization problem. Instead, from Theorem 1, an iterative procedure can be utilized to solve (4), along the lines of the pseudocode summarized in Fig. 3. The input tensors are centered first: . With initializations through full projection truncation (FPT), which is to be discussed in details in Section III-C, the projection matrices are computed Since the productLU et al.: MPCA: MULTILINEAR PRINCIPAL COMPONENT ANALYSIS OF TENSOR OBJECTS23Fig. 4. Visual illustration of (a) total scatter tensor, (b) 1-mode eigenvalues, (c) 2-mode eigenvalues, and (d) 3-mode eigenvalues.one by one with all the others fixed (local optimization). The local optimization procedure can be repeated, in a similar fashion as the ALS method [34], until the result converges or a maximum number of iterations is reached. Remark 1: The issue of centering has been ignored in the existing tensor processing literature. In the authors’ opinion, the main reason for the apparent lack of studies on the problem of tensor data centering is due to the fact that previously published works focused predominately on tensor approximation and reconstruction. It should be pointed out that for the approximation/reconstruction problem, centering is not essential, as the (sample) mean is the main focus of attention. However, in recognition applications where the solutions involve eigenproblems, noncentering (in other words, an average different from zero) can potentially affect the per-mode eigendecomposition and lead to a solution that captures the variation with respect to the origin rather than capturing the true variation of the data (with respect to the data center). Remark 2: The effects of the ordering of the projection matrices to be computed have been studied empirically in this work and simulation results presented in Section V indicate that altering the ordering of the projection matrix computation does not result in significant performance differences in practical situations. In the following sections, several issues pertinent to the development and implementation of the MPCA algorithm are discussed. First, in-depth understanding of the MPCA framework is provided. The properties of full projection are analyzed, and the geometric interpretation of the -mode eigenvalues is introduced together with the concept of eigentensor. In the sequence, the initialization method and the construction of termination criteria are described and convergence issues are also discussed. Finally, methods for subspace dimensionality determination are proposed and the connection to PCA and 2-D PCA is discussed, followed by computational issues. B. Full Projection With respect to this analysis, the term full projection refers for to the multilinear projection for MPCA with . In this case, is an identity matrix, as it can be seen from the pertinent lemma listed in Appendix I-C. reduces to As a result, , with determined by the input tensor samples only and independent of other projection matrices. The is then obtained as the matrix comprised optimal directly without iteration, and the of the eigenvectors of in the original data is fully captured. However, total scatterthere is no dimensionality reduction through this full projection. From the properties of eigendecomposition, it can be concluded that if all eigenvalues (per mode) are distinct, the full projection matrices (corresponding eigenvectors) are also distinct and that the full projection is unique (up to sign) [35]. To interpret the geometric meanings of the -mode eigenvalues, the total scatter tensor of the full projection is introduced as an extension of the total scatter mais defined as trix [36]. Each entry of the tensor (7) and . Using the previous definition, it can be for all ), shown that for the so-called full projection ( the th -mode eigenvalue is the sum of all the entries of the th -mode slice of where(8) In this paper, the eigenvalues are all arranged in a decreasing order. Fig. 4 shows visually what the -mode eigenvalues represent. In this graph, third-order tensors, e.g., short sequences (three frames) of images with size 5 4, are projected to a tensor space of size 5 4 3 (full projection) so that a total scatter is obtained. tensor Using (3), each tensor can be written as a linear comrank-1 tensors bination of . These rank-1 tensors will be called, herecan be viewed after, eigentensors. Thus, the projected tensor as the projection onto these eigentensors, with each entry of corresponding to one eigentensor. These definitions and illustrations for MPCA help with understanding the MPCA framework in the following discussions. C. Initialization by Full Projection Truncation FPT is used to initialize the iterative solution for MPCA, columns of the full projection matrix where the first is kept to give an initial projection matrix . The correand this initialization is sponding total scatter is denoted as equivalent to the HOSVD-based solution in [28]. Although this FPT initialization is not the optimal solution to (4), it is bounded24IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 19, NO. 1, JANUARY 2008and is considered a good starting point for the iterative procedure, as will be discussed in the following. Remark 3: There are other choices of initialization such as truncated identity matrices [20], [23], [31] (named as pseudoidentity matrices) and random matrices. Simulation studies (reported in Section V) indicate that although in practical applications, the initialization step may not have a significant impact in terms of performance, it can affect the speed of convergence of the iterative solution. Since FPT results in much faster convergence, it is the one utilized throughout this work for initialization purposes. In studying the optimality, with respect to (4), of the initialization procedure, let us assume, without loss of generality, that the 1-mode eigenvectors are truncated, in other words, only the 1-mode eigenvectors are kept. In this case, Thefirst orem 2 applies. and Theorem 2: Let be the matrix of the eigenvectors of and the , respectively, and eigenvalues of . Keep only the first eigenvectors with to get , where and . Let correspond to , and the matrix of its eigenvectors and its eigenvalues be and , respectively. ThenHaving proven the nonoptimality of FPT with respect to the objective function (4), we proceed to derive the bounds for FPT in Theorem 3. denote the th -mode eigenvalue for Theorem 3: Let the -mode full projection matrix. The upper and lower bounds , the loss of variation due to the FPT (measured for by the total scatter), are derived as follows:(9) Proof: The proof is given in Appendix I-D. From (9), it can be seen that the tightness of the bounds is determined by the eigenvalues in each mode. The bounds can be observed in Fig. 4. For instance, truncation of the last eigenvector in each of the three modes results in another truncated , and thus the difference betotal scatter tensor tween and (the sum of all entries in and , respectively) is upper bounded by the total of the sums of all the entries in each truncated slice and lower bounded by the maximum sum of all the entries in each truncated slice. For FPT, the gap between the actual loss of variation and the upper bound is due to the multiple counts of the overlaps between the discarded slice in one mode and the discarded slices in the other modes of . The tightness of the bounds and depends on the , the eigenvalue characteristics (distribution) such order as the number of zero-valued eigenvalues, and the degree of . For example, for , which is the case of truncation PCA, and the FPT is the optimal solution so no results in more terms in iterations are necessary. Larger the upper bound and tends to lead to looser bound, and vice versa. In addition, if all the truncated eigenvectors correspond since , to zero-valued eigenvalues, and the FPT results in the optimal solution. D. Termination The termination criterion is to be determined in this paper using the objective function . In particular, the iterative pro, where and cedure terminates if are the resulted total scatter from the th and th iterations, respectively, and is a user-defined small number threshold (e.g., ). In other words, the iterations stop if there is little improvement in the resulted total scatter (the objective function). In addition, the maximum number of iterations allowed is set to for computational consideration. E. Convergence of the MPCA Algorithm The derivation of Theorem 1 (Appendix I-B) implies that is a nondecreasing function per iteration, the total scatter (as it either remains the same or increases) since each update in a given mode maximizes of the projection matrix , while the projection matrices in all the other modes are considered fixed. On the other hand, is upper bounded by (the variation in the original samples)For (other modes), . Furthermore, for each mode, at least for one value of . Proof: The proof is given in Appendix I-D. It can be seen from Theorem 2 that if a nonzero eigenvalue is truncated in one mode, the eigenvalues in all the other modes tend to decrease in magnitude and the corresponding eigenvectors change accordingly. Thus, the eigendecomposition needs to be recomputed in all the other modes, i.e., the projection matrices in all the other modes need to be updated. Since from Theorem 1 the computations of all the projection matrices are upinterdependent, the update of a projection matrix as well. Consequently, the dates the matrices projection matrices in all the other modes are no longer consisting of the eigenvectors of the corresponding and they need to be updated. The update con(updated) tinues until the termination criterion, which is discussed in Section III-D, is satisfied. Fig. 4 provides a visual illustration of Theorem 2. Removal of . a basis vector in one mode results in eliminating a slice of In Fig. 4, if the last nonzero (fifth) 1-mode eigenvalue is discarded [shaded in Fig. 4(b)], the corresponding (fifth) 1-mode is removed [shaded in Fig. 4(a)], resulting in a trunslice of . Discarding this slice cated total scatter tensor will affect all eigenvalues in the remaining modes, whose corresponding slices have a nonempty overlap with the discarded 1-mode slice. In Fig. 4(c) and (d), the shaded part indicates the removed 1-mode slice corresponding to the discarded eigenvalue.。
正曲率齐性Finsler空间的分类:偶数维情形下的一种新方法
(英文)
徐熙昀;许明
【期刊名称】《首都师范大学学报(自然科学版)》
【年(卷),期】2024(45)1
【摘要】本文介绍了正曲率齐性Finsler流形的分类。
在偶数维的情形下,给出了一种新方法,证明了偶数维光滑陪集空间上有正曲率齐性Finsler度量,当且仅当其上面有正曲率齐性黎曼度量。
【总页数】7页(P124-130)
【作者】徐熙昀;许明
【作者单位】首都师范大学数学科学学院
【正文语种】中文
【中图分类】O186
【相关文献】
1.偶数维Damek-Ricci空间的曲率
2.曲率R部分为零的Finsler空间结构(英文)
3.常曲率空间中的正曲率子流形(英文)
4.三维时空中一对复主曲率类时共形齐性曲面的分类
5.三维时空中两个不同实主曲率类时共形齐性曲面的分类
因版权原因,仅展示原文概要,查看原文内容请购买。
linear discriminate analysisLinear discriminant analysis (LDA) is a statistical technique used in machine learning and pattern recognition. It is primarily used for dimensionality reduction and classification tasks. In this article, we will dive deep into the topic of LDA, step by step, to understand its working principles, assumptions, and how to apply it in practice.1. Introduction to Linear Discriminant Analysis (LDA)Linear Discriminant Analysis (LDA), also known as Fisher's Linear Discriminant, is a classical statistical technique that is widely used for pattern recognition and classification problems. It aims to find a linear combination of features that characterizes or separates two or more classes or groups of data points. The key idea is to maximize the between-class scatter while minimizing the within-class scatter.2. Assumptions of LDALDA makes several assumptions about the data:- The classes are linearly separable.- The features are normally distributed within each class.- The covariances of the features are equal for all classes.These assumptions are important because violating them may lead to misleading results or inaccurate classifications.3. Steps in LDALet's now discuss the step-by-step procedure for performing LDA on a given dataset:Step 1: Data preparation- Gather the dataset, ensuring that it contains labeled instances representing different classes.- Split the dataset into training and testing subsets.Step 2: Compute class means- Calculate the mean vector for each class, which represents the average value of each feature for that class.Step 3: Compute the scatter matrices- Compute the within-class scatter matrix (Sw), which measures the variation within each class.- Compute the between-class scatter matrix (Sb), which measures the variation between classes.Step 4: Solve the generalized eigenvalue problem- Compute the eigenvectors (e1, e2, ..., ed) and their corresponding eigenvalues (λ1, λ2, ..., λd) of (Sw^(-1) * Sb), where d is the number of features.Step 5: Select discriminant features- Sort the eigenvalues in descending order and choose the k largest eigenvalues.- Corresponding eigenvectors to these k largest eigenvalues will be the discriminant axes or features.Step 6: Transform the data- Project the original dataset onto the new space formed by the selected discriminant features.Step 7: Classification and evaluation- Train a classifier (e.g., logistic regression, support vector machines) on the transformed dataset.- Evaluate the performance of the classifier using appropriate metrics (e.g., accuracy, precision, recall).4. Advantages and Limitations of LDAAdvantages:- LDA reduces the dimensionality of the dataset while preserving the class discriminatory information.- It can handle multicollinearity, which is the correlation between features.- LDA assumes linear relationships between the features and the classes, making it computationally efficient.Limitations:- LDA assumes that the data is normally distributed, which may not hold true for all real-world datasets.- It may not work well with imbalanced class distributions.- LDA is a linear method, which means it may not capture complex nonlinear relationships in the data.5. Applications of LDALDA has found applications in various domains, including:- Face recognition: LDA can be used to extract discriminative features from facial images.- Document classification: LDA has been used for topic modeling to identify the underlying themes in documents.- Bioinformatics: LDA has been applied to analyze gene expressiondata and identify genes related to different classes.6. ConclusionLinear Discriminant Analysis (LDA) is a powerful statistical technique used for dimensionality reduction and classification tasks. By maximizing the between-class scatter and minimizing the within-class scatter, LDA finds the most discriminative features that separate different classes. Although it has certain assumptions and limitations, LDA has proven to be effective in a wide range of applications, making it a valuable tool in the field of machine learning and pattern recognition.。
【机器学习】半监督学习⼏种⽅法1.Self-training algorithm(⾃训练算法)这个是最早提出的⼀种研究半监督学习的算法,也是⼀种最简单的半监督学习算法.2.Multi-view algorithm(多视⾓算法)⼀般多⽤于可以进⾏⾃然特征分裂的数据集中.考虑特殊情况(每个数据点表征两个特征):每⼀个数据点看成是两个特征的集合,然后利⽤协同训练(Co-training algorithm)进⾏处理.协同训练(co-training)算法,此类算法隐含地利⽤了聚类假设或流形假设,它们使⽤两个或多个学习器,在学习过程中,这些学习器挑选若⼲个置信度⾼的未标记⽰例进⾏相互标记,从⽽使得模型得以更新。
Balcan and Blum (2006) show that co-training can be quite effective, that in the extreme case only one labeled point is needed to learn the classifier. Zhou et al. (2007) give a co-training algorithm using Canonical Correlation Analysis which also need only one labeled point. Dasgupta et al. (Dasgupta et al., 2001) provide a PAC-style th-eoretical analysis.3.Generative Models(⽣成模型)以⽣成式模型为分类器,将未标记⽰例属于每个类别的概率视为⼀组缺失参数,然后采⽤EM算法来进⾏标记估计和模型参数估计,此类算法可以看成是在少量有标记⽰例周围进⾏聚类,是早期直接采⽤聚类假设的做法。
EM算法的贪⼼本质使其容易陷⼊局部极值,因此算法对初始值的选择具有很强的依赖性.常⽤的解决⽅法是采⽤多组初值进⾏重复运算,并从中选择最好的⼀组解,或者通过复杂的优化算法(如分裂合并EM算法)获取参数的优化解.这些做法尽管降低了对初始值选择的敏感性,但却引⼊了过多的运算负担。
线性代数英语词汇大集合========================================================================= Aadjont(adjugate) of matrix A A 的伴随矩阵augmented matrix A 的增广矩阵Bblock diagonal matrix 块对角矩阵block matrix 块矩阵basic solution set 基础解系CCauchy-Schwarz inequality 柯西- 许瓦兹不等式characteristic equation 特征方程characteristic polynomial 特征多项式coffcient matrix 系数矩阵cofactor 代数余子式cofactor expansion 代数余子式展开column vector 列向量commuting matrices 交换矩阵consistent linear system 相容线性方程组Cramer's rule 克莱姆法则Cross- product term 交叉项DDeterminant 行列式Diagonal entries 对角元素Diagonal matrix 对角矩阵Dimension of a vector space V 向量空间V 的维数Eechelon matrix 梯形矩阵eigenspace 特征空间eigenvalue 特征值eigenvector 特征向量eigenvector basis 特征向量的基elementary matrix 初等矩阵elementary row operations 行初等变换Ffull rank 满秩fundermental set of solution 基础解系Ggrneral solution 通解Gram-Schmidt process 施密特正交化过程Hhomogeneous linear equations 齐次线性方程组Iidentity matrix 单位矩阵inconsistent linear system 不相容线性方程组indefinite matrix 不定矩阵indefinit quatratic form 不定二次型infinite-dimensional space 无限维空间inner product 内积inverse of matrix A 逆矩阵JKLlinear combination 线性组合linearly dependent 线性相关linearly independent 线性无关linear transformation 线性变换lower triangular matrix 下三角形矩阵Mmain diagonal of matrix A 矩阵的主对角matrix 矩阵Nnegative definite quaratic form 负定二次型negative semidefinite quadratic form 半负定二次型nonhomogeneous equations 非齐次线性方程组nonsigular matrix 非奇异矩阵nontrivial solution 非平凡解norm of vector V 向量V 的范数normalizing vector V 规范化向量Oorthogonal basis 正交基orthogonal complemen t 正交补orthogonal decomposition 正交分解orthogonally diagonalizable matrix 矩阵的正交对角化orthogonal matrix 正交矩阵orthogonal set 正交向量组orthonormal basis 规范正交基orthonomal set 规范正交向量组Ppartitioned matrix 分块矩阵positive definite matrix 正定矩阵positive definite quatratic form 正定二次型positive semidefinite matrix 半正定矩阵positive semidefinite quadratic form 半正定二次型Qquatratic form 二次型Rrank of matrix A 矩阵A 的秩r(A )reduced echelon matrix 最简梯形阵row vector 行向量Sset spanned by { } 由向量{ } 所生成similar matrices 相似矩阵similarity transformation 相似变换singular matrix 奇异矩阵solution set 解集合standard basis 标准基standard matrix 标准矩阵Isubmatrix 子矩阵subspace 子空间symmetric matrix 对称矩阵Ttrace of matrix A 矩阵A 的迹tr ( A )transpose of A 矩阵A 的转秩triangle inequlity 三角不等式trivial solution 平凡解Uunit vector 单位向量upper triangular matrix 上三角形矩阵Vvandermonde matrix 范得蒙矩阵vector 向量vector space 向量空间WZzero subspace 零子空间zero vector 零空间==============================================================================向量:vector 向量的长度(模):零向量: zero vector负向量: 向量的加法:addition 三角形法则:平行四边形法则:多边形法则减法向量的标量乘积:scalar multiplication 向量的线性运算线性组合:linear combination 线性表示,线性相关(linearly dependent),线性无关(linearly independent),原点(origin)位置向量(position vector)线性流形(linear manifold)线性子空间(linear subspace)基(basis)仿射坐标(affine coordinates),仿射标架(affine frame),仿射坐标系(affine coordinate system)坐标轴(coordinate axis)坐标平面卦限(octant)右手系左手系定比分点线性方程组(system of linear equations齐次线性方程组(system of homogeneous linear equations)行列式(determinant)维向量向量的分量(component)向量的相等和向量零向量负向量标量乘积维向量空间(vector space)自然基行向量(row vector)列向量(column vector)单位向量(unit vector)直角坐标系(rectangular coordinate system),直角坐标(rectangular coordinates),射影(projection)向量在某方向上的分量,正交分解,向量的夹角,内积(inner product)标量积(scalar product),数量积,方向的方向角,方向的方向余弦;二重外积外积(exterior product),向量积(cross product),混合积(mixed product,scalar triple product)==================================================================================(映射(mapping)),(象(image)),(一个原象(preimage)),(定义域(domain)),(值域(range)),(变换(transformation)),(单射(injection)),(象集),(满射(surjection)),(一一映射,双射(bijection)),(原象),(映射的复合,映射的乘积),(恒同映射,恒同变换(identity mapping)),(逆映射(inverse mapping));(置换(permutation)),(阶对称群(symmetric group)),(对换(transposition)),(逆序对),(逆序数),(置换的符号(sign)),(偶置换(even permutation)),(奇置换(odd permutation));行列式(determinant),矩阵(matrix),矩阵的元(entry),(方阵(square matrix)),(零矩阵(zero matrix)),(对角元),(上三角形矩阵(upper triangular matrix)),(下三角形矩阵(lower triangular matrix)),(对角矩阵(diagonal matrix)),(单位矩阵(identity matrix)),转置矩阵(transpose matrix),初等行变换(elementary row transformation),初等列变换(elementary column transformation);(反称矩阵(skew-symmetric matrix));子矩阵(submatrix),子式(minor),余子式(cofactor),代数余子式(algebraic cofactor),(范德蒙德行列式(Vandermonde determinant));(未知量),(系数矩阵),(方程的系数(coefficient)),(常数项(constant)),(线性方程组的解(solution)),(增广矩阵(augmented matrix)),(零解);子式的余子式,子式的代数余子式===================================================================================线性方程组与线性子空间(阶梯形方程组),(方程组的初等变换),行阶梯矩阵(row echelon matrix),主元,简化行阶梯矩阵(reduced row echelon matrix),(高斯消元法(Gauss elimination)),(解向量),(同解),(自反性(reflexivity)),(对称性(symmetry)),(传递性(transitivity)),(等价关系(equivalence));(齐次线性方程组的秩(rank));(主变量),(自由位置量),(一般解),向量组线性相关,向量组线性无关,线性组合,线性表示,线性组合的系数,(向量组的延伸组);线性子空间,由向量组张成的线性子空间;基,坐标,(自然基),向量组的秩;(解空间),线性子空间的维数(dimension),齐次线性方程组的基础解系(fundamental system of solutions);(平面束(pencil of planes))(导出组),线性流形,(方向子空间),(线性流形的维数),(方程组的特解);(方程组的零点),(方程组的图象),(平面的一般方程),(平面的三点式方程),(平面的截距式方程),(平面的参数方程),(参数),(方向向量);(直线的方向向量),(直线的参数方程),(直线的标准方程),(直线的方向系数),(直线的两点式方程),(直线的一般方程);=====================================================================================矩阵的秩与矩阵的运算线性表示,线性等价,极大线性无关组;(行空间,列空间),行秩(row rank),列秩(column rank),秩,满秩矩阵,行满秩矩阵,列满秩矩阵;线性映射(linear mapping),线性变换(linear transformation),线性函数(linear function);(零映射),(负映射),(矩阵的和),(负矩阵),(线性映射的标量乘积),(矩阵的标量乘积),(矩阵的乘积),(零因子),(标量矩阵(scalar matrix)),(矩阵的多项式);(退化的(degenerate)方阵),(非退化的(non-degenerate)方阵),(退化的线性变换),(非退化的线性变换),(逆矩阵(inverse matrix)),(可逆的(invertible),(伴随矩阵(adjoint matrix));(分块矩阵(block matrix)),(分块对角矩阵(block diagonal matrix));初等矩阵(elementary matrix),等价(equivalent);(象空间),(核空间(kernel)),(线性映射的秩),(零化度(nullity))==================================================================================== transpose of matrix 倒置矩阵; 转置矩阵【数学词汇】transposed matrix 转置矩阵【机械专业词汇】matrix transpose 矩阵转置【主科技词汇】transposed inverse matrix 转置逆矩阵【数学词汇】transpose of a matrix 矩阵的转置【主科技词汇】permutation matrix 置换矩阵; 排列矩阵【主科技词汇】singular matrix 奇异矩阵; 退化矩阵; 降秩矩阵【主科技词汇】unitary matrix 单式矩阵; 酉矩阵; 幺正矩阵【主科技词汇】Hermitian matrix 厄密矩阵; 埃尔米特矩阵; 艾米矩阵【主科技词汇】inverse matrix 逆矩阵; 反矩阵; 反行列式; 矩阵反演; 矩阵求逆【主科技词汇】matrix notation 矩阵符号; 矩阵符号表示; 矩阵记号; 矩阵运算【主科技词汇】state transition matrix 状态转变矩阵; 状态转移矩阵【航海航天词汇】torque master 转矩传感器; 转矩检测装置【主科技词汇】spin matrix 自旋矩阵; 旋转矩阵【主科技词汇】moment matrix 动差矩阵; 矩量矩阵【航海航天词汇】Jacobian matrix 雅可比矩阵; 导数矩阵【主科技词汇】relay matrix 继电器矩阵; 插接矩阵【主科技词汇】matrix notation 矩阵表示法; 矩阵符号【航海航天词汇】permutation matrix 置换矩阵【航海航天词汇】transition matrix 转移矩阵【数学词汇】transition matrix 转移矩阵【机械专业词汇】transitionmatrix 转移矩阵【航海航天词汇】transition matrix 转移矩阵【计算机网络词汇】transfer matrix 转移矩阵【物理词汇】rotation matrix 旋转矩阵【石油词汇】transition matrix 转换矩阵【主科技词汇】circulant matrix 循环矩阵; 轮换矩阵【主科技词汇】payoff matrix 报偿矩阵; 支付矩阵【主科技词汇】switching matrix 开关矩阵; 切换矩阵【主科技词汇】method of transition matrices 转换矩阵法【航海航天词汇】stalling torque 堵转力矩, 颠覆力矩, 停转转矩, 逆转转矩【航海航天词汇】thin-film switching matrix 薄膜转换矩阵【航海航天词汇】rotated factor matrix 旋转因子矩阵【航海航天词汇】transfer function matrix 转移函数矩阵【航海航天词汇】transition probability matrix 转移概率矩阵【主科技词汇】energy transfer matrix 能量转移矩阵【主科技词汇】fuzzy transition matrix 模糊转移矩阵【主科技词汇】canonical transition matrix 规范转移矩阵【主科技词汇】matrix form 矩阵式; 矩阵组织【主科技词汇】stochastic state transition matrix 随机状态转移矩阵【主科技词汇】fuzzy state transition matrix 模糊状态转移矩阵【主科技词汇】matrix compiler 矩阵编码器; 矩阵编译程序【主科技词汇】test matrix 试验矩阵; 测试矩阵; 检验矩阵【主科技词汇】matrix circuit 矩阵变换电路; 矩阵线路【主科技词汇】reducible matrix 可简化的矩阵; 可约矩阵【主科技词汇】matrix norm 矩阵的模; 矩阵模; 矩阵模量【主科技词汇】rectangular matrix 矩形矩阵; 长方形矩阵【主科技词汇】running torque 额定转速时的转矩; 旋转力矩【航海航天词汇】transposed matrix 转置阵【数学词汇】covariance matrix 协变矩阵; 协方差矩阵【主科技词汇】unreduced matrix 未约矩阵; 不可约矩阵【主科技词汇】receiver matrix 接收机矩阵; 接收矩阵变换电路【主科技词汇】torque 传动转矩; 转矩; 阻力矩【航海航天词汇】pull-in torque 启动转矩; 输入转矩, 同步转矩, 整步转矩【航海航天词汇】parity matrix 奇偶校验矩阵; 一致校验矩阵【主科技词汇】bus admittance matrix 母线导纳矩阵; 节点导纳矩阵【主科技词汇】matrix printer 矩阵式打印机; 矩阵形印刷机; 点阵打印机【主科技词汇】dynamic matrix 动力矩阵; 动态矩阵【航海航天词汇】connection matrix 连接矩阵; 连通矩阵【主科技词汇】characteristic matrix 特征矩阵; 本征矩阵【主科技词汇】regular matrix 正则矩阵; 规则矩阵【主科技词汇】flexibility matrix 挠度矩阵; 柔度矩阵【主科技词汇】citation matrix 引文矩阵; 引用矩阵【主科技词汇】relational matrix 关系矩阵; 联系矩阵【主科技词汇】eigenmatrix 本征矩阵; 特征矩阵【主科技词汇】system matrix 系统矩阵; 体系矩阵【主科技词汇】system matrix 系数矩阵; 系统矩阵【航海航天词汇】recovery diode matrix 恢复二极管矩阵; 再生式二极管矩阵【主科技词汇】inverse of a square matrix 方阵的逆矩阵【主科技词汇】torquematic transmission 转矩传动装置【石油词汇】torque balancing device 转矩平衡装置【航海航天词汇】torque measuring device 转矩测量装置【主科技词汇】torque measuring apparatus 转矩测量装置【航海航天词汇】torque-tube type suspension 转矩管式悬置【主科技词汇】steering torque indicator 转向力矩测定仪; 转向转矩指示器【主科技词汇】magnetic dipole moment matrix 磁偶极矩矩阵【主科技词汇】matrix addressing 矩阵寻址; 矩阵寻址时频矩阵编址; 时频矩阵编址【航海航天词汇】stiffness matrix 劲度矩阵; 刚度矩阵; 劲度矩阵【航海航天词汇】first-moment matrix 一阶矩矩阵【主科技词汇】matrix circuit 矩阵变换电路; 矩阵电路【计算机网络词汇】reluctance torque 反应转矩; 磁阻转矩【主科技词汇】pull-in torque 启动转矩; 牵入转矩【主科技词汇】induction torque 感应转矩; 异步转矩【主科技词汇】nominal torque 额定转矩; 公称转矩【航海航天词汇】phototronics 矩阵光电电子学; 矩阵光电管【主科技词汇】column matrix 列矩阵; 直列矩阵【主科技词汇】inverse of a matrix 矩阵的逆; 逆矩阵【主科技词汇】lattice matrix 点阵矩阵【数学词汇】lattice matrix 点阵矩阵【物理词汇】canonical matrix 典型矩阵; 正则矩阵; 典型阵; 正则阵【航海航天词汇】moment matrix 矩量矩阵【主科技词汇】moment matrix 矩量矩阵【数学词汇】dynamic torque 动转矩; 加速转矩【主科技词汇】indecomposable matrix 不可分解矩阵; 不能分解矩阵【主科技词汇】printed matrix wiring 印刷矩阵布线; 印制矩阵布线【主科技词汇】decoder matrix circuit 解码矩阵电路; 译码矩阵电路【航海航天词汇】scalar matrix 标量矩阵; 标量阵; 纯量矩阵【主科技词汇】array 矩阵式组织; 数组; 阵列【计算机网络词汇】commutative matrix 可换矩阵; 可交换矩阵【主科技词汇】标准文档实用文案。
discriminate function analysis 概述及解释说明1. 引言1.1 概述本篇文章旨在介绍和解释歧视函数分析(Discriminate Function Analysis,简称DFA)。
DFA是一种统计分析方法,旨在通过构建判别函数来区分不同组或类别的个体。
它可以被广泛应用于各种学科领域,包括社会科学、医学、工程等。
通过识别那些最能够区分样本群体的变量,DFA能够提供有关分类和预测的有价值信息。
1.2 文章结构本文将按照以下顺序对DFA进行详细说明:首先,我们将简要介绍DFA的基本概念和背景;随后,我们将探讨它在不同领域中的实际应用;然后,我们将深入讲解其基本原理和步骤;接着,我们将讨论DFA的优点与局限性,并指出需要考虑的因素;最后,在一个具体的数据集上展示并解释了如何使用DFA进行数据分析,并进行结果讨论。
1.3 目的通过本文的阐述与解释,读者将能够全面了解歧视函数分析方法。
无论是初学者还是已经熟悉此方法的研究人员都可从本文中获益。
本文将帮助读者理解DFA的概念及其在实际问题中的应用,并鼓励进一步的研究和尝试,以提高对该方法的理解和运用能力。
2. 正文:2.1 Discriminate Function Analysis 简介判别函数分析(Discriminate Function Analysis)是一种统计方法,用于研究两个或多个已知类别之间的差异。
它是一种监督学习方法,旨在通过寻找一个或多个判别函数,将不同类别的观测样本分类到正确的类别中。
2.2 Discriminate Function Analysis 的应用领域判别函数分析广泛应用于许多科学领域和实际问题中。
以下是一些常见的应用领域:- 医学研究:判别函数分析可用于诊断疾病、预测患者的治疗反应以及区分不同药物对机体的影响。
- 社会科学:它可以帮助社会科学家了解不同人群之间的特征差异,并进行人口统计数据分析。
Deep Neural Networks for Object DetectionChristian Szegedy Alexander Toshev Dumitru ErhanGoogle,Inc.{szegedy,toshev,dumitru }@AbstractDeep Neural Networks (DNNs)have recently shown outstanding performance on image classification tasks [14].In this paper we go one step further and address the problem of object detection using DNNs,that is not only classifying but also precisely localizing objects of various classes.We present a simple and yet pow-erful formulation of object detection as a regression problem to object bounding box masks.We define a multi-scale inference procedure which is able to pro-duce high-resolution object detections at a low cost by a few network applications.State-of-the-art performance of the approach is shown on Pascal VOC.1IntroductionAs we move towards more complete image understanding,having more precise and detailed object recognition becomes crucial.In this context,one cares not only about classifying images,but also about precisely estimating estimating the class and location of objects contained within the images,a problem known as object detection.The main advances in object detection were achieved thanks to improvements in object representa-tions and machine learning models.A prominent example of a state-of-the-art detection system is the Deformable Part-based Model (DPM)[9].It builds on carefully designed representations and kinematically inspired part decompositions of objects,expressed as a graphical ing dis-criminative learning of graphical models allows for building high-precision part-based models for variety of object classes.Manually engineered representations in conjunction with shallow discriminatively trained models have been among the best performing paradigms for the related problem of object classification as well [17].In the last years,however,Deep Neural Networks (DNNs)[12]have emerged as a powerful machine learning model.DNNs exhibit major differences from traditional approaches for classification.First,they are deep architectures which have the capacity to learn more complex models than shallow ones [2].This expressivity and robust training algorithms allow for learning powerful object representations with-out the need to hand design features.This has been empirically demonstrated on the challenging ImageNet classification task [5]across thousands of classes [14,15].In this paper,we exploit the power of DNNs for the problem of object detection,where we not only classify but also try to precisely localize objects .The problem we are address here is challenging,since we want to detect a potentially large number object instances with varying sizes in the same image using a limited amount of computing resources.We present a formulation which is capable of predicting the bounding boxes of multiple objects in a given image.More precisely,we formulate a DNN-based regression which outputs a binary mask of the object bounding box (and portions of the box as well),as shown in Fig.1.Additionally,we employ a simple bounding box inference to extract detections from the masks.To increase localization precision,we apply the DNN mask generation in a multi-scale fashion on the full image as well as on a small number of large image crops,followed by a refinement step (see Fig.2).似乎都用到了回归来确定边界框;如果没有边界框是不是就变成了分割了呢?如果更进一步要把物体分割出来该怎么搞呢?按这种说法,目标检测似乎比目标识别更进一步了?不仅要确定目标类别,还要确定目标位置。
Deep CORAL:Correlation Alignment for Deep Domain Adaptation Baochen Sun and Kate Saenko University of Massachusetts Lowell,Boston University Abstract.Deep neural networks are able to learn powerful represen-tations from large quantities of labeled input data,however they can-not always generalize well across changes in input distributions.Domain adaptation algorithms have been proposed to compensate for the degra-dation in performance due to domain shift.In this paper,we address the case when the target domain is unlabeled,requiring unsupervised adaptation.CORAL[1]is a “frustratingly easy”unsupervised domain adaptation method that aligns the second-order statistics of the source and target distributions with a linear transformation.Here,we extend CORAL to learn a nonlinear transformation that aligns correlations of layer activations in deep neural networks (Deep CORAL).Experiments on standard benchmark datasets show state-of-the-art performance.1Introduction Many machine learning algorithms assume that the training and test data are independent and identically distributed (i.i.d.).However,this assumption rarely holds in practice as the data is likely to change over time and space.Even though state-of-the-art Deep Convolutional Neural Network features are invari-ant to low level cues to some degree [2,3,4],Donahue et al.[5]showed that they still are susceptible to domain shift.Instead of collecting labelled data and train-ing a new classifier for every possible scenario,unsupervised domain adaptation methods [6,7,8,9,10,1]try to compensate for the degradation in performance by transferring knowledge from labelled source domains to unlabelled target do-mains.A recently proposed CORAL method [1]aligns the second-order statistics of the source and target distributions with a linear transformation.Even though it is “frustratingly easy”,it works well for unsupervised domain adaptation.However,it relies on a linear transformation and is not end-to-end:it needs to first extract features,apply the transformation,and then train an SVM classifier in a separate step.In this work,we extend CORAL to incorporate it directly into deep networks by constructing a differentiable loss function that minimizes the difference be-tween source and target correlations–the CORAL pared to CORAL,our proposed Deep CORAL approach learns a non-linear transformation that***********.edu *************a r X i v :1607.01719v 1 [c s .C V ] 6 J u l 20162Baochen Sun and Kate Saenkois more powerful and also works seamlessly with deep CNNs.We evaluate our method on standard benchmark datasets and show state-of-the-art performance. 2Related WorkPrevious techniques for unsupervised adaptation consisted of re-weighting the training point losses to more closely reflect those in the test distribution[11,12]or finding a transformation in a lower-dimensional manifold that brings the source and target subspaces closer together.Re-weighting based approaches often as-sume a restricted form of domain shift–selection bias–and are thus not appli-cable to more general scenarios.Geodesic methods[13,7]bridge the source and target domain by projecting source and target onto points along a geodesic path[13],orfinding a closed-form linear map that transforms source points to target[7].[14,8]align the subspaces by computing the linear map that min-imizes the Frobenius norm of the difference between the top n eigenvectors. In contrast,CORAL[1]minimizes domain shift by aligning the second-order statistics of source and target distributions.Adaptive deep neural networks have recently been explored for unsupervised adaptation.DLID[15]trains a joint source and target CNN architecture with two adaptation layers.DDC[16]applies a single linear kernel to one layer to minimize Maximum Mean Discrepancy(MMD)while DAN[17]minimizes MMD with multiple kernels applied to multiple layers.ReverseGrad[18]adds a binary classifier to explicitly confuse the two domains.Our proposed Deep CORAL approach is similar to DDC,DAN,and Re-verseGrad in the sense that a new loss(CORAL loss)is added to minimize the difference in learned feature covariances across domains,which is similar to minimizing MMD with a polynomial kernel.However,it is more powerful than DDC(which aligns sample means only),much simpler to optimize than DAN and ReverseGrad,and can be integrated into different layers or architectures seamlessly.3Deep CORALWe address the unsupervised domain adaptation scenario where there are no labelled training data in the target domain,and propose to leverage both the deep features pre-trained on a large generic domain(such as Imagenet[19]) and the labelled source data.In the meantime,we also want thefinal learned features to work well on the target domain.Thefirst goal can be achieved by initializing the network parameters from the generic pre-trained network and fine-tuning it on the labelled source data.For the second goal,we propose to minimize the difference in second-order statistics between the source and target feature activations,i.e.the CORAL loss.Figure1shows a sample Deep CORAL architecture using our proposed correlation alignment layer for deep domain adaptation.We refer to Deep CORAL as any deep network incorporating the CORAL loss for domain adaptation.Source DataTarget DataSample Deep CORALgeneralization and simplicity,AlexNet[20].Integrating it to otherforward.3.1CORAL LossWefirst describe the CORAL loss between two domains for a single feature layer. Suppose we are given source-domain training examples D S={x i},x∈R d with labels L S={y i},i∈{1,...,L},and unlabeled target data D T={u i},u∈R d. Suppose the number of source and target data are n S and n T respectively.Here both x and u are the d-dimensional deep layer activationsφ(I)of input I thatwe are trying to learn.Suppose D ijS (D ijT)indicates the j-th dimension of thei-th source(target)data example and C S(C T)denote the feature covariance matrices.We define the CORAL loss as the distance between the second-order statistics (covariances)of the source and target features:CORAL=14d2C S−C T 2F(1)where · 2F denotes the squared matrix Frobenius norm.The covariance matri-ces of the source and target data are given by:C S=1n S−1(D S D S−1n S(1 D S) (1 D S))(2)C T=1n T−1(D T D T−1n T(1 D T) (1 D T))(3)where1is a column vector with all elements equal to1.The gradient with respect to the input features can be calculated using the chain rule:∂ CORAL ∂D ijS =1d2(n S−1)((D S−1n S(1 D S) 1 ) (C S−C T))ij(4)4Baochen Sun and Kate Saenko∂ CORAL ∂D ijT =−1d2(n T−1)((D T−1n T(1 D T) 1 ) (C S−C T))ij(5)We use batch covariances and the network parameters are shared between two networks.3.2End-to-end Domain Adaptation with CORAL LossWe describe our method by taking a multi-class classification problem as the running example.As mentioned before,thefinal deep features need to be both discriminative enough to train strong classifier and invariant to the difference between source and target domains.Minimizing the classification loss itself is likely to lead to overfitting to the source domain,causing reduced performance on the target domain.On the other hand,minimizing the CORAL loss alone might lead to degenerated features.For example,the network could project all of the source and target data to a single point,making the CORAL loss trivially zero.However,no strong classifier can be constructed on these features. Joint training with both the classification loss and CORAL loss is likely to learn features that work well on the target domain:= CLASS.+ti=1λi CORAL(6)where t denotes the number of CORAL loss layers in a deep network andλis a weight that trades offthe adaptation with classification accuracy on the source domain.As we show below,these two losses play counterparts and reach an equilibrium at the end of training,where thefinal features are expected to work well on the target domain.4ExperimentsWe evaluate our method on a standard domain adaptation benchmark–the Office dataset[6].The Office dataset contains31object categories from an office environment in3image domains:Amazon,DSLR,and W ebcam.We follow the standard protocol of[7,17,5,16,18]and use all the labelled source data and all the target data without labels.Since there are3domains, we conduct experiments on all6shifts,taking one domain as the source and another as the target.In this experiment,we apply the CORAL loss to the last classification layer as it is the most general case–most deep classifier architectures(e.g.,convolutional, recurrent)contain a fully connected layer for classification.Applying the CORAL loss to other layers or other network architectures should be straightforward.The dimension of last fully connected layer(fc8)was set to the number of categories(31)and initialized with N(0,0.005).The learning rate of fc8is set to 10times the other layers as it was training from scratch.We initialized the other layers with the parameters pre-trained on ImageNet[19]and kept the originalDeep CORAL:Correlation Alignment for Deep Domain Adaptation5 layer-wise parameter settings.In the training phase,we set the batch size to128, base learning rate to10−3,weight decay to5×10−4,and momentum to0.9.The weight of the CORAL loss(λ)is set in such way that at the end of training the classification loss and CORAL loss are roughly the same.It seems be a reasonable choice as we want to have a feature representation that is both discriminative and also minimizes the distance between the source and target domains.We used Caffe[21]and BVLC Reference CaffeNet for all of our experiments.We compare to7recently published methods:CNN[20](no adaptation), GFK[7],SA[8],TCA[22],CORAL[1],DDC[16],DAN[17].GFK,SA,and TCA are manifold based methods that project the source and target distributions into a lower-dimensional manifold and are not end-to-end deep methods.DDC adds a domain confusion loss to AlexNet andfine-tunes it on both the source and target domain.DAN is similar to DDC but utilizes a multi-kernel selection method for better mean embedding matching and adapts in multiple layers.For direct comparison,DAN in this paper uses the hidden layer fc8.For GFK,SA, TCA,and CORAL,we use the fc7featurefine-tuned on the source domain(F T7 in[1])as it achieves better performance than generic pre-trained features,and train a linear SVM[8,1].To have a fair comparison,we use accuracies reported by other authors with exactly the same setting or conduct experiments using the source code provided by the authors.From Table1we can see that Deep CORAL(D-CORAL)achieves better average performance than CORAL and the other6baseline methods.In three3 out of6shifts,it achieves the highest accuracy.For the other3shifts,the margin between D-CORAL and the best baseline method is very small( 0.7).A→D A→W D→A D→W W→A W→D AVG GFK52.4±0.054.7±0.043.2±0.092.1±0.041.8±0.096.2±0.063.4 SA50.6±0.047.4±0.039.5±0.089.1±0.037.6±0.093.8±0.059.7 TCA46.8±0.045.5±0.036.4±0.081.1±0.039.5±0.092.2±0.056.9 CORAL65.7±0.064.3±0.048.5±0.096.1±0.048.2±0.099.8±0.070.4 CNN63.8±0.561.6±0.551.1±0.695.4±0.349.8±0.499.0±0.270.1 DDC64.4±0.361.8±0.452.1±0.895.0±0.552.2±0.498.5±0.470.6 DAN65.8±0.463.8±0.452.8±0.494.6±0.551.9±0.598.8±0.671.3 D-CORAL66.8±0.666.4±0.452.8±0.295.7±0.351.5±0.399.2±0.172.1 Table1.Object recognition accuracies for all6domain shifts on the standard Office dataset with deep features,following the standard unsupervised adaptation protocol.To get a better understanding of Deep CORAL,we generate three plots for domain shift A→W.In Figure2(a)we show the training(source)and testing (target)accuracies for training with vs.without CORAL loss.We can clearly see that adding the CORAL loss helps achieve much better performance on the target domain while maintaining strong classification accuracy on the source domain.6Baochen Sun and Kate SaenkoFig.2.Detailed analysis of shift A→W for training w/vs.w/o CORAL loss.(a): training and test accuracies for training w/vs.w/o CORAL loss.We can see that adding CORAL loss helps achieve much better performance on the target domain while maintaining strong classification accuracy on the source domain.(b):classification loss and CORAL loss for training w/CORAL loss.As the last fully connected layer is randomly initialized with N(0,0.005),CORAL loss is very small while classification loss is very large at the beginning.After training for a few hundred iterations,these two losses are about the same.(c):CORAL distance for training w/o CORAL loss(setting the weight to0).The distance is getting much larger( 100times larger compared to training w/CORAL loss).In Figure2(b)we visualize both the classification loss and the CORAL loss for training w/CORAL loss.As the last fully connected layer is randomly initialized with N(0,0.005),in the beginning the CORAL loss is very small while the classification loss is very large.After training for a few hundred iterations,these two losses are about the same.In Figure2(c)we show the CORAL distance between the domains for training w/o CORAL loss(setting the weight to0).We can see that the distance is getting much larger( 100times larger compared to training w/CORAL loss).Comparing Figure2(b)and Figure2(c),we can see that even though the CORAL loss is not always decreasing during training,if we set its weight to0,the distance between source and target domains becomes much larger.This is reasonable asfine-tuning without domain adaptation is likely to overfit the features to the source domain.Our CORAL loss constrains the distance between source and target domain during thefine-tuning process and helps to maintain an equilibrium where thefinal features work well on the target domain.5ConclusionIn this work,we extended CORAL,a simple yet effective unsupervised do-main adaptation method,to perform end-to-end adaptation in deep neural net-works.Experiments on standard benchmark datasets show state-of-the-art per-formance.Deep CORAL works seamlessly with deep networks and can be easily integrated into different layers or network architectures.Deep CORAL:Correlation Alignment for Deep Domain Adaptation7 References1.Sun,B.,Feng,J.,Saenko,K.:Return of frustratingly easy domain adaptation.In:AAAI.(2016)2.Peng,X.,Sun,B.,Ali,K.,Saenko,K.:What do deep cnns learn about objects?In:ICLR Workshop Track.(2015)3.Peng,X.,Sun,B.,Ali,K.,Saenko,K.:Learning deep object detectors from3dmodels.In:ICCV.(2015)4.Sun,B.,Peng,X.,Saenko,K.:Generating large scale image datasets from3d cadmodels.In:CVPR’15Workshop on The Future of Datasets in Vision.(2015)5.Donahue,J.,Jia,Y.,Vinyals,O.,Hoffman,J.,Zhang,N.,Tzeng,E.,Darrell,T.:Decaf:A deep convolutional activation feature for generic visual recognition.In: ICML.(2014)6.Saenko,K.,Kulis,B.,Fritz,M.,Darrell,T.:Adapting visual category models tonew domains.In:ECCV.(2010)7.Gong,B.,Shi,Y.,Sha,F.,Grauman,K.:Geodesicflow kernel for unsuperviseddomain adaptation.In:CVPR.(2012)8.Fernando,B.,Habrard,A.,Sebban,M.,Tuytelaars,T.:Unsupervised visual do-main adaptation using subspace alignment.In:ICCV.(2013)9.Sun,B.,Saenko,K.:From virtual to reality:Fast adaptation of virtual objectdetectors to real domains.In:BMVC.(2014)10.Sun,B.,Saenko,K.:Subspace distribution alignment for unsupervised domainadaptation.In:BMVC.(2015)11.Jiang,J.,Zhai,C.:Instance Weighting for Domain Adaptation in NLP.In:ACL.(2007)12.Huang,J.,Smola,A.J.,Gretton,A.,Borgwardt,K.M.,Sch¨o lkopf,B.:Correctingsample selection bias by unlabeled data.In:NIPS.(2006)13.Gopalan,R.,Li,R.,Chellappa,R.:Domain adaptation for object recognition:Anunsupervised approach.In:ICCV.(2011)14.Harel,M.,Mannor,S.:Learning from multiple outlooks.In:ICML.(2011)15.Chopra,S.,Balakrishnan,S.,Gopalan,R.:Dlid:Deep learning for domain adap-tation by interpolating between domains.In:ICML Workshop.(2013)16.Tzeng,E.,Hoffman,J.,Zhang,N.,Saenko,K.,Darrell,T.:Deep domain confusion:Maximizing for domain invariance.CoRR abs/1412.3474(2014)17.Long,M.,Cao,Y.,Wang,J.,Jordan,M.I.:Learning transferable features withdeep adaptation networks.In:ICML.(2015)18.Ganin,Y.,Lempitsky,V.:Unsupervised domain adaptation by backpropagation.In:ICML.(2015)19.Deng,J.,Dong,W.,Socher,R.,Li,L.J.,Li,K.,Fei-Fei,L.:Imagenet:A large-scalehierarchical image database.In:CVPR.(2009)20.Krizhevsky,A.,Sutskever,I.,Hinton,G.E.:Imagenet classification with deepconvolutional neural networks.In:NIPS.(2012)21.Jia,Y.,Shelhamer,E.,Donahue,J.,Karayev,S.,Long,J.,Girshick,R.,Guadar-rama,S.,Darrell,T.:Caffe:Convolutional architecture for fast feature embedding.arXiv preprint arXiv:1408.5093(2014)22.Pan,S.J.,Tsang,I.W.,Kwok,J.T.,Yang,Q.:Domain adaptation via transfercomponent analysis.In:IJCAI.(2009)。
On the Recent Use of Local Binary Patterns forFace AuthenticationS´e bastien Marcel,Yann Rodriguez and Guillaume HeuschAbstractThis paper presents a survey on the recent use of Local Binary Patterns(LBPs)for face recognition.LBP is becoming a popular technique for face representation.It is a non-parametric kernel which summarizes the local spacial structure of an image and it is invariant to monotonic gray-scale transformations.This is a very interesting property in face recognition.In this paper,we describe the LBP technique and different approaches proposed in the literature to represent and to recognize faces.The most representatives are considered for experimental comparison on a common face authentication task.For that purpose,the XM2VTS and BANCA databases are used according to their respective experimental protocols.Index TermsFace Recognition,Face Authentication,Local Binary Patterns.I.I NTRODUCTIONL Ocal Binary Pattern(LBP)is becoming a popular technique for face representation as well as for image representation in general.Recently,LBP has been applied to the specific problem of face recognition[29],[1],[10],[31],[32],[22],[9].The LBP is a non-parametric kernel which summarizes the local spacial structure of an image.Moreover,it is invariant to monotonic gray-scale transformations, hence the LBP representation may be less sensitive to changes in illumination.This is a very interesting property in face recognition.Indeed,one of the major problem in face recognition systems is to deal with variations in illumination.In a realistic scenario,it is very likely that the lighting conditions of the probe image does not correspond to those of the gallery image,hence there is a need to handle such variations. This probably explains the recent success of Local Binary Patterns in the face recognition community. In this paper,we will thus address only the problem of lighting variations both in artificial and realistic conditions.The reader should note that one of the database we are using,includes slight facial expression and pose variations.Moreover,in an authentication scenario,the claimant is supposed to cooperate with the system and thus we do not consider database with large facial expression and head pose changes. We propose in this paper an overview of different LBP techniques proposed for face recognition in general and we experimentally compare the most representative ones on the face authentication task.Face authentication(or verification)involves confirming or denying the identity claimed by a person(one-to-one matching).In contrast,face identification(or recognition)attempts to establish the identity of a given person out of a closed pool of N people(one-to-N matching).Both mode are generally grouped under the generic face recognition term.Authentication and identification share the same preprocessing and feature extraction steps and a large part of the classifier design.However,both modes target distinct applications.In authentication mode,people are supposed to cooperate with the system(the claimant wants to be accepted).The main applications are access control systems,such as computer or mobile devices log-in,building gate control,digital multimedia access.On the other hand,in identification mode, people are generally not concerned by the system and often even do not want to be identified.Potential applications includes video surveillance(public places,restricted areas)and information retrieval(police databases,video or photo album annotation/identification).S.Marcel,Y.Rodriguez and G.Heusch are at the IDIAP Research Institute,Avenue du Simplon4,1920Martigny,Switzerland.E-mail:marcel@idiap.chManuscript received May1,2006;revised October20,2006;accepted May25,2007.The problem of face authentication has been addressed by different researchers using various approaches.Thus,the performance of face authentication systems has steadily improved over the last few years.For a comparison of different approaches see [30].These approaches can be divided mainly into discriminant approaches and generative approaches.A discriminant approach takes a binary decision (whether or not the input face is a client)and therefore has to be trained explicitly on both client data and impostor data.A generative approach computes the likelihood of an observation or a set of observations given a client model and compares it to the corresponding likelihood given a generic model (referred as world model).Examples of discriminant classifiers are Multi-Layer Perceptrons (MLPs)and Support Vector Machines (SVMs)[13].Examples of generative methods are Gaussian Mixture Models (GMMs)[5],Hidden Markov Models (HMMs)[19],[4]or simply a metric [15],[14].Both discriminant and generative approaches can use local features (local observations of particular facial features)or holistic features (the whole face is considered as an input).Examples of holistic features are gray-scale face images or their projections onto a Principal Component subspace (referred to as PCA or Eigenfaces [26])or a Linear Discriminant subspace (referred to as LDA or Fisherfaces [3],[6]).Examples of local features are blocks extracted from an image or transforms of these blocks such as Discrete Cosine Transform (DCT)or others.Finally,the decision to accept or reject a claim depends on a score (distance measure,MLP output or Likelihood ratio)which could be either above (accept)or under (reject)a given threshold.This paper is organized as follows.First,we introduce Local Binary Patterns and several variants or extensions.Then,we describe the main different approaches for the recognition of faces with LBP.Finally,we present experimental results,we draw conclusions and we discuss future research directions.II.L OCAL B INARY P ATTERNSIn this section,we introduce the original LBP operator as well as several extensions (multi-scale LBP,uniform LBP,improved LBP)and variants (Extended LBP and Census Transforms).A.The Original LBPThe Local Binary Pattern (LBP)operator is a non-parametric 3x3kernel which summarizes the local spacial structure of an image.It was first introduced by Ojala et al.[20]who showed the high discriminative power of this operator for texture classification.At a given pixel position (x c ,y c ),LBP is defined as an ordered set of binary comparisons of pixel intensities between the center pixel and its eight surrounding pixels (Figure1).binary: 00111001decimal: 57comparisonwith the center binary intensity Fig.1.Calculating the original LBP codeThe decimal form of the resulting 8-bit word (LBP code)can be expressed as follows:LBP (x c ,y c )=7 n =0s (i n −i c )2n (1)where i c corresponds to the grey value of the center pixel (x c ,y c ),i n to the grey values of the 8surrounding pixels,and function s (x )is defined as:s (x )= 1if x ≥00if x <0.(2)Note that each bit of the LBP code has the same significance level and that two successive bit values may have a totally different meaning.Actually,The LBP code may be interpreted as a kernel structure index.By definition,the LBP operator is unaffected by any monotonic gray-scale transformation which preserves the pixel intensity order in a local neighborhood(Figure2).LBP operatorFig.2.Original image(left)processed by the LBP operator(right).Due to its texture discriminative property and its very low computational cost,LBP is becoming very popular in pattern recognition.Recently,LBP has been applied for instance to face detection[12],face localization[11],face recognition[29],[1],[10],[31],[32],image retrieval[25],motion detection[8]or visual inspection[27]1.B.The Multi-Scale LBPLater,Ojala et al.[21]extended their original LBP operator to a circular neighborhood of different radius size.Their LBP P,R notation refers to P equally spaced pixels on a circle of radius R.For instance, the LBP8,2operator is illustrated in Figure3.Fig.3.Examples of extended LBP operatorsC.The Uniform LBPIn[21],they also noticed that most of the texture information was contained in a small subset of LBP patterns.These patterns,called uniform patterns,contain at most two bitwise0to1or1to0 transitions(circular binary code).11111111,00000110or10000111are for instance uniform patterns.They mainly represent primitive micro-features such as lines,edges,corners.LBP u2P,R denotes the extendedLBP operator(u2for only uniform patterns,labelling all remaining patterns with a single label).D.The Improved LBPRecently,new extensions of LBP have appeared.For instance,Jin et al.[12]remarked that LBP features miss the local structure under some certain circumstance,and thus they introduced the Improved Local Binary Pattern(ILBP).The main difference between ILBP and LBP lies in the comparison of all the pixels(including the center pixel)with the mean of all the pixels in the kernel(Figure4).The decimal form of the resulting9-bit word(ILBP code)can be expressed as follows:ILBP(x c,y c)=8n=0s(i n−i m)2n(3)where i m corresponds to the mean grey value of all the pixels,and function s(x)is defined as in Equation2. 1a more exhaustive list of applications can be found on Oulu University web site at:http://www.ee.oulu.fi/research/imag/texture/lbp/lbp.phpbinary: 001001001decimal: 73with the mean(100.1) binaryintensityFig.4.Calculating the ILBP codeE.The Extended LBPHuang et al.[11]pointed out that LBP can only reflect the first derivation information of images,but could not represent the velocity of local variation.To solve this problem,they proposed an Extended version of Local Binary Patterns (ELBP)that encodes the gradient magnitude image in addition to theoriginal image.For that purpose,they simply applied kernels LBP u 28,1,LBP u 28,2and LBP u 28,3both to theoriginal image and the gradient image.As a consequence,this method can’t be considered as an extension of the LBP operator.F .Census TransformsWe finally point out that,approximately in the same time the original LBP operator was introduced by Ojala [20],Zabih and Woodfill [28]proposed a very similar local structure feature.This feature,called Census Transform ,also maps the local neighborhood surrounding a pixel to a bit string.With respect to LBP,the Census Transform only differs by the order of the bit ter,the Census Transform has been extended to become the Modified Census Transform (MCT)[7].Again,one can point out the same similarity between ILBP and MCT (also published at the same time).III.F ACE R ECOGNITION S YSTEMS USING L OCAL B INARY P ATTERNSIn this section,we present the most representative face recognition systems using LBP (Ahonen [1],Zhang [29],LBP/JSBoost [10],LBP/MAP [22],INORM LBP [9]).A.Ahonen SystemIn [1],Ahonen proposed a face recognition system based on a LBP representation of the face.The individual sample image is divided into R small non-overlapping blocks (or regions)of same size.Histograms of LBP codes H r ,with r ∈{1,2,...,R }are calculated over each block and then concatenated into a single histogram representing the face image.A block histogram can be defined as:H r (i )=x,y ∈block rI (f (x,y )=i ),i =1,...,N,(4)where N is the number of bins (number of different labels produced by the LBP operator),f (x,y )the LBP label 2at pixel (x,y )and I the indicator function....concatened histogramLBP codeFig.5.LBP description of the face.2Note that LBP (x,y ),the LBP operator value,may not be equal to f (x,y )which is the label assigned to the LBP operator value.With the LBP u 2P,R operator,for instance,all non-uniform patterns are labelled with a single label.This model contains information on three different levels (Figure 5):(1)LBP code labels for the local histograms (pixel level),(2)local histograms (region level)and (3)a concatenated histogram which builds a global description of the face image (image level).Because some regions are supposed to contain more information (such as eyes),Ahonen propose an empirical method to assign weights to each region.For classification,a nearest-neighbor classifier is used with Chi square (χ2)dissimilarity measure,defined as follows:χ2(S ,M )= r,i (S r (i )−M r (i ))2S r (i )+M r (i ),(5)where S and M correspond to the sample and the model histograms.Ahonen investigated several variants of LBP including LBP P,R (varying the radius parameter)and LBP u 2.However,he reported best resultswith LBP u 28,2.B.Zhang SystemFollowing the work of Ahonen,Zhang et al.[29]underlined some limitations.First,the size and position of each region are fixed which limits the size of the available feature space.Second,the weighting region method is not optimal.To overcome these limitations,they propose to shift and scale a scanning window over pairs of images,extract the local LBP histograms and compute a dissimilarity measure between the corresponding local histograms.If both images are from the same identity,the dissimilarity measure are labelled as positive features,otherwise as negative features.Classification is performed with AdaBoost learning,which solves the feature selection and classifier design problem.Optimal position/size,weight and selection of the regions are then chosen by the boosting parative study with Ahonen’s method showed similar results.Zhang et al.’s system uses however much less features (local LBP histograms).C.Huang SystemMore recently,Huang et al.[10]proposed an improved version of Zhang et al.system.Their method is based on a modified version of the boosting procedure called JSBoost which incorporates Jensen-Shannon (JS)divergence into AdaBoost learning.This JS divergence provides a more appropriate dissimilarity measure between two classes than previous proposed measures.More stable and effective weak classifiers are learnt by JSBoost .This improved feature selection leads to slightly better recognition results with a significantly smaller number of features.D.Rodriguez-Marcel SystemIn [22],Rodriguez et al.proposed to use a generative approach.This method,called LBP/MAP,considers local histograms as probability distributions and computes a log-likelihood ratio instead of a Chi square similarity.A generic face model is represented by collection of LBP-histograms.Then,a client-specific model is obtained by an adaptation technique (Figure 6)from this generic model under a probabilistic framework.LBP code lkk client dataworld datak histogram adaptation Fig.6.Illustration of the adaptation of LBP histograms.E.System using LBP as an Image PreprocessingA completely different approach has been also proposed.Heusch et al.[9]suggested to use the LBP (more precisely LBP8,2)directly as an illumination normalization technique(Figure7).This method, called INORM LBP,applies the LBP operator on every pixel of the image and computes the LBP code. Each LBP code becomes a pixel in the INORM LBP image.Then,standard face recognition techniques such as LDA/NC[15]or DCT/HMM[4]can be used to solve the verification task.Fig.7.Robustness to illumination changes.IV.E XPERIMENTS AND R ESULTSIn this section,we provide comparative experiments with several systems introduced in Section III on two face authentication benchmark databases,namely XM2VTS and BANCA,which we briefly describe in this section.All algorithms have been developed using the Torchvision library(torch3vision. idiap.ch)and experiments have been performed using the pyVerif(pyverif.idiap.ch)biometric verification toolkit.We implemented the systems of Ahonen[1]and Zhang[29],but also LBP/MAP[22]as well as two standard state-of-the-art methods combined with two different image preprocessing techniques: histogram equalization(HEQ)and LBP(INORM LBP)[9].Thefirst system is a combination of Linear Discriminant Analysis with a Normalized Correlation(LDA/NC)based on a holistic representation of the face[23].The second one is a generative approach based on the Discrete Cosine Transform and Hidden Markov Models(DCT/HMM)with a local description of the face[4].We didn’t implemented LBP/JSBoost,the system of Huang et al.[10],as the authors are claiming that LBP/JSBoost is an improvement of Zhang et al.system.A.DatabasesThe XM2VTS database[17]contains synchronized video and speech data from295subjects,recorded during four sessions taken at one month intervals.The subjects were divided into a set of200training clients,25evaluation impostors and70test impostors.Recordings were acquired over a period offive months under controlled conditions(blue background,artificial illumination)for the standard set.The darkened set contains four images of each subject acquired with side lighting.Figure8shows images coming from both sets of the database.We performed the experiments following the Lausanne Protocol Configuration I.Concerning darkened set experiments,the protocol is the same but for the testing phase: it is done on the darkened images.The BANCA database[2]was designed to test multi-modal identity verification with various acquisition devices under several scenarios(controlled,degraded and adverse).In the experiments described here we used the face images from the English corpora,containing52subjects,equally divided into two groups g1and g2used for development and evaluation alternatively.Each subject participated in12recording sessions.Each of these sessions contains two video recordings:one true client access and one impostor attack.Image acquisition was performed with two different cameras:a cheap analogue web-cam,and a high-quality digital camera,under several realistic scenarios:controlled(high-quality camera,uniform background,controlled lighting),degraded(web-cam,non-uniform background)and adverse(high-quality camera,arbitrary conditions).See Figure8for example images of each scenario.(a)XM2VTS:standard and darkened (b)BANCA:uncontrolled conditionsparison of XM2VTS (a)and BANCA (b)image conditions.Whereas XM2VTS database contains face images in controlled conditions,BANCA is a much more challenging database with face images recorded in uncontrolled environment (complex background,difficult lightning conditions).In the BANCA protocol,seven distinct configurations for the training and testing policy have been defined.In our experiments,the configurations referred as Match Controlled (Mc),Unmatched Adverse (Ua),Unmatched Degraded (Ud),Pooled Test (P)and Grand Test (G)are used.All of the of listed configurations,except protocol G,use the same training conditions:each client is trained using images from the first recording session of the controlled scenario.Testing is then performed on images taken from the controlled scenario (Mc),adverse scenario (Ua),degraded scenario (Ud),while (P)does the test for each of the previously described configurations.The protocol G uses training images from the first recording sessions of scenarios controlled,degraded and adverse.B.Experimental Setup1)Feature Extraction:For both XM2VTS and BANCA databases,face images are extracted to a common size of 80×64(rows ×columns),according to the provided ground-truth eye positions.We used this representation for all the methods we implemented.For LBP methods,face images are decomposed in 8×8blocks (R =80blocks)and histograms of LBP codes are then computed over each block r .2)Performance Measure:To assess authentication performance,the Half Total Error Rate (HTER)is generally used:HTER (θ)=FAR (θ)+FRR (θ)2.(6)where FAR if the false alarm rate,FRR the false rejection rate and θthe decision threshold.To correspond to a realistic situation,θis chosen a priori on the validation set at Equal Error Rate (EER).For experiments on XM2VTS database,we use all available training client images to build the generic model.For BANCA experiments,the generic model was trained with the additional set of images provided with the database,referred to as world data (independent of the subjects in the client database).C.Results and DiscussionTable I reports comparative results for Ahonen,Zhang and LBP/MAP systems,as well as for state-of-the-art methods LDA/NC and DCT/HMM both using HEQ and INORM LBP image preprocessing.We also report the only result from LBP/JSBoost [10]obtained on BANCA (protocol G only).From the results on XM2VTS (standard set),we first remark that several LBP methods obtain state-of-the-art results.Secondly,we notice that compared to the two other methods which use a LBP representation of the face,LBP/MAP performs clearly better on both databases and all protocols.On protocol G,where more client training data is available,LBP/MAP clearly outperforms the improved version of Zhang system (LBP/JSBoost).We also notice that LBP-based generative methods (INORM LBP +DCT/HMM and LBP/MAP)perform better that the two other LBP-based methods for all conditions.However,it must be noted that these methods (Ahonen and Zhang)have been originally designed for the face identification problem.We finally point out that as reported in [29]for identification,Ahonen and Zhang methods give similar results at least on the XM2VTS standard set.TABLE IHTER PERFORMANCE COMPARISON FOR TWO STATE-OF-THE-ART METHODS(LDA/NC AND DCT/HMM)AND LBP SYSTEMS,FORTHE XM2VTS DATABASE AND BANCA DATABASE.Models XM2VTS BANCAStd Dark Mc Ud Ua P GHEQ+LDA/NC 3.210.6 4.813.820.815.27.1HEQ+DCT/HMM 2.037.3 4.122.518.918.0 4.6INORM LBP+LDA/NC 5.69.7 6.213.320.515.37.4INORM LBP+DCT/HMM 1.49.6 2.19.216.112.6 1.2LBP Ahonen 3.422.68.314.323.120.810.4LBP Zhang 3.935.69.726.423.625.39.3LBP/MAP 1.412.97.310.722.619.2 5.0LBP/JSBoost[10]------10.7More importantly,it should be noticed the degradation of all systems when tested on the darkened set of XM2VTS and on unmatched conditions of BANCA.However,once again LBP-based generative methods are the most robust to these illumination changes and the mismatch.Finally,according to the results the best system is INORM LBP+DCT/HMM,that is when LBP is used as a preprocessing step and when an additional face recognition technique is used.Indeed,all LBP-based face recognition techniques perform histogram comparison.Therefore,we believe there might be a large potential for performance improvement by using more appropriate generative models of Local Binary Patterns.V.C ONCLUSION AND F UTURE W ORKIn this paper,we presented a survey on some recent use of Local Binary Patterns(LBPs)for face recognition.We described the LBP technique as well as several different approaches proposed in the liter-ature to represent and to recognize faces.We selected the most representatives to perform an experimental comparison on a face authentication task.The XM2VTS and BANCA databases were used according to their respective experimental protocols.Results have shown that LBP based methods obtained state-of-the-art results and than some of them were even outperforming the state-of-the-art.Another interesting conclusion from the results suggested to combine Local Binary Patterns and generative models.We believe this might be a novel research direction to investigate.R EFERENCES[1]T.Ahonen,A.Hadid and M.Pietik¨a inen,“Face recognition with local binary patterns”,European Conference on Computer Vision,Prague,469–481,2004.[2] E.Bailly-Bailli`e re,S.Bengio,F.Bimbot,M.Hamouz,J.Kittler,J.Mari´e thoz,J.Matas,K.Messer,V.Popovici,F.Por´e e,B.Ruiz andJ.P.Thiran,“The BANCA database and evaluation protocol”,International Conference on Audio-and Video-Based Biometric Person Authentication,Guilford,UK,2003.[3]P.Belhumeur,J.P.Hespanha and D.J.Kriegman,“Eigenfaces vs.Fisherfaces:Recognition using class specific linear projection”,EuropeanConference on Computer Vision,Cambridge,United Kingdom,45–58,1996.[4] F.Cardinaux,C.Sanderson and S.Bengio,“Face verification using adapted generative models”,IEEE Conference on Automatic Faceand Gesture Recognition,2004.[5] F.Cardinaux,C.Sanderson and S.Marcel,“Comparison of MLP and GMM classifiers for face verification on XM2VTS”,InternationalConference on Audio-and Video-Based Biometric Person Authentication,Guilford,UK,911–920,2003.[6]P.A.Devijver and J.Kittler,“Pattern Recognition:A Statistical Approach”,Prentice-Hall,Englewood Cliffs,N.J.,1982.[7] B.Fr¨o ba and A.Ernst,“Face detection with the modified census transform”,IEEE Conference on Automatic Face and GestureRecognition,2004.[8]M.Heikkil¨a,M.Pietik¨a inen and J.Heikkil¨a,“A texture-based method for detecting moving objects”,British Machine Vision Conference,London,UK.V olume1.187–196,2004[9]G.Heusch,Y.Rodriguez and S.Marcel,“Local Binary Patterns as an Image Preprocessing for Face Authentication”,IEEE InternationalConference on Automatic Face and Gesture Recognition,9–14,2006.[10]X.Huang,S.Z.Li and Y.Wang,“Jensen-shannon boosting learning for object recognition”,IEEE International Conference on ComputerVision and Pattern Recognition,San Diego,USA,2005.[11]X.Huang,S.Li,and Y.Wang,“Shape localization based on statistical method using extended local binary pattern”,InternationalConference on Image and Graphics,Hong Kong,China,184–187,2004.[12]H.Jin,Q.Liu,H.Lu and X.Tong,“Face detection using improved LBP under bayesian framework”,International Conference onImage and Graphics,Hong Kong,China.306–309,2004.[13]K.Jonsson,J.Matas,J.Kittler,and Y.Li,“Learning support vectors for face verification and recognition”,International Conferenceon Automatic Face and Gesture Recognition,208–213,2000.[14]J.Kittler,R.Ghaderi,T.Windeatt and G.Matas,“Face verification via ECOC”,British Machine Vision Conference,593–602,2001.[15]Y.Li,J.Kittler and J.Matas,“On matching scores of LDA-based face verification”,British Machine Vision Conference,2000.[16]S.Lucey and T.Chen,“A GMM parts based face representation for improved verification through relevance adaptation”,IEEEInternational Conference on Computer Vision and Pattern Recognition,Washington D.C.,USA.2004.[17]K.Messer,J.Matas,J.Kittler,J.Luettin and G.Maitre,“XM2VTSDB:The Extended M2VTS Database”,International Conferenceon Audio and Video-based Biometric Person Authentication,1999.[18]K.Messer,J.Kittler,M.Sadeghi,M.Hamouz,A.Kostyn,S.Marcel,S.Bengio,F.Cardinaux,C.Sanderson,N.Poh,Y.Rodriguezand al.:“Face authentication competition on the BANCA database”,International Conference on Biometric Authentication,Hong Kong, 2004.[19] A.Nefian and M.Hayes,“Face recognition using an embedded HMM”,IEEE Conference on Audio and Video-based Biometric PersonAuthentication,19–24,1999.[20]T.Ojala and M.Pietik¨a inen and D.Harwood,“A comparative study of texture measures with classification based on feature distributions”,Pattern Recognition,V olume29,51–59,1996.[21]T.Ojala and M.Pietik¨a inen and T.M¨a enp¨a¨a,“Multiresolution gray-scale and rotation invariant texture classification with loval binarypatterns”,IEEE Transactions on Pattern Analysis and Machine Intelligence,V olume24,971–987,2002.[22]Y.Rodriguez and S.Marcel,“Face Authentication Using Adapted Local Binary Pattern Histograms”,European Conference on ComputerVision,V olume4,321–332,2006.[23]M.Sadeghi,J.Kittler,A.Kostin and K.Messer,“A comparative study of automatic face verification algorithms on the banca database”,International Conference on Audio-and Video-Based Biometric Person Authentication,Guilford,UK,35–43,2003.[24] C.Sanderson and K.Paliwal,“Fast features for face authentication under illumination direction changes”,Pattern Recognition Letters,2409–2419,2003.[25]V.Takala,T.Ahonen and M.Pietik¨a inen,“Block-based methods for image retrieval using local binary patterns”,ScandinavianConference on Image Analysis,Joensuu,Finland,882–891,2005.[26]M.Turk and A.Pentland,“Eigenface for recognition”Journal of Cognitive Neuro-science,V olume3,70–86,1991.[27]M.Turtinen,M.Pietik¨a inen and O.Silven,“Visual characterization of paper using isomap and local binary patterns”,Conference onMachine Vision Applications,Tsukuba Science City,Japan,210–213,2005.[28]R.Zabih and J.Woodfill,“A non-parametric approach to visual correspondence”,IEEE Transactions on Pattern Analysis and Machineintelligence,1996.[29]G.Zhang,X.Huang,S.Li,Y.Wang and X.Wu,“Boosting local binary pattern(LBP)-based face recognition”,Chinese Conferenceon Biometric Recognition,Guangzhou,China,179–186,2004.[30]W.Zhao,R.Chellappa,P.J.Phillips and A.Rosenfeld,“Face Recognition:A literature Survey”,ACM Computing Surveys,V olume35,Number3,399–458,2003.[31]J.Zhao,H.Wang,H.Ren and S.-C.Kee,“LBP Discriminant Analysis for Face Verification”,IEEE Workshop on Face RecognitionGrand Challenge Experiments,V olume3,2005.[32]Z.Wenchao,S.Shiguang,G.Wen,C.Xilin and Z.Hongming,“Local Gabor Binary Pattern Histogram Sequence(LGBPHS):ANovel Non-Statistical Model for Face Representation and Recognition”,IEEE International Conference on Computer Vision,V olume1, 786–791,2005.。
矿产资源开发利用方案编写内容要求及审查大纲
矿产资源开发利用方案编写内容要求及《矿产资源开发利用方案》审查大纲一、概述
㈠矿区位置、隶属关系和企业性质。
如为改扩建矿山, 应说明矿山现状、
特点及存在的主要问题。
㈡编制依据
(1简述项目前期工作进展情况及与有关方面对项目的意向性协议情况。
(2 列出开发利用方案编制所依据的主要基础性资料的名称。
如经储量管理部门认定的矿区地质勘探报告、选矿试验报告、加工利用试验报告、工程地质初评资料、矿区水文资料和供水资料等。
对改、扩建矿山应有生产实际资料, 如矿山总平面现状图、矿床开拓系统图、采场现状图和主要采选设备清单等。
二、矿产品需求现状和预测
㈠该矿产在国内需求情况和市场供应情况
1、矿产品现状及加工利用趋向。
2、国内近、远期的需求量及主要销向预测。
㈡产品价格分析
1、国内矿产品价格现状。
2、矿产品价格稳定性及变化趋势。
三、矿产资源概况
㈠矿区总体概况
1、矿区总体规划情况。
2、矿区矿产资源概况。
3、该设计与矿区总体开发的关系。
㈡该设计项目的资源概况
1、矿床地质及构造特征。
2、矿床开采技术条件及水文地质条件。