Performance Comparison of Face Recognition using Transform Domain Techniques
- 格式:pdf
- 大小:575.84 KB
- 文档页数:8
基于人脸跟踪与识别的行人闯红灯取证系统的设计与实现韦勇;万旭;徐海黎;沈标【摘要】针对交叉路口行人频繁闯红灯的现象,提出一种基于人脸跟踪和人脸识别的行人闯红灯取证解决方案.行人闯红灯取证系统分为前端抓拍部分和后台比对查询部分,主要包括信息采集模块、人脸检测跟踪模块、报警模块和比对识别模块.系统采用改进的AdaBoost算法检测人脸,提出基于Camshift和轨迹预测的多人脸跟踪算法,以解决传统Camshift算法跟踪目标需手动绘制、跟踪目标单一和背景区域干扰大的缺点,实现了多人脸实时跟踪.后台比对查询部分采用基于卷积神经网络的方法对人脸进行比对识别.实际使用结果表明,该系统具有良好的稳定性和实时性,可有效杜绝"中国式过马路"现象.【期刊名称】《现代电子技术》【年(卷),期】2018(041)019【总页数】4页(P36-39)【关键词】行人闯红灯;多功能报警;人脸检测;人脸跟踪;人脸抓拍;人脸识别【作者】韦勇;万旭;徐海黎;沈标【作者单位】南通大学机械工程学院,江苏南通 226019;南通大学机械工程学院,江苏南通 226019;南通大学机械工程学院,江苏南通 226019;南京蓝泰交通设施有限责任公司,江苏南京 210019【正文语种】中文【中图分类】TN911.73-34;TP391.40 引言近年来,随着城市道路交通的迅速发展,机动车的数量日益增多。
在我国城市交叉路口,行人过马路闯红灯的现象较为常见,尤其当群体行人数量达到4人时,最容易激发人群的集体闯红灯行为,这种行为被称为“中国式过马路”[1⁃4]。
行人闯红灯的行为不仅会威胁人身安全,还会诱发交通事故,影响交通秩序。
为了提高城市交通秩序、保障人民群众的生命安全、倡导文明出行,需要对行人闯红灯行为进行检测识别,并做出一些相应的提醒和处罚措施,提高行人在道路交通中的自觉意识[5⁃6]。
本文提出了基于人脸跟踪和识别技术的行人闯红灯取证系统的解决方案,用信息化、智能化的手段加强对城市道路交通的管理,推动智慧交通、智慧城市的建设。
His face is remarkably round and smooth,a feature that often draws attention and can be described in various ways depending on the context of the essay.Here are some points to consider when writing an essay about someone with a round and smooth face:1.Physical Description:Begin by describing the persons facial features.Mention the roundness of the face,the smoothness of the skin,and how these features contribute to their overall appearance.2.First Impressions:Discuss the impact of his round and smooth face on first impressions.People might perceive him as friendly,approachable,or youthful due to the softness of his features.3.Personality Traits:Explore the possible correlation between his facial features and his personality.For example,if he is known for his diplomatic skills or for being a peacemaker,you could suggest that his round and smooth face might symbolize these qualities.4.Cultural Significance:Depending on the cultural context,round and smooth faces might have different meanings.In some cultures,a round face is associated with good fortune or wealth,while in others,it might be seen as a sign of health and happiness.5.Daily Life:Describe how his round and smooth face might affect his daily interactions. For instance,it could make him more memorable to others,or it might influence the way people approach him in social or professional settings.6.Personal Care:Mention how he might take care of his skin to maintain its smoothness. This could include his skincare routine,diet,or lifestyle habits that contribute to his facial appearance.7.Challenges and Advantages:Address any challenges he might face due to his round and smooth face,such as stereotypes or misconceptions.Conversely,discuss the advantages,such as a perceived trustworthiness or attractiveness.8.Personal Narrative:If the essay is more personal,include anecdotes or stories that illustrate the significance of his facial features in his life.This could be a funny incident, a moment of recognition,or a personal triumph.9.Conclusion:End the essay by reflecting on the significance of his round and smooth face in shaping his identity and how it has influenced his life experiences.Remember to use descriptive language and vivid imagery to bring the persons face to life in your writing.This will help the reader to visualize the subject and understand the impact of his facial features on his life and the lives of those around him.。
第3"卷第2期2018年6月金陵科技学院学报JOURNAL OF JINLING INSTITUTE OF TECHNOLOGYVol. 3"! No.2June,2018D O I: 10.16515#.cnki.32-1722/n.2018.02.0001特征脸算法人脸识别系统胡勇,朱莹莹(金陵科技学院软件工程学院,江苏南京211169)摘要:人脸识别技术,是指对给定的一个人脸图像,从存储的已知身份的人脸图像库中识别出该人身份的一种技术。
其主要应用于证件验证、刑侦破案、门禁系统、视频监控等领域。
实现了基于特征脸算法的人脸识别系统,在实验中采用的人脸图像数据库为E SSE X大学的F A C E S9"人脸图像数据库。
该系统的优点是识别速度快、准确率高,具有一定的实际应用价值。
关键词:人脸识别(特征脸;F A C E S9"人脸图像数据库;算法中图分类号:T P391.1文献标识码:A文章编号"672 - 755X(2018)02 - 0001 - 0"Face Recognition System Based on Eigenface AlgorithmHU Yong,ZHU Ying-ying(Jingling Institute of Technology,N anjing211169, China)Abstract:Face recognition is a technique that identifies a person!s identity from a given face image in a face image database.It is widely used in the fields of certificate veri investigation,access control system,video surveillance and so on.A face recognition systembased on the eigenface algorithm is implemented in this paper.The database used in the experiment is the FACES94 face image database of ESSEX University.The proposed system has theadvantages of high recognition rate,high accuracy rate and practical application value.Key words:face recognition;eigenface;FACES9" face image database;algorithm树林里没有两片一模一样的树叶,人世间也没有两张一模一样的面孔。
Method of Face Recognition Based on Red-BlackWavelet Transform and PCAYuqing He, Huan He, and Hongying YangDepartment of Opto-Electronic Engineering,Beijing Institute of Technology, Beijing, P.R. China, 10008120701170@。
cnAbstract。
With the development of the man—machine interface and the recogni—tion technology, face recognition has became one of the most important research aspects in the biological features recognition domain. Nowadays, PCA(Principal Components Analysis) has applied in recognition based on many face database and achieved good results. However, PCA has its limitations: the large volume of computing and the low distinction ability。
In view of these limitations, this paper puts forward a face recognition method based on red—black wavelet transform and PCA. The improved histogram equalization is used to realize image pre-processing in order to compensate the illumination. Then, appling the red—black wavelet sub—band which contains the information of the original image to extract the feature and do matching。
Vol.33,No.1ACTA AUTOMATICA SINICA January,2007 A Low-dimensional Illumination Space Representation ofHuman Faces for Arbitrary Lighting ConditionsHU Yuan-Kui1WANG Zeng-Fu1Abstract The proposed method for low-dimensional illumination space representation(LDISR)of human faces can not only synthesize a virtual face image when given lighting conditions but also estimate lighting conditions when given a face image.The LDISR is based on the observation that9basis point light sources can represent almost arbitrary lighting conditions for face recognition application and different human faces have a similar LDISR.The principal component analysis(PCA)and the nearest neighbor clustering method are adopted to obtain the9basis point light sources.The9basis images under the9basis point light sources are then used to construct an LDISR which can represent almost all face images under arbitrary lighting conditions. Illumination ratio image(IRI)is employed to generate virtual face images under different illuminations.The LDISR obtained from face images of one person can be used for other people.Experimental results on image reconstruction and face recognition indicate the efficiency of LDISR.Key words LDISR,basis image,illumination ratio image,face recognition1IntroductionIllumination variation is one of the most important fac-tors which reduce significantly the performance of face recognition system.It has been proved that the variations between images of the same face due to illumination are almost always larger than image variations due to change in face identity[1].So eliminating the effects due to illumi-nation variations relates directly to the performance and practicality of face recognition system.To handle face image variations due to changes in ligh-ting conditions,many methods have been proposed thus far.Generally,the approaches to cope with variation in appearance due to illumination fall into three kinds[2]: invariant features,such as edge maps,imagesfiltered with2D Gabor-like functions,derivatives of the gray-level image,images with Log transformations and the re-cently reported quotient image[3]and self-quotient image[4]; variation-modeling,such as subspace methods[5∼7],illumi-nation cone[8∼10];and canonical forms,such as methods in [11,12].This paper investigates the subspace methods for illumi-nation representation.Hallinan et al.[5,6]proposed an eigen subspace method for face representation.This method firstly collected frontal face images of the same person un-der different illuminations as training set,and then used principal component analysis(PCA)method to get the eigenvalues and eigenvectors of the training set.They concluded that5±2eigenvectors would suffice to model frontal face images under arbitrary illuminations.The ex-perimental results indicated that this method can recon-struct frontal face images with variant lightings using a few eigenvectors.Different from Hallinan,Shashua[7]pro-posed that under the assumption of Lambertian surface, three basis images shot under three linearly independent light sources could reconstruct frontal face images under arbitrary lightings.This method was proposed to discount the lighting effects but not to explain lighting conditions. Belhumeur et al.[8,9]proved that face images with the same pose under different illumination conditions form a convex cone,called illumination cone,and the cone can be repre-sented in a9dimensional space[10].This method performs well but it needs no less than seven face images for each Received January11,2006;in revised form March28,2006 Supported by Open Foundation of National Laboratory of Pattern Recognition,P.R.China.1.Department of Automation,University of Science and Technol-ogy of China,Hefei230027,P.R.ChinaDOI:10.1360/aas-007-0009person to estimate the3D face shape and the irradiance map.Basri&Jacobs[13]and Ramamoorthi[14,15]indepen-dently applied the spherical harmonic representation and explained the low dimensionality of differently illuminated face images.They theoretically proved that the images of a convex Lambertian object obtained under a wide variety of lighting conditions can be approximated accurately with a 9D linear subspace,explaining prior empirical results[5∼7]. However,both of them assumed that the3D surface normal and albedo(or unit albedo)were known.This assumption limits the application of this algorithm.The above research results theoretically and empirically indicate that frontal face images obtained under a wide variety of lighting conditions can be approximated accu-rately with a low-dimensional linear subspace.However, all the above subspace methods construct a subspace from training images for each human face,which is not only cor-responding to the illumination conditions but also to the face identity.The subspaces,in which the intrinsic infor-mation(shape and albedo)and the extrinsic information (lightings)are mixed,are not corresponding to the lighting conditions distinctly.Otherwise,a large training image set would be needed in the learning stage and3d face model might be needed.In this paper,a low-dimensional illumination space rep-resentation(LDISR)of human faces for arbitrary lighting conditions is proposed,which can handle the problems that can not be solved well in the existing methods to a certain extent.The key idea underlying our model is that any lighting condition can be represented by9basis point light sources.The9basis images under the9basis point light sources construct an LDISR,which separates the intrinsic and the extrinsic information and can both estimate ligh-ting conditions when given a face image and synthesize a virtual face image when given lighting condition combin-ing with the illumination ratio image(IRI)method.The method in[10]and the proposed method in this paper have some similarities,but they have some essential differences also.The former needs to build one subspace for each per-son,and the latter only needs to build one subspace for one selected person.Furthermore,the9D illumination space built in the former case is not corresponding to the lighting conditions distinctly,and in our case once the correspon-ding illumination space is built,it can be used to generate virtual frontal face images of anybody under arbitrary illu-minations by using the warping technology and IRI method developed.These virtual images are then used for the pur-pose of both training and recognition.The experiments onc 2007by Acta Automatica Sinica.All rights reserved.10ACTA AUTOMATICA SINICA Vol.33 Fig.1The positions corresponding to the dominant pointlight sourcesYale Face Database B indicate that the proposed methodcan improve the performance of face recognition efficiently.2Constructing the LDISRSince any given set of lighting conditions can be exactlyexpressed as a sum of point light sources,a surface patch sradiance illuminated by two light sources is the sum of thecorresponding radiances when the two light sources are ap-plied separately.More detail was discussed in[5].In thissection,PCA and clustering based method are adopted tofind the basis point light sources,which are able to repre-sent arbitrary lighting conditions.The needed3D face model was obtained using a3D ima-ging machine3DMetrics TM.Then the3D face model ob-tained was used to generate the training images.Moveafloodlight by increments of10degrees to each position(θi,ϕj)to generate image p(θi,ϕj),whereθis the eleva-tion andϕis the azimuth.Typicallyϕ∈[−120◦,120◦]andθ∈[−90◦,90◦].Totally,427images were generated,denoted as{pk ,k=1,···,427}.We use PCA tofind the dominant components for the finite set of images.Since the PCA is used on the images of the same human face with different lighting conditions, the dominant eigenvectors do not reflect the facial shape but the lighting conditions.So the above eigenvectors can be used to represent lighting conditions.In this paper,the lighting subspace is constructed not using the eigenvectors directly but the light sources corresponding to the eigen-vectors.According to the ratio of the corresponding eigen-value to the sum of all the eigenvalues,thefirst60 eigenvalues containing the99.9%energy were selected. And the60corresponding eigenvectors were selected as the principal components.Denote thefirst60eigenvectors as{u i,i=1,···,60}.For the i th eigenvector u i,thecorresponding training image is pj ,where u i and pjsatisfyu T i pj =maxk∈{1, (427){u T i pk}(1)The positions of the60dominant point light sources are shown in Fig.1.By investigating the positions of the dominant point light sources,it can be found that the dominant point light sources are distributed by certain rules.They are distributed almost symmetrically and cluster together in regions such as the frontal,the side,the below,and the above of head.The nearest neighbor clustering method is adopted here to get the basis light positions.Considering the effects of point light sources in different elevation and azimuth,some rules are employed for clustering:1.When the elevation is below−60◦or above60◦,clus-tering is done based on the differences of values in elevation.2.When the elevation is in range[−60◦,60◦],clusteringis donebased on the Euclidian distances in space.Fig.2The clustering result of thefirst60eigenvectors.By adopting the nearest nerghbor clustering method,the 60dominant light sources can be classified into9classes. The clustering result is shown in Fig.2.When the geometric center of each class is regarded as the basis position,the9 basis light positions are shown in Table1.From the above procedure,it is known that point light sources in the9basis positions are dominant and princi-pal components in the lighting space,and they can express arbitrary lighting conditions.The9basis images obtained under the9basis point light sources respectively construct a low-dimensional illumination space representation(LD-ISR)of human face,which can express frontal face images under arbitrary illuminations.Because different human faces have similar3D shapes[3,16],the LDISR of different faces is also similar.As an approximation,it can be as-sumed that different persons have the same LDISR,which has been discussed in[17].Denote the9basis images obtained under9basis lights are I i,i=1,···,9,the LDISR of human face can be de-noted as A=[I1,I2,···,I9].The face image under lighting s x can be expressed asI x=Aλλ(2) whereλ=[λ1,λ2,···,λ9]T,0≤λi≤1is the lighting pa-rameters of image I x and can be calculated by minimizing the energy function E(λ):E(λ)= Aλλ−I x 2(3) So we can getλ=A+I x(4) whereA+=(A T A)−1A T(5)No.1Hu Yuan-Kui and WANG Zeng-Fu:A Low-dimensional Illumination Space Representation of (11)Table1Positions of the9basis light sourceslight123456789Elevationθ(degree)017.525.7364468.6-33.3-35-70Elevationϕ(degree)0-47.544.4-10888-385-9522.5(a)Input image(b)ASM alignment(c)Warped mean shape(d)The virtual images generated under different lightingsFig.3Generating virtual images using the9D illuminationspace and the IRIGiven an image of human face for learning images,thelighting parametersλcan be calculated by(4),and thevirtual face images can be generated by(2)by using thelighting conditionλ.In order to use the LDISR learnedfrom one human face to generate virtual images of other hu-man faces,the illumination ratio image(IRI)based methodis adopted in next section.3Generating virtual images u-sing illumination ratio-image(IRI)Denote the light sources as s i,i=0,1,2,···,respec-tively,where s0is the normal light source,and I ji the imageunder light source s i for the person with index j.The IRIis based on the assumption that a face is a convex surfacewith a Lambertian function.A face image can be describedasI(u,v)=ρ(u,v)n(u,v)·l(6)where,ρ(u,v)is the albedo of point(u,v),n(u,v)is thesurface normal at(u,v),and l is the direction of light.Different from the quotient image[3],illumination ratioimage is defined as follows[11,18,19,20].R i(u,v)=I ij(u,v)I0j(u,v)(7)From(6)and(7),we haveR i(u,v)=ρj(u,v)n T(u,v)·s iρj(u,v)n T(u,v)·s0=n T(u,v)·s in T(u,v)·s0(8)Equation(8)shows that the IRI can be determined only by the surface normal of a face and the light sources,which is independent of specific albedo.Since different human faces have the similar surface normal[3,16],the IRIs of dif-ferent people under the same lighting condition can be con-sidered to be the same.In order to eliminate the effect due to shapes of different faces,the following procedure should be done.Firstly,all faces can be warped to the same shape, and then the IRI is computed.In this paper,an ASM based method is used to perform the face alignment and all faces will then be warped to the predefined mean shape.After the procedure,all faces will have a quite similar3D shape. That is to say,with the same illumination,IRI is the same for different people.The corresponding face image under arbitrary lighting condition can be generated from the IRI. Finally the face image is warped back to its original shape.From(7),we haveI ij(u,v)=I0j(u,v)R i(u,v)(9)Equation(9)means that,given the IRI under s i and the face image under the normal lighting,we can relight the face under s i.The face relighting problem can be defined as follows. Given one image,I a0,of people A under the normal ligh-ting s0,and one image,I bx,of another people B under some specific lighting s x,how to generate the image,I ax, of people A under lighting S x.Unlike[11,18],the IRI under each lighting is unknown in this paper.Given image I bx,the IRI under lighting s x can be calcu-lated using the LDISR described in Section2.Assume the LDISR,A,is learned from images of people M.The ligh-ting parameter,λx,of image I bx is solved by the least-square methodA T Aλλx=A T I bx(10) Aλλx is the image of people M under lighting s x,denoted as I mx.The IRI under lighting s x can be calculated byR x(u,v)=I xm(u,v)/I0m(u,v)(11)where I0m is the image of people M under normal lighting. After the IRI under lighting s x is calculated,the face image of people A can be relit under lighting s x by I xa(u,v)= I0a(u,v)R x(u,v).In general,given face image I0y of arbitrary face Y under lighting s0,face image of Y under arbitrary lighting can be generated by the following procedure:1.Detect face region I0y and align it using ASM;2.Warp I0y to the mean shape T0;3.Relight T0using the IRI under lighting s k:T k(u,v)=T0(u,v)R k(u,v);4.Reverse-warp the texture T k to its original shape toget the relit image I kyFig.3shows some relighting results on Yale Face Database B.In the experiments,the LDISR was con-structed by the nine basis images of people DZF(not in-cluded in Yale Face Database B).For each image under12ACTA AUTOMATICA SINICA Vol.33abec fd gFig.4Results of image reconstruction.a)Original images.b)Images reconstructed by 5-eigenimages.c)Images reconstructed by 3-basis images.d)Images reconstructed by the LDISR.e)The differences corresponding to the images in b).f)The differences corresponding to the images in c).g)The differences corresponding to the images in d).normal lighting in Yale Face Database B,the virtual im-ages under other 63lightings were generated.It should be highlighted that in the original IRI method [11,18],to calculate the IRI,the image under nor-mal lighting and the image under specific lighting must be of the same people.The LDISR based method proposed in this paper breaks this limitation and the face image used in the algorithm can be of different people.In addition,when no face image under normal lighting is available,the virtual image can be generated by using the given λx from (2).And the IRI will then be calculated according to the virtual image.4Experimental results4.12D image reconstructionThe experiment was based on the 427frontal face images under different lightings described in Section 2.In this experiment,three image reconstruction methods were im-plemented:5-eigenimages representation method proposed by Hallinan [5],a linear combination of 3-basis images pro-posed by Shashua [7],and the LDISR based method.The face images under different lightings were reconstructed and the performances were evaluated by the differences between the original and the reconstructed images.According to [5],PCA was adopted to train the 427images and the eigenvectors corresponding to the first 5eigenvalues were selected to construct face illumination sub-space I.According to [7],the selected 3basis images under three point light sources respectively were used to construct face illumination subspace II.The LDISR constructed by the nine basis images was the face illumination subspace III.The total 427face images were reconstructed by the three face illumination subspace,respectively.Some original images are shown in Fig.4a),and the images reconstructed using face illumination I,II,III are shown in Fig.4b),c),and d),respectively.The corre-sponding differences are shown in Fig.4e),f),and g),respectively.It can be concluded from Fig.4that the performances of the 5-eigenimages representation method and the LDISR are comparative,and they are both better than that of the 3-basis images representation method.When the variation due to lighting condition is large (Fig.4c),columns 2,3,and 4),the differences between the original and the recon-structed images are very large (Fig.4f),columns 2,3,and 4),especially when there are shadows in face images.To evaluate more rigorously,the fit function defined in [5]was adopted.The quality of the reconstruction can be measured by the goodness of the fit function:ε=1−I rec −I in 2I in 2(12)where I rec is the reconstructed image,and I in is the original image.The values of the fit function corresponding to all the 427reconstructions by three methods are shown in Fig.5.From Fig.5,it can be seen that the fitness of images reconstructed by the 5-eigenimages representation method and the LDISR to the original image is very good,while the 3-basis images representation method is not so good.When the variation in lighting is larger (corresponding to the abscissas are 50and 280in Fig.5)the performance of the LDISR is better than that of the 5-eigenimages repre-sentation method.Besides,the 5-eigenimages and the 3-basis images rep-resentation methods need multiple images of each person,and train one model for each person.However,the LDISR trains one model using 9basis images of one person,and can be used for other person by warping technique.4.2Face recognition with variant lightings based on virtual imagesIn this experiment,the LDISR and the IRI method were combined to generate virtual face images,which were used for face recognition with variant lightings.The experiments were based on the Yale Face Database B [10].64frontal face images of each person under 64different lightings were selected,and there were 640images of 10persons.TheNo.1Hu Yuan-Kui and WANG Zeng-Fu:A Low-dimensional Illumination Space Representation of ···13Fig.5The values of fit function corresponding to thereconstruction by three methods.images have been divided into five subsets according to the angles the light source direction makes with the camera axis [10]:Subset 1(up to 12◦),subset 2(up to 25◦),subset 3(up to 50◦),subset 4(up to 77◦),and subset 5(others).Correlation,PCA,and LDA methods were adopted for face recognition.For correlation method,the image under normal lighting of each person was the template image and the rest 63images of each person were test images.For PCA and LDA methods,three images of each person (of which the angles the light source direction makes with the camera axis are the smallest)were training images,and the rest were test images.The LDISR was constructed by the nine basis images of people DZF (not included in Yale Face Database B).For each frontal face image in Yale Face Database B,the virtual images corresponding to the other 63lightings were gener-ated using the LDISR and IRI.In order to decrease the effect of illumination,we used gamma intensity correction (GIC).Here γ=4.The three recognition methods were performed on the original images,images with GIC and virtual images with GIC.The results are shown in Fig.6,where correlation,PCA and LDA correspond to the results for the original images,GIC correlation,GIC PCA,and GIC LDA correspond to the results for the images with GIC,and GIC virtual correlation,GIC virtual PCA,and GIC virtual LDA correspond to the results for the virtual images with GIC.Fig.6illustrates that the recognition accuracy for the virtual images is improved greatly.When the variations due to illumination are larger,the improvement is greater.The recognition rates of correlation,PCA,and LDA on the virtual images are 87.24%,87.99%,and 90.5%,respec-tively.For subset 1,subset 2,and subset 3,in which the variations due to illumination are small,the performance of three recognition methods are comparable,while in sub-set 4and subset 5,LDA performs better.This indicates that the classifying ability of LDA is better than others.In the future,we will validate the proposed method on larger face database.5ConclusionThis paper proposes a method to construct an LDISR u-sing the 9basis images under the 9basis point light sources.The LDISR can represent almost all face images underar-Fig.6The results of Face recognition on Yale face Database Bbitrary lighting conditions.The LDISR combined with the IRI is corresponding to the lighting conditions distinctly,and can estimate lighting conditions when given a face im-age and synthesize a virtual face image when given lighting conditions.The experiments of reconstruction illustrate that the representation ability of LDISR is better than the 5-eigenimages and 3-basis images representation methods.The experiments on Yale Face Database B confirm the abi-lity of LDISR in synthesizing a virtual face image and in-dicate that the virtual face images can improve greatly the accuracy of face recognition under variant lightings.The main advantage of the proposed model is that it can be used to generate virtual images of anybody only from 9basis face images of one person.And at the same time,the method need not know the lighting conditions or pre-calculate the IRI.References1Moses Y,Adini Y,Ullman S.Face recognition:the problem of compensating for changes in illumination direction.In:Pro-ceedings of the Third European Conference on Computer Vision.Stockholm,Sweden.Springer-Verlag,1994,286∼2962Sim T,Kanade T,Combining models and exemplars for face recognition:An illuminating example,In:Proceedings of CVPR 2001Workshop on Models versus Exemplars in Computer Vi-sion,Hawaii,USA.IEEE,2001,1∼103Shashua A,Riklin-Raviv T,The quotient image:Class-based re-rendering and recognition with varying illuminations.IEEE Transactions on Pattern Analysis and Machine Intelligence ,2001,23(2):129∼1394Wang H,Li Stan Z,Wang Y.Face recognition under varying lighting conditions using self quotient image.In:Proceedings of the 6th International Conference on Automatic Face and Gesture Recognition,Seoul,Korea.IEEE,2004,819∼8245Hallinan P W.A low-dimensional representation of human faces for arbitrary lighting conditions.In:Proceedings of IEEE Com-puter Society Conference on Computer Vision and Pattern Recognition,Seattle,USA.IEEE,1994,995∼9996Epstein R,Hallinan P W,Yuille A L.5±2eigenimages suffice:an empirical investigation of low dimensional lighting models.In:Proceedings of IEEE Workshop on Physics-Based Vision,Boston,USA.IEEE,1995,108∼1167Shashua A.On photometric issues in 3D visual recognition from a single 2D image.International Journal of Computer Vision ,1997,21(1):99∼1228Belhumeur P,Kriegman D.What is the set of images of an object under all possible illumination conditions.International Journal of Computer Vision ,1998,28(3):245∼2609Georghiades A S,Belhumeur P N,Kriegman D J.From few to many:Illumination cone models for face recognition under variable lighting and pose.IEEE Transactions on Pattern Analysis and Machine Intelligence ,2001,23(6):643∼66014ACTA AUTOMATICA SINICA Vol.3310Lee K,Jeffrey Ho,Kriegman D.Nine points of light:Acquiring subspaces for face recognition under variable lighting.In:Pro-ceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition,Hawaii,USA.IEEE,2001(1), 519∼52611Gao W,Shan S,Chai X,Fu X.Virtual face image generation for illumination and pose insensitive face recognition.In:Proceed-ings of IEEE International Conference on Acoustics,Speech,and Signal Processing,Hong Kong.IEEE,2003,776∼77912Zhao W,Chellappa R.Robust face recognition using symmetric shape-from-shading.Technical Report CARTR-919,1999,Cen-ter for Automation Research,University of Maryland,College Park,MD.13Basri R,Jacobs mbertian reflectance and linear subspaces, In:Proceedings of the8th IEEE Computer Society International Conference On Computer Vision,Vancouver,Canada.IEEE, 2001,2:383∼39014Ramamoorthi R,Hanrahan P.An efficient representation for ir-radiance environment maps.In:Proceedings of the28th Annual Conference on Computer Graphics and Interactive Techniques, Los Angeles,USA:ACM Press,2001,497∼50015Ramamoorthi R.Analytic PCA construction for theoretical analysis of lighting variability in images of a Lambertian object, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002,24(10):1322∼133316Chen H F,Belhumeur P N,Jacobs D W.In search of illumi-nation invariants.In:Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition,South Carolina,USA.IEEE,2000,1:54∼26117Wang H,Li Stan Z,Wang Y,Zhang W,Illumination modeling and normalization for face recognition,In:Proceedings of IEEE International Workshop on Analysis and Modeling of Faces and Gestures,Nice,France.IEEE,2003,104∼11118Zhao J,Su Y,Wang D,Luo S.Illumination ratio image:Synthe-sizing and recognition with varying illuminations.Pattern Recog-nition Letters,2003,24(15):2703∼271019Wen Z,Liu Z,Huang T S.Face relighting with radiance envi-ronment maps.In:Proceedings of IEEE Computer Society Con-ference on Computer Vision and Pattern Recognition,Madison, USA.IEEE,2003,2:158∼16520Qing L,Shan S,Chen X.Face relighting for face recognition under generic illumination.In:Proceedings of IEEE Interna-tional Conference on Acoustics,Speech,and Signal Processing, Montreal,Canada.IEEE,2004,5:733∼736HU Yuan-Kui Ph.D.candidate in the De-partment of Automation at University of Sci-ence and Technology of China.His researchinterests include face recognition,image pro-cessing,and pattern recognition.W ANG Zeng-Fu Professor of the Depart-ment of Automation at University of Scienceand Technology of China.His current re-search interests include audio&vision in-formation processing,intelligent robots,andpattern recognition.Corresponding author ofthis paper.E-mail:zfwang@。
第 21 卷 第 9 期2023 年 9 月太赫兹科学与电子信息学报Journal of Terahertz Science and Electronic Information TechnologyVol.21,No.9Sept.,2023基于受试者工作特征pAUC优化的人脸识别系统唐林瑞泽,白仲鑫,张晓雷*(西北工业大学航海学院,陕西西安710072)摘要:基于深度学习的人脸识别技术在大量应用场景中表现出优于传统方法的性能,它们的损失函数大致可分为2类:基于验证的和基于辨识的。
验证型损失函数符合开集人脸识别的流程,但实施过程比较困难。
因此目前性能较优的人脸识别算法都是基于辨识型损失而设计的,通常由softmax输出单元和交叉熵损失构成,但辨识型损失并没有将训练过程与评估过程统一起来。
本文针对开集人脸识别任务提出一种新的验证型损失函数,即最大化受试者工作特征(ROC)曲线下的部分面积(pAUC);同时还提出一种类中心学习策略提高训练效率,使提出的验证型损失和辨识型损失有较强的可比性。
在5个大规模非限定环境下的人脸数据集上的实验结果表明,提出的方法和目前性能最优的人脸识别方法相比,具有很强的竞争性。
关键词:人脸识别;部分面积优化;损失函数;类中心中图分类号:TP391.4 文献标志码:A doi:10.11805/TKYDA2021258Partial Area Under Curve optimization for face recognition systemTANG Linruize,BAI Zhongxin,ZHANG Xiaolei*(School of Marine Science and Technology,Northwestern Polytechnical University,Xi'an Shaanxi 710072,China)AbstractAbstract::Deep learning based face recognition has outperformed traditional methods in many application scenarios. There are two main lines of research to design loss functions for face recognition, i.e., verification and identification. The verification loss functions match the pipeline of open-set facerecognition, but it is hard to implement. Therefore, most state-of-the-art deep learning methods for facerecognition take the identification loss functions with softmax output units and cross-entropy loss.Nevertheless, identification loss function dose not match the training process with evaluation procedure.A verification loss function is proposed for open-set face recognition to maximize partial area under theReceiver-Operating-Characteristic(ROC) curve, partial Area Under Curve(pAUC). A class-centerlearning method is also proposed to improve training efficiency, which is critical for the proposed lossfunction to be comparable to the identification loss in performance. Experimental results on five largescale unconstrained face recognition benchmarks show that the proposed method is highly competitivewith state-of-the-art face recognition methods.KeywordsKeywords::face recognition;partial Area Under Curve optimization;loss function;class centers 目前使用深层卷积神经网络(Deep Convolution Neural Networks,DCNNs)的嵌入进行人脸表征是人脸识别的首选方法[1-4]。
Comment on 100% accuracy in automatic face recognition(摘要) 詹金斯和伯顿报道说,图像平均技术把人脸自动识别提高到100%并且同时被应用到图片身份识别,我们怀疑在身份识别问题上,图像平均技术的可行性不充分,没有足够的证据支持。
在自动面部识别里,画廊里的面部图像是为后续搜索中第一个被注册和编码的。
然后就把探测图像和画廊里的任意一张编码图像进行比较,当能够匹配并且被识别的时候,就标记一下。
在目前的研究当中,詹金斯和伯顿使用名人的照片作为探测图像来测试宗谱网站使用的FaceV ACS系统的匹配命中率。
把每个名人的探测图像合并起来变成一个新的探测图像把探测数据库的整体命中率从54%提高到100%。
作者因此得出结论,把图像平均的过程可以显著提高自动脸部识别并且推断把平均脸应用到身份证明文件里会大大减少面部识别时发生的错误。
仅仅因为图像平均合并了一些可辨认的图片,詹金斯和伯顿就认为100%精确度是可以达到的。
为了减轻这种担忧,作者只是用了在研究1里不能被辨别的图片进行试验,使命中率从0%提高到80%。
因此他们把提高的精确度归因为平均过程。
然而,这个提高在一定程度上可以归因为在平均过程之前的手动面部注册,它准确的纠正了面部外形以至于使所有的脸部探测图像都是正面的,都在向一个标准靠拢。
通过很大程度上减少图片的可变性,图像配准过程可以把以前不能辨别出来的照片改变成可以变得的照片。
此外,标准的注册脸可能促进了自动脸的发现和检测算法的正常处理,这也可能提高了命中率。
因此,注册技术辅助图像平均来提高命中率也是可能的。
詹金斯和伯顿认为,图像平均提高了人脸的稳定性能。
然而对此的解释是被夸张了的。
平均图像被应用在身份证明文件里就能减少识别上的错误这个论断缺乏足够的证据,尤其是因为这不是一个通过已经做过的实验可以得到的等效的任务。
特别的,这个实验使用在线数据库作为画廊,使用平均图像作为探针,并且在线识别系统仅仅只返回数据库里最接近的匹配照片。
Resampling for Face RecognitionXiaoguang Lu and Anil K.JainDept.of Computer Science&Engineering,Michigan State UniversityEast Lansing,MI48824{lvxiaogu,jain}@Abstract.A number of applications require robust human face recog-nition under varying environmental lighting conditions and different fa-cial expressions,which considerably vary the appearance of human face.However,in many face recognition applications,only a small number oftraining samples for each subject are available;these samples are not ableto capture all the facial appearance variations.We utilize the resamplingtechniques to generate several subsets of samples from the original train-ing dataset.A classic appearance-based recognizer,LDA-based classifier,is applied to each of the generated subsets to construct a LDA represen-tation for face recognition.The classification results from each subset areintegrated by two strategies:majority voting and the sum rule.Experi-ments conducted on a face database containing206subjects(2,060faceimages)show that the proposed approaches improve the recognition ac-curacy of the classical LDA-based face classifier by about7percentages.1IntroductionHuman face recognition has been drawing a lot of attention in the past decade.A number of face recognition algorithms have been investigated[21]and several commercial face recognition products[9][20]are available.However,robust face recognition in unconstrained environments is still a very challenging problem.A face recognition system has two stages,training and test.In real appli-cations,current face recognition systems encounter difficulties due to the small number of available training face images and complicated facial variations dur-ing the testing stage.In other words,available training samples are not very representative.Human face appearance has a lot of variations resulting from varying lighting conditions,different head poses and facial expressions.Exam-ples of these variations for one subject are illustrated in Fig.1.In real-world situations,only a small number of samples for each subject are available for training.These samples cannot capture all the possible facial variations.Among the face recognition algorithms,appearance-based approaches[2][19], which utilize the intensity or intensity-derived features of original images,have been successfully developed[21][13].The dimensionality of the feature vector This research was supported by NSF IUC on Biometrics(CITeR),at West Virginia University.used by these methods is often very high while the training sample size is rela-tively small.The classifier based on such training data may be biased and have a large variance,resulting in a poor performance[10][17].To improve the per-formance of the weak classifiers,a number of approaches have been presented [4][6][8].Breiman[4]proposed a bootstrapping-aggregating(bagging)method. The training set is randomly resampled with replacement to generate indepen-dent bootstrap replicates.A classifier is developed based on each replicate.Fi-nally,the majority voting is applied to integrate results of all the classifiers. Freund and Schapire[6]have developed a boosting algorithm,which trains a se-ries of classifiers based on the reweighted training set in a sequential mode.The final decision is made by majority voting.In the random subspace method[8], classifiers are constructed in the random subspaces of the feature space.Simple majority voting is used as thefinal decision rule.Skurichina and Duin analyzed these methods for linear classifiers[17][16].Bolle et al.[3]used the bootstrap techniques for evaluating authentication systems.The boosting has been used to several applications,such as text categorization[15]and image retrieval[18]. Guo and Zhang[7]applied boosting for fast face recognition.Fig.1.Facial variations under different lighting conditions and facial expressions for the same subject[1]We propose a resampling-integration scheme for face recognition.A resam-pling technique is utilized to generate a number of subsets from the original training dataset.A classic appearance-based face recognizer based on the LDA representation is constructed on each of the generated subsets.Two integration strategies,majority voting and the sum rule,are used to combine the classifica-tion results to arrive at thefinal decision.In section2,the resampling and integration scheme is presented.Section3 provides the experimental results and discussion.Conclusions are summarized in section4.2Resampling and Integration2.1System OverviewOur resampling-integration scheme is illustrated in Fig.2.The training dataset contains a small number of sample face images.A number of subsets are gener-ated by resampling the training set.Each subset S i is used to train a classifier C i.In the test stage,the test image is loaded into each component classifier.Two strategies,(i)simple majority voting,and(ii)the sum rule,are used to integrate the outputs of component classifiers;the classifier outputs can be either the clas-sification labels or the matching scores.Currently,the face recognizer based onLDA representation is used as the component classifier,but this framework does not limit the component classifiers to be of the same type.Fig.2.The Resampling-Integration scheme for face recognition.S1to S K are the subsets resampled from the original training dataset.C1to C K are classifiers trained using the corresponding subsets.Here,K is the total number of subsets2.2LDA-based Face ClassifierA two-dimensional face image is considered as a vector,by concatenating each row(or column)of the image.Let X=(x1,x2,...,x i,...,x N)denote the data matrix,where N is the number of face images in the training set.Each x i is a face vector of dimension n,concatenated from a p×p face image,where n represents the total number of pixels in the face image and n=p×p.The Linear Discriminant Analysis(LDA)[5][2]representation is a linear transformation from the original image vector to a projection feature vector,i.e.Y=W T LDA X,(1) where Y is the d×N feature vector matrix,d is the dimension of the feature vector,d n and W LDA is the transformation matrix,derived byW LDA=arg maxW W T S B WW T S W W,(2)where S B is the between-class scatter matrix and S W is the within-class scattermatrix,S B=ci=1N i(x i−m)(x i−m)T,(3)S W =c i =1x k ∈X i (x k −m i )(x k −m i )T .(4)In the above expression,N i is the number of training samples in class i ;c is the number of distinct classes;m is the mean vector of all the samples,i.e.,m = N i =1x i ;m i is the mean vector of samples belonging to class i and X i represents the set of samples belonging to class i .In the face recognition problem,if the within-class scatter matrix S W is singular,due to the facts that the rank of S W is at most (N −c )and the number of training samples is generally less than the dimensionality of the face image (number of pixels),PCA [19]can be used to reduce the dimensionality of the original face image space [2]prior to applying LDA.LDA derives a low dimensional representation of a high dimensional face feature vector space.The face vector is projected by the transformation matrix W LDA .The projection coefficients are used as the feature representation of each face image.The matching score between the test face image and the training image is calculated as the cosine value of the angle between their coefficients vectors.A larger matching score means a better match.2.3ResamplingThe resampling module generates a number of subsets from the original training set.A number of resampling methods have been proposed in the literatures.For instance,in classic bagging [4],a random sampling with replacement is used to generate independent bootstrap replicates where the size of the subset is the same as that of the original set.In the LDA based face recognition,both intra-and inter-class information (between-class scatter matrix and within-class scatter matrix)are utilized,so our sampling strategy does not randomly sample the whole training set,but does randomly sampling within each class (subject),subject to the following conditions:1.The number of sample images for each subject in the subset is equal or as equal as possible.2.Sampling within each class is achieved based on a uniform distribution.The requirements listed above may not be the optimal ones,but work well as demonstrated by the empirical evaluation.2.4IntegrationAfter resampling,several LDA-based classifiers are constructed.The matching scores between the test face image and the training images are computed by each component classifier.Let MS (i,j )be the matching score between the test image and the j th training image,calculated by the i th component classifier.For the i th component classifier,the classification result for the test image is the subject label,denoted by Label (i ).This classification can be achieved by the nearest neighbor rule.Two strategies for integration are applied,namely the simple majority voting and the sum rule.1.Simple majority votingAssign the test image with the label which appears most frequently in Label(i),where i=1...K.2.The sum ruleCalculate MS j= Ki=1MS(i,j).Assign the test image with the label of theJ th training image,such thatJ=arg maxjMS j.(5)The integration rules may not give desired results when the number of com-ponent classifiers(K)is too small.But due to the resampling scheme presented, here K could be as large as needed.3Experiments and DiscussionOur database is a union of four different face databases,which are available in the public domain(see table1).It contains2,060face images of206subjects, with10images per subject.The set of face images contains variations in pose, illumination and expression.Some images in the individual databases were not selected for our experiments because they either had out-of-plane rotation by more than45degrees or were occluded due to sun glasses or a scarf.Sample images from the databases are shown in Fig.3.Face images are closely cropped to include only the internal facial structures such as the eyebrows,eyes,nose and mouth,and aligned by the centers of the two eyes.All cropped images are resized to42×42pixels.Each image vector is normalized to be of unit length.Table1.Database descriptionFace database number of subjects Variations includedORL[14]40Slight pose and expressionYale[1]15Illumination and expressionAR[12]120Illumination and expressionNLPR+MSU31Slight pose and expression(collected by the authors)The entire face database is divided into two parts.Nine images of each subject are used to construct the original training data and the remaining one is used for testing.This partition is repeated10different times so that every image of the subject can be used for testing.The recognition accuracy is the average of these ten different test sets.In resampling,8of9images for each subject are randomly selected according to the uniformly distributed seeds between1and9.The sampling is without replacement.Each subject has different random generatedFig.3.Representative face images in the database.(a)ORL,(b)Yale,(c)AR and(d) NLPR+MSUseeds.Consequently,each resampled subset contains8x206=1,648images.The LDA-based classifier is trained on this subset.The component classifiers compute the cosine value of the angle between the two projection coefficients vectors(one from the test image and the other from the database image)as the matching score.Database image with the best match is used to determine the classification result of the input image from the com-ponent classifier.The recognition accuracy of different face recognition schemes is listed in table2.Figure4shows some images which were misclassified by the classic LDA-based face recognizer but correctly classified using the proposed scheme.Table2.Recognition accuracy(The number of resampled subsets,K=20.)Without resampling Resampling+Majority Voting Resampling+Sum rule81.0%88.7%87.9%Fig.4.Examples which are misclassified by classic LDA-based face recognizer but correctly classified using the proposed schemeThe number of subsets,K,is decided empirically.In order to analyze the influence of K in our scheme,we conducted experiments with different settings of K values,from1to20.Figure5demonstrates the recognition accuracy of theproposed recognition schemes as the number of subsets changes.These results show that the proposed resampling-integration scheme generally improves the performance of the LDA-based face classifier as K increases up to20.Fig.5.Recognition accuracy with respect to the number of subsets4Conclusions and Future WorkThe resampling-integration scheme is proposed to improve the recognition ac-curacy of face classification.The resampling is designed to generate a number of subsets,which are used to train the component classifiers(parameter adjust-ment).The integration rules are applied to combine the outputs of component classifiers for thefinal decision.Two integration rules are presented and cor-responding experiments are carried out.The LDA-based face classifier is inte-grated into the scheme with the corresponding resampling design.Experiments conducted on a face database containing206subjects(2,060face images)show that the recognition accuracy of the classical LDA-based face classifier is im-proved by applying the proposed scheme.The system framework is scalable in terms of the number of subsets,the type of component classifiers and resampling techniques.Different resampling techniques can be explored in this scheme.Since the subsets are resampled randomly,the resulting component classifiers may have different weights in thefinal decision.Some classifier selection technique can be applied.Although in our experiments,all component classifiers are LDA-based,the presented scheme does not limit the type of the component classifier. However,currently there is no guarantee that the proposed scheme always works for any type of classifiers.From the perspective of classifier combination,many other integration rules can be tried out[11].References1.Yale University face database.</projects/yalefaces/yalefaces.html>.2.P.N.Belhumeur,J.P.Hespanha,and D.J.Kriegman.Eigenfaces vs.Fisherfaces:Recognition using class specific linear projection.IEEE Trans.Pattern Analysisand Machine Intelligence,19(7):711–720,Jul.1997.3.R.Bolle,N.Ratha,and S.Pankanti.Evaluating authentication systems usingbootstrap confidence intervals.In Proc.1999IEEE Workshop on Automatic Iden-tification Advanced Technologies,pages9–13,Morristown NJ,1999.4.Leo Breiman.Bagging predictors.Machine Learning,24(2):123–140,1996.5.R.O.Duda,P.E.Hart,and D.G.Stork.Pattern Classification.Wiley,New York,2nd edition,2000.6.Yoav Freund and Robert E.Schapire.Experiments with a new boosting algorithm.In Proc.International Conference on Machine Learning,pages148–156,1996.7.Guo-Dong Guo and Hong-Jiang Zhang.Boosting for fast face recognition.InProc.IEEE ICCV Workshop on Recognition,Analysis,and Tracking of Faces andGestures in Real-Time Systems,pages96–100,2001.8.T.K.Ho.The random subspace method for constructing decision forests.IEEETrans.Pattern Analysis and Machine Intelligence,20(8):832–844,1998.9.Identix.</>.Minnetonka,MN.10. A.K.Jain and B.Chandrasekaran.Dimensionality and sample size considerationsin pattern recognition practice.Handbook of Statistics,P.R.Krishnaiah and L.N.Kanal(eds.),2:835–855,1987.11.J.Kittler,M.Hatef,R.Duin,and J.Matas.On combining classifiers.IEEE Trans.Pattern Analysis and Machine Intelligence,20(3):226–239,1998.12. A.M.Martinez and R.Benavente.The ar face database.CVC Tech.Report#24,Jun.1998.13.P.Jonathon Phillips,Hyeonjoon Moon,Syed A.Rizvi,and Patrick J.Rauss.Theferet evaluation methodology for face-recognition algorithms.IEEE Trans.PatternAnalysis and Machine Intelligence,22(10):1090–1104,2000.14.Ferdinando Samaria and Andy Harter.Parameterisation of a stochastic modelfor human face identification.In Proc.2nd IEEE Workshop on Applications ofComputer Vision,Sarasota FL,Dec.1994.15.R.E.Schapire and Y.Singer.Boostexter:A boosting-based system for text cate-gorization.Machine Learning,39(2-3):135–168,May/June2000.16.M.Skurichina and R.P.W.Duin.Bagging for linear classifiers.Pattern Recognition,31(7):909–930,1998.17.M.Skurichina and R.P.W.Duin.Bagging,boosting and the random subspacemethod for linear classifiers.Pattern Analysis and Applications,5(2):121–135,2002.18.K.Tieu and P.Viola.Boosting image retrieval.In Proc.CVPR,2000.19.M.Turk and A.Pentland.Eigenfaces for recognition.Journal of Cognitive Neu-roscience,3(1):71–86,Mar.1991.20.Viisage.</>.Littleton,MA.21.W.Zhao,R.Chellappa,A.Rosenfeld,and P.J.Phillips.Face recognition:A lit-erature survey.CVL Technical Report,University of Maryland,October2000.<ftp:///TRs/CVL-Reports-2000/TR4167-zhao.ps.gz>.。
⼈脸识别中的图像处理技术⼈脸识别中的图像处理技术 ⼈脸作为⼀种⾼普遍性、可以⾮接触式采集的重要⽣物特征,正被越来越多地⽤来进⾏⾝份鉴别。
下⾯是YJBYS⼩编搜索整理的关于⼈脸识别中的图像处理技术,欢迎参考阅读,希望对⼤家有所帮助!想了解更多相关信息请持续关注我们应届毕业⽣培训⽹! ⼈脸识别,特指利⽤分析⽐较⼈脸视觉特征信息进⾏⾝份鉴别的计算机技术。
⼈脸识别技术应⽤⼴泛,可⽤于安全验证系统、医学、档案管理、银⾏和海关的监控系统及⾃动门禁系统等[1]。
与利⽤指纹、虹膜等其他⼈体⽣物特征进⾏⾝份识别的⽅法相⽐,⼈脸识别更加友好、⽅便和隐蔽。
因其巨⼤的应⽤前景,以及其⽆可⽐拟的优越性,⼈脸识别越来越成为当前模式识别和⼈⼯智能领域的⼀个热点。
图像预处理是⼈脸识别过程中的⼀个重要环节。
输⼊图像由于图像采集环境的不同,往往存在有噪声,对⽐度不够等缺点。
为了保证⼈脸图像中⼈脸⼤⼩、位置以及⼈脸图像质量的⼀致性,必须对图像进⾏预处理。
1、⼈脸识别的基本内容和过程 ⼈脸识别(Face Recognition)⼀般可描述为:给定⼀静⽌或动态图像,利⽤已有的⼈脸数据库来确认图像中的⼀个或多个⼈。
从⼴义上讲,其研究内容包括以下五个⽅⾯: (1)⼈脸检测(Face Detection):即从各种不同的场景中检测出⼈脸的存在并确定其位置。
这⼀任务主要受光照、噪声、头部倾斜度以及各种遮挡的影响。
(2)⼈脸表征(Face Representation):即确定表⽰检测出的⼈脸和数据库中的⼰知⼈脸的描述⽅式。
通常的表⽰⽅法包括⼏何特征(如欧⽒距离、曲率、⾓度等)、代数特征(如矩阵特征⽮量)、固定特征模板、特征脸、云纹图等。
(3)⼈脸鉴别(Face identification):即通常所说的⼈脸识别,就是将待识别的⼈脸与数据库中的已知⼈脸⽐较,得出相关信息。
这⼀过程的核⼼是选择适当的⼈脸表⽰⽅式与匹配策略。
(4)表情分析(Facial expression Analysis):即对待识别⼈脸的表情进⾏分析,并对其加以分类。
不同年龄组下颌三组指标的比较英语In different age groups, the comparison of mandibular three groups of indicators.The mandible, or lower jaw, is an important anatomical structure that undergoes changes as individuals age. These changes can be assessed through various indicators, including size, shape, and bone density. Comparing these indicators across different age groups can provide valuable insights into the aging process and its impact on the mandible.One of the key indicators for comparison is the size of the mandible. Studies have shown that the mandible continues to grow and develop until early adulthood, after which it undergoes more subtle changes in size. By comparing the size of the mandible across different age groups, we can gain a better understanding of how it changes over time and how these changes may impact overall facial structure and function.Another important indicator is the shape of the mandible. As individuals age, the shape of the mandible can undergo significant changes due to factors such as bone resorption and dental changes. These changes can affect the overall appearance of the face and may also have implications for dental health and function. By comparing the shape of the mandible across different age groups, we can identify patterns of change and better understand the factors that contribute to these changes.Bone density is also a critical indicator for comparison. The mandible is a weight-bearing bone and is subject to changes in bone density over time. Comparing bone density across different age groups can provide valuable information about the impact of aging on the structural integrity of the mandible and its susceptibility to conditions such as osteoporosis.In conclusion, comparing the mandibular indicators across different age groups can provide valuable insights into the aging process and its impact on this importantanatomical structure. By examining size, shape, and bone density, we can better understand how the mandible changes over time and the implications of these changes for overall health and function.。
人脸面部识别算法的点英文回答:Facial recognition algorithms leverage computer vision and machine learning techniques to detect, analyze, and map facial features. These algorithms are often utilized in security, surveillance, and access control systems, as well as in consumer applications such as photo tagging andsocial media filters.The process of facial recognition typically involves the following steps:1. Face Detection: The algorithm identifies the presence of a face in an image or video frame.2. Feature Extraction: Key facial features, such as the eyes, nose, mouth, and other unique characteristics, are extracted and mapped.3. Feature Representation: The extracted features are converted into a numerical representation that can beeasily compared and processed.4. Matching: The feature representation of an unknown face is compared to a database of known faces to identify potential matches.Types of Facial Recognition Algorithms:There are several different types of facial recognition algorithms, including:Local Binary Patterns (LBP): LBP algorithms analyze the texture of facial features by comparing the brightness values of adjacent pixels.Scale Invariant Feature Transform (SIFT): SIFT algorithms detect and describe key points in an image, which are then used for matching.Histogram of Oriented Gradients (HOG): HOG algorithmscreate histograms of the gradients of image pixels, which are then used to extract features.Convolutional Neural Networks (CNNs): CNNs are deep learning algorithms that have achieved state-of-the-art performance in facial recognition tasks.Accuracy and Limitations:The accuracy of facial recognition algorithms depends on factors such as the quality of the input image, the algorithm itself, and the size and diversity of thetraining dataset. While facial recognition algorithms have made significant progress in recent years, they are not yet foolproof and can be susceptible to errors caused byfactors such as facial expressions, lighting conditions, and aging.Ethical Considerations:The use of facial recognition algorithms raises ethical concerns related to privacy, surveillance, and bias. It iscrucial to ensure that these algorithms are used in a responsible and transparent manner, with appropriate safeguards in place to protect individual rights and freedoms.中文回答:人脸面部识别算法要点。
Learning invariant representations and applicationsto face verificationQianli Liao,Joel Z Leibo,and Tomaso PoggioCenter for Brains,Minds and MachinesMcGovern Institute for Brain ResearchMassachusetts Institute of TechnologyCambridge MA02139lql@,jzleibo@,tp@AbstractOne approach to computer object recognition and modeling the brain’s ventralstream involves unsupervised learning of representations that are invariant to com-mon transformations.However,applications of these ideas have usually been lim-ited to2D affine transformations,e.g.,translation and scaling,since they are eas-iest to solve via convolution.In accord with a recent theory of transformation-invariance[1],we propose a model that,while capturing other common con-volutional networks as special cases,can also be used with arbitrary identity-preserving transformations.The model’s wiring can be learned from videos oftransforming objects—or any other grouping of images into sets by their depictedobject.Through a series of successively more complex empirical tests,we studythe invariance/discriminability properties of this model with respect to differenttransformations.First,we empirically confirm theoretical predictions(from[1])for the case of2D affine transformations.Next,we apply the model to non-affinetransformations;as expected,it performs well on face verification tasks requiringinvariance to the relatively smooth transformations of3D rotation-in-depth andchanges in illumination direction.Surprisingly,it can also tolerate clutter“trans-formations”which map an image of a face on one background to an image of thesame face on a different background.Motivated by these empiricalfindings,wetested the same model on face verification benchmark tasks from the computervision literature:Labeled Faces in the Wild,PubFig[2,3,4]and a new datasetwe gathered—achieving strong performance in these highly unconstrained casesas well.1IntroductionIn the real world,two images of the same object may only be related by a very complicated and highly nonlinear transformation.Far beyond the well-studied2D affine transformations,objects may rotate in depth,receive illumination from new directions,or become embedded on different backgrounds;they might even break into pieces or deform—melting like Salvador Dali’s pocket watch[5]—and still maintain their identity.Two images of the same face could be related by the transformation from frowning to smiling or from youth to old age.This notion of an identity-preserving transformation is considerably more expansive than those normally considered in com-puter vision.We argue that there is much to be gained from pushing the theory(and practice)of transformation-invariant recognition to accommodate this unconstrained notion of a transformation. Throughout this paper we use the formalism for describing transformation-invariant hierarchical architectures developed by Poggio et al.(2012).In[1],the authors propose a theory which,they argue,is general enough to explain the strong performance of convolutional architectures across awide range of tasks(e.g.[6,7,8])and possibly also the ventral stream.The theory is based on the premise that invariance to identity-preserving transformations is the crux of object recognition. The present paper has two primary points.First,we provide empirical support for Poggio et al.’s theory of invariance(which we review in section2)and show how various pooling methods for convolutional networks can all be understood as building invariance since they are all equivalent to special cases of the model we study here.We also measure the model’s invariance/discriminability with face-matching tasks.Our use of computer-generated image datasets lets us completely control the transformations appearing in each test,thereby allowing us to measure properties of the repre-sentation for each transformation independently.Wefind that the representation performs well even when it is applied to transformations for which there are no theoretical guarantees—e.g.,the clutter “transformation”which maps an image of a face on one background to the same face on a different background.Motivated by the empiricalfinding of strong performance with far less constrained transformations than those captured by the theory,in the paper’s second half we apply the same approach to face-verification benchmark tasks from the computer vision literature:Labeled Faces in the Wild,Pub-Fig[2,3,4],and a new dataset we gathered.All of these datasets consist of photographs taken under natural conditions(gathered from the internet).Wefind that,despite the use of a very simple classifier—thresholding the angle between face representations—our approach still achieves results that compare favorably with the current state of the art and even exceed it in some cases.2Template-based invariant encodings for objects unseen during trainingWe conjecture that achieving invariance to identity-preserving transformations without losing dis-criminability is the crux of object recognition.In the following we will consider a very expansive notion of‘transformation’,butfirst,in this section we develop the theory for2D affine transforma-tions1.Our aim is to compute a unique signature for each image x that is invariant with respect to a group of transformations G.We consider the orbit{gx|g2G}of x under the action of the group.In this section,G is the2D affine group so its elements correspond to translations,scalings,and in-plane rotations of the image(notice that we use g to denote both elements of G and their representations, acting on vectors).We regard two images as equivalent if they are part of the same orbit,that is,if they are transformed versions of one another(x0=gx for some g2G).The orbit of an image is itself invariant with respect to the group.For example,the set of images obtained by rotating x is exactly the same as the set of images obtained by rotating gx.The orbit is also unique for each object:the set of images obtained by rotating x only intersects with the set of images obtained by rotating x0when x0=gx.Thus,an intuitive method of obtaining an invariant signature for an image,unique to each object,is just to check which orbit it belongs to.We can assume access to a stored set of orbits of template images⌧k;these template orbits could have been acquired by unsupervised learning—possibly by observing objects transform and associating temporally adjacent frames(e.g.[9,10]).The key fact enabling this approach to object recognition is this:It is not necessary to have all the template orbits beforehand.Even with a small,sampled,set of template orbits,not including the actual orbit of x,we can still compute an invariant signature.Observe that when g is unitary h gx,⌧k i=h x,g 1⌧k i.That is,the inner product of the transformed image with a template is the same as the inner product of the image with a transformed template.This is true regardless of whether x is in the orbit of⌧k or not.In fact,the test image need not resemble any of the templates (see[11,12,13,1]).Consider g t⌧k to be a realization of a random variable.For a set{g t⌧k,|t=1,...,T}of images sampled from the orbit of the template⌧k,the distribution of h x,g t⌧k i is invariant and unique to each object.See[1]for a proof of this fact in the case that G is the group of2D affine transformations.1See[1]for a more complete exposition of the theory.of an invariant.Fol-(1)where is a smooth version of the step function ( (x )=0for x 0, (x )=1for x >0), is the resolution (bin-width)parameter and n =1,...,N .Figure 1shows the results of an experiment demonstrating that the µk n (x )are invariant to translation and in-plane rotation.Since each face has its own characteristic empirical distribution function,it also shows that these signatures could be used to discriminate between them.Table 1reports the average Kolmogorov-Smirnov (KS)statistics comparing signatures for images of the same face,and for different faces:Mean (KS same )⇠0=)invariance and Mean (KS different )>0=)discriminability.1Figure 1:Example signatures (empirical distribution functions—CDFs)of images depicting two different faces under affine transformations.(A)shows in-plane rotations.Signatures for the upper and lower face are shown in red and purple respectively.(B)Shows the analogous experiment with translated faces.Note:In order to highlight the difference between the two distributions,the axes do not start at 0.Since the distribution of the h x,g t ⌧k i is invariant,we have many choices of possible signatures.Most notably,we can choose any of its statistical moments and these may also be invariant—or nearly so—in order to be discriminative and “invariant for a task”it only need be the case that for each k ,the distributions of the h x,g t ⌧k i have different moments.It turns out that many different convolutional networks can be understood in this framework 2.The differences between them cor-respond to different choices of 1.the set of template orbits (which group),2.the inner product (more generally,we consider the template response function g ⌧k (·):=f (h ·,g t ⌧k i ),for a possibly non-linear function f —see [1])and 3.the moment used for the signature.For example,a simple neural-networks-style convolutional net with one convolutional layer and one subsampling layer (no bias term)is obtained by choosing G =translations and µk (x )=mean (·).The k -th filter is the template ⌧k .The network’s nonlinearity could be captured by choosing g ⌧k (x )=tanh(x ·g ⌧k );note the similarity to Eq.(1).Similar descriptions could be given for modern convolutional nets,e.g.[6,7,11].It is also possible to capture HMAX [14,15]and related models (e.g.[16])with this framework.The “simple cells”compute normalized dot products or Gaussian radial basis functions of their inputs with stored templates and “complex cells”compute,for example,µk (x )=max(·).The templates are normally obtained by translation or scaling of a set of fixed patterns,often Gabor functions at the first layer and patches of natural images in subsequent layers.3Invariance to non-affine transformationsThe theory of [1]only guarantees that this approach will achieve invariance (and discriminability)in the case of affine transformations.However,many researchers have shown good performance of related architectures on object recognition tasks that seem to require invariance to non-affine trans-formations (e.g.[17,18,19]).One possibility is that achieving invariance to affine transformations2The computation can be made hierarchical by using the signature as the input to a subsequent layer.of the full object recognition problem.While not dismissing that approximateinvariance to many non-affine transformations can be achieved as long as the system’s operation is restricted to certain nice object classes[20,21,22].A nice class with respect to a transformation G(not necessarily a group)is a set of objects that alltransform similarly to one another under the action of G.For example,the2D transformation map-ping a profile view of one person’s face to its frontal view is similar to the analogous transformationof another person’s face in this sense.The two transformations will not be exactly the same since anytwo faces differ in their exact3D structure,but all faces do approximately share a gross3D structure,so the transformations of two different faces will not be as different from one another as would,forexample,the image transformations evoked by3D rotation of a chair versus the analogous rotationof a clock.Faces are the prototypical example of a class of objects that is nice with respect to manytransformations3.(A) ROTATION IN DEPTH (B) ILLUMINATIONFigure2:Example signatures(empirical distribution functions)of images depicting two differentfaces under non-affine transformations:(A)Rotation in depth.(B)Changing the illumination direc-tion(lighting from above or below).Figure2shows that unlike in the affine case,the signature of a test face with respect to template facesat different orientations(3D rotation in depth)or illumination conditions is not perfectly invariant(KS same>0),though it still tolerates substantial transformations.These signatures are also use-ful for discriminating faces since the empirical distribution functions are considerably more variedbetween faces than they are across images of the same face(Mean(KS different)>Mean(KS same),table1).Table2reports the ratios of within-class discriminability(negatively related to invari-ance)and between-class discriminability for moment-signatures.Lower values indicate both bettertransformation-tolerance and stronger discriminability.Transformation Mean(KS same)Mean(KS different)Translation0.0000 1.9420In-plane rotation0.216019.1897Out-of-plane rotation 2.8698 5.2950Illumination 1.9636 2.8809Table1:Average Kolmogorov-Smirnov statistics comparing the distributions of normalized innerproducts across transformations and across objects(faces).Transformation MEAN L1L2L5MAXTranslation0.00000.00000.00000.00000.0000In-plane rotation0.00310.00310.00330.00420.0030Out-of-plane rotation0.30450.30450.30160.29230.1943Illumination0.71970.71970.69940.64050.2726 Table2:Table of ratios of“within-class discriminability”to“between-class discriminability”forone template kµ(x i) µ(x j)k2.within:x i,x j depict the same face,and between:x i,x j depictdifferent faces.Columns are different statistical moments used for pooling(computingµ(x)).3It is interesting to consider the possibility that faces co-evolved along with natural visual systems in order to be highly recognizable.4Towards the fully unconstrained taskThefinding that this templates-and-signatures approach works well even in the difficult cases of3D-rotation and illumination motivates us to see how far we can push it.We would like to accommodate a totally-unconstrained notion of invariance to identity-preserving transformations.In particular, we investigate the possibility of computing signatures that are invariant to all the task-irrelevant variability in the datasets used for serious computer vision benchmarks.In the present paper we focus on the problem of face-verification(also called pair-matching).Given two images of new faces,never encountered during training,the task is to decide if they depict the same person or not. We used the following procedure to test the templates-and-signatures approach on face verification problems using a variety of different datasets(seefig.4A).First,all images were preprocessed with low-level features(e.g.,histograms of oriented gradients(HOG)[23]),followed by PCA using all the images in the training set and z-score-normalization4.At test-time,the k-th element of the signature of an image x is obtained byfirst computing all the h x,g t⌧k i where g t⌧k is the t-th image of the k-th template person—both encoded by their projection onto the training set’s principal components—then pooling the results.We used h·,·i=normalized dot product,andµk(x)=mean(·).At test time,the classifier receives images of two faces and must classify them as either depicting the same person or not.We used a simple classifier that merely computes the angle between the signatures of the two faces(via a normalized dot product)and responds“same”if it is above afixed threshold or“different”if below threshold.We chose such a weak classifier since the goal of these simulations was to assess the value of the signature as a feature representation.We expect that the overall performance levels could be improved for most of these tasks by using a more sophisticated classifier5.We also note that,after extracting low-level features,the entire system only employs two operations:normalized dot products and pooling.The images in the Labeled Faces in the Wild(LFW)dataset vary along so many different dimensions that it is difficult to try to give an exhaustive list.It contains natural variability in,at least,pose, lighting,facial expression,and background[2](example images infig.3).We argue here that LFW and the controlled synthetic data problems we studied up to now are different in two primary ways. First,in unconstrained tasks like LFW,you cannot rely on having seen all the transformations of any template.Recall,the theory of[1]relies on previous experience with all the transformations of tem-plate images in order to recognize test images invariantly to the same transformations.Since LFW is totally unconstrained,any subset of it used for training will never contain all the transformations that will be encountered at test time.Continuing to abuse the notation from section2,we can say that the LFW database only samples a small subset of G,which is now the set of all transformations that occur in LFW.That is,for any two images in LFW,x and x0,only a small(relative to|G|)subset of their orbits are in LFW.Moreover,{g|gx2LFW}and{g0|g0x02LFW}almost surely do not overlap with one another6.The second important way in which LFW differs from our synthetic image sets is the presence of clutter.Each LFW face appears on many different backgrounds.It is commmon to consider clut-ter to be a separate problem from that of achieving transformation-invariance,indeed,[1]conjec-tures that the brain employs separate mechanisms,quite different from templates and pooling—e.g.4PCA reduces thefinal algorithm’s memory requirements.Additionally,it is much more plausible that the brain could store principal components than directly memorizing frames of past visual experience.A network of neurons with Hebbian synapses(modeled by Oja’s rule)—changing its weights online as images are presented—converges to the network that projects new inputs onto the eigenvectors of its past input’s covariance [24].See also[1]for discussion of this point in the context of the templates-and-signatures approach.5Our classifier is unsupervised in the sense that it doesn’t have any free parameters tofit on training data. However,our complete system is built using labeled data for the templates,so from that point-of-view it may be considered supervised.On the other hand,we also believe that it could be wired up by an unsupervised process—probably involving the association of temporally-adjacent frames—so there is also a sense in which the entire system could be considered,at least in principle,to be unsupervised.We might say that,insofar as our system models the ventral stream,we intend it as a(strong)claim about what the brain could learn via unsupervised mechanisms.6The brain also has to cope with sampling and its effects can be strikingly counterintuitive.For example, Afraz et al.showed that perceived gender of a face is strongly biased toward male or female at different locations in the visualfield;and that the spatial pattern of these biases was distinctive and stable over time for each individual[25].These perceptual heterogeneity effects could be due to the templates supporting the task differing in the precise positions(transformations)at which they were encountered during development.we removed connected subsets of each orbit.In both cases the test used the entire orbit and never contained any of the same faces as the training phase.It is arguable which case is a better model of the real situation,but we note that even in the worse case,performance is surprisingly high—even with large percentages of the orbit discarded.Figure 3C shows that signatures produced by pooling over clutter conditions give good performance on a face-verification task with faces embedded on ing templates with the appropriate background size for each test,we show that our models continue to perform well as we increase the size of the background while the performance of standard HOG features declines.204050556065707580859095Percentage A c c u r a c y Consecutive240.50.60.70.80.91A U CBackground (A) LFW IMAGES (B) NON-UNIFORM SAMPLING (C) BACKGROUND VARIATION TASK Figure 3:(A)Example images from Labeled Faces in the Wild.(B)Non-uniform sampling sim-ulation.The abscissa is the percentage of frames discarded from each template’s transformation sequence,the ordinate is the accuracy on the face verification task.(C)Pooling over variation in the background.The abscissa is the background size (10scales),and the ordinate is the area under the ROC curve (AUC)for the face verification task.5Computer vision benchmarks:LFW,PubFig,and SUFR-WAn implication of the argument in sections 2and 4,is that there needs to be a reasonable number of images sampled from each template’s orbit.Despite the fact that we are now considering a totally unconstrained set of transformations,i.e.any number of samples is going to be small relative to |G |,we found that approximately 15images g t ⌧k per face is enough for all the face verification tasks we considered.15is a surprisingly manageable number,however,it is still more images than LFW has for most individuals.We also used the PubFig83dataset,which has the same problem as LFW,and a subset of the original PubFig dataset.In order to ensure we would have enough images from each template orbit,we gathered a new dataset—SUFR-W 8—with ⇠12,500images,depicting 450individuals.The new dataset contains similar variability to LFW and PubFig but tends to have more images per individual than LFW (there are at least 15images of each individual).The new dataset does not contain any of the same individuals that appear in either LFW or PubFig/PubFig83.7We obtained 3D models of faces from FaceGen (Singular Inversions Inc.)and rendered them with Blender ().8See paper [26]for details.Data available at /0.2Our Model w/ scrambledidentities --- AUC: 0.681Our Model w/ random noise templates--- AUC: 0.649(a) Inputs (b) Features (c) Signatures (d) VerificationHOGHOGPerson 1Person 2Person 3Person 4emplate preparation esting Figure 4:(A)Illustration of the model’s processing pipeline.(B)ROC curves for the new dataset using templates from the training set.The second model (red)is a control model that uses HOG features directly.The third (control)model pools over random images in the dataset (as opposed to images depicting the same person).The fourth model pools over random noise images.1. Detection2. Alignment3. RecognitionSignatureFigure 5:(A)The complete pipeline used for all experiments.(B)The performance of four different models on PubFig83,our new dataset,PubFig and LFW.For these experiments,Local Binary Pat-terns (LBP),Local Phase Quantization (LPQ),Local Ternary Patterns (LTP)were used [27,28,29];they all perform very similarly to HOG—just slightly better (⇠1%).These experiments used non-detected and non-aligned face images as inputs—thus the errors include detection and alignment errors (about 1.5%of faces are not detected and 6-7%of the detected faces are significantly mis-aligned).In all cases,templates were obtained from our new dataset (excluding 30images for a testing set).This sacrifices some performance (⇠1%)on each dataset but prevents overfitting:we ran the exact same model on all 4datasets .(C)The ROC curves of the best model in each dataset.Figure 4B shows ROC curves for face verification with the new dataset.The blue curve is our model.The purple and green curves are control experiments that pool over images depicting different indi-viduals,and random noise templates respectively.Both control models performed worse than raw HOG features (red curve).For all our PubFig,PubFig83and LFW experiments (Fig.5),we ignored the provided training data.Instead,we obtained templates from our new dataset.For consistency,we applied the same detection/alignment to all images.The alignment method we used ([30])produced images that were somewhat more variable than the method used by the authors of the LFW dataset (LFW-a)—the performance of our simple classifier using raw HOG features on LFW is 73.3%,while on LFW-a it is 75.6%.Even with the very simple classifier,our system’s performance still compares favorably with the current state of the art.In the case of LFW,our model’s performance exceeds the current state-of-the-art for an unsupervised system (86.2%using LQP —Local Quantized Patterns [31]—Note:these features are not publicly available;otherwise we would have tried using them for preprocess-ing),though the best supervised systems do better9.The strongest result in the literature for face verification with PubFig8310is70.2%[4]—which is6.2%lower than our best model.6DiscussionThe templates-and-signatures approach to recognition permits many seemingly-different convolu-tional networks(e.g.ConvNets and HMAX)to be understood in a common framework.We have argued here that the recent strong performance of convolutional networks across a variety of tasks (e.g.,[6,7,8])is explained because all these problems share a common computational crux:the need to achieve representations that are invariant to identity-preserving transformations.We argued that when studying invariance,the appropriate mathematical objects to consider are the orbits of images under the action of a transformation and their associated probability distribu-tions.The probability distributions(and hence the orbits)can be characterized by one-dimensional projections—thus justifying the choice of the empirical distribution function of inner products with template images as a representation for recognition.In this paper,we systematically investigated the properties of this representation for two affine and two non-affine transformations(tables1and 2).The same probability distribution could also be characterized by its statistical moments.Inter-estingly,we found when we considered more difficult tasks in the second half of the paper,rep-resentations based on statistical moments tended to outperform the empirical distribution function. There is a sense in which this result is surprising,since the empirical distribution function contains more invariant“information”than the moments—on the other hand,it could also be expected that the moments ought to be less noisy estimates of the underlying distribution.This is an interesting question for further theoretical and experimental work.Unlike most convolutional networks,our model has essentially no free parameters.In fact,the pipeline we used for most experiments actually has no operations at all besides normalized dot prod-ucts and pooling(also PCA when preparing templates).These operations are easily implemented by neurons[32].We could interpret the former as the operation of“simple cells”and the latter as “complex cells”—thus obtaining a similar view of the ventral stream to the one given by[33,16,14] (and many others).Despite the classifier’s simplicity,our model’s strong performance on face verification benchmark tasks is quite encouraging(Fig.5).Future work could extend this approach to other objects,and other tasks.Acknowledgments This material is based upon work supported by the Center for Brains,Minds and Machines(CBMM),funded by NSF STC award CCF-1231216.References[1]T.Poggio,J.Mutch,F.Anselmi,J.Z.Leibo,L.Rosasco,and A.Tacchetti,“The computational magic ofthe ventral stream:sketch of a theory(and why some deep architectures work),”MIT-CSAIL-TR-2012-035,2012.[2]G.B.Huang,M.Mattar,T.Berg,and E.Learned-Miller,“Labeled faces in the wild:A database for study-ing face recognition in unconstrained environments,”in Workshop on faces in real-life images:Detection, alignment and recognition(ECCV),(Marseille,Fr),2008.[3]N.Kumar,A.C.Berg,P.N.Belhumeur,and S.K.Nayar,“Attribute and Simile Classifiers for FaceVerification,”in IEEE International Conference on Computer Vision(ICCV),(Kyoto,JP),pp.365–372, Oct.2009.[4]N.Pinto,Z.Stone,T.Zickler,and D.D.Cox,“Scaling-up Biologically-Inspired Computer Vision:ACase-Study on Facebook,”in IEEE Computer Vision and Pattern Recognition,Workshop on Biologically Consistent Vision,2011.[5]S.Dali,“The persistence of memory(1931).”Museum of Modern Art,New York,NY.[6]A.Krizhevsky,I.Sutskever,and G.Hinton,“ImageNet classification with deep convolutional neuralnetworks,”in Advances in neural information processing systems,vol.25,(Lake Tahoe,CA),2012.9Note:Our method of testing does not strictly conform to the protocol recommended by the creators of LFW[2]:we re-aligned(worse)the faces.We also use the identities of the individuals during training.10The original PubFig dataset was only provided as a list of URLs from which the images could be down-loaded.Now only half the images remain available.On the original dataset,the strongest performance reported is78.7%[3].The authors of that study also made their features available,so we estimated the performance of their features on the available subset of images(using SVM).We found that an SVM classifier,using their features,and our cross-validation splits gets78.4%correct—3.3%lower than our best model.。
Abstract—The task of face recognition has been actively researched in recent years. This paper provides an up-to-date review of major human face recognition research. We first present an overview of face recognition and its applications. Then, a literature review of the most recent face recognition techniques is presented. Description and limitations of face databases which are used to test the performance of these face recognition algorithms are given. A brief summary of the face recognition vendor test (FRVT) 2002, a large scale evaluation of automatic face recognition technology, and its conclusions are also given. Finally, we give a summary of the research results.Keywords—Combined classifiers, face recognition, graph matching, neural networks.I.I NTRODUCTIONACE recognition is an important research problem spanning numerous fields and disciplines. This because face recognition, in additional to having numerous practical applications such as bankcard identification, access control, Mug shots searching, security monitoring, and surveillance system, is a fundamental human behaviour that is essential for effective communications and interactions among people.A formal method of classifying faces was first proposed in[1]. The author proposed collecting facial profiles as curves, finding their norm, and then classifying other profiles by their deviations from the norm. This classification is multi-modal, i.e. resulting in a vector of independent measures that could be compared with other vectors in a database.Progress has advanced to the point that face recognition systems are being demonstrated in real-world settings [2]. The rapid development of face recognition is due to a combination of factors: active development of algorithms, the availability of a large databases of facial images, and a method for evaluating the performance of face recognition algorithms.In the literatures, face recognition problem can be formulated as: given static (still) or video images of a scene, identify or verify one or more persons in the scene by comparing with faces stored in a database.When comparing person verification to face recognition, there are several aspects which differ. First, a client – an authorized user of a personal identification system – is Manuscript received February 22, 2005.A. S. Tolba is with the Information Systems Department, Mansoura University, Egypt, (e-mail: tolba1954@)).A. H. EL-Baz is with the Mathematics Department, Damietta Faculty of Science, New Damietta, Egypt, and doing PhD research on pattern recognition (phone: 0020-57-403980; Fax: 0020-57–403868; e-mail: ali_elbaz@).A. H. EL-Harby is with the Mathematics Department, Damietta Faculty of Science, New Damietta, Egypt, (e-mail: elharby@). assumed to be co-operative and makes an identity claim. Computationally this means that it is not necessary to consult the complete set of database images (denoted model images below) in order to verify a claim. An incoming image (referred to as a probe image) is thus compared to a small number of model images of the person whose identity is claimed and not, as in the recognition scenario, with every image (or some descriptor of an image) in a potentially large database. Second, an automatic authentication system must operate in near-real time to be acceptable to users. Finally, in recognition experiments, only images of people from the training database are presented to the system, whereas the case of an imposter (most likely a previously unseen person) is of utmost importance for authentication.Face recognition is a biometric approach that employs automated methods to verify or recognize the identity of a living person based on his/her physiological characteristics. In general, a biometric identification system makes use of either physiological characteristics (such as a fingerprint, iris pattern, or face) or behaviour patterns (such as hand-writing, voice, or key-stroke pattern) to identify a person. Because of human inherent protectiveness of his/her eyes, some people are reluctant to use eye identification systems. Face recognition has the benefit of being a passive, non intrusive system to verify personal identity in a “natural” and friendly way.In general, biometric devices can be explained with a three-step procedure (1) a sensor takes an observation. The type of sensor and its observation depend on the type of biometric devices used. This observation gives us a “Biometric Signature” of the individual. (2) a computer algorithm “normalizes” the biometric signature so that it is in the same format (size, resolution, view, etc.) as the signatures on the system’s database. The normalization of the biometric signature gives us a “Normalized Signature” of the individual.(3) a matcher compares the normalized signature with the set (or sub-set) of normalized signatures on the system's database and provides a “similarity score” that compares the individual's normalized signature with each signature in the database set (or sub-set). What is then done with the similarity scores depends on the biometric system’s application?Face recognition starts with the detection of face patterns in sometimes cluttered scenes, proceeds by normalizing the face images to account for geometrical and illumination changes, possibly using information about the location and appearance of facial landmarks, identifies the faces using appropriate classification algorithms, and post processes the results using model-based schemes and logistic feedback [3].The application of face recognition technique can be categorized into two main parts: law enforcement application and commercial application. Face recognition technology isFace Recognition: A Literature ReviewA. S. Tolba, A.H. El-Baz, and A.A. El-HarbyFprimarily used in law enforcement applications, especially Mug shot albums (static matching) and video surveillance (real-time matching by video image sequences). The commercial applications range from static matching of photographs on credit cards, ATM cards, passports, driver’s licenses, and photo ID to real-time matching with still images or video image sequences for access control. Each application presents different constraints in terms of processing.All face recognition algorithms consistent of two major parts: (1) face detection and normalization and (2) face identification. Algorithms that consist of both parts are referred to as fully automatic algorithms and those that consist of only the second part are called partially automatic algorithms. Partially automatic algorithms are given a facial image and the coordinates of the center of the eyes. Fully automatic algorithms are only given facial images. On the other hand, the development of face recognition over the past years allows an organization into three types of recognition algorithms, namely frontal, profile, and view-tolerant recognition, depending on the kind of images and the recognition algorithms. While frontal recognition certainly is the classical approach, view-tolerant algorithms usually perform recognition in a more sophisticated fashion by taking into consideration some of the underlying physics, geometry, and statistics. Profile schemes as stand-alone systems have a rather marginal significance for identification, (for more detail see [4]). However, they are very practical either for fast coarse pre-searches of large face database to reduce the computational load for a subsequent sophisticated algorithm, or as part of a hybrid recognition scheme. Such hybrid approaches have a special status among face recognition systems as they combine different recognition approaches in an either serial or parallel order to overcome the shortcoming of the individual components.Another way to categorize face recognition techniques is to consider whether they are based on models or exemplars. Models are used in [5] to compute the Quotient Image, and in [6] to derive their Active Appearance Model. These models capture class information (the class face), and provide strong constraints when dealing with appearance variation. At the other extreme, exemplars may also be used for recognition. The ARENA method in [7] simply stores all training and matches each one against the task image. As far we can tell, current methods that employ models do not use exemplars, and vice versa. This is because these two approaches are by no means mutually exclusive. Recently, [8] proposed a way of combining models and exemplars for face recognition. In which, models are used to synthesize additional training images, which can then be used as exemplars in the learning stage of a face recognition system.Focusing on the aspect of pose invariance, face recognition approaches may be divided into two categories: (i) global approach and (ii) component-based approach. In global approach, a single feature vector that represents the whole face image is used as input to a classifier. Several classifiers have been proposed in the literature e.g. minimum distance classification in the eigenspace [9,10], Fisher’s discriminant analysis [11], and neural networks [12]. Global techniques work well for classifying frontal views of faces. However, they are not robust against pose changes since global features are highly sensitive to translation and rotation of the face. To avoid this problem an alignment stage can be added before classifying the face. Aligning an input face image with a reference face image requires computing correspondence between the two face images. The correspondence is usually determined for a small number of prominent points in the face like the center of the eye, the nostrils, or the corners of the mouth. Based on these correspondences, the input face image can be warped to a reference face image.In [13], an affine transformation is computed to perform the warping. Active shape models are used in [14] to align input faces with model faces. A semi-automatic alignment step in combination with support vector machines classification was proposed in [15]. An alternative to the global approach is to classify local facial components. The main idea of component based recognition is to compensate for pose changes by allowing a flexible geometrical relation between the components in the classification stage.In [16], face recognition was performed by independently matching templates of three facial regions (eyes, nose and mouth). The configuration of the components during classification was unconstrained since the system did not include a geometrical model of the face. A similar approach with an additional alignment stage was proposed in [17]. In [18], a geometrical model of a face was implemented by a 2D elastic graph. The recognition was based on wavelet coefficients that were computed on the nodes of the elastic graph. In [19], a window was shifted over the face image and the DCT coefficients computed within the window were fed into a 2D Hidden Markov Model.Face recognition research still face challenge in some specific domains such as pose and illumination changes. Although numerous methods have been proposed to solve such problems and have demonstrated significant promise, the difficulties still remain. For these reasons, the matching performance in current automatic face recognition is relatively poor compared to that achieved in fingerprint and iris matching, yet it may be the only available measuring tool for an application. Error rates of 2-25% are typical. It is effective if combined with other biometric measurements.Current systems work very well whenever the test image to be recognized is captured under conditions similar to those of the training images. However, they are not robust enough if there is variation between test and training images [20]. Changes in incident illumination, head pose, facial expression, hairstyle (include facial hair), cosmetics (including eyewear) and age, all confound the best systems today.As a general rule, we may categorize approaches used to cope with variation in appearance into three kinds: invariant features, canonical forms, and variation- modeling. The first approach seeks to utilize features that are invariant to the changes being studied. For instance, the Quotient Image [5] is (by construction) invariant to illumination and may be used to recognize faces (assumed to be Lambertian) when lighting conditions change.The second approach attempts to “normalize” away the variation, either by clever image transformations or by synthesizing a new image (from the given test image) in some“canonical” or “prototypical” form. Recognition is then performed using this canonical form. Examples of this approach include [21,22]. In [21], for instance, the test image under arbitrary illumination is re-rendered under frontal illumination, and then compared against other frontally-illuminated prototypes.The third approach of variation-modeling is self explanatory: the idea is to learn, in some suitable subspace, the extent of the variation in that space. This usually leads to some parameterization of the subspace(s). Recognition is then performed by choosing the subspace closest to the test image, after the latter has been appropriately mapped. In effect, the recognition step recovers the variation (e.g. pose estimation) as well as the identity of the person. For examples of this technique, see [18, 23, 24 and 25].Despite the plethora of techniques, and the valiant effort of many researchers, face recognition remains a difficult, unsolved problem in general. While each of the above approaches works well for the specific variation being studied, performance degrades rapidly when other variations are present. For instance, a feature invariant to illumination works well as long as pose or facial expression remains constant, but fails to be invariant when pose or expression is changed. This is not a problem for some applications, such as controlling access to a secured room, since both the training and test images may be captured under similar conditions. However, for general, unconstrained recognition, none of these techniques are robust enough.Moreover, it is not clear that different techniques can be combined to overcome each other’s limitations. Some techniques, by their very nature, exclude others. For example, the Symmetric Shape-from-Shading method of [22] relies on the approximate symmetry of a frontal face. It is unclear how this may be combined with a technique that depends on side profiles, where the symmetry is absent.We can make two important observations after surveying the research literature: (1) there does not appear to be any feature, set of features, or subspace that is simultaneously invariant to all the variations that a face image may exhibit, (2) given more training images, almost any technique will perform better. These two factors are the major reasons why face recognition is not widely used in real-world applications. The fact is that for many applications, it is usual to require the ability to recognize faces under different variations, even when training images are severely limited.II.L ITERATURE R EVIEW OF F ACE R ECOGNITION T ECHNIQUES This section gives an overview on the major human face recognition techniques that apply mostly to frontal faces, advantages and disadvantages of each method are also given. The methods considered are eigenfaces (eigenfeatures), neural networks, dynamic link architecture, hidden Markov model, geometrical feature matching, and template matching. The approaches are analyzed in terms of the facial representations they used.A.EigenfacesEigenface is one of the most thoroughly investigated approaches to face recognition. It is also known as Karhunen- Loève expansion, eigenpicture, eigenvector, and principal component. References [26, 27] used principal component analysis to efficiently represent pictures of faces. They argued that any face images could be approximately reconstructed by a small collection of weights for each face and a standard face picture (eigenpicture). The weights describing each face are obtained by projecting the face image onto the eigenpicture. Reference [28] used eigenfaces, which was motivated by the technique of Kirby and Sirovich, for face detection and identification.In mathematical terms, eigenfaces are the principal components of the distribution of faces, or the eigenvectors of the covariance matrix of the set of face images. The eigenvectors are ordered to represent different amounts of the variation, respectively, among the faces. Each face can be represented exactly by a linear combination of the eigenfaces. It can also be approximated using only the “best” eigenvectors with the largest eigenvalues. The best M eigenfaces construct an M dimensional space, i.e., the “face space”. The authors reported 96 percent, 85 percent, and 64 percent correct classifications averaged over lighting, orientation, and size variations, respectively. Their database contained 2,500 images of 16 individuals.As the images include a large quantity of background area, the above results are influenced by background. The authors explained the robust performance of the system under different lighting conditions by significant correlation between images with changes in illumination. However, [29] showed that the correlation between images of the whole faces is not efficient for satisfactory recognition performance. Illumination normalization [27] is usually necessary for the eigenfaces approach.Reference [30] proposed a new method to compute the covariance matrix using three images each was taken in different lighting conditions to account for arbitrary illumination effects, if the object is Lambertian. Reference [31] extended their early work on eigenface to eigenfeatures corresponding to face components, such as eyes, nose, and mouth. They used a modular eigenspace which was composed of the above eigenfeatures (i.e., eigeneyes, eigennose, and eigenmouth). This method would be less sensitive to appearance changes than the standard eigenface method. The system achieved a recognition rate of 95 percent on the FERET database of 7,562 images of approximately 3,000 individuals. In summary, eigenface appears as a fast, simple, and practical method. However, in general, it does not provide invariance over changes in scale and lighting conditions. Recently, in [32] experiments with ear and face recognition, using the standard principal component analysis approach , showed that the recognition performance is essentially identical using ear images or face images and combining the two for multimodal recognition results in a statistically significant performance improvement. For example, the difference in the rank-one recognition rate for the day variation experiment using the 197-image training sets is90.9% for the multimodal biometric versus 71.6% for the ear and 70.5% for the face.There is substantial related work in multimodal biometrics. For example [33] used face and fingerprint in multimodal biometric identification, and [34] used face and voice. However, use of the face and ear in combination seems more relevant to surveillance applications.B.Neural NetworksThe attractiveness of using neural networks could be due to its non linearity in the network. Hence, the feature extraction step may be more efficient than the linear Karhunen-Loève methods. One of the first artificial neural networks (ANN) techniques used for face recognition is a single layer adaptive network called WISARD which contains a separate network for each stored individual [35]. The way in constructing a neural network structure is crucial for successful recognition. It is very much dependent on the intended application. For face detection, multilayer perceptron [36] and convolutional neural network [37] have been applied. For face verification, [38] is a multi-resolution pyramid structure. Reference [37] proposed a hybrid neural network which combines local image sampling, a self-organizing map (SOM) neural network, and a convolutional neural network. The SOM provides a quantization of the image samples into a topological space where inputs that are nearby in the original space are also nearby in the output space, thereby providing dimension reduction and invariance to minor changes in the image sample. The convolutional network extracts successively larger features in a hierarchical set of layers and provides partial invariance to translation, rotation, scale, and deformation. The authors reported 96.2% correct recognition on ORL database of 400 images of 40 individuals.The classification time is less than 0.5 second, but the training time is as long as 4 hours. Reference [39] used probabilistic decision-based neural network (PDBNN) which inherited the modular structure from its predecessor, a decision based neural network (DBNN) [40]. The PDBNN can be applied effectively to 1) face detector: which finds the location of a human face in a cluttered image, 2) eye localizer: which determines the positions of both eyes in order to generate meaningful feature vectors, and 3) face recognizer. PDNN does not have a fully connected network topology. Instead, it divides the network into K subnets. Each subset is dedicated to recognize one person in the database. PDNN uses the Guassian activation function for its neurons, and the output of each “face subnet” is the weighted summation of the neuron outputs. In other words, the face subnet estimates the likelihood density using the popular mixture-of-Guassian model. Compared to the AWGN scheme, mixture of Guassian provides a much more flexible and complex model for approximating the time likelihood densities in the face space. The learning scheme of the PDNN consists of two phases, in the first phase; each subnet is trained by its own face images. In the second phase, called the decision-based learning, the subnet parameters may be trained by some particular samples from other face classes. The decision-based learning scheme does not use all the training samples for the training. Only misclassified patterns are used. If the sample is misclassified to the wrong subnet, the rightful subnet will tune its parameters so that its decision-region can be moved closer to the misclassified sample.PDBNN-based biometric identification system has the merits of both neural networks and statistical approaches, and its distributed computing principle is relatively easy to implement on parallel computer. In [39], it was reported that PDBNN face recognizer had the capability of recognizing up to 200 people and could achieve up to 96% correct recognition rate in approximately 1 second. However, when the number of persons increases, the computing expense will become more demanding. In general, neural network approaches encounter problems when the number of classes (i.e., individuals) increases. Moreover, they are not suitable for a single model image recognition test because multiple model images per person are necessary in order for training the systems to “optimal” parameter setting.C.Graph MatchingGraph matching is another approach to face recognition. Reference [41] presented a dynamic link structure for distortion invariant object recognition which employed elastic graph matching to find the closest stored graph. Dynamic link architecture is an extension to classical artificial neural networks. Memorized objects are represented by sparse graphs, whose vertices are labeled with a multiresolution description in terms of a local power spectrum and whose edges are labeled with geometrical distance vectors. Object recognition can be formulated as elastic graph matching which is performed by stochastic optimization of a matching cost function. They reported good results on a database of 87 people and a small set of office items comprising different expressions with a rotation of 15 degrees.The matching process is computationally expensive, taking about 25 seconds to compare with 87 stored objects on a parallel machine with 23 transputers. Reference [42] extended the technique and matched human faces against a gallery of 112 neutral frontal view faces. Probe images were distorted due to rotation in depth and changing facial expression. Encouraging results on faces with large rotation angles were obtained. They reported recognition rates of 86.5% and 66.4% for the matching tests of 111 faces of 15 degree rotation and 110 faces of 30 degree rotation to a gallery of 112 neutral frontal views. In general, dynamic link architecture is superior to other face recognition techniques in terms of rotation invariance; however, the matching process is computationally expensive.D.Hidden Markov Models (HMMs)Stochastic modeling of nonstationary vector time series based on (HMM) has been very successful for speech applications. Reference [43] applied this method to human face recognition. Faces were intuitively divided into regions such as the eyes, nose, mouth, etc., which can be associated with the states of a hidden Markov model. Since HMMs require a one-dimensional observation sequence and images are two-dimensional, the images should be converted into either 1D temporal sequences or 1D spatial sequences.In [44], a spatial observation sequence was extracted from a face image by using a band sampling technique. Each face image was represented by a 1D vector series of pixel observation. Each observation vector is a block of L lines and there is an M lines overlap between successive observations. An unknown test image is first sampled to an observation sequence. Then, it is matched against every HMMs in the model face database (each HMM represents a different subject). The match with the highest likelihood is considered the best match and the relevant model reveals the identity of the test face.The recognition rate of HMM approach is 87% using ORL database consisting of 400 images of 40 individuals. A pseudo 2D HMM [44] was reported to achieve a 95% recognition rate in their preliminary experiments. Its classification time and training time were not given (believed to be very expensive). The choice of parameters had been based on subjective intuition.E.Geometrical Feature MatchingGeometrical feature matching techniques are based on the computation of a set of geometrical features from the picture of a face. The fact that face recognition is possible even at coarse resolution as low as 8x6 pixels [45] when the single facial features are hardly revealed in detail, implies that the overall geometrical configuration of the face features is sufficient for recognition. The overall configuration can be described by a vector representing the position and size of the main facial features, such as eyes and eyebrows, nose, mouth, and the shape of face outline.One of the pioneering works on automated face recognition by using geometrical features was done by [46] in 1973. Their system achieved a peak performance of 75% recognition rate on a database of 20 people using two images per person, one as the model and the other as the test image. References [47,48] showed that a face recognition program provided with features extracted manually could perform recognition apparently with satisfactory results. Reference [49] automatically extracted a set of geometrical features from the picture of a face, such as nose width and length, mouth position, and chin shape. There were 35 features extracted form a 35 dimensional vector. The recognition was then performed with a Bayes classifier. They reported a recognition rate of 90% on a database of 47 people.Reference [50] introduced a mixture-distance technique which achieved 95% recognition rate on a query database of 685 individuals. Each face was represented by 30 manually extracted distances. Reference [51] used Gabor wavelet decomposition to detect feature points for each face image which greatly reduced the storage requirement for the database. Typically, 35-45 feature points per face were generated. The matching process utilized the information presented in a topological graphic representation of the feature points. After compensating for different centroid location, two cost values, the topological cost, and similarity cost, were evaluated. The recognition accuracy in terms of the best match to the right person was 86% and 94% of the correct person's faces was in the top three candidate matches.In summary, geometrical feature matching based on precisely measured distances between features may be most useful for finding possible matches in a large database such as a Mug shot album. However, it will be dependent on the accuracy of the feature location algorithms. Current automated face feature location algorithms do not provide a high degree of accuracy and require considerable computational time.F.Template MatchingA simple version of template matching is that a test image represented as a two-dimensional array of intensity values is compared using a suitable metric, such as the Euclidean distance, with a single template representing the whole face. There are several other more sophisticated versions of template matching on face recognition. One can use more than one face template from different viewpoints to represent an individual's face.A face from a single viewpoint can also be represented by a set of multiple distinctive smaller templates [49,52]. The face image of gray levels may also be properly processed before matching [53]. In [49], Bruneli and Poggio automatically selected a set of four features templates, i.e., the eyes, nose, mouth, and the whole face, for all of the available faces. They compared the performance of their geometrical matching algorithm and template matching algorithm on the same database of faces which contains 188 images of 47 individuals. The template matching was superior in recognition (100 percent recognition rate) to geometrical matching (90 percent recognition rate) and was also simpler. Since the principal components (also known as eigenfaces or eigenfeatures) are linear combinations of the templates in the data basis, the technique cannot achieve better results than correlation [49], but it may be less computationally expensive. One drawback of template matching is its computational complexity. Another problem lies in the description of these templates. Since the recognition system has to be tolerant to certain discrepancies between the template and the test image, this tolerance might average out the differences that make individual faces unique.In general, template-based approaches compared to feature matching are a more logical approach. In summary, no existing technique is free from limitations. Further efforts are required to improve the performances of face recognition techniques, especially in the wide range of environments encountered in real world.G.3D Morphable ModelThe morphable face model is based on a vector space representation of faces [54] that is constructed such that any convex combination of shape and texture vectors of a set of examples describes a realistic human face.Fitting the 3D morphable model to images can be used in two ways for recognition across different viewing conditions: Paradigm 1. After fitting the model, recognition can be based on model coefficients, which represent intrinsic shape and texture of faces, and are independent of the imaging conditions: Paradigm 2. Three-dimension face reconstruction can also be employed to generate synthetic views from gallery probe images [55-58]. The synthetic views are then。
关于面孔识别两种模型的比较作者:田海鹏来源:《中国校外教育·高教(下旬)》2014年第14期摘要:在面孔识别的理论模型中,Bruce-Young模型和交互激活竞争模型是最重要、影响最深远的两个模型。
从面孔识别的过程对两个模型进行了阐述,并对相异之处进行了比较。
关键词:面孔识别理论模型比较面孔是一种比较特殊的刺激,能为人们提供性别、年龄、情绪状态等丰富的信息。
在人际交往中,对交往对象的识别,很大程度上依靠面孔提供的各种信息。
对面孔识别的各种研究,从达尔文的时代就已经开始,到20世纪70年代,Ekman和Frisen系统的研究了人脸基本表情的跨文化一致性,并提出面部表情编码系统(FACS),到了20世纪80年代,Bruce和Young基于大量的行为实验、日常观察和临床结果提出了非常经典的面孔识别加工模型。
一直到现在,很多关于面孔加工的研究都是基于Bruce—Young的面孔加工模型。
一、Bruce-Young“多阶段”面孔加工模型根据Bruce和Young的描述,面孔识别是从一个面孔的信息中确认个体身份的认知过程,这个认知过程涉及关于该面孔的多种信心的提取以及加工,如面孔的图像信息、结构信息以及个人身份信息等。
更加广义的面孔识别还包括与识别个体身份信息无关的其他信息的加工,如面部孔情的识别等。
Bruce-Young面孔加工模型是其在综合了之前20多年的多个领域的研究包括理论的和实证的,提出的一个至今仍具广泛而深刻影响的面孔识别理论模型。
该模型描述了面孔识别所必须的各种信息的以及各种信息的加工过程,并提出了面孔加工的不同阶段。
在Bruce和Young的面孔加工模型中,面孔的加工分两个阶段:第一阶段,是面孔的结构编码阶段,在此阶段主要对面孔结构特征进行编码;第二阶段,为面孔特征的识别阶段,主要是对面孔的非结构特征(如面孔的表情等)以及面孔所蕴含的身份信息(如年龄、社会地位等)等进行处理和编码。
2010,46(18)1引言人脸检测是指在输入图像中确定所包含人脸的位置、大小、位姿的过程。
人脸检测是人脸信息处理中的一项关键技术,近年来,已成为模式识别与计算机视觉领域一项受到倍受关注、研究广泛的课题。
其在人脸识别、人机交互、表情识别、视频安全监控等方面,具有广阔的应用前景[1-2]。
肤色是人脸的显著特征之一,其不依赖于人脸面部的细节特征,与大多数背景存在差异,且具有相对的稳定性,对表情和姿态的变化不敏感,故可以通过肤色分割来对输入图像进行预处理,利用肤色分割快速排除图像中大量的非人脸区域,可以实现人脸区域的快速粗检测[3-4]。
肤色分割作为人脸检测的预处理,具有明显的速度优势,在人脸检测过程中至关重要,它的性能直接影响着人脸检测的结果[5-6]。
对于彩色图像中的每个像素点,计算其相似度,可以得知该像素点属于肤色区域的可能性大小[7]。
根据肤色相似度,采用阈值化技术可以对输入图像进行肤色分割。
然而,采用固定的阈值可能存在这样的问题:如果阈值取得过大,许多肤色区域将无法检出,造成漏检;如果阈值取得过小,将导致许多非肤色区域被误检成肤色区域。
因此给出一种动态阈值确定方法,并将肤色相似度与求得的动态阈值相结合,以实现对输入图像的肤色区域进行精确分割,从而提高人脸检测的速度与性能。
基于肤色相似度和动态阈值的肤色分割算法的流程框图如图1所示。
肤色相似度和动态阈值相结合的肤色分割技术郭耸1,顾国昌1,蔡则苏2,刘海波1,沈晶1GUO Song1,GU Guo-chang1,CAI Ze-su2,LIU Hai-bo1,SHEN Jing11.哈尔滨工程大学计算机科学与技术学院,哈尔滨1500012.哈尔滨工业大学计算机科学与技术学院,哈尔滨1500011.School of Computer Science and Technology,Harbin Engineering University,Harbin150001,China2.School of Computer Science and Technology,Harbin Institute of Technology,Harbin150001,ChinaE-mail:guosong@GUO Song,GU Guo-chang,CAI Ze-su,et al.Skin segmentation using similarity of skin color and dynamic threshold. Computer Engineering and Applications,2010,46(18):1-3.Abstract:A method for skin segmentation based on similarity of skin color and dynamic threshold is proposed to improve the speed and performance of face detection.Firstly,the similarity of skin color is calculated in YCgCr space.Then,the method based on between-cluster variance and within-class scatter for selecting the threshold dynamically is proposed.Skin color segmentation is implemented according to the dynamic threshold.The binary image obtained by skin segmentation is processed in order to elimi-nate noise.The experimental results show that the proposed algorithm improves the performance of skin segmentation,and can seg-ment skin regions exactly in the complex background so as to improve the speed and performance of face detection.Key words:face detection;skin segmentation;similarity of skin color;threshold segmentation;dynamic threshold selecting摘要:为了提高人脸检测的速度与性能,提出一种基于肤色相似度与动态阈值相结合的肤色分割方法。
Face Recognition:A Literature SurveyW.ZHAOSarnoff CorporationR.CHELLAPP AUniversity of MarylandP.J.PHILLIPSNational Institute of Standards and TechnologyANDA.ROSENFELDUniversity of MarylandAs one of the most successful applications of image analysis and understanding,facerecognition has recently received significant attention,especially during the pastseveral years.At least two reasons account for this trend:thefirst is the wide range ofcommercial and law enforcement applications,and the second is the availability offeasible technologies after30years of research.Even though current machinerecognition systems have reached a certain level of maturity,their success is limited bythe conditions imposed by many real applications.For example,recognition of faceimages acquired in an outdoor environment with changes in illumination and/or poseremains a largely unsolved problem.In other words,current systems are still far awayfrom the capability of the human perception system.This paper provides an up-to-date critical survey of still-and video-based facerecognition research.There are two underlying motivations for us to write this surveypaper:thefirst is to provide an up-to-date review of the existing literature,and thesecond is to offer some insights into the studies of machine recognition of faces.Toprovide a comprehensive survey,we not only categorize existing recognition techniquesbut also present detailed descriptions of representative methods within each category.In addition,relevant topics such as psychophysical studies,system evaluation,andissues of illumination and pose variation are covered.Categories and Subject Descriptors:I.5.4[Pattern Recognition]:ApplicationsGeneral Terms:AlgorithmsAdditional Key Words and Phrases:Face recognition,person identificationAn earlier version of this paper appeared as“Face Recognition:A Literature Survey,”Technical Report CAR-TR-948,Center for Automation Research,University of Maryland,College Park,MD,2000.Authors’addresses:W.Zhao,Vision Technologies Lab,Sarnoff Corporation,Princeton,NJ08543-5300; email:wzhao@;R.Chellappa and A.Rosenfeld,Center for Automation Research,University of Maryland,College Park,MD20742-3275;email:{rama,ar}@;P.J.Phillips,National Institute of Standards and Technology,Gaithersburg,MD20899;email:jonathon@.Permission to make digital/hard copy of part or all of this work for personal or classroom use is granted with-out fee provided that the copies are not made or distributed for profit or commercial advantage,the copyright notice,the title of the publication,and its date appear,and notice is given that copying is by permission of ACM,Inc.To copy otherwise,to republish,to post on servers,or to redistribute to lists requires prior specific permission and/or a fee.c 2003ACM0360-0300/03/1200-0399$5.00ACM Computing Surveys,Vol.35,No.4,December2003,pp.399–458.400Zhao et al.1.INTRODUCTIONAs one of the most successful applications of image analysis and understanding,face recognition has recently received signifi-cant attention,especially during the past few years.This is evidenced by the emer-gence of face recognition conferences such as the International Conference on Audio-and Video-Based Authentication (AVBPA)since 1997and the International Con-ference on Automatic Face and Gesture Recognition (AFGR)since 1995,system-atic empirical evaluations of face recog-nition techniques (FRT),including the FERET [Phillips et al.1998b,2000;Rizvi et al.1998],FRVT 2000[Blackburn et al.2001],FRVT 2002[Phillips et al.2003],and XM2VTS [Messer et al.1999]pro-tocols,and many commercially available systems (Table II).There are at least two reasons for this trend;the first is the wide range of commercial and law enforcement applications and the second is the avail-ability of feasible technologies after 30years of research.In addition,the prob-lem of machine recognition of human faces continues to attract researchers from dis-ciplines such as image processing,pattern recognition,neural networks,computer vision,computer graphics,and psychology .The strong need for user-friendly sys-tems that can secure our assets and pro-tect our privacy without losing our iden-tity in a sea of numbers is obvious.At present,one needs a PIN to get cash from an ATM,a password for a computer,a dozen others to access the internet,and so on.Although very reliable methods of biometric personal identification exist,forTable I.T ypical Applications of Face RecognitionAreasSpecific applicationsVideo game,virtual reality ,training programsEntertainment Human-robot-interaction,human-computer-interactionDrivers’licenses,entitlement programsSmart cards Immigration,national ID,passports,voter registrationWelfare fraudTV Parental control,personal device logon,desktop logonInformation security Application security ,database security ,file encryptionIntranet security ,internet access,medical records Secure trading terminalsLaw enforcement Advanced video surveillance,CCTV control and surveillance Portal control,postevent analysisShoplifting,suspect tracking and investigationexample,fingerprint analysis and retinal or iris scans,these methods rely on the cooperation of the participants,whereas a personal identification system based on analysis of frontal or profile images of the face is often effective without the partici-pant’s cooperation or knowledge.Some of the advantages/disadvantages of different biometrics are described in Phillips et al.[1998].Table I lists some of the applica-tions of face recognition.Commercial and law enforcement ap-plications of FRT range from static,controlled-format photographs to uncon-trolled video images,posing a wide range of technical challenges and requiring an equally wide range of techniques from im-age processing,analysis,understanding,and pattern recognition.One can broadly classify FRT systems into two groups de-pending on whether they make use of static images or of video.Within these groups,significant differences exist,de-pending on the specific application.The differences are in terms of image qual-ity ,amount of background clutter (posing challenges to segmentation algorithms),variability of the images of a particular individual that must be recognized,avail-ability of a well-defined recognition or matching criterion,and the nature,type,and amount of input from a user.A list of some commercial systems is given in Table II.A general statement of the problem of machine recognition of faces can be for-mulated as follows:given still or video images of a scene,identify or verify one or more persons in the scene us-ing a stored database of faces.AvailableACM Computing Surveys,Vol.35,No.4,December 2003.Face Recognition:A Literature Survey401Table II.Available Commercial Face Recognition Systems (Some of these Web sites may have changed or been removed.)[The identification of any company,commercial product,or trade name does not imply endorsement or recommendation by the NationalInstitute of Standards and Technology or any of the authors or their institutions.]Commercial products Websites FaceIt from Visionics http://www Viisage Technology http://www FaceVACS from Plettac http://www FaceKey Corp.http://www .facekey .com Cognitec Systems http://www .cognitec-systems.de Keyware Technologies http://www /Passfaces from ID-arts http://www /ImageWare Sofware http://www /Eyematic Interfaces Inc.http://www /BioID sensor fusion http://www Visionsphere Technologies http://www /menu.htm Biometric Systems,Inc.http://www /FaceSnap Recoder http://www .facesnap.de/htdocs/english/index2.html SpotIt for face compositehttp://spotit.itc.it/SpotIt.htmlFig.1.Configuration of a generic face recognitionsystem.collateral information such as race,age,gender,facial expression,or speech may be used in narrowing the search (enhancing recognition).The solution to the problem involves segmentation of faces (face de-tection)from cluttered scenes,feature ex-traction from the face regions,recognition,or verification (Figure 1).In identification problems,the input to the system is an un-known face,and the system reports back the determined identity from a database of known individuals,whereas in verifica-tion problems,the system needs to confirm or reject the claimed identity of the input face.Face perception is an important part of the capability of human perception sys-tem and is a routine task for humans,while building a similar computer sys-tem is still an on-going research area.The earliest work on face recognition can be traced back at least to the 1950s in psy-chology [Bruner and Tagiuri 1954]and to the 1960s in the engineering literature [Bledsoe 1964].Some of the earliest stud-ies include work on facial expression of emotions by Darwin [1972](see also Ekman [1998])and on facial profile-based biometrics by Galton [1888]).But re-search on automatic machine recogni-tion of faces really started in the 1970s [Kelly 1970]and after the seminal work of Kanade [1973].Over the past 30years extensive research has been con-ducted by psychophysicists,neuroscien-tists,and engineers on various aspects of face recognition by humans and ma-chines.Psychophysicists and neuroscien-tists have been concerned with issues such as whether face perception is a dedicated process (this issue is still be-ing debated in the psychology community [Biederman and Kalocsai 1998;Ellis 1986;Gauthier et al.1999;Gauthier and Logo-thetis 2000])and whether it is done holis-tically or by local feature analysis.Many of the hypotheses and theories put forward by researchers in these dis-ciplines have been based on rather small sets of images.Nevertheless,many of theACM Computing Surveys,Vol.35,No.4,December 2003.402Zhao et al.findings have important consequences for engineers who design algorithms and sys-tems for machine recognition of human faces.Section2will present a concise re-view of thesefindings.Barring a few exceptions that use range data[Gordon1991],the face recognition problem has been formulated as recogniz-ing three-dimensional(3D)objects from two-dimensional(2D)images.1Earlier ap-proaches treated it as a2D pattern recog-nition problem.As a result,during the early and mid-1970s,typical pattern clas-sification techniques,which use measured attributes of features(e.g.,the distances between important points)in faces or face profiles,were used[Bledsoe1964;Kanade 1973;Kelly1970].During the1980s,work on face recognition remained largely dor-mant.Since the early1990s,research in-terest in FRT has grown significantly.One can attribute this to several reasons:an in-crease in interest in commercial opportu-nities;the availability of real-time hard-ware;and the increasing importance of surveillance-related applications.Over the past15years,research has focused on how to make face recognition systems fully automatic by tackling prob-lems such as localization of a face in a given image or video clip and extraction of features such as eyes,mouth,etc. Meanwhile,significant advances have been made in the design of classifiers for successful face recognition.Among appearance-based holistic approaches, eigenfaces[Kirby and Sirovich1990; Turk and Pentland1991]and Fisher-faces[Belhumeur et al.1997;Etemad and Chellappa1997;Zhao et al.1998] have proved to be effective in experiments with large databases.Feature-based graph matching approaches[Wiskott et al.1997]have also been quite pared to holistic approaches, feature-based methods are less sensi-tive to variations in illumination and viewpoint and to inaccuracy in face local-1There have been recent advances on3D face recogni-tion in situations where range data acquired through structured light can be matched reliably[Bronstein et al.2003].ization.However,the feature extraction techniques needed for this type of ap-proach are still not reliable or accurate enough[Cox et al.1996].For example, most eye localization techniques assume some geometric and textural models and do not work if the eye is closed.Section3 will present a review of still-image-based face recognition.During the past5to8years,much re-search has been concentrated on video-based face recognition.The still image problem has several inherent advantages and disadvantages.For applications such as drivers’licenses,due to the controlled nature of the image acquisition process, the segmentation problem is rather easy. However,if only a static picture of an air-port scene is available,automatic location and segmentation of a face could pose se-rious challenges to any segmentation al-gorithm.On the other hand,if a video sequence is available,segmentation of a moving person can be more easily accom-plished using motion as a cue.But the small size and low image quality of faces captured from video can significantly in-crease the difficulty in recognition.Video-based face recognition is reviewed in Section4.As we propose new algorithms and build more systems,measuring the performance of new systems and of existing systems becomes very important.Systematic data collection and evaulation of face recogni-tion systems is reviewed in Section5. Recognizing a3D object from its2D im-ages poses many challenges.The illumina-tion and pose problems are two prominent issues for appearance-or image-based ap-proaches.Many approaches have been proposed to handle these issues,with the majority of them exploring domain knowl-edge.Details of these approaches are dis-cussed in Section6.In1995,a review paper[Chellappa et al. 1995]gave a thorough survey of FRT at that time.(An earlier survey[Samal and Iyengar1992]appeared in1992.)At that time,video-based face recognition was still in a nascent stage.During the past8years,face recognition has received increased attention and has advanced ACM Computing Surveys,Vol.35,No.4,December2003.Face Recognition:A Literature Survey403technically.Many commercial systems for still face recognition are now available. Recently,significant research efforts have been focused on video-based face model-ing/tracking,recognition,and system in-tegration.New datasets have been created and evaluations of recognition techniques using these databases have been carried out.It is not an overstatement to say that face recognition has become one of the most active applications of pattern recog-nition,image analysis and understanding. In this paper we provide a critical review of current developments in face recogni-tion.This paper is organized as follows:in Section2we briefly review issues that are relevant from a psychophysical point of view.Section3provides a detailed review of recent developments in face recognition techniques using still images.In Section4 face recognition techniques based on video are reviewed.Data collection and perfor-mance evaluation of face recognition algo-rithms are addressed in Section5with de-scriptions of representative protocols.In Section6we discuss two important prob-lems in face recognition that can be math-ematically studied,lack of robustness to illumination and pose variations,and we review proposed methods of overcoming these limitations.Finally,a summary and conclusions are presented in Section7. 2.PSYCHOPHYSICS/NEUROSCIENCEISSUES RELEVANT TO FACERECOGNITIONHuman recognition processes utilize a broad spectrum of stimuli,obtained from many,if not all,of the senses(visual, auditory,olfactory,tactile,etc.).In many situations,contextual knowledge is also applied,for example,surroundings play an important role in recognizing faces in relation to where they are supposed to be located.It is futile to even attempt to develop a system using existing technol-ogy,which will mimic the remarkable face recognition ability of humans.However, the human brain has its limitations in the total number of persons that it can accu-rately“remember.”A key advantage of a computer system is its capacity to handle large numbers of face images.In most applications the images are available only in the form of single or multiple views of 2D intensity data,so that the inputs to computer face recognition algorithms are visual only.For this reason,the literature reviewed in this section is restricted to studies of human visual perception of faces.Many studies in psychology and neuro-science have direct relevance to engineers interested in designing algorithms or sys-tems for machine recognition of faces.For example,findings in psychology[Bruce 1988;Shepherd et al.1981]about the rela-tive importance of different facial features have been noted in the engineering liter-ature[Etemad and Chellappa1997].On the other hand,machine systems provide tools for conducting studies in psychology and neuroscience[Hancock et al.1998; Kalocsai et al.1998].For example,a pos-sible engineering explanation of the bot-tom lighting effects studied in Johnston et al.[1992]is as follows:when the actual lighting direction is opposite to the usually assumed direction,a shape-from-shading algorithm recovers incorrect structural in-formation and hence makes recognition of faces harder.A detailed review of relevant studies in psychophysics and neuroscience is beyond the scope of this paper.We only summa-rizefindings that are potentially relevant to the design of face recognition systems. For details the reader is referred to the papers cited below.Issues that are of po-tential interest to designers are2:—Is face recognition a dedicated process? [Biederman and Kalocsai1998;Ellis 1986;Gauthier et al.1999;Gauthier and Logothetis2000]:It is traditionally be-lieved that face recognition is a dedi-cated process different from other ob-ject recognition tasks.Evidence for the existence of a dedicated face process-ing system comes from several sources [Ellis1986].(a)Faces are more eas-ily remembered by humans than other 2Readers should be aware of the existence of diverse opinions on some of these issues.The opinions given here do not necessarily represent our views.ACM Computing Surveys,Vol.35,No.4,December2003.404Zhao et al.objects when presented in an upright orientation.(b)Prosopagnosia patients are unable to recognize previously fa-miliar faces,but usually have no other profound agnosia.They recognize peo-ple by their voices,hair color,dress,etc. It should be noted that prosopagnosia patients recognize whether a given ob-ject is a face or not,but then have dif-ficulty in identifying the face.Seven differences between face recognition and object recognition can be summa-rized[Biederman and Kalocsai1998] based on empirical evidence:(1)con-figural effects(related to the choice of different types of machine recognition systems),(2)expertise,(3)differences verbalizable,(4)sensitivity to contrast polarity and illumination direction(re-lated to the illumination problem in ma-chine recognition systems),(5)metric variation,(6)Rotation in depth(related to the pose variation problem in ma-chine recognition systems),and(7)ro-tation in plane/inverted face.Contrary to the traditionally held belief,some re-centfindings in human neuropsychol-ogy and neuroimaging suggest that face recognition may not be unique.Accord-ing to[Gauthier and Logothetis2000], recent neuroimaging studies in humans indicate that level of categorization and expertise interact to produce the speci-fication for faces in the middle fusiform gyrus.3Hence it is possible that the en-coding scheme used for faces may also be employed for other classes with simi-lar properties.(On recognition of famil-iar vs.unfamiliar faces see Section7.)—Is face perception the result of holistic or feature analysis?[Bruce1988;Bruce et al.1998]:Both holistic and feature information are crucial for the percep-tion and recognition of faces.Studies suggest the possibility of global descrip-tions serving as a front end forfiner, feature-based perception.If dominant features are present,holistic descrip-3The fusiform gyrus or occipitotemporal gyrus,lo-cated on the ventromedial surface of the temporal and occipital lobes,is thought to be critical for face recognition.tions may not be used.For example,in face recall studies,humans quickly fo-cus on odd features such as big ears,a crooked nose,a staring eye,etc.One of the strongest pieces of evidence to sup-port the view that face recognition in-volves more configural/holistic process-ing than other object recognition has been the face inversion effect in which an inverted face is much harder to rec-ognize than a normal face(first demon-strated in[Yin1969]).An excellent ex-ample is given in[Bartlett and Searcy 1993]using the“Thatcher illusion”[Thompson1980].In this illusion,the eyes and mouth of an expressing face are excised and inverted,and the re-sult looks grotesque in an upright face; however,when shown inverted,the face looks fairly normal in appearance,and the inversion of the internal features is not readily noticed.—Ranking of significance of facial features [Bruce1988;Shepherd et al.1981]:Hair, face outline,eyes,and mouth(not nec-essarily in this order)have been de-termined to be important for perceiv-ing and remembering faces[Shepherd et al.1981].Several studies have shown that the nose plays an insignificant role; this may be due to the fact that al-most all of these studies have been done using frontal images.In face recogni-tion using profiles(which may be im-portant in mugshot matching applica-tions,where profiles can be extracted from side views),a distinctive nose shape could be more important than the eyes or mouth[Bruce1988].Another outcome of some studies is that both external and internal features are im-portant in the recognition of previ-ously presented but otherwise unfamil-iar faces,but internal features are more dominant in the recognition of familiar faces.It has also been found that the upper part of the face is more useful for face recognition than the lower part [Shepherd et al.1981].The role of aes-thetic attributes such as beauty,attrac-tiveness,and/or pleasantness has also been studied,with the conclusion that ACM Computing Surveys,Vol.35,No.4,December2003.Face Recognition:A Literature Survey405the more attractive the faces are,the better is their recognition rate;the least attractive faces come next,followed by the midrange faces,in terms of ease of being recognized.—Caricatures[Brennan1985;Bruce1988; Perkins1975]:A caricature can be for-mally defined[Perkins1975]as“a sym-bol that exaggerates measurements rel-ative to any measure which varies from one person to another.”Thus the length of a nose is a measure that varies from person to person,and could be useful as a symbol in caricaturing someone, but not the number of ears.A stan-dard caricature algorithm[Brennan 1985]can be applied to different qual-ities of image data(line drawings and photographs).Caricatures of line draw-ings do not contain as much information as photographs,but they manage to cap-ture the important characteristics of a face;experiments based on nonordinary faces comparing the usefulness of line-drawing caricatures and unexaggerated line drawings decidedly favor the former [Bruce1988].—Distinctiveness[Bruce et al.1994]:Stud-ies show that distinctive faces are bet-ter retained in memory and are rec-ognized better and faster than typical faces.However,if a decision has to be made as to whether an object is a face or not,it takes longer to recognize an atypical face than a typical face.This may be explained by different mecha-nisms being used for detection and for identification.—The role of spatial frequency analysis [Ginsburg1978;Harmon1973;Sergent 1986]:Earlier studies[Ginsburg1978; Harmon1973]concluded that informa-tion in low spatial frequency bands plays a dominant role in face recog-nition.Recent studies[Sergent1986] have shown that,depending on the spe-cific recognition task,the low,band-pass and high-frequency components may play different roles.For example gender classification can be successfully accomplished using low-frequency com-ponents only,while identification re-quires the use of high-frequency com-ponents[Sergent1986].Low-frequency components contribute to global de-scription,while high-frequency compo-nents contribute to thefiner details needed in identification.—Viewpoint-invariant recognition?[Bie-derman1987;Hill et al.1997;Tarr and Bulthoff1995]:Much work in vi-sual object recognition(e.g.[Biederman 1987])has been cast within a theo-retical framework introduced in[Marr 1982]in which different views of ob-jects are analyzed in a way which allows access to(largely)viewpoint-invariant descriptions.Recently,there has been some debate about whether ob-ject recognition is viewpoint-invariant or not[Tarr and Bulthoff1995].Some experiments suggest that memory for faces is highly viewpoint-dependent. Generalization even from one profile viewpoint to another is poor,though generalization from one three-quarter view to the other is very good[Hill et al. 1997].—Effect of lighting change[Bruce et al. 1998;Hill and Bruce1996;Johnston et al.1992]:It has long been informally observed that photographic negatives of faces are difficult to recognize.How-ever,relatively little work has explored why it is so difficult to recognize nega-tive images of faces.In[Johnston et al. 1992],experiments were conducted to explore whether difficulties with nega-tive images and inverted images of faces arise because each of these manipula-tions reverses the apparent direction of lighting,rendering a top-lit image of a face apparently lit from below.It was demonstrated in[Johnston et al.1992] that bottom lighting does indeed make it harder to identity familiar faces.In[Hill and Bruce1996],the importance of top lighting for face recognition was demon-strated using a different task:match-ing surface images of faces to determine whether they were identical.—Movement and face recognition[O’Toole et al.2002;Bruce et al.1998;Knight and Johnston1997]:A recent study[KnightACM Computing Surveys,Vol.35,No.4,December2003.406Zhao et al.and Johnston1997]showed that fa-mous faces are easier to recognize when shown in moving sequences than in still photographs.This observation has been extended to show that movement helps in the recognition of familiar faces shown under a range of different types of degradations—negated,inverted,or thresholded[Bruce et al.1998].Even more interesting is the observation that there seems to be a benefit due to movement even if the informa-tion content is equated in the mov-ing and static comparison conditions. However,experiments with unfamiliar faces suggest no additional benefit from viewing animated rather than static sequences.—Facial expressions[Bruce1988]:Based on neurophysiological studies,it seems that analysis of facial expressions is ac-complished in parallel to face recogni-tion.Some prosopagnosic patients,who have difficulties in identifying famil-iar faces,nevertheless seem to recog-nize expressions due to emotions.Pa-tients who suffer from“organic brain syndrome”suffer from poor expression analysis but perform face recognition quite well.4Similarly,separation of face recognition and“focused visual process-ing”tasks(e.g.,looking for someone witha thick mustache)have been claimed.3.FACE RECOGNITION FROMSTILL IMAGESAs illustrated in Figure1,the prob-lem of automatic face recognition involves three key steps/subtasks:(1)detection and rough normalization of faces,(2)feature extraction and accurate normalization of faces,(3)identification and/or verification. Sometimes,different subtasks are not to-tally separated.For example,the facial features(eyes,nose,mouth)used for face recognition are often used in face detec-tion.Face detection and feature extraction can be achieved simultaneously,as indi-4From a machine recognition point of view,dramatic facial expressions may affect face recognition perfor-mance if only one photograph is available.cated in Figure1.Depending on the nature of the application,for example,the sizes of the training and testing databases,clutter and variability of the background,noise, occlusion,and speed requirements,some of the subtasks can be very challenging. Though fully automatic face recognition systems must perform all three subtasks, research on each subtask is critical.This is not only because the techniques used for the individual subtasks need to be im-proved,but also because they are critical in many different applications(Figure1). For example,face detection is needed to initialize face tracking,and extraction of facial features is needed for recognizing human emotion,which is in turn essential in human-computer interaction(HCI)sys-tems.Isolating the subtasks makes it eas-ier to assess and advance the state of the art of the component techniques.Earlier face detection techniques could only han-dle single or a few well-separated frontal faces in images with simple backgrounds, while state-of-the-art algorithms can de-tect faces and their poses in cluttered backgrounds[Gu et al.2001;Heisele et al. 2001;Schneiderman and Kanade2000;Vi-ola and Jones2001].Extensive research on the subtasks has been carried out and rel-evant surveys have appeared on,for exam-ple,the subtask of face detection[Hjelmas and Low2001;Yang et al.2002].In this section we survey the state of the art of face recognition in the engineering literature.For the sake of completeness, in Section3.1we provide a highlighted summary of research on face segmenta-tion/detection and feature extraction.Sec-tion3.2contains detailed reviews of recent work on intensity image-based face recog-nition and categorizes methods of recog-nition from intensity images.Section3.3 summarizes the status of face recognition and discusses open research issues.3.1.Key Steps Prior to Recognition:FaceDetection and Feature ExtractionThefirst step in any automatic face recognition systems is the detection of faces in images.Here we only provide a summary on this topic and highlight a few ACM Computing Surveys,Vol.35,No.4,December2003.。