Learning support vector machines for a multi-view face model

格式：pdf
大小：639.28 KB
文档页数：10

下载文档原格式

svm 例题

svm 例题Support Vector Machines (SVM) ExampleIn machine learning, Support Vector Machines (SVM) are widely used for classification and regression analysis. SVM aims to find the best hyperplane that separates data points into different classes, maximizing the margin between them. In this article, we will discuss an example to demonstrate the application of SVM.Example Scenario:Suppose we have a dataset containing information about different fruits. Each fruit is described by two features: sweetness (x-axis) and acidity (y-axis). The goal is to build an SVM model to classify fruits into two categories: apples and oranges. We will use the SVM algorithm to find the decision boundary that best separates these two classes.Data Preprocessing:Before training our SVM model, it is essential to preprocess the data. Firstly, we need to collect a labelled dataset consisting of fruits, where each fruit is labelled as either an apple or an orange. Then, we perform feature scaling to normalize both the sweetness and acidity values within a specific range, such as between 0 and 1. This step ensures that the features contribute equally to the SVM model.Training the SVM Model:After preprocessing the data, we can proceed with training the SVM model. The SVM algorithm aims to find the optimal hyperplane thatmaximizes the margin between the classes. In our fruit classification example, the SVM model will determine the hyperplane that best separates the apples from the oranges in feature space.To find the optimal hyperplane, we need to choose a suitable kernel function. Commonly used kernel functions include the linear kernel, polynomial kernel, and radial basis function (RBF) kernel. The choice of the kernel function depends on the nature of the data and the desired decision boundary.Evaluating the SVM Model:Once the SVM model is trained, we can evaluate its performance using various evaluation metrics, such as accuracy, precision, recall, and F1-score. These metrics provide insights into how well the SVM model can classify new, unseen fruits.To evaluate the model, we can split our dataset into a training set and a testing set. The training set is used to train the SVM model, while the testing set is used to assess its performance. By comparing the predicted labels with the actual labels of the testing set, we can calculate the evaluation metrics and determine the accuracy of our model.Improving the Performance:In some cases, SVM may not perform optimally due to factors like imbalanced data or overlapping classes. To enhance the performance of the SVM model, we can employ techniques such as data resampling, feature engineering, or using a different kernel function.Conclusion:Support Vector Machines (SVM) offer an effective approach for classification and regression tasks. In this example, we showcased how SVM can be applied to classify fruits into apples and oranges based on their sweetness and acidity. By preprocessing the data, training the SVM model, and evaluating its performance, we can build a reliable fruit classification system. SVM can be further improved by utilizing various techniques to address specific challenges in different scenarios.Remember, the key to success with SVM lies in understanding the dataset, selecting appropriate features, and choosing the right kernel function. With proper implementation and careful evaluation, SVM can be a powerful tool in machine learning.。

图像处理和计算机视觉中的经典论文

前言：最近由于工作的关系，接触到了很多篇以前都没有听说过的经典文章，在感叹这些文章伟大的同时，也顿感自己视野的狭小。

想在网上找找计算机视觉界的经典文章汇总，一直没有找到。

失望之余，我决定自己总结一篇，希望对 CV领域的童鞋们有所帮助。

由于自
己的视野比较狭窄，肯定也有很多疏漏，权当抛砖引玉了
1990年之前
1990年
1991年
1992年
1993年
1994年
1995年
1996年
1997年
1998年
1998年是图像处理和计算机视觉经典文章井喷的一年。

大概从这一年开始，开始有了新的趋势。

由于竞争的加剧，一些好的算法都先发在会议上了，先占个坑，等过一两年之后再扩展到会议上。

1999年
2000年
世纪之交，各种综述都出来了
2001年
2002年
2003年
2004年
2005年
2006年
2007年
2008年
2009年
2010年
2011年
2012年。

support-vector-machine

1
Figure 2: The two partial cost terms belonging to the cost function J (θ) for logistic regression: in the left, the positive case for y = 1 is − log 1+1 ; in the right, the negative case for y = 0, is e−z − log 1 −
m
y (i) log hθ x(i) + 1 − y (i) log 1 − hθ x(i)
i=1
+
λ 2m
n 2 θj j =1
(2)
you ﬁnd that each example, (x, y ), contributes the term (forgetting averaging with the ห้องสมุดไป่ตู้/m weight) − (y log(hθ (x)) + (1 − y ) log(1 − hθ (x))) to the overall cost function, J (θ). If I take the deﬁnition of my hypothesis (1), and plug it in the above cost term, what I get is that each training example contributes with the quantity −y log 1 1 + e−θT x − (1 − y ) log 1 − 1 1 + e−θT x (3)
in the objective
Recall that z = θT x. If we plot − log

简述监督型机器学习训练基本流程

简述监督型机器学习训练基本流程监督型机器学习训练的基本流程一般包括数据收集、数据预处理、模型选择、模型训练和模型评估等步骤。

数据收集是指从各种数据源获取相关数据，如传感器、数据库、日志文件等。

Data collection, also known as data acquisition, is the process of gathering and measuring information on targeted variables in an established systematic fashion, which enables one to answer relevant questions, evaluate outcomes and make predictions.数据预处理是对收集的数据进行清洗、转换和标准化处理，以便于机器学习算法的使用。

这一步骤通常包括缺失数据处理、异常值处理、特征选择和特征变换等操作。

Data preprocessing involves cleaning, transforming, and standardizing the collected data to make it suitable for machine learning algorithms. This step typically includeshandling missing data, dealing with outliers, feature selection, and feature transformation.模型选择是指根据问题的性质和数据的特点选择合适的机器学习模型，如决策树、支持向量机、神经网络等。

Model selection involves choosing the appropriate machine learning model based on the nature of the problem and the characteristics of the data, such as decision trees, support vector machines, neural networks, etc.模型训练是指利用已经经过预处理的数据对选定的机器学习模型进行训练，以使其能够准确地预测未知数据的结果。

Support vector machine_A tool for mapping mineral prospectivity

Support vector machine:A tool for mapping mineral prospectivityRenguang Zuo a,n,Emmanuel John M.Carranza ba State Key Laboratory of Geological Processes and Mineral Resources,China University of Geosciences,Wuhan430074;Beijing100083,Chinab Department of Earth Systems Analysis,Faculty of Geo-Information Science and Earth Observation(ITC),University of Twente,Enschede,The Netherlandsa r t i c l e i n f oArticle history:Received17May2010Received in revised form3September2010Accepted25September2010Keywords:Supervised learning algorithmsKernel functionsWeights-of-evidenceTurbidite-hosted AuMeguma Terraina b s t r a c tIn this contribution,we describe an application of support vector machine(SVM),a supervised learningalgorithm,to mineral prospectivity mapping.The free R package e1071is used to construct a SVM withsigmoid kernel function to map prospectivity for Au deposits in western Meguma Terrain of Nova Scotia(Canada).The SVM classiﬁcation accuracies of‘deposit’are100%,and the SVM classiﬁcation accuracies ofthe‘non-deposit’are greater than85%.The SVM classiﬁcations of mineral prospectivity have5–9%lowertotal errors,13–14%higher false-positive errors and25–30%lower false-negative errors compared tothose of the WofE prediction.The prospective target areas predicted by both SVM and WofE reﬂect,nonetheless,controls of Au deposit occurrence in the study area by NE–SW trending anticlines andcontact zones between Goldenville and Halifax Formations.The results of the study indicate theusefulness of SVM as a tool for predictive mapping of mineral prospectivity.&2010Elsevier Ltd.All rights reserved.1.IntroductionMapping of mineral prospectivity is crucial in mineral resourcesexploration and mining.It involves integration of information fromdiverse geoscience datasets including geological data(e.g.,geologicalmap),geochemical data(e.g.,stream sediment geochemical data),geophysical data(e.g.,magnetic data)and remote sensing data(e.g.,multispectral satellite data).These sorts of data can be visualized,processed and analyzed with the support of computer and GIStechniques.Geocomputational techniques for mapping mineral pro-spectivity include weights of evidence(WofE)(Bonham-Carter et al.,1989),fuzzy WofE(Cheng and Agterberg,1999),logistic regression(Agterberg and Bonham-Carter,1999),fuzzy logic(FL)(Ping et al.,1991),evidential belief functions(EBF)(An et al.,1992;Carranza andHale,2003;Carranza et al.,2005),neural networks(NN)(Singer andKouda,1996;Porwal et al.,2003,2004),a‘wildcat’method(Carranza,2008,2010;Carranza and Hale,2002)and a hybrid method(e.g.,Porwalet al.,2006;Zuo et al.,2009).These techniques have been developed toquantify indices of occurrence of mineral deposit occurrence byintegrating multiple evidence layers.Some geocomputational techni-ques can be performed using popular software packages,such asArcWofE(a free ArcView extension)(Kemp et al.,1999),ArcSDM9.3(afree ArcGIS9.3extension)(Sawatzky et al.,2009),MI-SDM2.50(aMapInfo extension)(Avantra Geosystems,2006),GeoDAS(developedbased on MapObjects,which is an Environmental Research InstituteDevelopment Kit)(Cheng,2000).Other geocomputational techniques(e.g.,FL and NN)can be performed by using R and Matlab.Geocomputational techniques for mineral prospectivity map-ping can be categorized generally into two types–knowledge-driven and data-driven–according to the type of inferencemechanism considered(Bonham-Carter1994;Pan and Harris2000;Carranza2008).Knowledge-driven techniques,such as thosethat apply FL and EBF,are based on expert knowledge andexperience about spatial associations between mineral prospec-tivity criteria and mineral deposits of the type sought.On the otherhand,data-driven techniques,such as WofE and NN,are based onthe quantiﬁcation of spatial associations between mineral pro-spectivity criteria and known occurrences of mineral deposits ofthe type sought.Additional,the mixing of knowledge-driven anddata-driven methods also is used for mapping of mineral prospec-tivity(e.g.,Porwal et al.,2006;Zuo et al.,2009).Every geocomputa-tional technique has advantages and disadvantages,and one or theother may be more appropriate for a given geologic environmentand exploration scenario(Harris et al.,2001).For example,one ofthe advantages of WofE is its simplicity,and straightforwardinterpretation of the weights(Pan and Harris,2000),but thismodel ignores the effects of possible correlations amongst inputpredictor patterns,which generally leads to biased prospectivitymaps by assuming conditional independence(Porwal et al.,2010).Comparisons between WofE and NN,NN and LR,WofE,NN and LRfor mineral prospectivity mapping can be found in Singer andKouda(1999),Harris and Pan(1999)and Harris et al.(2003),respectively.Mapping of mineral prospectivity is a classiﬁcation process,because its product(i.e.,index of mineral deposit occurrence)forevery location is classiﬁed as either prospective or non-prospectiveaccording to certain combinations of weighted mineral prospec-tivity criteria.There are two types of classiﬁcation techniques.Contents lists available at ScienceDirectjournal homepage:/locate/cageoComputers&Geosciences0098-3004/$-see front matter&2010Elsevier Ltd.All rights reserved.doi:10.1016/j.cageo.2010.09.014n Corresponding author.E-mail addresses:zrguang@,zrguang1981@(R.Zuo).Computers&Geosciences](]]]])]]]–]]]One type is known as supervised classiﬁcation,which classiﬁes mineral prospectivity of every location based on a training set of locations of known deposits and non-deposits and a set of evidential data layers.The other type is known as unsupervised classiﬁcation, which classiﬁes mineral prospectivity of every location based solely on feature statistics of individual evidential data layers.A support vector machine(SVM)is a model of algorithms for supervised classiﬁcation(Vapnik,1995).Certain types of SVMs have been developed and applied successfully to text categorization, handwriting recognition,gene-function prediction,remote sensing classiﬁcation and other studies(e.g.,Joachims1998;Huang et al.,2002;Cristianini and Scholkopf,2002;Guo et al.,2005; Kavzoglu and Colkesen,2009).An SVM performs classiﬁcation by constructing an n-dimensional hyperplane in feature space that optimally separates evidential data of a predictor variable into two categories.In the parlance of SVM literature,a predictor variable is called an attribute whereas a transformed attribute that is used to deﬁne the hyperplane is called a feature.The task of choosing the most suitable representation of the target variable(e.g.,mineral prospectivity)is known as feature selection.A set of features that describes one case(i.e.,a row of predictor values)is called a feature vector.The feature vectors near the hyperplane are the support feature vectors.The goal of SVM modeling is toﬁnd the optimal hyperplane that separates clusters of feature vectors in such a way that feature vectors representing one category of the target variable (e.g.,prospective)are on one side of the plane and feature vectors representing the other category of the target variable(e.g.,non-prospective)are on the other size of the plane.A good separation is achieved by the hyperplane that has the largest distance to the neighboring data points of both categories,since in general the larger the margin the better the generalization error of the classiﬁer.In this paper,SVM is demonstrated as an alternative tool for integrating multiple evidential variables to map mineral prospectivity.2.Support vector machine algorithmsSupport vector machines are supervised learning algorithms, which are considered as heuristic algorithms,based on statistical learning theory(Vapnik,1995).The classical task of a SVM is binary (two-class)classiﬁcation.Suppose we have a training set composed of l feature vectors x i A R n,where i(¼1,2,y,n)is the number of feature vectors in training samples.The class in which each sample is identiﬁed to belong is labeled y i,which is equal to1for one class or is equal toÀ1for the other class(i.e.y i A{À1,1})(Huang et al., 2002).If the two classes are linearly separable,then there exists a family of linear separators,also called separating hyperplanes, which satisfy the following set of equations(KavzogluandFig.1.Support vectors and optimum hyperplane for the binary case of linearly separable data sets.Table1Experimental data.yer A Layer B Layer C Layer D Target yer A Layer B Layer C Layer D Target1111112100000 2111112200000 3111112300000 4111112401000 5111112510000 6111112600000 7111112711100 8111112800000 9111012900000 10111013000000 11101113111100 12111013200000 13111013300000 14111013400000 15011013510000 16101013600000 17011013700000 18010113811100 19010112900000 20101014010000R.Zuo,E.J.M.Carranza/Computers&Geosciences](]]]])]]]–]]]2Colkesen,2009)(Fig.1):wx iþb Zþ1for y i¼þ1wx iþb rÀ1for y i¼À1ð1Þwhich is equivalent toy iðwx iþbÞZ1,i¼1,2,...,nð2ÞThe separating hyperplane can then be formalized as a decision functionfðxÞ¼sgnðwxþbÞð3Þwhere,sgn is a sign function,which is deﬁned as follows:sgnðxÞ¼1,if x400,if x¼0À1,if x o08><>:ð4ÞThe two parameters of the separating hyperplane decision func-tion,w and b,can be obtained by solving the following optimization function:Minimize tðwÞ¼12J w J2ð5Þsubject toy Iððwx iÞþbÞZ1,i¼1,...,lð6ÞThe solution to this optimization problem is the saddle point of the Lagrange functionLðw,b,aÞ¼1J w J2ÀX li¼1a iðy iððx i wÞþbÞÀ1Þð7Þ@ @b Lðw,b,aÞ¼0@@wLðw,b,aÞ¼0ð8Þwhere a i is a Lagrange multiplier.The Lagrange function is minimized with respect to w and b and is maximized with respect to a grange multipliers a i are determined by the following optimization function:MaximizeX li¼1a iÀ12X li,j¼1a i a j y i y jðx i x jÞð9Þsubject toa i Z0,i¼1,...,l,andX li¼1a i y i¼0ð10ÞThe separating rule,based on the optimal hyperplane,is the following decision function:fðxÞ¼sgnX li¼1y i a iðxx iÞþb!ð11ÞMore details about SVM algorithms can be found in Vapnik(1995) and Tax and Duin(1999).3.Experiments with kernel functionsFor spatial geocomputational analysis of mineral exploration targets,the decision function in Eq.(3)is a kernel function.The choice of a kernel function(K)and its parameters for an SVM are crucial for obtaining good results.The kernel function can be usedTable2Errors of SVM classiﬁcation using linear kernel functions.l Number ofsupportvectors Testingerror(non-deposit)(%)Testingerror(deposit)(%)Total error(%)0.2580.00.00.0180.00.00.0 1080.00.00.0 10080.00.00.0 100080.00.00.0Table3Errors of SVM classiﬁcation using polynomial kernel functions when d¼3and r¼0. l Number ofsupportvectorsTestingerror(non-deposit)(%)Testingerror(deposit)(%)Total error(%)0.25120.00.00.0160.00.00.01060.00.00.010060.00.00.0 100060.00.00.0Table4Errors of SVM classiﬁcation using polynomial kernel functions when l¼0.25,r¼0.d Number ofsupportvectorsTestingerror(non-deposit)(%)Testingerror(deposit)(%)Total error(%)11110.00.0 5.010290.00.00.0100230.045.022.5 1000200.090.045.0Table5Errors of SVM classiﬁcation using polynomial kernel functions when l¼0.25and d¼3.r Number ofsupportvectorsTestingerror(non-deposit)(%)Testingerror(deposit)(%)Total error(%)0120.00.00.01100.00.00.01080.00.00.010080.00.00.0 100080.00.00.0Table6Errors of SVM classiﬁcation using radial kernel functions.l Number ofsupportvectorsTestingerror(non-deposit)(%)Testingerror(deposit)(%)Total error(%)0.25140.00.00.01130.00.00.010130.00.00.0100130.00.00.0 1000130.00.00.0Table7Errors of SVM classiﬁcation using sigmoid kernel functions when r¼0.l Number ofsupportvectorsTestingerror(non-deposit)(%)Testingerror(deposit)(%)Total error(%)0.25400.00.00.01400.035.017.510400.0 6.0 3.0100400.0 6.0 3.0 1000400.0 6.0 3.0R.Zuo,E.J.M.Carranza/Computers&Geosciences](]]]])]]]–]]]3to construct a non-linear decision boundary and to avoid expensive calculation of dot products in high-dimensional feature space.The four popular kernel functions are as follows:Linear:Kðx i,x jÞ¼l x i x j Polynomial of degree d:Kðx i,x jÞ¼ðl x i x jþrÞd,l40Radial basis functionðRBFÞ:Kðx i,x jÞ¼exp fÀl99x iÀx j992g,l40 Sigmoid:Kðx i,x jÞ¼tanhðl x i x jþrÞ,l40ð12ÞThe parameters l,r and d are referred to as kernel parameters. The parameter l serves as an inner product coefﬁcient in the polynomial function.In the case of the RBF kernel(Eq.(12)),l determines the RBF width.In the sigmoid kernel,l serves as an inner product coefﬁcient in the hyperbolic tangent function.The parameter r is used for kernels of polynomial and sigmoid types. The parameter d is the degree of a polynomial function.We performed some experiments to explore the performance of the parameters used in a kernel function.The dataset used in the experiments(Table1),which are derived from the study area(see below),were compiled according to the requirementfor Fig.2.Simpliﬁed geological map in western Meguma Terrain of Nova Scotia,Canada(after,Chatterjee1983;Cheng,2008).Table8Errors of SVM classiﬁcation using sigmoid kernel functions when l¼0.25.r Number ofSupportVectorsTestingerror(non-deposit)(%)Testingerror(deposit)(%)Total error(%)0400.00.00.01400.00.00.010400.00.00.0100400.00.00.01000400.00.00.0R.Zuo,E.J.M.Carranza/Computers&Geosciences](]]]])]]]–]]]4classiﬁcation analysis.The e1071(Dimitriadou et al.,2010),a freeware R package,was used to construct a SVM.In e1071,the default values of l,r and d are1/(number of variables),0and3,respectively.From the study area,we used40geological feature vectors of four geoscience variables and a target variable for classiﬁcation of mineral prospec-tivity(Table1).The target feature vector is either the‘non-deposit’class(or0)or the‘deposit’class(or1)representing whether mineral exploration target is absent or present,respectively.For‘deposit’locations,we used the20known Au deposits.For‘non-deposit’locations,we randomly selected them according to the following four criteria(Carranza et al.,2008):(i)non-deposit locations,in contrast to deposit locations,which tend to cluster and are thus non-random, must be random so that multivariate spatial data signatures are highly non-coherent;(ii)random non-deposit locations should be distal to any deposit location,because non-deposit locations proximal to deposit locations are likely to have similar multivariate spatial data signatures as the deposit locations and thus preclude achievement of desired results;(iii)distal and random non-deposit locations must have values for all the univariate geoscience spatial data;(iv)the number of distal and random non-deposit locations must be equaltoFig.3.Evidence layers used in mapping prospectivity for Au deposits(from Cheng,2008):(a)and(b)represent optimum proximity to anticline axes(2.5km)and contacts between Goldenville and Halifax formations(4km),respectively;(c)and(d)represent,respectively,background and anomaly maps obtained via S-Aﬁltering of theﬁrst principal component of As,Cu,Pb and Zn data.R.Zuo,E.J.M.Carranza/Computers&Geosciences](]]]])]]]–]]]5the number of deposit locations.We used point pattern analysis (Diggle,1983;2003;Boots and Getis,1988)to evaluate degrees of spatial randomness of sets of non-deposit locations and toﬁnd distance from any deposit location and corresponding probability that one deposit location is situated next to another deposit location.In the study area,we found that the farthest distance between pairs of Au deposits is71km,indicating that within that distance from any deposit location in there is100%probability of another deposit location. However,few non-deposit locations can be selected beyond71km of the individual Au deposits in the study area.Instead,we selected random non-deposit locations beyond11km from any deposit location because within this distance from any deposit location there is90% probability of another deposit location.When using a linear kernel function and varying l from0.25to 1000,the number of support vectors and the testing errors for both ‘deposit’and‘non-deposit’do not vary(Table2).In this experiment the total error of classiﬁcation is0.0%,indicating that the accuracy of classiﬁcation is not sensitive to the choice of l.With a polynomial kernel function,we tested different values of l, d and r as follows.If d¼3,r¼0and l is increased from0.25to1000,the number of support vectors decreases from12to6,but the testing errors for‘deposit’and‘non-deposit’remain nil(Table3).If l¼0.25, r¼0and d is increased from1to1000,the number of support vectors ﬁrstly increases from11to29,then decreases from23to20,the testing error for‘non-deposit’decreases from10.0%to0.0%,whereas the testing error for‘deposit’increases from0.0%to90%(Table4). In this experiment,the total error of classiﬁcation is minimum(0.0%) when d¼10(Table4).If l¼0.25,d¼3and r is increased from 0to1000,the number of support vectors decreases from12to8,but the testing errors for‘deposit’and‘non-deposit’remain nil(Table5).When using a radial kernel function and varying l from0.25to 1000,the number of support vectors decreases from14to13,but the testing errors of‘deposit’and‘non-deposit’remain nil(Table6).With a sigmoid kernel function,we experimented with different values of l and r as follows.If r¼0and l is increased from0.25to1000, the number of support vectors is40,the testing errors for‘non-deposit’do not change,but the testing error of‘deposit’increases from 0.0%to35.0%,then decreases to6.0%(Table7).In this experiment,the total error of classiﬁcation is minimum at0.0%when l¼0.25 (Table7).If l¼0.25and r is increased from0to1000,the numbers of support vectors and the testing errors of‘deposit’and‘non-deposit’do not change and the total error remains nil(Table8).The results of the experiments demonstrate that,for the datasets in the study area,a linear kernel function,a polynomial kernel function with d¼3and r¼0,or l¼0.25,r¼0and d¼10,or l¼0.25and d¼3,a radial kernel function,and a sigmoid kernel function with r¼0and l¼0.25are optimal kernel functions.That is because the testing errors for‘deposit’and‘non-deposit’are0%in the SVM classiﬁcations(Tables2–8).Nevertheless,a sigmoid kernel with l¼0.25and r¼0,compared to all the other kernel functions,is the most optimal kernel function because it uses all the input support vectors for either‘deposit’or‘non-deposit’(Table1)and the training and testing errors for‘deposit’and‘non-deposit’are0% in the SVM classiﬁcation(Tables7and8).4.Prospectivity mapping in the study areaThe study area is located in western Meguma Terrain of Nova Scotia,Canada.It measures about7780km2.The host rock of Au deposits in this area consists of Cambro-Ordovician low-middle grade metamorphosed sedimentary rocks and a suite of Devonian aluminous granitoid intrusions(Sangster,1990;Ryan and Ramsay, 1997).The metamorphosed sedimentary strata of the Meguma Group are the lower sand-dominatedﬂysch Goldenville Formation and the upper shalyﬂysch Halifax Formation occurring in the central part of the study area.The igneous rocks occur mostly in the northern part of the study area(Fig.2).In this area,20turbidite-hosted Au deposits and occurrences (Ryan and Ramsay,1997)are found in the Meguma Group, especially near the contact zones between Goldenville and Halifax Formations(Chatterjee,1983).The major Au mineralization-related geological features are the contact zones between Gold-enville and Halifax Formations,NE–SW trending anticline axes and NE–SW trending shear zones(Sangster,1990;Ryan and Ramsay, 1997).This dataset has been used to test many mineral prospec-tivity mapping algorithms(e.g.,Agterberg,1989;Cheng,2008). More details about the geological settings and datasets in this area can be found in Xu and Cheng(2001).We used four evidence layers(Fig.3)derived and used by Cheng (2008)for mapping prospectivity for Au deposits in the yers A and B represent optimum proximity to anticline axes(2.5km) and optimum proximity to contacts between Goldenville and Halifax Formations(4km),yers C and D represent variations in geochemical background and anomaly,respectively, as modeled by multifractalﬁlter mapping of theﬁrst principal component of As,Cu,Pb,and Zn data.Details of how the four evidence layers were obtained can be found in Cheng(2008).4.1.Training datasetThe application of SVM requires two subsets of training loca-tions:one training subset of‘deposit’locations representing presence of mineral deposits,and a training subset of‘non-deposit’locations representing absence of mineral deposits.The value of y i is1for‘deposits’andÀ1for‘non-deposits’.For‘deposit’locations, we used the20known Au deposits(the sixth column of Table1).For ‘non-deposit’locations(last column of Table1),we obtained two ‘non-deposit’datasets(Tables9and10)according to the above-described selection criteria(Carranza et al.,2008).We combined the‘deposits’dataset with each of the two‘non-deposit’datasets to obtain two training datasets.Each training dataset commonly contains20known Au deposits but contains different20randomly selected non-deposits(Fig.4).4.2.Application of SVMBy using the software e1071,separate SVMs both with sigmoid kernel with l¼0.25and r¼0were constructed using the twoTable9The value of each evidence layer occurring in‘non-deposit’dataset1.yer A Layer B Layer C Layer D100002000031110400005000061000700008000090100 100100 110000 120000 130000 140000 150000 160100 170000 180000 190100 200000R.Zuo,E.J.M.Carranza/Computers&Geosciences](]]]])]]]–]]] 6training datasets.With training dataset1,the classiﬁcation accuracies for‘non-deposits’and‘deposits’are95%and100%, respectively;With training dataset2,the classiﬁcation accuracies for‘non-deposits’and‘deposits’are85%and100%,respectively.The total classiﬁcation accuracies using the two training datasets are97.5%and92.5%,respectively.The patterns of the predicted prospective target areas for Au deposits(Fig.5)are deﬁned mainly by proximity to NE–SW trending anticlines and proximity to contact zones between Goldenville and Halifax Formations.This indicates that‘geology’is better than‘geochemistry’as evidence of prospectivity for Au deposits in this area.With training dataset1,the predicted prospective target areas occupy32.6%of the study area and contain100%of the known Au deposits(Fig.5a).With training dataset2,the predicted prospec-tive target areas occupy33.3%of the study area and contain95.0% of the known Au deposits(Fig.5b).In contrast,using the same datasets,the prospective target areas predicted via WofE occupy 19.3%of study area and contain70.0%of the known Au deposits (Cheng,2008).The error matrices for two SVM classiﬁcations show that the type1(false-positive)and type2(false-negative)errors based on training dataset1(Table11)and training dataset2(Table12)are 32.6%and0%,and33.3%and5%,respectively.The total errors for two SVM classiﬁcations are16.3%and19.15%based on training datasets1and2,respectively.In contrast,the type1and type2 errors for the WofE prediction are19.3%and30%(Table13), respectively,and the total error for the WofE prediction is24.65%.The results show that the total errors of the SVM classiﬁcations are5–9%lower than the total error of the WofE prediction.The 13–14%higher false-positive errors of the SVM classiﬁcations compared to that of the WofE prediction suggest that theSVMFig.4.The locations of‘deposit’and‘non-deposit’.Table10The value of each evidence layer occurring in‘non-deposit’dataset2.yer A Layer B Layer C Layer D110102000030000411105000060110710108000091000101110111000120010131000140000150000161000171000180010190010200000R.Zuo,E.J.M.Carranza/Computers&Geosciences](]]]])]]]–]]]7classiﬁcations result in larger prospective areas that may not contain undiscovered deposits.However,the 25–30%higher false-negative error of the WofE prediction compared to those of the SVM classiﬁcations suggest that the WofE analysis results in larger non-prospective areas that may contain undiscovered deposits.Certainly,in mineral exploration the intentions are notto miss undiscovered deposits (i.e.,avoid false-negative error)and to minimize exploration cost in areas that may not really contain undiscovered deposits (i.e.,keep false-positive error as low as possible).Thus,results suggest the superiority of the SVM classi-ﬁcations over the WofE prediction.5.ConclusionsNowadays,SVMs have become a popular geocomputational tool for spatial analysis.In this paper,we used an SVM algorithm to integrate multiple variables for mineral prospectivity mapping.The results obtained by two SVM applications demonstrate that prospective target areas for Au deposits are deﬁned mainly by proximity to NE–SW trending anticlines and to contact zones between the Goldenville and Halifax Formations.In the study area,the SVM classiﬁcations of mineral prospectivity have 5–9%lower total errors,13–14%higher false-positive errors and 25–30%lower false-negative errors compared to those of the WofE prediction.These results indicate that SVM is a potentially useful tool for integrating multiple evidence layers in mineral prospectivity mapping.Table 11Error matrix for SVM classiﬁcation using training dataset 1.Known All ‘deposits’All ‘non-deposits’TotalPrediction ‘Deposit’10032.6132.6‘Non-deposit’067.467.4Total100100200Type 1(false-positive)error ¼32.6.Type 2(false-negative)error ¼0.Total error ¼16.3.Note :Values in the matrix are percentages of ‘deposit’and ‘non-deposit’locations.Table 12Error matrix for SVM classiﬁcation using training dataset 2.Known All ‘deposits’All ‘non-deposits’TotalPrediction ‘Deposits’9533.3128.3‘Non-deposits’566.771.4Total100100200Type 1(false-positive)error ¼33.3.Type 2(false-negative)error ¼5.Total error ¼19.15.Note :Values in the matrix are percentages of ‘deposit’and ‘non-deposit’locations.Table 13Error matrix for WofE prediction.Known All ‘deposits’All ‘non-deposits’TotalPrediction ‘Deposit’7019.389.3‘Non-deposit’3080.7110.7Total100100200Type 1(false-positive)error ¼19.3.Type 2(false-negative)error ¼30.Total error ¼24.65.Note :Values in the matrix are percentages of ‘deposit’and ‘non-deposit’locations.Fig.5.Prospective targets area for Au deposits delineated by SVM.(a)and (b)are obtained using training dataset 1and 2,respectively.R.Zuo,E.J.M.Carranza /Computers &Geosciences ](]]]])]]]–]]]8。

机器学习与数据挖掘笔试面试题

What is a decision tree? What are some business reasons you might want to use a decision tree model? How do you build a decision tree model? What impurity measures do you know? Describe some of the different splitting rules used by different decision tree algorithms. Is a big brushy tree always good? How will you compare aegression? Which is more suitable under different circumstances? What is pruning and why is it important? Ensemble models: To answer questions on ensemble models here is a :
Why do we combine multiple trees? What is Random Forest? Why would you prefer it to SVM? Logistic regression: Link to Logistic regression Here's a nice tutorial What is logistic regression? How do we train a logistic regression model? How do we interpret its coefficients? Support Vector Machines A tutorial on SVM can be found and What is the maximal margin classifier? How this margin can be achieved and why is it beneficial? How do we train SVM? What about hard SVM and soft SVM? What is a kernel? Explain the Kernel trick Which kernels do you know? How to choose a kernel? Neural Networks Here's a link to on Coursera What is an Artificial Neural Network? How to train an ANN? What is back propagation? How does a neural network with three layers (one input layer, one inner layer and one output layer) compare to a logistic regression? What is deep learning? What is CNN (Convolution Neural Network) or RNN (Recurrent Neural Network)? Other models: What other models do you know? How can we use Naive Bayes classifier for categorical features? What if some features are numerical? Tradeoffs between different types of classification models. How to choose the best one? Compare logistic regression with decision trees and neural networks. and What is Regularization? Which problem does Regularization try to solve? Ans. used to address the overfitting problem, it penalizes your loss function by adding a multiple of an L1 (LASSO) or an L2 (Ridge) norm of your weights vector w (it is the vector of the learned parameters in your linear regression). What does it mean (practically) for a design matrix to be "ill-conditioned"? When might you want to use ridge regression instead of traditional linear regression? What is the difference between the L1 and L2 regularization? Why (geometrically) does LASSO produce solutions with zero-valued coefficients (as opposed to ridge)? and What is the purpose of dimensionality reduction and why do we need it? Are dimensionality reduction techniques supervised or not? Are all of them are (un)supervised? What ways of reducing dimensionality do you know? Is feature selection a dimensionality reduction technique? What is the difference between feature selection and feature extraction? Is it beneficial to perform dimensionality reduction before fitting an SVM? Why or why not? and Why do you need to use cluster analysis? Give examples of some cluster analysis methods? Differentiate between partitioning method and hierarchical methods. Explain K-Means and its objective? How do you select K for K-Means?

支持向量机回归模型英文专业名词

支持向量机回归模型英文专业名词Support Vector Regression (SVR) is a powerful machine learning technique that extends the principles of Support Vector Machines (SVM) to tackle regression problems. Unlike SVMs which are primarily used for classification, SVR models are adept at predicting continuous values.SVR operates by finding the optimal hyperplane that best fits the data within a margin of error, known as the epsilon-tube. This tube encapsulates the data points, allowing for some degree of error, which is crucial for handling real-world data that may contain noise.One of the key features of SVR is its ability to handle non-linear relationships through the use of kernel functions. These functions transform the input data into a higher-dimensional space where a linear regression can be applied, thus making SVR versatile for complex datasets.Regularization is another important aspect of SVR, which helps in preventing overfitting by controlling the model's complexity. The regularization parameter, often denoted as C, plays a pivotal role in balancing the trade-off between achieving a low error and maintaining model simplicity.In practice, SVR models require careful tuning of parameters such as C, the kernel type, and kernel parameters to achieve optimal performance. Cross-validation techniquesare commonly used to find the best combination of these parameters for a given dataset.SVR has been successfully applied in various fields, including finance for predicting stock prices, in medicine for forecasting patient outcomes, and in engineering for modeling complex systems. Its robustness and adaptability make it a valuable tool in the machine learning toolkit.Despite its advantages, SVR can be computationally intensive, especially with large datasets, due to the quadratic programming problem it needs to solve. However, with the advancement of computational resources and optimization algorithms, SVR remains a viable option for regression tasks.。

人工智能算法的使用教程分享

人工智能算法的使用教程分享人工智能（Artificial Intelligence，AI）是当前科技领域的热门话题，而算法则是AI技术的核心驱动力。

在如今的高科技社会中，人工智能算法被广泛应用于各个领域，如图像识别、语音识别、自然语言处理等。

本文将分享一些常用的人工智能算法，并提供一些使用教程，帮助读者了解和应用这些算法。

1. 机器学习算法机器学习（Machine Learning）是一种让计算机基于数据和模式进行学习的方法。

以下是几种常见的机器学习算法：(1) 逻辑回归（Logistic Regression）：逻辑回归是一种用于处理分类问题的算法，用于预测二元结果。

它通过将输入数据映射到概率值的范围来进行预测。

(2) 决策树（Decision Trees）：决策树是一种用于解决分类和回归问题的算法。

它通过选择最佳特征和阈值进行树形结构的分类和预测。

(3) 支持向量机（Support Vector Machines）：支持向量机是一种用于分类和回归问题的算法，通过找到最佳超平面将数据分为不同的类别。

(4) 随机森林（Random Forests）：随机森林是一种集成学习方法，通过将多个决策树的预测结果结合起来提高准确性和鲁棒性。

使用教程：为了使用机器学习算法，首先需要收集和准备好用于训练和测试的数据。

然后，选择适当的算法和模型，将数据拟合到模型中进行训练，并使用测试数据评估模型的性能。

最后，根据实际需求对模型进行调优和优化。

2. 深度学习算法深度学习（Deep Learning）是机器学习的一种特殊形式，它模仿人脑神经网络的结构和功能，并能自动从大量数据中学习。

以下是几种常用的深度学习算法：(1) 卷积神经网络（Convolutional Neural Networks）：卷积神经网络广泛应用于图像和视频处理领域，能够在不同的层次上提取特征并进行分类或回归预测。

(2) 循环神经网络（Recurrent Neural Networks）：循环神经网络适用于序列数据的处理，对于自然语言处理、语音识别等任务具有很好的效果。

基于支持向量机的弗兰克-赫兹实验曲线拟合

本期推荐本栏目责任编辑：王力基于支持向量机的弗兰克-赫兹实验曲线拟合周祉煜1,孟倩2（1.河北师范大学物理学院，河北石家庄050024；2.江苏师范大学计算机科学与技术学院，江苏徐州221116）摘要：弗兰克-赫兹实验是“近代物理实验”中的重要实验之一，数据量大且数据处理复杂。

支持向量机是一种广泛应用于函数逼近、模式识别、回归等领域的机器学习算法。

本文将支持向量机算法应用于弗兰克－赫兹实验数据的拟合，过程简单，在python 环境下验证该方法拟合精度高，效果好。

支持向量机算法还可应用于其他的物理实验曲线拟合。

关键词：支持向量机；曲线拟合；弗兰克-赫兹实验；Python 中图分类号：TP18文献标识码：A文章编号：1009-3044(2021)13-0001-02开放科学（资源服务）标识码（OSID ）：Curve Fitting of Frank Hertz Experiment Based on Support Vector Machine ZHOU Zhi-yu 1,MENG Qian 2(1.Hebei Normal University,College of physics.,Shijiazhuang 050024,China;2.School of Computer Science and technology,Jiang⁃su Normal University,Xuzhou 221116,China)Abstract:Frank-Hertz experiment is a classical experiment in modern physics experiments.It has a large amount of experimental data and a complicated data processing process.Support Vector Machine is a machine learning algorithm which widely used in function approximation,pattern recognition,regression and other fields.In this paper,support vector machine is used to do curve fitting for the experimental data of Frank-Hertz experiment.The process is simple,and the method is verified to have high curve fit⁃ting accuracy and good effect in python environment.SVM can also be applied to curve fitting in other physics experiments.Key words:support vector machine,curve fitting,Frank Hertz experiment ，python 1998年，Vapnik V N.等人[1]提出了一种新型的基于小样本和统计学习理论的机器学习方法-支持向量机(Support Vector Machine,SVM)，该方法可以从有限的训练样本出发寻找“最优函数规律”,使它能够对未知输出作尽可能准确的预测，可应用于函数逼近、模式识别、回归等领域。

A Tutorial on Support Vector Machines for Pattern Recognition

CHRISTOPHER J.C. BURGES
burges@
Bell Laboratories, Lucent Technologies Editor: Usama Fayyad Abstract. The tutorial starts with an overview of the concepts of VC dimension and structural risk minimization. We then describe linear Support Vector Machines (SVMs) for separable and non-separable data, working through a non-trivial example in detail. We describe a mechanical analogy, and discuss when SVM solutions are unique and when they are global. We describe how support vector training can be practically implemented, and discuss in detail the kernel mapping technique which is used to construct SVM solutions which are nonlinear in the data. We show how Support Vector machines can have very large (even inﬁnite) VC dimension by computing the VC dimension for homogeneous polynomial and Gaussian radial basis function kernels. While very high VC dimension would normally bode ill for generalization performance, and while at present there exists no theory which shows that good generalization performance is guaranteed for SVMs, there are several arguments which support the observed high accuracy of SVMs, which we review. Results of some experiments which were inspired by these arguments are also presented. We give numerous examples and proofs of most of the key theorems. There is new material, and I hope that the reader will ﬁnd that even old material is cast in a fresh light. Keywords: support vector machines, statistical learning theory, VC dimension, pattern recognition

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

Learning Support Vector Machines forA Multi-View Face ModelJeffrey Ng Sing Kwong and Shaogang Gong Department of Computer Science,Queen Mary and Westﬁeld College,London E14NS,UKjeffng,sgg@AbstractSupport Vector Machines have shown great potential for learning clas-siﬁcation functions that can be applied to object recognition.In this work,we extend SVMs to model the appearance of human faces which undergononlinear change across multiple views.The approach uses inherent factorsin the nature of the input images and the SVM classiﬁcation algorithm toperform both multi-view face detection and pose estimation.1IntroductionSupport Vector Machines(SVMs)have recently been shown to be effective learning mechanisms for object recognition.By deﬁning hyperplanes in a high-dimensional fea-ture space,SVMs build complex decision boundaries to learn the distribution of a given data set.Their capabilities to learn a function approximation have been successfully ap-plied in theﬁeld of handwritten digit recognition[5]and face detection[2].The hand-writing recognition task involved constrained two-dimensional variations in the input data for each recognition class.Osuna’s face detection experiment limited the operational pa-rameters of the SVM classiﬁer to almost full-frontal views of human faces,with a small degree of tolerance to variations in the pose of detected faces.The3D pose of a face greatly inﬂuences the2D images captured by a camera.Three-dimensional head rotations perpendicular to the camera view plane introduce complex deformations into the appearance of the face.Changes in the lateral and vertical orien-tation,i.e.yaw and tilt,of a person’s head reveal more details of the3D structure of the head,as other details are occluded.The rotation of the reﬂective planes of a face can also cause largeﬂuctuations in the local lighting conditions of captured images.Such transformations are highly nonlinear but the distribution of faces across poses have been shown to form smooth trajectories in low dimensional pose eigenspace[1].In this paper, we investigate both the problem of performing multi-view face detection and the task of using Support Vector Machines to learn a model of the face pose distribution.In addition, we extend SVMs to perform pose estimation by enriching support vectors with extra pose information.5032Support Vector MachinesSVMs are based on a generic learning framework that have exhibited useful potentials in resolving some computer vision problems[6,5,2,3,4].Let usﬁrst outline the basic concept of this approach to learning classiﬁcation functions for object recognition.2.1Structural Risk MinimisationPrevious approaches to statistical learning have tended to be based onﬁnding functions to map vector-encoded data to their respective classes.The conventional minimisation of the empirical risk over training data does not however imply good generalisation to novel test data.Indeed,there could be a number of different functions which all give a good approximation to the training data set.It is nevertheless difﬁcult to determine a function which best captures the true underlying structure of the data distribution.Structural Risk Minimisation(SRM)aims to address this problem and provides a well deﬁned quantitative measure for the capacity of a learned function to generalise over unknown test data.Due to its relative simplicity,Vapnik-Chervonenkis(VC)dimension[6]in particular has been adopted as one of the more popular measures for such a capacity.By choosing a function with a low VC dimension and minimising its empirical error to a training data set,SRM can offer a guaranteed minimal bound on the test error.Perhaps the notion of VC dimension can be more clearly illustrated through hyper-plane classiﬁers.Given a data set x x,a hyperplane such asw x w(1) can be oriented across the input space to perform a binary classiﬁcation task,minimising the empirical risk of a hyperplane decision function x sign w x.This is achieved by changing the normal vector w,also known as the weight vector.There is usually a margin on either side of the hyperplane between the two classes.The VC dimension of the decision function decreases,and therefore improves,with an increasing margin.To obtain a function with the smallest VC capacity and the optimal hyperplane, one has to maximise the margin:Maximise x x(2)Subject to and(3) The optimal hyperplane is mainly deﬁned by the weight vector w which consists of all the data elements with non-zero Lagrange multipliers()in Functional(2),those elements lie on the margins of the hyperplane.They therefore deﬁne both the hyperplane and the boundaries of the two classes.The decision function of the optimal hyperplane is thus:x sign x x(4)2.2Support Vector Machines Using Kernel FunctionsA hyperplane classiﬁcation function attempts toﬁt an optimal hyperplane between two classes in a training data set,which will inevitably fail in cases where the two classes are504not linearly separable in the input space.Therefore,a high dimensional mappingis used to cater for nonlinear cases.As both the objective function and the decision func-tion is expressed in terms of dot products of data vectors x,the potentially computa-tional intensive mapping does not need to be explicitly evaluated.A kernel function, x z,satisfying Mercer’s condition can be used as a substitute for x z which replaces x z[6].For noisy data sets where there is a large overlap between data classes,error variables are introduced to allow the output of the outliers to be locally corrected,constrain-ing the range of the Lagrange multipliers from0to.is a constant which acts as a penalty function,preventing outliers from affecting the optimal hyperplane.Therefore, the nonlinear objective function isMaximise x x(5)Subject to and(6) with corresponding decision function given byx sign x x(7)There are a number of kernel functions which have been found to provide good gener-alisation capabilities,e.g.polynomials.Here we explore the use of a Gaussian kernel function(analogous to RBF networks)as follows:Gaussian Kernel x y x y(8)3The Nature of Face Pose DistributionDetecting human faces across the view sphere involves the recognition of a whole spec-trum of very different face appearances.The pose of the head reveals some details about the3-dimensional structure of the face while masking others.Head rotations introduce nonlinear deformations in captured face images while the rotation can occur in two axes outside the view plane of the camera.The face’s main direction of light reﬂection also changes and affects the illumination conditions of the captured image.Ambient day-time lighting conditions in normal ofﬁce environments are hardly symmetric for the top and bottom hemispheres of the face,while the bias towards the upper hemisphere is exacer-bated by ceiling-ﬁxed light sources during the night.The view sphere provides a framework for analysing face pose distribution and for training support vector machines over the inﬁnite number of possible pose angles of hu-man faces.For collecting training data,a3D iso-tracking machine can be used to capture human faces at preset yaw(lateral)and tilt(vertical)angles.The tracking mechanism can also provide semi-automatic segmentation facilities for cropping the face.The result is an array of accurately calibrated and cropped images as shown in Figure1.505Figure1:A sample view sphere image-array with calibrated elements varying horizon-tally from0to180yaw and vertically from60to120tilt.Figure2:Face rotation in depth forms a smooth trajectory in a3D pose eigenspace.Figure3:From left to right:The graphs show the PES trajectories for a set of10people rotating their heads from proﬁle to proﬁle,at60tilt,90tilt and120tilt respectively.A face rotating across views forms a smooth trajectory as can be seen in Figure2. In fact,faces form continuous manifolds across the view sphere in a Pose Eigen-Space (PES).It is plausible to suggest that head rotations describe a continuous function in PES.506This can be seen more clearly in Figure3.It can also be observed in Figure3that an emerging pattern exists for the vertical positioning(from the selected view angle)of the groups of trajectories.Considering that the two images on either sides are made up of the extreme tilt angles of the view sphere,the middle image indeed corresponds to the middle tilt band.The volume enclosed by the entire view sphere is more visible when the nodes of the sphere are plotted individually as in Figure4.The distribution appears to be a convex hull.Figure4:Counter-clockwise from the upper right image:Side,front and top views of the distribution of the face sphere,with the trajectory of the mean yaw clusters.The lower right image uses a special angle to show the direction of biggest variance of the yaw clusters(by the tangential lines)across the mean yaw positions.Given better correlation of the lateral bands of the face sphere,the whole distribution can be grouped into19different clusters according to their yaw orientation(0to180). We observed that the trajectory of the mean positions of the clusters,which are indeed their centroids in PES,structures the distribution across a main axis of variation.This notion is further supported by the tangentiality of the main axes of local variation inside the clusters across the mean trajectory as shown in the lower right picture in Figure4. The above observations strongly suggest that the convex hull is more akin to a“tube”,a volume function,through which data elements“ﬂow”from one end to the other as their yaw angles increase from0to180.5074Learning a Face Model across Views using SVMs Support Vector Machines perform automatic feature extraction and enable the construc-tion of complex nonlinear decision boundaries for learning the distribution of a given data set.The learning process and the number of support vectors for a data set are determined in a principled way by only a few customisable parameters which deﬁne the character-istics of the learned function.In our case,the parameters are limited to two:,the penalty value for the Lagrange multipliers to distinguish between noisy data and,for determining the effective range of Gaussian Support Vectors.Effective values for the two parameters have already been reported for frontal view face detection[2].We adopt a semi-iterative approach for obtaining good examples of negative training data.The ideal negative images chosen by SVM training algorithms for negative sup-port vectors have been reported to be naturally occurring non-face patterns that possess a strong degree of similarity to a human face[2].Given the highly complex distribution of the view sphere described in the previous section,it is crucial toﬁnd good examples of these to allow the training algorithm to construct accurate decision boundaries.It must be stressed that training is performed on masked vectors consisting of normalised pixel intensity values of face and non-face images of some300dimensions.PCA was only used for investigating the nature of the face-pose distribution.We extended a training process for frontal-view SVM face detection to use the im-ages of the view sphere.The process uses an iterative reﬁnement methodology toﬁnd important negative pattern examples in a database of big scenery pictures.This process is illustrated in Figure5.Although the resulting SVM did not show any potential for robustly detecting faces across views,its training process yielded a good database of neg-ative examples for training such a system.This shows that a single Support Vector Ma-chine cannot learn a unique model of the human face across all views.A multi-view face model must be broken down into component models which form better localised clusters in the distribution and therefore is easier for each SVM to learn a view-based subspace.Set of PositiveFace ImagesInitial Set of Negative Random-Noise ImagesSMO TrainingProcessPositive DataNegative DataFalse PositivesDatabase of various sceneriesfor multi-resolution subscanningAdded to the set of negative imagesFigure5:Boot-strapping technique for obtaining negative support vectors.Based on the nature of the face distribution in PES(Figure4),the view sphere is divided into smaller,more localised yaw segments as in Table1.The observed asymmetry of the view sphere distribution and the greater complexity of the left portion are reﬂected508into the selection of smaller segments for that region.Segment12345Yaw angles0-1020-4050-8090-130140-180No.of Elems140210280350350No.of Pos SVs107139176190203Table1:The division of the view sphere for learning multi-view SVMs.All the component SVMs were trained on the same global negative data set.The size of the negative training data is about6,000images and of those,the SVMs selected1,666 as negative support vectors,with only36shared between two or more component SVMs. This shows that the negative support vectors are well localised to the sub-space of each yaw segment.The modelling capabilities of the component SVMs and their tendency to overﬂow to the neighbouring segments corroborated with the previous observations of the structure of the distribution of the view sphere in pose eigenspace.In general,the component SVMs could detect faces at yaw angles of10on either side of their training ranges.In some cases,the overlap was as much as30.The observed phenomenon also shows that support vectors are localised in a composite distribution such as the view sphere.They can be used to detect either the whole distribution or smaller segments in that distribution.For face detection across the view sphere,the component SVMs can be arranged into a linear array to form a composite SVM classiﬁer as follows:Composite SVM x sign x(9) where x is the decision function x for SVM number.The multi-view face model can also be applied to pose estimation across the view sphere.Figure4shows the correspondence of the yaw angles to the data elements’po-sition along the mean trajectory of the yaw clusters.A similar correspondence of the tilt angles to their“vertical position”from the selected viewing angles,with the variation lying approximately perpendicular to the mean yaw trajectory,can also be observed in Figure3.Support vectors in fact deﬁne the boundaries of respective classes and should there-fore lie on the“walls”of the“tube”.Knowing the correspondence between their position in input space to their pose orientation,nearest neighbour matching should enable esti-mation of the pose for each classiﬁed image.The pose estimation is retrieved at no extra computational cost to the calculation of the decision function and is illustrated in Figure6. 5ExperimentsWe have applied the multi-view SVM-based face model to perform both multi-view face detection and pose estimation across views.First,we show the performance of the multi-view face detection system on training data given in Table2.50950Figure6:Top view of the face manifold across the pose eigenspace with pan information shown next to each support vector(dark circles).The pose orientation classiﬁcation image (white circle)is retrieved from that of the closest support vector.Training subsets12345Full detection10097.794.792.582.7Multi-scaling10010010097.085.7Training subsets678910Full detection88.794.710099.297.0Multi-scaling99.297.710010098.4Table2:Face detection on training data across the view sphere,grouped by human sub-ject.The quality of alignment of the input images played an important role in the learning process.Most of the misclassiﬁed elements of the view sphere were correctly recognised after multi-scaling the images.Multi-scaling is performed on the input images with a bias in each of the four directions to correct misalignments of the face images.It is worth pointing out that our previous work reported that the variation of the view sphere distribution along the second principle component axis was highly related to the level of local lighting in the image[1].Using an overhead light source yields such an effect on the captured images.The lighting conditions must therefore help in the deter-mination of the tilt orientation of the faces.However,it makes down-facing poses very poorly illuminated and therefore,very difﬁcult to detect by the system as shown in Fig-ure7.The Multi-View SVM face detector and pose estimator was tested over a number of test sequences of human subjects freely turning their heads around,with the ground-truths of the pose information measured for comparison.For test sequences A-D,the system has been coupled to the iso-tracker to test its classiﬁcation and pose-estimation accuracy.In sequence E,the system is used to detect,track and estimate the pose of the human subject without the iso-tracker.Experiments on three subjects are given here for illustration:the subject with the worst detection results during training(test sequences A,B and E)and two novel subjects unknown to the training process(test sequences C and D).The latter were selected to test the generalisation capabilities of the system.510Figure 7:Misclassi ﬁcation in lower hemisphere of the view sphere (shown by -1,-1).Image multi-scaling is shown with white rectangles.Figure 8:Selected frames from an example sequence (E)of detected and tracked moving faces.The graphs also show the estimated face pose (in grey)over time and their cor-responding ground-truths (in black),measured by electro-magnetic sensors.The vertical lines indicate moments in time where no face was detected.6DiscussionIn this work,we have shown that a well structured distribution of a face training image data set allows a collection of view-based component Support Vector Machines to be locally trained on segments of that distribution.The outputs of the component Support Vector Machines can then be integrated into a composite SVM function,which effectively gives a generic face model across the entire view sphere.The model enables multi-view face detection across the view sphere in our case without any gap in the detection of faces at the “seams ”of the segments.The technique has also been extended to use the inherent511Test Sequence Detection Rate Mean Yaw Error Mean Tilt ErrorA10011.07 6.62B84.911.467 6.32C82.913.577.29D99.68.738.67E99.28.908.21Table3:Test results of the multi-view face detector and pose estimator from a total of over1000images from a set of test sequences.structure of the data to perform pose estimation at no extra computational cost to the detection process.In particular,support vectors have been tagged with pose information to allow the retrieval of pose orientation by nearest neighbour matching to the support vectors.The results show that the support vectors obtained from the view sphere make good prototypes for pose estimation by nearest neighbour matching.The accuracy of the face alignment and orientation calibration of some of the training images were not perfect.A better training set could have better deﬁned the decision surface and allow nearest neighbour matching to yield more accurate results.We believe that pose estimation can still be further reﬁned by using high-dimensional mapping and the weighted decision function of the SVMs to perform nonlinear pose estimation. References[1]S.Gong,S.McKenna,and J.Collins.An investigation into face pose distributions.In IEEE Int.Conf.on Automatic Face and Gesture Recognition,pages265–270,Ver-mont,1996.[2]E.Osuna,R.Freund,and F.Girosi.Training support vector machines:an applicationto face detection.In CVPR,1997.[3]J.C Platt.Fast Training of Support Vector Machines using Sequential Minimal Opti-misation.Microsoft Research Technical Report MSR-TR-98-14,1998.[4]B.Sch¨o lkopf,C.Burges,and A.Sm¨o la.Advances in Kernel Methods-Support VectorLearning.MIT Press,1998.[5]B.Sch¨o lkopf,C.Burges,and V.Vapnik.Incorporating invariances in support vectorlearning machines.In International Conference on Artiﬁcial Neural Networks,1996.[6]V.Vapnik.The Nature of Statistical Learning Theory.Springer Verlag,New York,1995.512。