Model selection for support vector machines
- 格式:pdf
- 大小:90.75 KB
- 文档页数:7
Matlab的第三方工具箱大全(强烈推荐)Matlab Toolboxes∙ADCPtools - acoustic doppler current profiler data processing∙AFDesign - designing analog and digital filters∙AIRES - automatic integration of reusable embedded software∙Air-Sea - air-sea flux estimates in oceanography∙Animation - developing scientific animations∙ARfit - estimation of parameters and eigenmodes of multivariate autoregressive methods∙ARMASA - power spectrum estimation∙AR-Toolkit - computer vision tracking∙Auditory - auditory models∙b4m - interval arithmetic∙Bayes Net - inference and learning for directed graphical models∙Binaural Modeling - calculating binaural cross-correlograms of sound∙Bode Step - design of control systems with maximized feedback∙Bootstrap - for resampling, hypothesis testing and confidence interval estimation ∙BrainStorm - MEG and EEG data visualization and processing∙BSTEX - equation viewer∙CALFEM - interactive program for teaching the finite element method∙Calibr - for calibrating CCD cameras∙Camera Calibration∙Captain - non-stationary time series analysis and forecasting∙CHMMBOX - for coupled hidden Markov modeling using maximum likelihood EM∙Classification - supervised and unsupervised classification algorithms∙CLOSID∙Cluster - for analysis of Gaussian mixture models for data set clustering ∙Clustering - cluster analysis∙ClusterPack - cluster analysis∙COLEA - speech analysis∙CompEcon - solving problems in economics and finance∙Complex - for estimating temporal and spatial signal complexities∙Computational Statistics∙Coral - seismic waveform analysis∙DACE - kriging approximations to computer models∙DAIHM - data assimilation in hydrological and hydrodynamic models∙Data Visualization∙DBT - radar array processing∙DDE-BIFTOOL - bifurcation analysis of delay differential equations∙Denoise - for removing noise from signals∙DiffMan - solving differential equations on manifolds∙Dimensional Analysis -∙DIPimage - scientific image processing∙Direct - Laplace transform inversion via the direct integration method ∙DirectSD - analysis and design of computer controlled systems with process-oriented models∙DMsuite - differentiation matrix suite∙DMTTEQ - design and test time domain equalizer design methods∙DrawFilt - drawing digital and analog filters∙DSFWAV - spline interpolation with Dean wave solutions∙DWT - discrete wavelet transforms∙EasyKrig∙Econometrics∙EEGLAB∙EigTool - graphical tool for nonsymmetric eigenproblems∙EMSC - separating light scattering and absorbance by extended multiplicative signal correction∙Engineering Vibration∙FastICA - fixed-point algorithm for ICA and projection pursuit∙FDC - flight dynamics and control∙FDtools - fractional delay filter design∙FlexICA - for independent components analysis∙FMBPC - fuzzy model-based predictive control∙ForWaRD - Fourier-wavelet regularized deconvolution∙FracLab - fractal analysis for signal processing∙FSBOX - stepwise forward and backward selection of features using linear regression∙GABLE - geometric algebra tutorial∙GAOT - genetic algorithm optimization∙Garch - estimating and diagnosing heteroskedasticity in time series models∙GCE Data - managing, analyzing and displaying data and metadata stored using the GCE data structure specification∙GCSV - growing cell structure visualization∙GEMANOVA - fitting multilinear ANOVA models∙Genetic Algorithm∙Geodetic - geodetic calculations∙GHSOM - growing hierarchical self-organizing map∙glmlab - general linear models∙GPIB - wrapper for GPIB library from National Instrument∙GTM - generative topographic mapping, a model for density modeling and data visualization∙GVF - gradient vector flow for finding 3-D object boundaries∙HFRadarmap - converts HF radar data from radial current vectors to total vectors ∙HFRC - importing, processing and manipulating HF radar data∙Hilbert - Hilbert transform by the rational eigenfunction expansion method∙HMM - hidden Markov models∙HMMBOX - for hidden Markov modeling using maximum likelihood EM∙HUTear - auditory modeling∙ICALAB - signal and image processing using ICA and higher order statistics∙Imputation - analysis of incomplete datasets∙IPEM - perception based musical analysisJMatLink - Matlab Java classesKalman - Bayesian Kalman filterKalman Filter - filtering, smoothing and parameter estimation (using EM) for linear dynamical systemsKALMTOOL - state estimation of nonlinear systemsKautz - Kautz filter designKrigingLDestimate - estimation of scaling exponentsLDPC - low density parity check codesLISQ - wavelet lifting scheme on quincunx gridsLKER - Laguerre kernel estimation toolLMAM-OLMAM - Levenberg Marquardt with Adaptive Momentum algorithm for training feedforward neural networksLow-Field NMR - for exponential fitting, phase correction of quadrature data and slicing LPSVM - Newton method for LP support vector machine for machine learning problems LSDPTOOL - robust control system design using the loop shaping design procedureLS-SVMlabLSVM - Lagrangian support vector machine for machine learning problemsLyngby - functional neuroimagingMARBOX - for multivariate autogressive modeling and cross-spectral estimation MatArray - analysis of microarray dataMatrix Computation - constructing test matrices, computing matrix factorizations, visualizing matrices, and direct search optimizationMCAT - Monte Carlo analysisMDP - Markov decision processesMESHPART - graph and mesh partioning methodsMILES - maximum likelihood fitting using ordinary least squares algorithmsMIMO - multidimensional code synthesisMissing - functions for handling missing data valuesM_Map - geographic mapping toolsMODCONS - multi-objective control system designMOEA - multi-objective evolutionary algorithmsMS - estimation of multiscaling exponentsMultiblock - analysis and regression on several data blocks simultaneouslyMultiscale Shape AnalysisMusic Analysis - feature extraction from raw audio signals for content-based music retrievalMWM - multifractal wavelet modelNetCDFNetlab - neural network algorithmsNiDAQ - data acquisition using the NiDAQ libraryNEDM - nonlinear economic dynamic modelsNMM - numerical methods in Matlab textNNCTRL - design and simulation of control systems based on neural networks NNSYSID - neural net based identification of nonlinear dynamic systemsNSVM - newton support vector machine for solving machine learning problems NURBS - non-uniform rational B-splinesN-way - analysis of multiway data with multilinear modelsOpenFEM - finite element developmentPCNN - pulse coupled neural networksPeruna - signal processing and analysisPhiVis - probabilistic hierarchical interactive visualization, i.e. functions for visual analysis of multivariate continuous dataPlanar Manipulator - simulation of n-DOF planar manipulatorsPRTools - pattern recognitionpsignifit - testing hyptheses about psychometric functionsPSVM - proximal support vector machine for solving machine learning problemsPsychophysics - vision researchPyrTools - multi-scale image processingRBF - radial basis function neural networksRBN - simulation of synchronous and asynchronous random boolean networks ReBEL - sigma-point Kalman filtersRegression - basic multivariate data analysis and regressionRegularization ToolsRegularization Tools XPRestore ToolsRobot - robotics functions, e.g. kinematics, dynamics and trajectory generation Robust Calibration - robust calibration in statsRRMT - rainfall-runoff modellingSAM - structure and motionSchwarz-Christoffel - computation of conformal maps to polygonally bounded regions SDH - smoothed data histogramSeaGrid - orthogonal grid makerSEA-MAT - oceanographic analysisSLS - sparse least squaresSolvOpt - solver for local optimization problemsSOM - self-organizing mapSOSTOOLS - solving sums of squares (SOS) optimization problemsSpatial and Geometric AnalysisSpatial RegressionSpatial StatisticsSpectral MethodsSPM - statistical parametric mappingSSVM - smooth support vector machine for solving machine learning problems STATBAG - for linear regression, feature selection, generation of data, and significance testingStatBox - statistical routinesStatistical Pattern Recognition - pattern recognition methodsStixbox - statisticsSVM - implements support vector machinesSVM ClassifierSymbolic Robot DynamicsTEMPLAR - wavelet-based template learning and pattern classificationTextClust - model-based document clusteringTextureSynth - analyzing and synthesizing visual texturesTfMin - continous 3-D minimum time orbit transfer around EarthTime-Frequency - analyzing non-stationary signals using time-frequency distributions Tree-Ring - tasks in tree-ring analysisTSA - uni- and multivariate, stationary and non-stationary time series analysis TSTOOL - nonlinear time series analysisT_Tide - harmonic analysis of tidesUTVtools - computing and modifying rank-revealing URV and UTV decompositionsUvi_Wave - wavelet analysisvarimax - orthogonal rotation of EOFsVBHMM - variation Bayesian hidden Markov modelsVBMFA - variational Bayesian mixtures of factor analyzersVMT - VRML Molecule Toolbox, for animating results from molecular dynamics experiments VOICEBOXVRMLplot - generates interactive VRML 2.0 graphs and animationsVSVtools - computing and modifying symmetric rank-revealing decompositionsWAFO - wave analysis for fatique and oceanographyWarpTB - frequency-warped signal processingWAVEKIT - wavelet analysisWaveLab - wavelet analysisWeeks - Laplace transform inversion via the Weeks methodWetCDF - NetCDF interfaceWHMT - wavelet-domain hidden Markov tree modelsWInHD - Wavelet-based inverse halftoning via deconvolutionWSCT - weighted sequences clustering toolkitXMLTree - XML parserYAADA - analyze single particle mass spectrum dataZMAP - quantitative seismicity analysis。
UNIVERSITY OF SOUTHAMPTONSupport Vector MachinesforClassification and RegressionbySteve R.GunnTechnical ReportFaculty of Engineering,Science and Mathematics School of Electronics and Computer Science10May1998ContentsNomenclature xi1Introduction11.1Statistical Learning Theory (2)1.1.1VC Dimension (3)1.1.2Structural Risk Minimisation (4)2Support Vector Classification52.1The Optimal Separating Hyperplane (5)2.1.1Linearly Separable Example (10)2.2The Generalised Optimal Separating Hyperplane (10)2.2.1Linearly Non-Separable Example (13)2.3Generalisation in High Dimensional Feature Space (14)2.3.1Polynomial Mapping Example (16)2.4Discussion (16)3Feature Space193.1Kernel Functions (19)3.1.1Polynomial (20)3.1.2Gaussian Radial Basis Function (20)3.1.3Exponential Radial Basis Function (20)3.1.4Multi-Layer Perceptron (20)3.1.5Fourier Series (21)3.1.6Splines (21)3.1.7B splines (21)3.1.8Additive Kernels (22)3.1.9Tensor Product (22)3.2Implicit vs.Explicit Bias (22)3.3Data Normalisation (23)3.4Kernel Selection (23)4Classification Example:IRIS data254.1Applications (28)5Support Vector Regression295.1Linear Regression (30)5.1.1 -insensitive Loss Function (30)5.1.2Quadratic Loss Function (31)iiiiv CONTENTS5.1.3Huber Loss Function (32)5.1.4Example (33)5.2Non Linear Regression (33)5.2.1Examples (34)5.2.2Comments (36)6Regression Example:Titanium Data396.1Applications (42)7Conclusions43A Implementation Issues45A.1Support Vector Classification (45)A.2Support Vector Regression (47)B MATLAB SVM Toolbox51 Bibliography53List of Figures1.1Modelling Errors (2)1.2VC Dimension Illustration (3)2.1Optimal Separating Hyperplane (5)2.2Canonical Hyperplanes (6)2.3Constraining the Canonical Hyperplanes (7)2.4Optimal Separating Hyperplane (10)2.5Generalised Optimal Separating Hyperplane (11)2.6Generalised Optimal Separating Hyperplane Example(C=1) (13)2.7Generalised Optimal Separating Hyperplane Example(C=105) (14)2.8Generalised Optimal Separating Hyperplane Example(C=10−8) (14)2.9Mapping the Input Space into a High Dimensional Feature Space (14)2.10Mapping input space into Polynomial Feature Space (16)3.1Comparison between Implicit and Explicit bias for a linear kernel (22)4.1Iris data set (25)4.2Separating Setosa with a linear SVC(C=∞) (26)4.3Separating Viginica with a polynomial SVM(degree2,C=∞) (26)4.4Separating Viginica with a polynomial SVM(degree10,C=∞) (26)4.5Separating Viginica with a Radial Basis Function SVM(σ=1.0,C=∞)274.6Separating Viginica with a polynomial SVM(degree2,C=10) (27)4.7The effect of C on the separation of Versilcolor with a linear spline SVM.285.1Loss Functions (29)5.2Linear regression (33)5.3Polynomial Regression (35)5.4Radial Basis Function Regression (35)5.5Spline Regression (36)5.6B-spline Regression (36)5.7Exponential RBF Regression (36)6.1Titanium Linear Spline Regression( =0.05,C=∞) (39)6.2Titanium B-Spline Regression( =0.05,C=∞) (40)6.3Titanium Gaussian RBF Regression( =0.05,σ=1.0,C=∞) (40)6.4Titanium Gaussian RBF Regression( =0.05,σ=0.3,C=∞) (40)6.5Titanium Exponential RBF Regression( =0.05,σ=1.0,C=∞) (41)6.6Titanium Fourier Regression( =0.05,degree3,C=∞) (41)6.7Titanium Linear Spline Regression( =0.05,C=10) (42)vvi LIST OF FIGURES6.8Titanium B-Spline Regression( =0.05,C=10) (42)List of Tables2.1Linearly Separable Classification Data (10)2.2Non-Linearly Separable Classification Data (13)5.1Regression Data (33)viiListingsA.1Support Vector Classification MATLAB Code (46)A.2Support Vector Regression MATLAB Code (48)ixNomenclature0Column vector of zeros(x)+The positive part of xC SVM misclassification tolerance parameterD DatasetK(x,x )Kernel functionR[f]Risk functionalR emp[f]Empirical Risk functionalxiChapter1IntroductionThe problem of empirical data modelling is germane to many engineering applications. In empirical data modelling a process of induction is used to build up a model of the system,from which it is hoped to deduce responses of the system that have yet to be ob-served.Ultimately the quantity and quality of the observations govern the performance of this empirical model.By its observational nature data obtained isfinite and sampled; typically this sampling is non-uniform and due to the high dimensional nature of the problem the data will form only a sparse distribution in the input space.Consequently the problem is nearly always ill posed(Poggio et al.,1985)in the sense of Hadamard (Hadamard,1923).Traditional neural network approaches have suffered difficulties with generalisation,producing models that can overfit the data.This is a consequence of the optimisation algorithms used for parameter selection and the statistical measures used to select the’best’model.The foundations of Support Vector Machines(SVM)have been developed by Vapnik(1995)and are gaining popularity due to many attractive features,and promising empirical performance.The formulation embodies the Struc-tural Risk Minimisation(SRM)principle,which has been shown to be superior,(Gunn et al.,1997),to traditional Empirical Risk Minimisation(ERM)principle,employed by conventional neural networks.SRM minimises an upper bound on the expected risk, as opposed to ERM that minimises the error on the training data.It is this difference which equips SVM with a greater ability to generalise,which is the goal in statistical learning.SVMs were developed to solve the classification problem,but recently they have been extended to the domain of regression problems(Vapnik et al.,1997).In the literature the terminology for SVMs can be slightly confusing.The term SVM is typ-ically used to describe classification with support vector methods and support vector regression is used to describe regression with support vector methods.In this report the term SVM will refer to both classification and regression methods,and the terms Support Vector Classification(SVC)and Support Vector Regression(SVR)will be used for specification.This section continues with a brief introduction to the structural risk12Chapter1Introductionminimisation principle.In Chapter2the SVM is introduced in the setting of classifica-tion,being both historical and more accessible.This leads onto mapping the input into a higher dimensional feature space by a suitable choice of kernel function.The report then considers the problem of regression.Illustrative examples re given to show the properties of the techniques.1.1Statistical Learning TheoryThis section is a very brief introduction to statistical learning theory.For a much more in depth look at statistical learning theory,see(Vapnik,1998).Figure1.1:Modelling ErrorsThe goal in modelling is to choose a model from the hypothesis space,which is closest (with respect to some error measure)to the underlying function in the target space. Errors in doing this arise from two cases:Approximation Error is a consequence of the hypothesis space being smaller than the target space,and hence the underlying function may lie outside the hypothesis space.A poor choice of the model space will result in a large approximation error, and is referred to as model mismatch.Estimation Error is the error due to the learning procedure which results in a tech-nique selecting the non-optimal model from the hypothesis space.Chapter1Introduction3Together these errors form the generalisation error.Ultimately we would like tofind the function,f,which minimises the risk,R[f]=X×YL(y,f(x))P(x,y)dxdy(1.1)However,P(x,y)is unknown.It is possible tofind an approximation according to the empirical risk minimisation principle,R emp[f]=1lli=1Ly i,fx i(1.2)which minimises the empirical risk,ˆf n,l (x)=arg minf∈H nR emp[f](1.3)Empirical risk minimisation makes sense only if,liml→∞R emp[f]=R[f](1.4) which is true from the law of large numbers.However,it must also satisfy,lim l→∞minf∈H nR emp[f]=minf∈H nR[f](1.5)which is only valid when H n is’small’enough.This condition is less intuitive and requires that the minima also converge.The following bound holds with probability1−δ,R[f]≤R emp[f]+h ln2lh+1−lnδ4l(1.6)Remarkably,this expression for the expected risk is independent of the probability dis-tribution.1.1.1VC DimensionThe VC dimension is a scalar value that measures the capacity of a set offunctions.Figure1.2:VC Dimension Illustration4Chapter1IntroductionDefinition1.1(Vapnik–Chervonenkis).The VC dimension of a set of functions is p if and only if there exists a set of points{x i}pi=1such that these points can be separatedin all2p possible configurations,and that no set{x i}qi=1exists where q>p satisfying this property.Figure1.2illustrates how three points in the plane can be shattered by the set of linear indicator functions whereas four points cannot.In this case the VC dimension is equal to the number of free parameters,but in general that is not the case;e.g.the function A sin(bx)has an infinite VC dimension(Vapnik,1995).The set of linear indicator functions in n dimensional space has a VC dimension equal to n+1.1.1.2Structural Risk MinimisationCreate a structure such that S h is a hypothesis space of VC dimension h then,S1⊂S2⊂...⊂S∞(1.7) SRM consists in solving the following problemmin S h R emp[f]+h ln2lh+1−lnδ4l(1.8)If the underlying process being modelled is not deterministic the modelling problem becomes more exacting and consequently this chapter is restricted to deterministic pro-cesses.Multiple output problems can usually be reduced to a set of single output prob-lems that may be considered independent.Hence it is appropriate to consider processes with multiple inputs from which it is desired to predict a single output.Chapter2Support Vector ClassificationThe classification problem can be restricted to consideration of the two-class problem without loss of generality.In this problem the goal is to separate the two classes by a function which is induced from available examples.The goal is to produce a classifier that will work well on unseen examples,i.e.it generalises well.Consider the example in Figure2.1.Here there are many possible linear classifiers that can separate the data, but there is only one that maximises the margin(maximises the distance between it and the nearest data point of each class).This linear classifier is termed the optimal separating hyperplane.Intuitively,we would expect this boundary to generalise well as opposed to the other possible boundaries.Figure2.1:Optimal Separating Hyperplane2.1The Optimal Separating HyperplaneConsider the problem of separating the set of training vectors belonging to two separateclasses,D=(x1,y1),...,(x l,y l),x∈R n,y∈{−1,1},(2.1)56Chapter 2Support Vector Classificationwith a hyperplane, w,x +b =0.(2.2)The set of vectors is said to be optimally separated by the hyperplane if it is separated without error and the distance between the closest vector to the hyperplane is maximal.There is some redundancy in Equation 2.2,and without loss of generality it is appropri-ate to consider a canonical hyperplane (Vapnik ,1995),where the parameters w ,b are constrained by,min i w,x i +b =1.(2.3)This incisive constraint on the parameterisation is preferable to alternatives in simpli-fying the formulation of the problem.In words it states that:the norm of the weight vector should be equal to the inverse of the distance,of the nearest point in the data set to the hyperplane .The idea is illustrated in Figure 2.2,where the distance from the nearest point to each hyperplane is shown.Figure 2.2:Canonical HyperplanesA separating hyperplane in canonical form must satisfy the following constraints,y i w,x i +b ≥1,i =1,...,l.(2.4)The distance d (w,b ;x )of a point x from the hyperplane (w,b )is,d (w,b ;x )= w,x i +bw .(2.5)Chapter 2Support Vector Classification 7The optimal hyperplane is given by maximising the margin,ρ,subject to the constraints of Equation 2.4.The margin is given by,ρ(w,b )=min x i :y i =−1d (w,b ;x i )+min x i :y i =1d (w,b ;x i )=min x i :y i =−1 w,x i +b w +min x i :y i =1 w,x i +b w =1 w min x i :y i =−1 w,x i +b +min x i :y i =1w,x i +b =2 w (2.6)Hence the hyperplane that optimally separates the data is the one that minimisesΦ(w )=12 w 2.(2.7)It is independent of b because provided Equation 2.4is satisfied (i.e.it is a separating hyperplane)changing b will move it in the normal direction to itself.Accordingly the margin remains unchanged but the hyperplane is no longer optimal in that it will be nearer to one class than the other.To consider how minimising Equation 2.7is equivalent to implementing the SRM principle,suppose that the following bound holds,w <A.(2.8)Then from Equation 2.4and 2.5,d (w,b ;x )≥1A.(2.9)Accordingly the hyperplanes cannot be nearer than 1A to any of the data points and intuitively it can be seen in Figure 2.3how this reduces the possible hyperplanes,andhence thecapacity.Figure 2.3:Constraining the Canonical Hyperplanes8Chapter2Support Vector ClassificationThe VC dimension,h,of the set of canonical hyperplanes in n dimensional space is bounded by,h≤min[R2A2,n]+1,(2.10)where R is the radius of a hypersphere enclosing all the data points.Hence minimising Equation2.7is equivalent to minimising an upper bound on the VC dimension.The solution to the optimisation problem of Equation2.7under the constraints of Equation 2.4is given by the saddle point of the Lagrange functional(Lagrangian)(Minoux,1986),Φ(w,b,α)=12 w 2−li=1αiy iw,x i +b−1,(2.11)whereαare the Lagrange multipliers.The Lagrangian has to be minimised with respect to w,b and maximised with respect toα≥0.Classical Lagrangian duality enables the primal problem,Equation2.11,to be transformed to its dual problem,which is easier to solve.The dual problem is given by,max αW(α)=maxαminw,bΦ(w,b,α).(2.12)The minimum with respect to w and b of the Lagrangian,Φ,is given by,∂Φ∂b =0⇒li=1αi y i=0∂Φ∂w =0⇒w=li=1αi y i x i.(2.13)Hence from Equations2.11,2.12and2.13,the dual problem is,max αW(α)=maxα−12li=1lj=1αiαj y i y j x i,x j +lk=1αk,(2.14)and hence the solution to the problem is given by,α∗=arg minα12li=1lj=1αiαj y i y j x i,x j −lk=1αk,(2.15)with constraints,αi≥0i=1,...,llj=1αj y j=0.(2.16)Chapter2Support Vector Classification9Solving Equation2.15with constraints Equation2.16determines the Lagrange multi-pliers,and the optimal separating hyperplane is given by,w∗=li=1αi y i x ib∗=−12w∗,x r+x s .(2.17)where x r and x s are any support vector from each class satisfying,αr,αs>0,y r=−1,y s=1.(2.18)The hard classifier is then,f(x)=sgn( w∗,x +b)(2.19) Alternatively,a soft classifier may be used which linearly interpolates the margin,f(x)=h( w∗,x +b)where h(z)=−1:z<−1z:−1≤z≤1+1:z>1(2.20)This may be more appropriate than the hard classifier of Equation2.19,because it produces a real valued output between−1and1when the classifier is queried within the margin,where no training data resides.From the Kuhn-Tucker conditions,αiy iw,x i +b−1=0,i=1,...,l,(2.21)and hence only the points x i which satisfy,y iw,x i +b=1(2.22)will have non-zero Lagrange multipliers.These points are termed Support Vectors(SV). If the data is linearly separable all the SV will lie on the margin and hence the number of SV can be very small.Consequently the hyperplane is determined by a small subset of the training set;the other points could be removed from the training set and recalculating the hyperplane would produce the same answer.Hence SVM can be used to summarise the information contained in a data set by the SV produced.If the data is linearly separable the following equality will hold,w 2=li=1αi=i∈SV sαi=i∈SV sj∈SV sαiαj y i y j x i,x j .(2.23)Hence from Equation2.10the VC dimension of the classifier is bounded by,h≤min[R2i∈SV s,n]+1,(2.24)10Chapter2Support Vector Classificationx1x2y11-133113131-12 2.513 2.5-143-1Table2.1:Linearly Separable Classification Dataand if the training data,x,is normalised to lie in the unit hypersphere,,n],(2.25)h≤1+min[i∈SV s2.1.1Linearly Separable ExampleTo illustrate the method consider the training set in Table2.1.The SVC solution is shown in Figure2.4,where the dotted lines describe the locus of the margin and the circled data points represent the SV,which all lie on the margin.Figure2.4:Optimal Separating Hyperplane2.2The Generalised Optimal Separating HyperplaneSo far the discussion has been restricted to the case where the training data is linearly separable.However,in general this will not be the case,Figure2.5.There are two approaches to generalising the problem,which are dependent upon prior knowledge of the problem and an estimate of the noise on the data.In the case where it is expected (or possibly even known)that a hyperplane can correctly separate the data,a method ofChapter2Support Vector Classification11Figure2.5:Generalised Optimal Separating Hyperplaneintroducing an additional cost function associated with misclassification is appropriate. Alternatively a more complex function can be used to describe the boundary,as discussed in Chapter2.1.To enable the optimal separating hyperplane method to be generalised, Cortes and Vapnik(1995)introduced non-negative variables,ξi≥0,and a penalty function,Fσ(ξ)=iξσiσ>0,(2.26) where theξi are a measure of the misclassification errors.The optimisation problem is now posed so as to minimise the classification error as well as minimising the bound on the VC dimension of the classifier.The constraints of Equation2.4are modified for the non-separable case to,y iw,x i +b≥1−ξi,i=1,...,l.(2.27)whereξi≥0.The generalised optimal separating hyperplane is determined by the vector w,that minimises the functional,Φ(w,ξ)=12w 2+Ciξi,(2.28)(where C is a given value)subject to the constraints of Equation2.27.The solution to the optimisation problem of Equation2.28under the constraints of Equation2.27is given by the saddle point of the Lagrangian(Minoux,1986),Φ(w,b,α,ξ,β)=12 w 2+Ciξi−li=1αiy iw T x i+b−1+ξi−lj=1βiξi,(2.29)12Chapter2Support Vector Classificationwhereα,βare the Lagrange multipliers.The Lagrangian has to be minimised with respect to w,b,x and maximised with respect toα,β.As before,classical Lagrangian duality enables the primal problem,Equation2.29,to be transformed to its dual problem. The dual problem is given by,max αW(α,β)=maxα,βminw,b,ξΦ(w,b,α,ξ,β).(2.30)The minimum with respect to w,b andξof the Lagrangian,Φ,is given by,∂Φ∂b =0⇒li=1αi y i=0∂Φ∂w =0⇒w=li=1αi y i x i∂Φ∂ξ=0⇒αi+βi=C.(2.31) Hence from Equations2.29,2.30and2.31,the dual problem is,max αW(α)=maxα−12li=1lj=1αiαj y i y j x i,x j +lk=1αk,(2.32)and hence the solution to the problem is given by,α∗=arg minα12li=1lj=1αiαj y i y j x i,x j −lk=1αk,(2.33)with constraints,0≤αi≤C i=1,...,llj=1αj y j=0.(2.34)The solution to this minimisation problem is identical to the separable case except for a modification of the bounds of the Lagrange multipliers.The uncertain part of Cortes’s approach is that the coefficient C has to be determined.This parameter introduces additional capacity control within the classifier.C can be directly related to a regulari-sation parameter(Girosi,1997;Smola and Sch¨o lkopf,1998).Blanz et al.(1996)uses a value of C=5,but ultimately C must be chosen to reflect the knowledge of the noise on the data.This warrants further work,but a more practical discussion is given in Chapter4.Chapter2Support Vector Classification13x1x2y11-133113131-12 2.513 2.5-143-11.5 1.5112-1Table2.2:Non-Linearly Separable Classification Data2.2.1Linearly Non-Separable ExampleTwo additional data points are added to the separable data of Table2.1to produce a linearly non-separable data set,Table2.2.The resulting SVC is shown in Figure2.6,for C=1.The SV are no longer required to lie on the margin,as in Figure2.4,and the orientation of the hyperplane and the width of the margin are different.Figure2.6:Generalised Optimal Separating Hyperplane Example(C=1)In the limit,lim C→∞the solution converges towards the solution obtained by the optimal separating hyperplane(on this non-separable data),Figure2.7.In the limit,lim C→0the solution converges to one where the margin maximisation term dominates,Figure2.8.Beyond a certain point the Lagrange multipliers will all take on the value of C.There is now less emphasis on minimising the misclassification error, but purely on maximising the margin,producing a large width margin.Consequently as C decreases the width of the margin increases.The useful range of C lies between the point where all the Lagrange Multipliers are equal to C and when only one of them is just bounded by C.14Chapter2Support Vector ClassificationFigure2.7:Generalised Optimal Separating Hyperplane Example(C=105)Figure2.8:Generalised Optimal Separating Hyperplane Example(C=10−8)2.3Generalisation in High Dimensional Feature SpaceIn the case where a linear boundary is inappropriate the SVM can map the input vector, x,into a high dimensional feature space,z.By choosing a non-linear mapping a priori, the SVM constructs an optimal separating hyperplane in this higher dimensional space, Figure2.9.The idea exploits the method of Aizerman et al.(1964)which,enables the curse of dimensionality(Bellman,1961)to be addressed.Figure2.9:Mapping the Input Space into a High Dimensional Feature SpaceChapter2Support Vector Classification15There are some restrictions on the non-linear mapping that can be employed,see Chap-ter3,but it turns out,surprisingly,that most commonly employed functions are accept-able.Among acceptable mappings are polynomials,radial basis functions and certain sigmoid functions.The optimisation problem of Equation2.33becomes,α∗=arg minα12li=1lj=1αiαj y i y j K(x i,x j)−lk=1αk,(2.35)where K(x,x )is the kernel function performing the non-linear mapping into feature space,and the constraints are unchanged,0≤αi≤C i=1,...,llj=1αj y j=0.(2.36)Solving Equation2.35with constraints Equation2.36determines the Lagrange multipli-ers,and a hard classifier implementing the optimal separating hyperplane in the feature space is given by,f(x)=sgn(i∈SV sαi y i K(x i,x)+b)(2.37) wherew∗,x =li=1αi y i K(x i,x)b∗=−12li=1αi y i[K(x i,x r)+K(x i,x r)].(2.38)The bias is computed here using two support vectors,but can be computed using all the SV on the margin for stability(Vapnik et al.,1997).If the Kernel contains a bias term, the bias can be accommodated within the Kernel,and hence the classifier is simply,f(x)=sgn(i∈SV sαi K(x i,x))(2.39)Many employed kernels have a bias term and anyfinite Kernel can be made to have one(Girosi,1997).This simplifies the optimisation problem by removing the equality constraint of Equation2.36.Chapter3discusses the necessary conditions that must be satisfied by valid kernel functions.16Chapter2Support Vector Classification 2.3.1Polynomial Mapping ExampleConsider a polynomial kernel of the form,K(x,x )=( x,x +1)2,(2.40)which maps a two dimensional input vector into a six dimensional feature space.Apply-ing the non-linear SVC to the linearly non-separable training data of Table2.2,produces the classification illustrated in Figure2.10(C=∞).The margin is no longer of constant width due to the non-linear projection into the input space.The solution is in contrast to Figure2.7,in that the training data is now classified correctly.However,even though SVMs implement the SRM principle and hence can generalise well,a careful choice of the kernel function is necessary to produce a classification boundary that is topologically appropriate.It is always possible to map the input space into a dimension greater than the number of training points and produce a classifier with no classification errors on the training set.However,this will generalise badly.Figure2.10:Mapping input space into Polynomial Feature Space2.4DiscussionTypically the data will only be linearly separable in some,possibly very high dimensional feature space.It may not make sense to try and separate the data exactly,particularly when only afinite amount of training data is available which is potentially corrupted by noise.Hence in practice it will be necessary to employ the non-separable approach which places an upper bound on the Lagrange multipliers.This raises the question of how to determine the parameter C.It is similar to the problem in regularisation where the regularisation coefficient has to be determined,and it has been shown that the parameter C can be directly related to a regularisation parameter for certain kernels (Smola and Sch¨o lkopf,1998).A process of cross-validation can be used to determine thisChapter2Support Vector Classification17parameter,although more efficient and potentially better methods are sought after.In removing the training patterns that are not support vectors,the solution is unchanged and hence a fast method for validation may be available when the support vectors are sparse.Chapter3Feature SpaceThis chapter discusses the method that can be used to construct a mapping into a high dimensional feature space by the use of reproducing kernels.The idea of the kernel function is to enable operations to be performed in the input space rather than the potentially high dimensional feature space.Hence the inner product does not need to be evaluated in the feature space.This provides a way of addressing the curse of dimensionality.However,the computation is still critically dependent upon the number of training patterns and to provide a good data distribution for a high dimensional problem will generally require a large training set.3.1Kernel FunctionsThe following theory is based upon Reproducing Kernel Hilbert Spaces(RKHS)(Aron-szajn,1950;Girosi,1997;Heckman,1997;Wahba,1990).An inner product in feature space has an equivalent kernel in input space,K(x,x )= φ(x),φ(x ) ,(3.1)provided certain conditions hold.If K is a symmetric positive definite function,which satisfies Mercer’s Conditions,K(x,x )=∞ma mφm(x)φm(x ),a m≥0,(3.2)K(x,x )g(x)g(x )dxdx >0,g∈L2,(3.3) then the kernel represents a legitimate inner product in feature space.Valid functions that satisfy Mercer’s conditions are now given,which unless stated are valid for all real x and x .1920Chapter3Feature Space3.1.1PolynomialA polynomial mapping is a popular method for non-linear modelling,K(x,x )= x,x d.(3.4)K(x,x )=x,x +1d.(3.5)The second kernel is usually preferable as it avoids problems with the hessian becoming zero.3.1.2Gaussian Radial Basis FunctionRadial basis functions have received significant attention,most commonly with a Gaus-sian of the form,K(x,x )=exp−x−x 22σ2.(3.6)Classical techniques utilising radial basis functions employ some method of determining a subset of centres.Typically a method of clustering isfirst employed to select a subset of centres.An attractive feature of the SVM is that this selection is implicit,with each support vectors contributing one local Gaussian function,centred at that data point. By further considerations it is possible to select the global basis function width,s,using the SRM principle(Vapnik,1995).3.1.3Exponential Radial Basis FunctionA radial basis function of the form,K(x,x )=exp−x−x2σ2.(3.7)produces a piecewise linear solution which can be attractive when discontinuities are acceptable.3.1.4Multi-Layer PerceptronThe long established MLP,with a single hidden layer,also has a valid kernel represen-tation,K(x,x )=tanhρ x,x +(3.8)for certain values of the scale,ρ,and offset, ,parameters.Here the SV correspond to thefirst layer and the Lagrange multipliers to the weights.。
Support vector machine:A tool for mapping mineral prospectivityRenguang Zuo a,n,Emmanuel John M.Carranza ba State Key Laboratory of Geological Processes and Mineral Resources,China University of Geosciences,Wuhan430074;Beijing100083,Chinab Department of Earth Systems Analysis,Faculty of Geo-Information Science and Earth Observation(ITC),University of Twente,Enschede,The Netherlandsa r t i c l e i n f oArticle history:Received17May2010Received in revised form3September2010Accepted25September2010Keywords:Supervised learning algorithmsKernel functionsWeights-of-evidenceTurbidite-hosted AuMeguma Terraina b s t r a c tIn this contribution,we describe an application of support vector machine(SVM),a supervised learningalgorithm,to mineral prospectivity mapping.The free R package e1071is used to construct a SVM withsigmoid kernel function to map prospectivity for Au deposits in western Meguma Terrain of Nova Scotia(Canada).The SVM classification accuracies of‘deposit’are100%,and the SVM classification accuracies ofthe‘non-deposit’are greater than85%.The SVM classifications of mineral prospectivity have5–9%lowertotal errors,13–14%higher false-positive errors and25–30%lower false-negative errors compared tothose of the WofE prediction.The prospective target areas predicted by both SVM and WofE reflect,nonetheless,controls of Au deposit occurrence in the study area by NE–SW trending anticlines andcontact zones between Goldenville and Halifax Formations.The results of the study indicate theusefulness of SVM as a tool for predictive mapping of mineral prospectivity.&2010Elsevier Ltd.All rights reserved.1.IntroductionMapping of mineral prospectivity is crucial in mineral resourcesexploration and mining.It involves integration of information fromdiverse geoscience datasets including geological data(e.g.,geologicalmap),geochemical data(e.g.,stream sediment geochemical data),geophysical data(e.g.,magnetic data)and remote sensing data(e.g.,multispectral satellite data).These sorts of data can be visualized,processed and analyzed with the support of computer and GIStechniques.Geocomputational techniques for mapping mineral pro-spectivity include weights of evidence(WofE)(Bonham-Carter et al.,1989),fuzzy WofE(Cheng and Agterberg,1999),logistic regression(Agterberg and Bonham-Carter,1999),fuzzy logic(FL)(Ping et al.,1991),evidential belief functions(EBF)(An et al.,1992;Carranza andHale,2003;Carranza et al.,2005),neural networks(NN)(Singer andKouda,1996;Porwal et al.,2003,2004),a‘wildcat’method(Carranza,2008,2010;Carranza and Hale,2002)and a hybrid method(e.g.,Porwalet al.,2006;Zuo et al.,2009).These techniques have been developed toquantify indices of occurrence of mineral deposit occurrence byintegrating multiple evidence layers.Some geocomputational techni-ques can be performed using popular software packages,such asArcWofE(a free ArcView extension)(Kemp et al.,1999),ArcSDM9.3(afree ArcGIS9.3extension)(Sawatzky et al.,2009),MI-SDM2.50(aMapInfo extension)(Avantra Geosystems,2006),GeoDAS(developedbased on MapObjects,which is an Environmental Research InstituteDevelopment Kit)(Cheng,2000).Other geocomputational techniques(e.g.,FL and NN)can be performed by using R and Matlab.Geocomputational techniques for mineral prospectivity map-ping can be categorized generally into two types–knowledge-driven and data-driven–according to the type of inferencemechanism considered(Bonham-Carter1994;Pan and Harris2000;Carranza2008).Knowledge-driven techniques,such as thosethat apply FL and EBF,are based on expert knowledge andexperience about spatial associations between mineral prospec-tivity criteria and mineral deposits of the type sought.On the otherhand,data-driven techniques,such as WofE and NN,are based onthe quantification of spatial associations between mineral pro-spectivity criteria and known occurrences of mineral deposits ofthe type sought.Additional,the mixing of knowledge-driven anddata-driven methods also is used for mapping of mineral prospec-tivity(e.g.,Porwal et al.,2006;Zuo et al.,2009).Every geocomputa-tional technique has advantages and disadvantages,and one or theother may be more appropriate for a given geologic environmentand exploration scenario(Harris et al.,2001).For example,one ofthe advantages of WofE is its simplicity,and straightforwardinterpretation of the weights(Pan and Harris,2000),but thismodel ignores the effects of possible correlations amongst inputpredictor patterns,which generally leads to biased prospectivitymaps by assuming conditional independence(Porwal et al.,2010).Comparisons between WofE and NN,NN and LR,WofE,NN and LRfor mineral prospectivity mapping can be found in Singer andKouda(1999),Harris and Pan(1999)and Harris et al.(2003),respectively.Mapping of mineral prospectivity is a classification process,because its product(i.e.,index of mineral deposit occurrence)forevery location is classified as either prospective or non-prospectiveaccording to certain combinations of weighted mineral prospec-tivity criteria.There are two types of classification techniques.Contents lists available at ScienceDirectjournal homepage:/locate/cageoComputers&Geosciences0098-3004/$-see front matter&2010Elsevier Ltd.All rights reserved.doi:10.1016/j.cageo.2010.09.014n Corresponding author.E-mail addresses:zrguang@,zrguang1981@(R.Zuo).Computers&Geosciences](]]]])]]]–]]]One type is known as supervised classification,which classifies mineral prospectivity of every location based on a training set of locations of known deposits and non-deposits and a set of evidential data layers.The other type is known as unsupervised classification, which classifies mineral prospectivity of every location based solely on feature statistics of individual evidential data layers.A support vector machine(SVM)is a model of algorithms for supervised classification(Vapnik,1995).Certain types of SVMs have been developed and applied successfully to text categorization, handwriting recognition,gene-function prediction,remote sensing classification and other studies(e.g.,Joachims1998;Huang et al.,2002;Cristianini and Scholkopf,2002;Guo et al.,2005; Kavzoglu and Colkesen,2009).An SVM performs classification by constructing an n-dimensional hyperplane in feature space that optimally separates evidential data of a predictor variable into two categories.In the parlance of SVM literature,a predictor variable is called an attribute whereas a transformed attribute that is used to define the hyperplane is called a feature.The task of choosing the most suitable representation of the target variable(e.g.,mineral prospectivity)is known as feature selection.A set of features that describes one case(i.e.,a row of predictor values)is called a feature vector.The feature vectors near the hyperplane are the support feature vectors.The goal of SVM modeling is tofind the optimal hyperplane that separates clusters of feature vectors in such a way that feature vectors representing one category of the target variable (e.g.,prospective)are on one side of the plane and feature vectors representing the other category of the target variable(e.g.,non-prospective)are on the other size of the plane.A good separation is achieved by the hyperplane that has the largest distance to the neighboring data points of both categories,since in general the larger the margin the better the generalization error of the classifier.In this paper,SVM is demonstrated as an alternative tool for integrating multiple evidential variables to map mineral prospectivity.2.Support vector machine algorithmsSupport vector machines are supervised learning algorithms, which are considered as heuristic algorithms,based on statistical learning theory(Vapnik,1995).The classical task of a SVM is binary (two-class)classification.Suppose we have a training set composed of l feature vectors x i A R n,where i(¼1,2,y,n)is the number of feature vectors in training samples.The class in which each sample is identified to belong is labeled y i,which is equal to1for one class or is equal toÀ1for the other class(i.e.y i A{À1,1})(Huang et al., 2002).If the two classes are linearly separable,then there exists a family of linear separators,also called separating hyperplanes, which satisfy the following set of equations(KavzogluandFig.1.Support vectors and optimum hyperplane for the binary case of linearly separable data sets.Table1Experimental data.yer A Layer B Layer C Layer D Target yer A Layer B Layer C Layer D Target1111112100000 2111112200000 3111112300000 4111112401000 5111112510000 6111112600000 7111112711100 8111112800000 9111012900000 10111013000000 11101113111100 12111013200000 13111013300000 14111013400000 15011013510000 16101013600000 17011013700000 18010113811100 19010112900000 20101014010000R.Zuo,E.J.M.Carranza/Computers&Geosciences](]]]])]]]–]]]2Colkesen,2009)(Fig.1):wx iþb Zþ1for y i¼þ1wx iþb rÀ1for y i¼À1ð1Þwhich is equivalent toy iðwx iþbÞZ1,i¼1,2,...,nð2ÞThe separating hyperplane can then be formalized as a decision functionfðxÞ¼sgnðwxþbÞð3Þwhere,sgn is a sign function,which is defined as follows:sgnðxÞ¼1,if x400,if x¼0À1,if x o08><>:ð4ÞThe two parameters of the separating hyperplane decision func-tion,w and b,can be obtained by solving the following optimization function:Minimize tðwÞ¼12J w J2ð5Þsubject toy Iððwx iÞþbÞZ1,i¼1,...,lð6ÞThe solution to this optimization problem is the saddle point of the Lagrange functionLðw,b,aÞ¼1J w J2ÀX li¼1a iðy iððx i wÞþbÞÀ1Þð7Þ@ @b Lðw,b,aÞ¼0@@wLðw,b,aÞ¼0ð8Þwhere a i is a Lagrange multiplier.The Lagrange function is minimized with respect to w and b and is maximized with respect to a grange multipliers a i are determined by the following optimization function:MaximizeX li¼1a iÀ12X li,j¼1a i a j y i y jðx i x jÞð9Þsubject toa i Z0,i¼1,...,l,andX li¼1a i y i¼0ð10ÞThe separating rule,based on the optimal hyperplane,is the following decision function:fðxÞ¼sgnX li¼1y i a iðxx iÞþb!ð11ÞMore details about SVM algorithms can be found in Vapnik(1995) and Tax and Duin(1999).3.Experiments with kernel functionsFor spatial geocomputational analysis of mineral exploration targets,the decision function in Eq.(3)is a kernel function.The choice of a kernel function(K)and its parameters for an SVM are crucial for obtaining good results.The kernel function can be usedTable2Errors of SVM classification using linear kernel functions.l Number ofsupportvectors Testingerror(non-deposit)(%)Testingerror(deposit)(%)Total error(%)0.2580.00.00.0180.00.00.0 1080.00.00.0 10080.00.00.0 100080.00.00.0Table3Errors of SVM classification using polynomial kernel functions when d¼3and r¼0. l Number ofsupportvectorsTestingerror(non-deposit)(%)Testingerror(deposit)(%)Total error(%)0.25120.00.00.0160.00.00.01060.00.00.010060.00.00.0 100060.00.00.0Table4Errors of SVM classification using polynomial kernel functions when l¼0.25,r¼0.d Number ofsupportvectorsTestingerror(non-deposit)(%)Testingerror(deposit)(%)Total error(%)11110.00.0 5.010290.00.00.0100230.045.022.5 1000200.090.045.0Table5Errors of SVM classification using polynomial kernel functions when l¼0.25and d¼3.r Number ofsupportvectorsTestingerror(non-deposit)(%)Testingerror(deposit)(%)Total error(%)0120.00.00.01100.00.00.01080.00.00.010080.00.00.0 100080.00.00.0Table6Errors of SVM classification using radial kernel functions.l Number ofsupportvectorsTestingerror(non-deposit)(%)Testingerror(deposit)(%)Total error(%)0.25140.00.00.01130.00.00.010130.00.00.0100130.00.00.0 1000130.00.00.0Table7Errors of SVM classification using sigmoid kernel functions when r¼0.l Number ofsupportvectorsTestingerror(non-deposit)(%)Testingerror(deposit)(%)Total error(%)0.25400.00.00.01400.035.017.510400.0 6.0 3.0100400.0 6.0 3.0 1000400.0 6.0 3.0R.Zuo,E.J.M.Carranza/Computers&Geosciences](]]]])]]]–]]]3to construct a non-linear decision boundary and to avoid expensive calculation of dot products in high-dimensional feature space.The four popular kernel functions are as follows:Linear:Kðx i,x jÞ¼l x i x j Polynomial of degree d:Kðx i,x jÞ¼ðl x i x jþrÞd,l40Radial basis functionðRBFÞ:Kðx i,x jÞ¼exp fÀl99x iÀx j992g,l40 Sigmoid:Kðx i,x jÞ¼tanhðl x i x jþrÞ,l40ð12ÞThe parameters l,r and d are referred to as kernel parameters. The parameter l serves as an inner product coefficient in the polynomial function.In the case of the RBF kernel(Eq.(12)),l determines the RBF width.In the sigmoid kernel,l serves as an inner product coefficient in the hyperbolic tangent function.The parameter r is used for kernels of polynomial and sigmoid types. The parameter d is the degree of a polynomial function.We performed some experiments to explore the performance of the parameters used in a kernel function.The dataset used in the experiments(Table1),which are derived from the study area(see below),were compiled according to the requirementfor Fig.2.Simplified geological map in western Meguma Terrain of Nova Scotia,Canada(after,Chatterjee1983;Cheng,2008).Table8Errors of SVM classification using sigmoid kernel functions when l¼0.25.r Number ofSupportVectorsTestingerror(non-deposit)(%)Testingerror(deposit)(%)Total error(%)0400.00.00.01400.00.00.010400.00.00.0100400.00.00.01000400.00.00.0R.Zuo,E.J.M.Carranza/Computers&Geosciences](]]]])]]]–]]]4classification analysis.The e1071(Dimitriadou et al.,2010),a freeware R package,was used to construct a SVM.In e1071,the default values of l,r and d are1/(number of variables),0and3,respectively.From the study area,we used40geological feature vectors of four geoscience variables and a target variable for classification of mineral prospec-tivity(Table1).The target feature vector is either the‘non-deposit’class(or0)or the‘deposit’class(or1)representing whether mineral exploration target is absent or present,respectively.For‘deposit’locations,we used the20known Au deposits.For‘non-deposit’locations,we randomly selected them according to the following four criteria(Carranza et al.,2008):(i)non-deposit locations,in contrast to deposit locations,which tend to cluster and are thus non-random, must be random so that multivariate spatial data signatures are highly non-coherent;(ii)random non-deposit locations should be distal to any deposit location,because non-deposit locations proximal to deposit locations are likely to have similar multivariate spatial data signatures as the deposit locations and thus preclude achievement of desired results;(iii)distal and random non-deposit locations must have values for all the univariate geoscience spatial data;(iv)the number of distal and random non-deposit locations must be equaltoFig.3.Evidence layers used in mapping prospectivity for Au deposits(from Cheng,2008):(a)and(b)represent optimum proximity to anticline axes(2.5km)and contacts between Goldenville and Halifax formations(4km),respectively;(c)and(d)represent,respectively,background and anomaly maps obtained via S-Afiltering of thefirst principal component of As,Cu,Pb and Zn data.R.Zuo,E.J.M.Carranza/Computers&Geosciences](]]]])]]]–]]]5the number of deposit locations.We used point pattern analysis (Diggle,1983;2003;Boots and Getis,1988)to evaluate degrees of spatial randomness of sets of non-deposit locations and tofind distance from any deposit location and corresponding probability that one deposit location is situated next to another deposit location.In the study area,we found that the farthest distance between pairs of Au deposits is71km,indicating that within that distance from any deposit location in there is100%probability of another deposit location. However,few non-deposit locations can be selected beyond71km of the individual Au deposits in the study area.Instead,we selected random non-deposit locations beyond11km from any deposit location because within this distance from any deposit location there is90% probability of another deposit location.When using a linear kernel function and varying l from0.25to 1000,the number of support vectors and the testing errors for both ‘deposit’and‘non-deposit’do not vary(Table2).In this experiment the total error of classification is0.0%,indicating that the accuracy of classification is not sensitive to the choice of l.With a polynomial kernel function,we tested different values of l, d and r as follows.If d¼3,r¼0and l is increased from0.25to1000,the number of support vectors decreases from12to6,but the testing errors for‘deposit’and‘non-deposit’remain nil(Table3).If l¼0.25, r¼0and d is increased from1to1000,the number of support vectors firstly increases from11to29,then decreases from23to20,the testing error for‘non-deposit’decreases from10.0%to0.0%,whereas the testing error for‘deposit’increases from0.0%to90%(Table4). In this experiment,the total error of classification is minimum(0.0%) when d¼10(Table4).If l¼0.25,d¼3and r is increased from 0to1000,the number of support vectors decreases from12to8,but the testing errors for‘deposit’and‘non-deposit’remain nil(Table5).When using a radial kernel function and varying l from0.25to 1000,the number of support vectors decreases from14to13,but the testing errors of‘deposit’and‘non-deposit’remain nil(Table6).With a sigmoid kernel function,we experimented with different values of l and r as follows.If r¼0and l is increased from0.25to1000, the number of support vectors is40,the testing errors for‘non-deposit’do not change,but the testing error of‘deposit’increases from 0.0%to35.0%,then decreases to6.0%(Table7).In this experiment,the total error of classification is minimum at0.0%when l¼0.25 (Table7).If l¼0.25and r is increased from0to1000,the numbers of support vectors and the testing errors of‘deposit’and‘non-deposit’do not change and the total error remains nil(Table8).The results of the experiments demonstrate that,for the datasets in the study area,a linear kernel function,a polynomial kernel function with d¼3and r¼0,or l¼0.25,r¼0and d¼10,or l¼0.25and d¼3,a radial kernel function,and a sigmoid kernel function with r¼0and l¼0.25are optimal kernel functions.That is because the testing errors for‘deposit’and‘non-deposit’are0%in the SVM classifications(Tables2–8).Nevertheless,a sigmoid kernel with l¼0.25and r¼0,compared to all the other kernel functions,is the most optimal kernel function because it uses all the input support vectors for either‘deposit’or‘non-deposit’(Table1)and the training and testing errors for‘deposit’and‘non-deposit’are0% in the SVM classification(Tables7and8).4.Prospectivity mapping in the study areaThe study area is located in western Meguma Terrain of Nova Scotia,Canada.It measures about7780km2.The host rock of Au deposits in this area consists of Cambro-Ordovician low-middle grade metamorphosed sedimentary rocks and a suite of Devonian aluminous granitoid intrusions(Sangster,1990;Ryan and Ramsay, 1997).The metamorphosed sedimentary strata of the Meguma Group are the lower sand-dominatedflysch Goldenville Formation and the upper shalyflysch Halifax Formation occurring in the central part of the study area.The igneous rocks occur mostly in the northern part of the study area(Fig.2).In this area,20turbidite-hosted Au deposits and occurrences (Ryan and Ramsay,1997)are found in the Meguma Group, especially near the contact zones between Goldenville and Halifax Formations(Chatterjee,1983).The major Au mineralization-related geological features are the contact zones between Gold-enville and Halifax Formations,NE–SW trending anticline axes and NE–SW trending shear zones(Sangster,1990;Ryan and Ramsay, 1997).This dataset has been used to test many mineral prospec-tivity mapping algorithms(e.g.,Agterberg,1989;Cheng,2008). More details about the geological settings and datasets in this area can be found in Xu and Cheng(2001).We used four evidence layers(Fig.3)derived and used by Cheng (2008)for mapping prospectivity for Au deposits in the yers A and B represent optimum proximity to anticline axes(2.5km) and optimum proximity to contacts between Goldenville and Halifax Formations(4km),yers C and D represent variations in geochemical background and anomaly,respectively, as modeled by multifractalfilter mapping of thefirst principal component of As,Cu,Pb,and Zn data.Details of how the four evidence layers were obtained can be found in Cheng(2008).4.1.Training datasetThe application of SVM requires two subsets of training loca-tions:one training subset of‘deposit’locations representing presence of mineral deposits,and a training subset of‘non-deposit’locations representing absence of mineral deposits.The value of y i is1for‘deposits’andÀ1for‘non-deposits’.For‘deposit’locations, we used the20known Au deposits(the sixth column of Table1).For ‘non-deposit’locations(last column of Table1),we obtained two ‘non-deposit’datasets(Tables9and10)according to the above-described selection criteria(Carranza et al.,2008).We combined the‘deposits’dataset with each of the two‘non-deposit’datasets to obtain two training datasets.Each training dataset commonly contains20known Au deposits but contains different20randomly selected non-deposits(Fig.4).4.2.Application of SVMBy using the software e1071,separate SVMs both with sigmoid kernel with l¼0.25and r¼0were constructed using the twoTable9The value of each evidence layer occurring in‘non-deposit’dataset1.yer A Layer B Layer C Layer D100002000031110400005000061000700008000090100 100100 110000 120000 130000 140000 150000 160100 170000 180000 190100 200000R.Zuo,E.J.M.Carranza/Computers&Geosciences](]]]])]]]–]]] 6training datasets.With training dataset1,the classification accuracies for‘non-deposits’and‘deposits’are95%and100%, respectively;With training dataset2,the classification accuracies for‘non-deposits’and‘deposits’are85%and100%,respectively.The total classification accuracies using the two training datasets are97.5%and92.5%,respectively.The patterns of the predicted prospective target areas for Au deposits(Fig.5)are defined mainly by proximity to NE–SW trending anticlines and proximity to contact zones between Goldenville and Halifax Formations.This indicates that‘geology’is better than‘geochemistry’as evidence of prospectivity for Au deposits in this area.With training dataset1,the predicted prospective target areas occupy32.6%of the study area and contain100%of the known Au deposits(Fig.5a).With training dataset2,the predicted prospec-tive target areas occupy33.3%of the study area and contain95.0% of the known Au deposits(Fig.5b).In contrast,using the same datasets,the prospective target areas predicted via WofE occupy 19.3%of study area and contain70.0%of the known Au deposits (Cheng,2008).The error matrices for two SVM classifications show that the type1(false-positive)and type2(false-negative)errors based on training dataset1(Table11)and training dataset2(Table12)are 32.6%and0%,and33.3%and5%,respectively.The total errors for two SVM classifications are16.3%and19.15%based on training datasets1and2,respectively.In contrast,the type1and type2 errors for the WofE prediction are19.3%and30%(Table13), respectively,and the total error for the WofE prediction is24.65%.The results show that the total errors of the SVM classifications are5–9%lower than the total error of the WofE prediction.The 13–14%higher false-positive errors of the SVM classifications compared to that of the WofE prediction suggest that theSVMFig.4.The locations of‘deposit’and‘non-deposit’.Table10The value of each evidence layer occurring in‘non-deposit’dataset2.yer A Layer B Layer C Layer D110102000030000411105000060110710108000091000101110111000120010131000140000150000161000171000180010190010200000R.Zuo,E.J.M.Carranza/Computers&Geosciences](]]]])]]]–]]]7classifications result in larger prospective areas that may not contain undiscovered deposits.However,the 25–30%higher false-negative error of the WofE prediction compared to those of the SVM classifications suggest that the WofE analysis results in larger non-prospective areas that may contain undiscovered deposits.Certainly,in mineral exploration the intentions are notto miss undiscovered deposits (i.e.,avoid false-negative error)and to minimize exploration cost in areas that may not really contain undiscovered deposits (i.e.,keep false-positive error as low as possible).Thus,results suggest the superiority of the SVM classi-fications over the WofE prediction.5.ConclusionsNowadays,SVMs have become a popular geocomputational tool for spatial analysis.In this paper,we used an SVM algorithm to integrate multiple variables for mineral prospectivity mapping.The results obtained by two SVM applications demonstrate that prospective target areas for Au deposits are defined mainly by proximity to NE–SW trending anticlines and to contact zones between the Goldenville and Halifax Formations.In the study area,the SVM classifications of mineral prospectivity have 5–9%lower total errors,13–14%higher false-positive errors and 25–30%lower false-negative errors compared to those of the WofE prediction.These results indicate that SVM is a potentially useful tool for integrating multiple evidence layers in mineral prospectivity mapping.Table 11Error matrix for SVM classification using training dataset 1.Known All ‘deposits’All ‘non-deposits’TotalPrediction ‘Deposit’10032.6132.6‘Non-deposit’067.467.4Total100100200Type 1(false-positive)error ¼32.6.Type 2(false-negative)error ¼0.Total error ¼16.3.Note :Values in the matrix are percentages of ‘deposit’and ‘non-deposit’locations.Table 12Error matrix for SVM classification using training dataset 2.Known All ‘deposits’All ‘non-deposits’TotalPrediction ‘Deposits’9533.3128.3‘Non-deposits’566.771.4Total100100200Type 1(false-positive)error ¼33.3.Type 2(false-negative)error ¼5.Total error ¼19.15.Note :Values in the matrix are percentages of ‘deposit’and ‘non-deposit’locations.Table 13Error matrix for WofE prediction.Known All ‘deposits’All ‘non-deposits’TotalPrediction ‘Deposit’7019.389.3‘Non-deposit’3080.7110.7Total100100200Type 1(false-positive)error ¼19.3.Type 2(false-negative)error ¼30.Total error ¼24.65.Note :Values in the matrix are percentages of ‘deposit’and ‘non-deposit’locations.Fig.5.Prospective targets area for Au deposits delineated by SVM.(a)and (b)are obtained using training dataset 1and 2,respectively.R.Zuo,E.J.M.Carranza /Computers &Geosciences ](]]]])]]]–]]]8。
Vector® seriesGroup-operated switches manual and motorized/disconnectswitchesOverviewDesigned for superior reliability and load-breaking performance, the Vector series of group-operated air-break switches includes 15/25 kV and 38 kV versions. Manufactured as a true unitized switch, the Vector series is completely factory assembled and adjusted to eliminate the need for tuning in the field. Utilizing Siemens innovative Saf-T-Gap interrupters for dependable loadinterruption, bolt-on style interrupters are supplied as standard with plug-in styleinterrupters available as an option for rapid change-out using a shotgun stick.VectorOperational reliability is enhanced byenclosing the operating mechanism inside the switch base for superior weather protection. Ease-of-operation is assisted with the use of environmentally sealed, oil impregnated sintered bronze bearings supporting a balanced-force operating mechanism. Galvanized steel, fiberglass, and aluminum tubular bases are available.2OverviewInstallation is made simple with the factory-installed “single-point lift bracket.” Provisions for dead-ending all three conductors is standard (on most models) along with the “pole hugger” design for improved aesthetics.All Vector series switches are available in pipe-operated versions, hookstick-operated gang versions and motorized versions with SCADA control.3Enclosed operating mechanismOperational reliability is assured by enclosing the operating mechanism inside the switch base to shield it from the environment. Two interposing rods balance the forces acting on the rotating insulator shafts to provide dependable push-pull switching.14123455DetailsHookstick-operated gangThe HOG – horizontalThe HOG – vertical* Weight does not include operating pipe.* Weight does not include operating pipe.6Pipe operatedDetailsPipe operated – horizontalPipe operated – vertical (cable riser style)* Weight does not include operating pipe.* Weight does not include operating pipe.7DetailsHookstick-operated gangThe HOG – symmetricalThe HOG – inverted* Weight does not include operating pipe.* Weight does not include operating pipe.* Weight does not include operating pipe.8Pipe operatedDetailsPipe operated – symmetricalPipe operated – inverted* Weight does not include operating pipe.* Weight does not include operating pipe.* Weight does not include operating pipe.9ConfigurationsThe Vector ® series is available in pipe-operated and hookstick-operated versions. For additional options or configurations, please contact yourSiemens representative.1410AutomationVersatility is key when it comes to operating a motorized group-operatedairbreak switch locally or remotely. Combining the field proven reliability of themanually operated Vector series with engineered controls, the Auto-Pak Vector(APV) enables applications such as auto-transfer and auto-sectionalizing.Optional voltage and current sensors can be mounted directly to the switch unitto provide a cost-effective line monitoring and fault detection solution.11Vector numbering systemPosition:12345*A B1C*Consult factory for custom configurations.12DetailsVector ratings:• 900/1,200 A continuous current• 630 A loadbreak• Momentarycurrent loadbreak• 25,000 A three second• 20,000 A one-time symmetricalfault close in• 15,000 A three-time symmetricalfault close in.Standard features:• Silver-to-silver hinge andjaw contacts• Dead-end bracket supplied asstandard (not available for verticalriser model)• E nclosed balanced bearingoperating mechanism• Pole-hugger operating shaft forimproved installation appearance• Four 7’ pipe sections, handle andlocking assembly and necessarycouplings and guides• Single-point lift bracket• All components packaged inone crate• 630 A bolt-on Saf-T-Gapinterrupters. Plug-in style availableas an option.• Copper-bronze live parts• Tinned terminal pads.3EK4 (APS) surge arrester3EK8 (APS) surge arrester131415Siemens Industry, Inc.99 Bolton Sullivan DriveHeber Springs, Arkansas 72543For more information, including service or parts,please contact our Customer Support Center.Phone: 1-800-333-7421/disconnectswitchesOrder No.: E50001-F630-A167-01-76USPrinted in U.S.A.© 2019 Siemens Industry, Inc.The technical data presented in this document is based on an actual case or on as designed parameters, and therefore should not be relied upon for any specific application and does not constitute a performance guarantee for any projects. Actual results are dependent on variable conditions. Accordingly, Siemens does not make representations, warranties, or assurances as to the accuracy, currency or completeness of the content contained herein. If requested, we will provide specific technical data or specifications with respect to any customer‘s particular applications. Our company is constantly involved in engineering and development. For that reason, we reserve the right to modify, at any time, the technology and product specifications contained herein.。
Legal informationCopyright and License© Copyright 2019 HP Development Company, L.P.Reproduction, adaptation, or translation without prior written permission is prohibited, except as allowedunder the copyright laws.The information contained herein is subject to change without notice.The only warranties for HP products and services are set forth in the express warranty statementsaccompanying such products and services. Nothing herein should be construed as constituting anadditional warranty. HP shall not be liable for technical or editorial errors or omissions contained herein.Edition 1, 10/2019Trademark CreditsAdobe®, Adobe Photoshop®, Acrobat®, and PostScript® are trademarks of Adobe Systems Incorporated.Apple and the Apple logo are trademarks of Apple Inc., registered in the U.S. and other countries.macOS is a trademark of Apple Inc., registered in the U.S. and other countries.AirPrint is a trademark of Apple Inc., registered in the U.S. and other countries.Google™ is a trademark of Google Inc.Microsoft®, Windows®, Windows® XP, and Windows Vista® are U.S. registered trademarks of MicrosoftCorporation.UNIX® is a registered trademark of The Open Group.iiiT able of contents1 Printer overview (1)Warning icons (1)Potential shock hazard (2)Printer views (2)Printer front view (2)Printer back view (4)Interface ports (4)Control-panel view (5)How to use the touchscreen control panel (7)Printer specifications (8)T echnical specifications (8)Supported operating systems (11)Mobile printing solutions (12)Printer dimensions (13)Power consumption, electrical specifications, and acoustic emissions (15)Operating-environment range (15)Printer hardware setup and software installation (16)2 Paper trays (17)Introduction (17)Load paper to Tray 1 (multipurpose tray) (17)Load Tray 1 (multipurpose tray) (18)Tray 1 paper orientation (19)Use alternative letterhead mode (24)Enable Alternative Letterhead Mode by using the printer control-panel menus (24)Load paper to Tray 2 (24)Load Tray 2 (24)Tray 2 paper orientation (26)Use alternative letterhead mode (29)Enable Alternative Letterhead Mode by using the printer control-panel menus (29)Load paper to the 550-sheet paper tray (30)Load paper to the 550-sheet paper tray (30)550-sheet paper tray paper orientation (32)Use alternative letterhead mode (35)Enable Alternative Letterhead Mode by using the printer control-panel menus (35)ivLoad paper to the 2 x 550-sheet paper trays (36)Load paper to the 2 x 550-sheet paper trays (36)2 x 550-sheet paper tray paper orientation (38)Use alternative letterhead mode (41)Enable Alternative Letterhead Mode by using the printer control-panel menus (41)Load paper to the 2,700-sheet high-capacity input paper trays (41)Load paper to the 2,700-sheet high-capacity input paper trays (41)2,700-sheet HCI paper tray paper orientation (43)Use alternative letterhead mode (45)Enable Alternative Letterhead Mode by using the printer control-panel menus (45)Load and print envelopes (46)Print envelopes (46)Envelope orientation (46)Load and print labels (47)Manually feed labels (47)Label orientation (48)3 Supplies, accessories, and parts (49)Order supplies, accessories, and parts (49)Ordering (49)Supplies and accessories (50)Maintenance/long-life consumables (51)Customer self-repair parts (51)Dynamic security (52)Configure the HP toner-cartridge-protection supply settings (53)Introduction (53)Enable or disable the Cartridge Policy feature (53)Use the printer control panel to enable the Cartridge Policy feature (54)Use the printer control panel to disable the Cartridge Policy feature (54)Use the HP Embedded Web Server (EWS) to enable the Cartridge Policy feature (54)Use the HP Embedded Web Server (EWS) to disable the Cartridge Policy feature (55)Troubleshoot Cartridge Policy control panel error messages (55)Enable or disable the Cartridge Protection feature (55)Use the printer control panel to enable the Cartridge Protection feature (56)Use the printer control panel to disable the Cartridge Protection feature (56)Use the HP Embedded Web Server (EWS) to enable the Cartridge Protection feature (56)Use the HP Embedded Web Server (EWS) to disable the Cartridge Protection feature (57)Troubleshoot Cartridge Protection control panel error messages (57)Replace the toner cartridges (58)T oner-cartridge information (58)Remove and replace the cartridges (59)Replace the imaging drums (62)Imaging drum information (62)Remove and replace the imaging drums (63)Replace the toner-collection unit (66)T oner-collection unit information (66)vRemove and replace the toner-collection unit (67)Replace the staple cartridge (M776zs model only) (70)Staple cartridge information (70)Remove and replace the staple cartridge (71)4 Print (73)Print tasks (Windows) (73)How to print (Windows) (73)Automatically print on both sides (Windows) (74)Manually print on both sides (Windows) (74)Print multiple pages per sheet (Windows) (75)Select the paper type (Windows) (75)Additional print tasks (76)Print tasks (macOS) (77)How to print (macOS) (77)Automatically print on both sides (macOS) (77)Manually print on both sides (macOS) (77)Print multiple pages per sheet (macOS) (78)Select the paper type (macOS) (78)Additional print tasks (79)Store print jobs on the printer to print later or print privately (79)Introduction (79)Create a stored job (Windows) (79)Create a stored job (macOS) (80)Print a stored job (81)Delete a stored job (81)Delete a job that is stored on the printer (81)Change the job storage limit (82)Information sent to printer for Job Accounting purposes (82)Mobile printing (82)Introduction (82)Wi-Fi, Wi-Fi Direct Print, NFC, and BLE printing (82)Enable wireless printing (83)Change the Wi-Fi Direct name (83)HP ePrint via email (83)AirPrint (84)Android embedded printing (85)Print from a USB flash drive (85)Enable the USB port for printing (85)Method one: Enable the USB port from the printer control panel (85)Method two: Enable the USB port from the HP Embedded Web Server (network-connectedprinters only) (85)Print USB documents (86)Print using high-speed USB 2.0 port (wired) (86)Method one: Enable the high-speed USB 2.0 port from the printer control panel menus (86)Method two: Enable the high-speed USB 2.0 port from the HP Embedded Web Server (network-connected printers only) (87)vi5 Copy (88)Make a copy (88)Copy on both sides (duplex) (90)Additional copy tasks (92)6 Scan (93)Set up Scan to Email (93)Introduction (93)Before you begin (93)Step one: Access the HP Embedded Web Server (EWS) (94)Step two: Configure the Network Identification settings (95)Step three: Configure the Send to Email feature (96)Method one: Basic configuration using the Email Setup Wizard (96)Method two: Advanced configuration using the Email Setup (100)Step four: Configure the Quick Sets (optional) (104)Step five: Set up Send to Email to use Office 365 Outlook (optional) (105)Introduction (105)Configure the outgoing email server (SMTP) to send an email from an Office 365 Outlookaccount (105)Set up Scan to Network Folder (108)Introduction (108)Before you begin (108)Step one: Access the HP Embedded Web Server (EWS) (108)Step two: Set up Scan to Network Folder (109)Method one: Use the Scan to Network Folder Wizard (109)Method two: Use Scan to Network Folder Setup (110)Step one: Begin the configuration (110)Step two: Configure the Scan to Network Folder settings (111)Step three: Complete the configuration (118)Set up Scan to SharePoint (118)Introduction (118)Before you begin (118)Step one: Access the HP Embedded Web Server (EWS) (118)Step two: Enable Scan to SharePoint and create a Scan to SharePoint Quick Set (119)Scan a file directly to a SharePoint site (121)Quick Set scan settings and options for Scan to SharePoint (122)Set up Scan to USB Drive (123)Introduction (124)Step one: Access the HP Embedded Web Server (EWS) (124)Step two: Enable Scan to USB Drive (124)Step three: Configure the Quick Sets (optional) (125)Default scan settings for Scan to USB Drive setup (126)Default file settings for Save to USB setup (126)Scan to email (127)Introduction (127)Scan to email (127)Scan to job storage (129)viiIntroduction (129)Scan to job storage on the printer (130)Print from job storage on the printer (132)Scan to network folder (132)Introduction (132)Scan to network folder (132)Scan to SharePoint (134)Introduction (134)Scan to SharePoint (134)Scan to USB drive (136)Introduction (136)Scan to USB drive (136)Use HP JetAdvantage business solutions (138)Additional scan tasks (138)7 Fax (140)Set up fax (140)Introduction (140)Set up fax by using the printer control panel (140)Change fax configurations (141)Fax dialing settings (141)General fax send settings (142)Fax receive settings (143)Send a fax (144)Additional fax tasks (146)8 Manage the printer (147)Advanced configuration with the HP Embedded Web Server (EWS) (147)Introduction (147)How to access the HP Embedded Web Server (EWS) (148)HP Embedded Web Server features (149)Information tab (149)General tab (149)Copy/Print tab (150)Scan/Digital Send tab (151)Fax tab (152)Supplies tab (153)Troubleshooting tab (153)Security tab (153)HP Web Services tab (154)Networking tab (154)Other Links list (156)Configure IP network settings (157)Printer sharing disclaimer (157)View or change network settings (157)Rename the printer on a network (157)viiiManually configure IPv4 TCP/IP parameters from the control panel (158)Manually configure IPv6 TCP/IP parameters from the control panel (158)Link speed and duplex settings (159)Printer security features (160)Introduction (160)Security statements (160)Assign an administrator password (160)Use the HP Embedded Web Server (EWS) to set the password (160)Provide user access credentials at the printer control panel (161)IP Security (161)Encryption support: HP High Performance Secure Hard Disks (161)Lock the formatter (161)Energy-conservation settings (161)Set the sleep timer and configure the printer to use 1 watt or less of power (161)Set the sleep schedule (162)Set the idle settings (162)HP Web Jetadmin (163)Software and firmware updates (163)9 Solve problems (164)Customer support (164)Control panel help system (165)Reset factory settings (165)Introduction (165)Method one: Reset factory settings from the printer control panel (165)Method two: Reset factory settings from the HP Embedded Web Server (network-connectedprinters only) (166)A “Cartridge is low” or “Cartridge is very low” message displays on the printer control panel (166)Change the “Very Low” settings (166)Change the “Very Low” settings at the control panel (166)For printers with fax capability (167)Order supplies (167)Printer does not pick up paper or misfeeds (167)Introduction (167)The printer does not pick up paper (167)The printer picks up multiple sheets of paper (171)The document feeder jams, skews, or picks up multiple sheets of paper (174)Clear paper jams (174)Introduction (174)Paper jam locations (174)Auto-navigation for clearing paper jams (175)Experiencing frequent or recurring paper jams? (175)Clear paper jams in the document feeder - 31.13.yz (176)Clear paper jams in Tray 1 (13.A1) (177)Clear paper jams in Tray 2 (13.A2) (182)Clear paper jams in the fuser (13.B9, 13.B2, 13.FF) (188)ixClear paper jams in the duplex area (13.D3) (194)Clear paper jams in the 550-sheet trays (13.A3, 13.A4) (199)Clear paper jams in the 2 x 550 paper trays (13.A4, 13.A5) (206)Clear paper jams in the 2,700-sheet high-capacity input paper trays (13.A3, 13.A4, 13.A5, 13.A7) (213)Resolving color print quality problems (220)Introduction (220)Troubleshoot print quality (221)Update the printer firmware (221)Print from a different software program (221)Check the paper-type setting for the print job (221)Check the paper type setting on the printer (221)Check the paper type setting (Windows) (221)Check the paper type setting (macOS) (222)Check toner-cartridge status (222)Step one: Print the Supplies Status Page (222)Step two: Check supplies status (222)Print a cleaning page (222)Visually inspect the toner cartridge or cartridges (223)Check paper and the printing environment (223)Step one: Use paper that meets HP specifications (223)Step two: Check the environment (223)Step three: Set the individual tray alignment (224)Try a different print driver (224)Troubleshoot color quality (225)Calibrate the printer to align the colors (225)Troubleshoot image defects (225)Improve copy image quality (233)Check the scanner glass for dirt and smudges (233)Calibrate the scanner (234)Check the paper settings (235)Check the paper selection options (235)Check the image-adjustment settings (235)Optimize copy quality for text or pictures (236)Edge-to-edge copying (236)Improve scan image quality (236)Check the scanner glass for dirt and smudges (237)Check the resolution settings (238)Check the color settings (238)Check the image-adjustment settings (239)Optimize scan quality for text or pictures (239)Check the output-quality settings (240)Improve fax image quality (240)Check the scanner glass for dirt and smudges (240)Check the send-fax resolution settings (242)Check the image-adjustment settings (242)Optimize fax quality for text or pictures (242)Check the error-correction setting (243)xSend to a different fax machine (243)Check the sender's fax machine (243)Solve wired network problems (244)Introduction (244)Poor physical connection (244)The computer is unable to communicate with the printer (244)The printer is using incorrect link and duplex settings for the network (245)New software programs might be causing compatibility problems (245)The computer or workstation might be set up incorrectly (245)The printer is disabled, or other network settings are incorrect (245)Solve wireless network problems (245)Introduction (245)Wireless connectivity checklist (245)The printer does not print after the wireless configuration completes (246)The printer does not print, and the computer has a third-party firewall installed (246)The wireless connection does not work after moving the wireless router or printer (247)Cannot connect more computers to the wireless printer (247)The wireless printer loses communication when connected to a VPN (247)The network does not appear in the wireless networks list (247)The wireless network is not functioning (247)Reduce interference on a wireless network (248)Solve fax problems (248)Checklist for solving fax problems (248)What type of phone line are you using? (249)Are you using a surge-protection device? (249)Are you using a phone company voice-messaging service or an answering machine? (249)Does your phone line have a call-waiting feature? (249)Check fax accessory status (249)General fax problems (250)The fax failed to send (250)No fax address book button displays (250)Not able to locate the Fax settings in HP Web Jetadmin (250)The header is appended to the top of the page when the overlay option is enabled (251)A mix of names and numbers is in the recipients box (251)A one-page fax prints as two pages (251)A document stops in the document feeder in the middle of faxing (251)The volume for sounds coming from the fax accessory is too high or too low (251)Index (252)xiPrinter overview1Review the location of features on the printer, the physical and technical specifications of the printer,and where to locate setup information.For video assistance, see /videos/LaserJet.The following information is correct at the time of publication. For current information, see /support/colorljM776MFP.For more information:HP's all-inclusive help for the printer includes the following information:●Install and configure●Learn and use●Solve problems●Download software and firmware updates●Join support forums●Find warranty and regulatory informationWarning iconsUse caution if you see a warning icon on your HP printer, as indicated in the icon definitions.●Caution: Electric shock●Caution: Hot surface●Caution: Keep body parts away from moving partsPrinter overview1●Caution: Sharp edge in close proximity●WarningPotential shock hazardReview this important safety information.●Read and understand these safety statements to avoid an electrical shock hazard.●Always follow basic safety precautions when using this product to reduce risk of injury from fire orelectric shock.●Read and understand all instructions in the user guide.●Observe all warnings and instructions marked on the product.●Use only a grounded electrical outlet when connecting the product to a power source. If you do notknow whether the outlet is grounded, check with a qualified electrician.●Do not touch the contacts on any of the sockets on the product. Replace damaged cordsimmediately.●Unplug this product from wall outlets before cleaning.●Do not install or use this product near water or when you are wet.●Install the product securely on a stable surface.●Install the product in a protected location where no one can step on or trip over the power cord.Printer viewsIdentify certain parts of the printer and the control panel.Printer front viewLocate features on the front of the printer.2Chapter 1 Printer overviewPrinter front view3Printer back viewLocate features on the back of the printer.Interface portsLocate the interface ports on the printer formatter. 4Chapter 1 Printer overviewControl-panel viewThe control panel provides access to the printer features and indicates the current status of the printer.NOTE:Tilt the control panel for easier viewing.The Home screen provides access to the printer features and indicates the current status of the printer.screens.NOTE:The features that appear on the Home screen can vary, depending on the printerconfiguration.Control-panel view5Figure 1-1Control-panel view?i 12:42 PM6Chapter 1 Printer overviewHow to use the touchscreen control panelPerform the following actions to use the printer touchscreen control panel.T ouchT ouch an item on the screen to select that item or open that menu. Also, when scrolling T ouch the Settings icon to open the Settings app.How to use the touchscreen control panel 7SwipeT ouch the screen and then move your finger horizontally to scroll the screen sideways.Swipe until the Settings app displays.Printer specificationsDetermine the specifications for your printer model.IMPORTANT:The following specifications are correct at the time of publication, but they are subject to change. For current information, see /support/colorljM776MFP .T echnical specificationsReview the printer technical specifications.Product numbers for each model ●M776dn - #T3U55A ●Flow M776z - #3WT91A ●Flow M776zs - #T3U56APaper handling specificationsPaper handling features Tray 1 (100-sheet capacity)Included Included Included Tray 2 (550-sheet capacity)IncludedIncludedIncluded8Chapter 1 Printer overview550-sheet paper trayOptional Included Not included NOTE:The M776dn models accept one optional550-sheet tray.Optional Included Included2 x 550-sheet paper tray and standNOTE:The M776dn models accept one optional550-sheet tray that may be installed on top of thestand.Optional Not included Not included2,700-sheet high-capacity input (HCI) paper trayand standNOTE:The M776dn models accept one optional550-sheet tray that may be installed on top of theoptional printer stand.Printer standOptional Not included Not included NOTE:The M776dn models accept one optional550-sheet tray that may be installed on top of theoptional printer stand.Inner finisher accessory Not included Not included Included Automatic duplex printing Included IncludedIncludedIncluded Included Included10/100/1000 Ethernet LAN connection with IPv4and IPv6Hi-Speed USB 2.0Included Included IncludedIncluded Included IncludedEasy-access USB port for printing from a USBflash drive or upgrading the firmwareIncluded Included Included Hardware Integration Pocket for connectingaccessory and third-party devicesHP Internal USB Ports Optional Optional OptionalOptional Optional OptionalHP Jetdirect 2900nw Print Server accessory forWi-Fi connectivity and an additional Ethernet portOptional IncludedIncludedHP Jetdirect 3100w accessory for Wi-Fi, BLE, NFC,and proximity badge readingPrints 45 pages per minute (ppm) on Letter-sizepaper and 46 ppm on A4-size paperEasy-access USB printing for printing from a USBIncluded Included Includedflash driveT echnical specifications9Included Included Included Store jobs in the printer memory to print later orprint privatelyScans 100 pages per minute (ppm) on A4 andIncluded Included Included letter-size paper one-sidedIncluded Included Included 200-page document feeder with dual-headscanning for single-pass duplex copying andscanningNot included Included Included HP EveryPage T echnologies including ultrasonicmulti-feed detectionNot included Included Included Embedded optical character recognition (OCR)provides the ability to convert printed pages intotext that can be edited or searched using acomputerIncluded Included Included SMART Label feature provides paper-edgedetection for automatic page croppingIncluded Included Included Automatic page orientation for pages that haveat least 100 characters of textIncluded Automatic tone adjustment sets contrast,Included Includedbrightness, and background removal for eachpageIncluded Included Includedfolders on a networkIncludedSend documents to SharePoint®Included IncludedIncluded Included Included NOTE:Memory reported on the configurationpage will change from 2.5 GB to 3 GB with theoptional 1 GB SODIMM installed.Mass storage: 500 GB hard disk drive Included Included IncludedSecurity: HP Trusted Platform Module (TPM)Included Included IncludedT ouchscreen control panel Included Included IncludedRetractable keyboard Not included Included Included 10Chapter 1 Printer overviewFax Optional Included IncludedSupported operating systemsUse the following information to ensure printer compatibility with your computer operating system.Linux: For information and print drivers for Linux, go to /go/linuxprinting.UNIX: For information and print drivers for UNIX®, go to /go/unixmodelscripts.The following information applies to the printer-specific Windows HP PCL 6 print drivers, HP print driversfor macOS, and to the software installer.Windows: Download HP Easy Start from /LaserJet to install the HP print driver. Or, go tothe printer-support website for this printer: /support/colorljM776MFP to download the printdriver or the software installer to install the HP print driver.macOS: Mac computers are supported with this printer. Download HP Easy Start either from /LaserJet or from the Printer Support page, and then use HP Easy Start to install the HP print driver.1.Go to /LaserJet.2.Follow the steps provided to download the printer software.Windows 7, 32-bit and 64-bit The “HP PCL 6” printer-specific print driver is installed for this operating system aspart of the software installation.Windows 8.1, 32-bit and 64-bit The “HP PCL-6” V4 printer-specific print driver is installed for this operating systemas part of the software installation.Windows 10, 32-bit and 64-bit The “HP PCL-6” V4 printer-specific print driver is installed for this operating systemas part of the software installation.Windows Server 2008 R2, SP 1, 64-bit The PCL 6 printer-specific print driver is available for download from the printer-support website. Download the driver, and then use the Microsoft Add Printer tool toinstall it.Windows Server 2012, 64-bit The PCL 6 printer-specific print driver is available for download from the printer-support website. Download the driver, and then use the Microsoft Add Printer tool toinstall it.Windows Server 2012 R2, 64-bit The PCL 6 printer-specific print driver is available for download from the printer-support website. Download the driver, and then use the Microsoft Add Printer tool toinstall it.Windows Server 2016, 64-bit The PCL 6 printer-specific print driver is available for download from the printer-support website. Download the driver, and then use the Microsoft Add Printer tool toinstall it.Windows Server 2019, 64-bit The PCL 6 printer-specific print driver is available for download from the printer-support website. Download the driver, and then use the Microsoft Add Printer tool toinstall it.Supported operating systems11macOS 10.13 High Sierra, macOS 10.14 MojaveDownload HP Easy Start from /LaserJet , and then use it to install the print driver.NOTE:Supported operating systems can change.NOTE:For a current list of supported operating systems and HP’s all-inclusive help for the printer, go to /support/colorljM776MFP .NOTE:For details on client and server operating systems and for HP UPD driver support for this printer, go to /go/upd . Under Additional information , click Specifications .●Internet connection●Dedicated USB 1.1 or 2.0 connection or a network connection● 2 GB of available hard-disk space ●1 GB RAM (32-bit) or2 GB RAM (64-bit)●Internet connection●Dedicated USB 1.1 or 2.0 connection or a network connection●1.5 GB of available hard-disk spaceNOTE:The Windows software installer installs the HP Smart Device Agent Base service. The file size is less than 100 kb. Its only function is to check for printers connected via USB hourly. No data is collected. If a USB printer is found, it then tries to locate a JetAdvantage Management Connector (JAMc) instance on the network. If a JAMc is found, the HP Smart Device Agent Base is securelyupgraded to a full Smart Device Agent from JAMc, which will then allow printed pages to be accounted for in a Managed Print Services (MPS) account. The driver-only web packs downloaded from for the printer and installed through the Add Printer wizard do not install this service.T o uninstall the service, open the Control Panel , select Programs or Programs and Features , and then select Add/Remove Programs or Uninstall a Programto remove the service. The file name isHPSmartDeviceAgentBase.Mobile printing solutionsHP offers multiple mobile printing solutions to enable easy printing to an HP printer from a laptop, tablet, smartphone, or other mobile device.T o see the full list and to determine the best choice, go to /go/MobilePrinting .NOTE:Update the printer firmware to ensure all mobile printing capabilities are supported.●Wi-Fi Direct (wireless models only, with HP Jetdirect 3100w BLE/NFC/Wireless accessory installed)●HP ePrint via email (Requires HP Web Services to be enabled and the printer to be registered with HP Connected)●HP Smart app ●Google Cloud Print12Chapter 1 Printer overview。
文献2:Model selection approaches for non-linear system identification: a reviewX. Hong, R.J. Mitchell, S. Chen, C.J. Harris, K. Li and G.W. Irwin. International Journal of Systems Science, 2008,39(10): 925–946非线性系统辨识模型选择方法综述摘要:近20年来基于有限观测数据集的非线性系统辨识方法的研究比较成熟。
由于可利用现有线性学习算法,同时满足收敛条件,目前深入研究和广泛使用的非线性系统辨识方法是一类具有万能逼近能力的参数线性化非线性模型辨识(linear-in-the-parameters nonlinear model identification )。
本文综述了参数线性化的非线性模型选择方法。
非线性系统辨识最基本问题是从观测数据中识别具有最好模型泛化性能的最小模型。
综述了各种非线性系统辨识算法中实现良好模型泛化性的一些重要概念,包括贝叶斯参数正规化,基于交叉验证和实验设计的模型选择准则。
机器学习的一个显著进步,被认为是确定的结构风险最小化原则为基础的内核模式,即支持向量机的发展。
基于凸优化建模算法,包括支持向量回归算法,输入选择算法和在线系统辨识算法。
1 引言控制工程学科的系统辨识,是指从测量数据建立系统/过程动态特性的数学描述,以便准确预测输入未来行为。
系统辨识2个重要子问题:(1)确定描述系统输入和输出变量之间函数关系的模型结构;(2)估计选定或衍生模型结构范围内模型参数。
最初自然的想法是使用输入输出观测值线性差分方程。
早期研究集中在线性时不变系统,近期线性辨识研究考虑连续系统辨识、子空间辨识、变量误差法(errors-in-the-variable methods )。
模型质量重要测度是未知过程逼近的拟合精度。
Matlab的第三方工具箱大全(按住CTRL点击连接就可以到达每个工具箱的主页面来下载了)Matlab Toolboxes∙ADCPtools - acoustic doppler current profiler data processing∙AFDesign - designing analog and digital filters∙AIRES - automatic integration of reusable embedded software∙Air-Sea - air-sea flux estimates in oceanography∙Animation - developing scientific animations∙ARfit - estimation of parameters and eigenmodes of multivariate autoregressive methods∙ARMASA - power spectrum estimation∙AR-Toolkit - computer vision tracking∙Auditory - auditory models∙b4m - interval arithmetic∙Bayes Net - inference and learning for directed graphical models∙Binaural Modeling - calculating binaural cross-correlograms of sound∙Bode Step - design of control systems with maximized feedback∙Bootstrap - for resampling, hypothesis testing and confidence interval estimation ∙BrainStorm - MEG and EEG data visualization and processing∙BSTEX - equation viewer∙CALFEM - interactive program for teaching the finite element method∙Calibr - for calibrating CCD cameras∙Camera Calibration∙Captain - non-stationary time series analysis and forecasting∙CHMMBOX - for coupled hidden Markov modeling using max imum likelihood EM ∙Classification - supervised and unsupervised classification algorithms∙CLOSID∙Cluster - for analysis of Gaussian mixture models for data set clustering∙Clustering - cluster analysis∙ClusterPack - cluster analysis∙COLEA - speech analysis∙CompEcon - solving problems in economics and finance∙Complex - for estimating temporal and spatial signal complexities∙Computational Statistics∙Coral - seismic waveform analysis∙DACE - kriging approximations to computer models∙DAIHM - data assimilation in hydrological and hydrodynamic models∙Data Visualization∙DBT - radar array processing∙DDE-BIFTOOL - bifurcation analysis of delay differential equations∙Denoise - for removing noise from signals∙DiffMan - solv ing differential equations on manifolds∙Dimensional Analysis -∙DIPimage - scientific image processing∙Direct - Laplace transform inversion via the direct integration method∙DirectSD - analysis and design of computer controlled systems with process-oriented models∙DMsuite - differentiation matrix suite∙DMTTEQ - design and test time domain equalizer design methods∙DrawFilt - drawing digital and analog filters∙DSFWAV - spline interpolation with Dean wave solutions∙DWT - discrete wavelet transforms∙EasyKrig∙Econometrics∙EEGLAB∙EigTool - graphical tool for nonsymmetric eigenproblems∙EMSC - separating light scattering and absorbance by extended multiplicative signal correction∙Engineering Vibration∙FastICA - fixed-point algorithm for ICA and projection pursuit∙FDC - flight dynamics and control∙FDtools - fractional delay filter design∙FlexICA - for independent components analysis∙FMBPC - fuzzy model-based predictive control∙ForWaRD - Fourier-wavelet regularized deconvolution∙FracLab - fractal analysis for signal processing∙FSBOX - stepwise forward and backward selection of features using linear regression∙GABLE - geometric algebra tutorial∙GAOT - genetic algorithm optimization∙Garch - estimating and diagnosing heteroskedasticity in time series models∙GCE Data - managing, analyzing and displaying data and metadata stored using the GCE data structure specification∙GCSV - growing cell structure visualization∙GEMANOVA - fitting multilinear ANOVA models∙Genetic Algorithm∙Geodetic - geodetic calculations∙GHSOM - growing hierarchical self-organizing map∙glmlab - general linear models∙GPIB - wrapper for GPIB library from National Instrument∙GTM - generative topographic mapping, a model for density modeling and data visualization∙GVF - gradient vector flow for finding 3-D object boundaries∙HFRadarmap - converts HF radar data from radial current vectors to total vectors ∙HFRC - importing, processing and manipulating HF radar data∙Hilbert - Hilbert transform by the rational eigenfunction expansion method∙HMM - hidden Markov models∙HMMBOX - for hidden Markov modeling using maximum likelihood EM∙HUTear - auditory modeling∙ICALAB - signal and image processing using ICA and higher order statistics∙Imputation - analysis of incomplete datasets∙IPEM - perception based musical analysisJMatLink - Matlab Java classesKalman - Bayesian Kalman filterKalman Filter - filtering, smoothing and parameter estimation (using EM) for linear dynamical systemsKALMTOOL - state estimation of nonlinear systemsKautz - Kautz filter designKrigingLDestimate - estimation of scaling exponentsLDPC - low density parity check codesLISQ - wavelet lifting scheme on quincunx gridsLKER - Laguerre kernel estimation toolLMAM-OLMAM - Levenberg Marquardt with Adaptive Momentum algorithm for training feedforward neural networksLow-Field NMR - for exponential fitting, phase correction of quadrature data and slicing LPSVM - Newton method for LP support vector machine for machine learning problems LSDPTOOL - robust control system design using the loop shaping design procedure LS-SVMlabLSVM - Lagrangian support vector machine for machine learning problemsLyngby - functional neuroimagingMARBOX - for multivariate autogressive modeling and cross-spectral estimation MatArray - analysis of microarray dataMatrix Computation- constructing test matrices, computing matrix factorizations, visualizing matrices, and direct search optimizationMCAT - Monte Carlo analysisMDP - Markov decision processesMESHPART - graph and mesh partioning methodsMILES - maximum likelihood fitting using ordinary least squares algorithmsMIMO - multidimensional code synthesisMissing - functions for handling missing data valuesM_Map - geographic mapping toolsMODCONS - multi-objective control system designMOEA - multi-objective evolutionary algorithmsMS - estimation of multiscaling exponentsMultiblock - analysis and regression on several data blocks simultaneously Multiscale Shape AnalysisMusic Analysis - feature extraction from raw audio signals for content-based music retrievalMWM - multifractal wavelet modelNetCDFNetlab - neural network algorithmsNiDAQ - data acquisition using the NiDAQ libraryNEDM - nonlinear economic dynamic modelsNMM - numerical methods in Matlab textNNCTRL - design and simulation of control systems based on neural networks NNSYSID - neural net based identification of nonlinear dynamic systemsNSVM - newton support vector machine for solv ing machine learning problems NURBS - non-uniform rational B-splinesN-way - analysis of multiway data with multilinear modelsOpenFEM - finite element developmentPCNN - pulse coupled neural networksPeruna - signal processing and analysisPhiVis- probabilistic hierarchical interactive visualization, i.e. functions for visual analysis of multivariate continuous dataPlanar Manipulator - simulation of n-DOF planar manipulatorsPRT ools - pattern recognitionpsignifit - testing hyptheses about psychometric functionsPSVM - proximal support vector machine for solving machine learning problems Psychophysics - vision researchPyrTools - multi-scale image processingRBF - radial basis function neural networksRBN - simulation of synchronous and asynchronous random boolean networks ReBEL - sigma-point Kalman filtersRegression - basic multivariate data analysis and regressionRegularization ToolsRegularization Tools XPRestore ToolsRobot - robotics functions, e.g. kinematics, dynamics and trajectory generation Robust Calibration - robust calibration in statsRRMT - rainfall-runoff modellingSAM - structure and motionSchwarz-Christoffel - computation of conformal maps to polygonally bounded regions SDH - smoothed data histogramSeaGrid - orthogonal grid makerSEA-MAT - oceanographic analysisSLS - sparse least squaresSolvOpt - solver for local optimization problemsSOM - self-organizing mapSOSTOOLS - solving sums of squares (SOS) optimization problemsSpatial and Geometric AnalysisSpatial RegressionSpatial StatisticsSpectral MethodsSPM - statistical parametric mappingSSVM - smooth support vector machine for solving machine learning problems STATBAG - for linear regression, feature selection, generation of data, and significance testingStatBox - statistical routinesStatistical Pattern Recognition - pattern recognition methodsStixbox - statisticsSVM - implements support vector machinesSVM ClassifierSymbolic Robot DynamicsTEMPLAR - wavelet-based template learning and pattern classificationTextClust - model-based document clusteringTextureSynth - analyzing and synthesizing visual texturesTfMin - continous 3-D minimum time orbit transfer around EarthTime-Frequency - analyzing non-stationary signals using time-frequency distributions Tree-Ring - tasks in tree-ring analysisTSA - uni- and multivariate, stationary and non-stationary time series analysisTSTOOL - nonlinear time series analysisT_Tide - harmonic analysis of tidesUTVtools - computing and modifying rank-revealing URV and UTV decompositions Uvi_Wave - wavelet analysisvarimax - orthogonal rotation of EOFsVBHMM - variation Bayesian hidden Markov modelsVBMFA - variational Bayesian mixtures of factor analyzersVMT- VRML Molecule Toolbox, for animating results from molecular dynamics experimentsVOICEBOXVRMLplot - generates interactive VRML 2.0 graphs and animationsVSVtools - computing and modifying symmetric rank-revealing decompositions WAFO - wave analysis for fatique and oceanographyWarpTB - frequency-warped signal processingWAVEKIT - wavelet analysisWaveLab - wavelet analysisWeeks - Laplace transform inversion via the Weeks methodWetCDF - NetCDF interfaceWHMT - wavelet-domain hidden Markov tree modelsWInHD - Wavelet-based inverse halftoning via deconvolutionWSCT - weighted sequences clustering toolkitXMLTree - XML parserYAADA - analyze single particle mass spectrum dataZMAP - quantitative seismicity analysis。
scikit-learn使用手册Scikit-learn是一个功能强大的Python机器学习库,提供了各种各样的机器学习算法和工具,以帮助开发者构建高效准确的机器学习模型。
本使用手册将介绍Scikit-learn的基本功能和用法,以帮助读者快速上手并充分利用这个库。
一、安装与环境配置Scikit-learn依赖于NumPy和SciPy库,在使用之前需要先安装这些依赖项。
在安装完成后,你可以通过以下命令来检查Scikit-learn是否正确安装:```pythonimport sklearnprint(sklearn.__version__)```二、数据预处理在开始使用Scikit-learn进行机器学习之前,我们通常需要对原始数据进行预处理。
这包括数据清洗、特征选择、特征缩放以及数据拆分等步骤。
1. 数据清洗数据清洗是指从原始数据中去除无效或不完整的样本。
Scikit-learn提供了多种处理缺失数据的方法,例如使用均值来填充缺失值,或者使用最近邻算法来估计缺失值。
2. 特征选择特征选择是指从原始数据中选择最相关的特征,并且去除冗余特征。
Scikit-learn提供了多种特征选择的方法,包括方差阈值、相关系数、主成分分析(PCA)等。
3. 特征缩放特征缩放是指对原始数据的特征进行归一化处理,以消除不同特征之间的量纲差异。
Scikit-learn提供了多种特征缩放的方法,例如标准化(Standardization)和归一化(Normalization)。
4. 数据拆分在机器学习中,我们通常将数据集划分为训练集和测试集。
Scikit-learn提供了便捷的方法用于数据集划分,例如train_test_split函数可以将数据集按指定比例划分为训练集和测试集。
三、机器学习算法Scikit-learn提供了多种机器学习算法,包括分类、回归、聚类、降维等。
下面将介绍其中一些常用的机器学习算法及其使用方法。
英语情感分类模型Introduction:Sentiment classification is a crucial aspect of natural language processing (NLP) that involves categorizing textinto predefined sentiment classes, typically positive, negative, and neutral. This technique is widely used in various fields such as market research, customer service, and social media analysis to gauge public opinion or customer satisfaction.Objective:The objective of an English sentiment classification model is to automatically determine the sentiment expressed in English text. This can be particularly useful for businesses to understand customer feedback and for researchers to analyze social trends.Methodology:1. Data Collection: Gather a large dataset of English text labeled with sentiments. This could include product reviews, social media posts, or movie reviews.2. Preprocessing: Clean and prepare the data for analysis. This step may involve removing noise, such as special characters, and normalizing the text through techniques like lowercasing, stemming, or lemmatization.3. Feature Extraction: Convert textual data into a formatthat can be analyzed by machine learning algorithms. Common techniques include bag-of-words, TF-IDF (Term Frequency-Inverse Document Frequency), and word embeddings.4. Model Selection: Choose a suitable machine learning model for sentiment classification. Options range from traditional models like Naive Bayes and Support Vector Machines (SVM) to more complex deep learning models like Recurrent Neural Networks (RNN) and Transformers.5. Training: Train the selected model using the preprocessed and feature-extracted data. This involves feeding the data into the model and adjusting the model's parameters to minimize prediction errors.6. Evaluation: Assess the model's performance using evaluation metrics such as accuracy, precision, recall, and F1 score. It's also important to test the model on unseen data to ensure its generalizability.7. Optimization: Fine-tune the model by adjusting hyperparameters, using techniques like cross-validation, or employing more sophisticated algorithms to improve performance.8. Deployment: Once the model is trained and evaluated, it can be deployed as part of a larger system to classify real-time or batch data.Applications:- Customer Feedback Analysis: Classify customer reviews tounderstand satisfaction levels and identify areas for improvement.- Social Media Monitoring: Monitor public sentiment towards brands, products, or events on social media platforms.- Market Research: Analyze consumer opinions to guide marketing strategies and product development.Challenges:- Sarcasm and Irony: Detecting sentiments in sarcastic or ironic statements can be challenging for sentiment classification models.- Contextual Understanding: The model must understand the context to accurately classify sentiments, especially in cases where the same word or phrase can have different sentiments depending on the situation.- Multilingual Data: For models that are not specifically designed for English, handling multilingual data can lead to misclassification if the model is not trained on a diverse dataset.Conclusion:English sentiment classification models are powerful toolsfor understanding the emotional tone of text data. By following a systematic approach to model development and addressing the unique challenges of sentiment analysis, these models can provide valuable insights into public opinion and customer satisfaction.。
svmrfe特征选择算法英文回答:SVM-RFE (Support Vector Machine Recursive Feature Elimination) is a feature selection algorithm that combines the power of Support Vector Machines (SVM) and recursive feature elimination. It is commonly used in machine learning and data mining tasks to identify the most relevant features in a dataset.The algorithm works by iteratively training an SVM model on the dataset and eliminating the least important features based on their weights or coefficients. In each iteration, the features with the smallest weights are removed, and the SVM model is retrained on the reduced feature set. This process continues until a specified number of features remains.One of the advantages of SVM-RFE is that it takes into account the interdependencies among features and selects asubset of features that collectively provide the best predictive performance. By eliminating irrelevant or redundant features, SVM-RFE can improve the efficiency and interpretability of the model, as well as reduce overfitting.For example, let's say we have a dataset with 100 features and we want to select the top 10 most important features using SVM-RFE. We start by training an SVM model on all 100 features and obtaining the weights for each feature. We then eliminate the 10 features with the smallest weights and retrain the SVM model on the remaining 90 features. This process is repeated until only 10 features are left.SVM-RFE can be applied to various types of machine learning problems, such as classification, regression, and clustering. It has been successfully used in many domains, including bioinformatics, image analysis, and text mining. The algorithm is known for its robustness and ability to handle high-dimensional datasets.中文回答:SVM-RFE(支持向量机递归特征消除)是一种特征选择算法,结合了支持向量机(SVM)和递归特征消除的优点。
有监督学习模型的一般建立流程英文版Supervised learning is a type of machine learning where the algorithm learns from labeled training data. The general process of building a supervised learning model involves several key steps.1. Data Collection: The first step in building a supervised learning model is to collect relevant data. This data should be labeled, meaning that each data point is associated with a corresponding target variable.2. Data Preprocessing: Once the data is collected, it needs to be preprocessed to ensure that it is in a suitable format for the model. This may involve cleaning the data, handling missing values, and encoding categorical variables.3. Feature Selection: In supervised learning, it is important to select the most relevant features that will help the model make accurate predictions. This step involves identifying the features that have the most impact on the target variable.4. Model Selection: The next step is to select a suitable model for the data. There are various supervised learning algorithms to choose from, such as linear regression, logistic regression, decision trees, and support vector machines.5. Model Training: Once the model is selected, it needs to be trained on the labeled training data. During training, the model learns the relationship between the input features and the target variable.6. Model Evaluation: After the model is trained, it needs to be evaluated to assess its performance. This involves testing the model on a separate dataset and measuring metrics such as accuracy, precision, recall, and F1 score.7. Model Tuning: If the model's performance is not satisfactory, it may need to be fine-tuned by adjusting hyperparameters or trying different algorithms.By following these steps, a supervised learning model can be effectively built and deployed for making predictions on new, unseen data.中文版监督学习是一种机器学习类型,其中算法从标记的训练数据中学习。
Computer Knowledge and Technology 电脑知识与技术人工智能及识别技术本栏目责任编辑:唐一东第6卷第28期(2010年10月)支持向量机模型参数选择方法综述付阳1,李昆仑2(1.南昌大学信息工程学院江西南昌330031;2.南昌大学科学技术学院,江西南昌330029)摘要:支持向量机是机器学习和数据挖掘领域的热门研究课题之一,作为一种尚未完全成熟的技术,目前仍有许多不足,其中之一就是没有统一的模型参数选择标准和理论。
在具体使用中,对支持向量机性能有重要影响的参数包括惩罚因子C ,核函数及其参数的选取。
文章首先分析了模型参数对支持向量机性能的影响,然后对几种常用的模型参数选择方法进行介绍,分析以及客观评价,最后概括了支持向量机模型参数选择方法的现状,以及对其发展趋势进行了展望。
关键词:支持向量机;模型参数选择;惩罚因子;核函数;核参数中图分类号:TP181文献标识码:A 文章编号:1009-3044(2010)28-8081-02A Survey of Model Parameters Selection Method for Support Vector MachinesFU Yang 1,LI Kun-lun 2(rmation Engineering College of NanChang University,Nanchang 330031,China;2.Science and Technology College of NanChang University,Nanchang 330029,China)Abstract:Support vector machine is machine learning and data mining area is one of the hot research topic,as a kind of technology,has not yet been fully mature now,there are still many deficiencies,one is no unified model parameter selection criteria and theory.In the spe -cific use of support vector machine has a significant effect on the performance of the parameters including the penalty C,kernel function and parameters selection.This paper analyzes the model of support vector machine performance parameters,the influence of several com -mon model and parameter selection method,analyzed and summarized,the final objective evaluation support vector machine (SVM)model parameter selection method,and its development trend was prospected.Key words:support vector machine;model parameter selection;the penalty;kernel functions;kernel functions parameter支持向量机(Support Vector Machines ,SVM )是一种机器学习方法,它是在统计学习理论的基础上发展而来的,最早由Vapnik 等人于1992年在计算机理论大会上提出,其主要内容在1995年间才基本完成,目前仍处在不断发展的阶段[1]。
MCOSMOS by Mitutoyo is a proprietary metrology suite of inter-related modules and dedicated expansion modules for Microsoft Windows 7, 8 and 8.1 operating systems (32 or 64-bit). Available in 37 countries and 12 different languages, MCOSMOS is the world's standard in metrology software.Supported by MiCAT (Mitutoyo Intelligent Computer Aided Technology), your Mitutoyo CMM is streamlined with intuitive user interfaces that provide a familiar look and feel to operate multiple modules. They work together seamlessly to put reliable metrology at you fingertips throughout the entire production process.MCOSMOS allows integration among a series of applications, improving the efficiency of your CMM and the productivity of your quality control functions. Specific expansion modules are available dedicated toGEOPAK or for specific applications such as GEAR measurement, airfoil analysis, reverse engineering and integrating CAD with metrology.—M itutoyo C ontrolled O pen S ystems for M odularO peration S upportMitutoyo is the world’s largest provider of measurement and inspection solutions, with the most complete, capable line of machines, systems, sensors and software. With a presence in more than 100 countries, Mitutoyo is the international precision measurement technology leader.MiCAT-MCOSMOS is based on expertise acquired from around the globe,providing the assurance that you are employing best practices for managing your dimensional metrology equipment (DME).With computer technology laboratories in the United States, Europe and Asia,Mitutoyo employs highly qualified specialists who are devoted to the development of one common software platform.The modular system allows the capability to tailor the measuring software with only the specific modules needed to meet requirements. The measurement results may be displayed, printed and archived with numerous built-in and user-defined formats.Software packages and expansion modules to meet your metrology production requirements.GEOPAK (Geometry online/offline modules)Includes: support for high-speed nominal scanning (known path) with scanning probes (optional), user-defined dialogs, parametric programming with the use of variable substitution and user-defined reporting.CAT-1000P (Online/offline programming module)Software package featuresGEOPAK (basic geometry module) provides an easy graphical console with tool bars and windows, which can be personal-ized to the operator’s preference.GEOPAK provides visual tools and a graphically enhanced display providing step-by-step screen wizards. The design allows operators of all experience levels to easily and quickly create efficient measurement routines.The basic level software includes the flexibility for advanced tools demanded by the most experienced operators, such as looping, formula calculations or expressions that use variables, libraries of day-to-day sub-routines and conditional statements which add logic for a variety of applications.GEOPAK CNCmcosmos–1GEOPAK’s program tree is simple to read and edit.The program tree can be expanded or collapsed to correspond with the operator's level or part application requirements.With a double click on the function line, an easy dialog box will appear; for example, a hole that is threaded may not repeat if the probe path does not follow thread pitch.With CNC control, the dialog box allows a pitch value to program the probe path to follow the thread pitch.Ease of use forentry level to expert(Figure 1) Auto Circle Measurement by the Circular Travel FunctionArc Movement of Probe Circular Slot MeasurementDirect Measurement of Tapped HoleGEOPAK-Flexible ReportingUser-defined reportsPart Security & Management Included with GEOPAK is a built-in module thatfully controls the machine and the access to yourparts. The Part Manager displays the part listthat may be stored locally on the DME computeror via the LAN to a company network drive.Within the parts list the operator can attach thesetup instruction documents, header informationfor part traceability and images for visualCAT-1000 facilitates the programming of measurement tasks during the GEOPAK learn mode. All data for measuring parts and tolerance evaluations are taken accurately from the CAD model via pointing device (mouse, trackball, etc.) selection. The same principles apply for programming probe paths (clearance and measurement), while at the same time, using the embedded nominal information from the CAD model for tolerance comparison.CAT-1000P CAD InterfaceCAT-1000 utilizes Spatial’s prominent 3D ACIS ® Modeler, which is used in more than 350 customer applications and 2 million seats worldwide.CAT-1000 fully supports and reads PMI (Product & Manufacturing Information), which is embedded in the model for datum alignment, GD&T (Geometric Dimensioning and Tolerancing). Spatial’s 3D InterOp delivers the highest quality data exchange between CAD formats, enabling superior CAD file translation. The comprehensive suite of translators provides import/export for all applications, including ACIS, CGM and Parasolid-based applications. 3D InterOp is embedded in many of today’s leading design, engineering and manufacturing applications.CATIA V5, SolidWorks, NX Siemens (Unigraphics), Parasolids, AutoDesk Inventor, Pro-Engineer and IGES or VDAFS exchange formats are available options.Standard with CAT-1000 is ACIS (*.sat) and STEP AP203, which are both licensed copies from Spatial InterOp.Product Manufacturing InformationVirtual Offline OptionalDual Screen Optional mcosmos–2CMM System ManagerCMM System Manager allows you to create a virtualrepresentation of your CMM. New Mitutoyo models orlegacy models, such as the FN series and BHN series, canbe selected based on machine measuring size. If you havemultiple machines, they can also be added for simulation orpart placement purposes. CAD models can be placed andcompared to the true working volume of the machine andindexing probe swivel access.CAT-1000S3D surface analysis creates grid patterns to verify a surfacewith a single-click tool that calculates a collision-free probepath to measure a grid of surface points.Manual CMMs allow the operator to move the probe manually,and the point will appear in real-time. Probe compensationis determined from the center of the stylus and the shortestdistance to the CAD model to eliminate cosine error.as a gradient surface.Cone-shaped icons are used to show the surface vectordirection of the material from the nominal and tolerancedeviation.SCANPAK reports the deviations from the nominal profile to determine if the measured contour is within the tolerance zone.The SCANPAK Best-fit can provide the information required to correct the part. Best-fit is most effective when the reference alignment is not accurately defined to provide tool-setting information.The graphical report template provided with SCANPAK allows the oper-ator to make comments or notes and uses Protocol Designer for making special reports.SCANPAK-CNCmcosmos–3SCANPAK constructs from the measured contourcircles and/or line elements based on the user-specified tolerance amount calculated from the elements form deviation.Calculate elements automatically construct best-fitgeometry (lines/circles) from a contour. This information can be used as reference for the basic dimensions given for the profile call-out or even assist in reverse engineering for tool paths.The contour compared to the nominal “AS IS." The areas shown in red fail to meet the profile requirement.Calculate elements automaticallyKnown path scanningCNC Gasket Scan provides an easy-to-use tool for complex planes requiring a specific measuring path on the gasket surface.The measuring path can be created by nodes or offset from the CAD model.MCOSMOS-3 maximizes your throughput when using a scanning probe such as the SP25 or MPP.micat PlannerAutomatic measurement program generation softwareMiCAT Planner is Mitutoyo’s latest software development for fast and efficient CMM part programming. Operation of MiCAT Planner is easy and intuitive. Programs are made with a few mouse clicks in minutes instead of hours or days.WORKFLOW:1) Load design model 2) Select target CMM 3) Part placement via virtual alignment 4) Measurement program creation 5) Translate to Geopak MCOSMOSMeasurement PlanThe measurement plan is synchronized with the 3D view and the program view. This allows you to select a feature in any of the views (plan view, 3D view, program view) and the feature is highlighted in the other views simultaneously. Reordering the feature measurements is possible by dragging and dropping of the features in the plan view. Users can select a feature, characteristic, or a point set in the plan MiCAT Planner toolbar is workflow based.Rules EditorThe Rules Editor allows users to create rules to define measurement approaches, such as number of points per feature, sensor type, fitting method and automatic sensor selection.Design Model Support:• Siemens NX w/PMI • CATIA v5 w/PMI • PRO/E w/PMI • ACIS (SAT)DME 2DME 3The Pure DMISPAK module is a powerful bi-directional program exchange for legacy DMIS-based software.This module converts the native DMIS file (*.dmi) to GEOPAK. New programs from MCOSMOS also can be exported to a DMI format so non-Mitutoyo brand CMMs can read and execute on their legacy DMIS-based software.Pure DMISPAKMeasurLink ® acquires the measurements real-time as the CMM runs the GEOPAK Part Program.Data storage can be local or networked to a SQL Server.The software manages data from all types of devices from handheld tools to CMM and supports non-Mitutoyo products.expansion modules formcosmos3D surface generation software is used with Patch Scanning Generator to easily create high-accuracy surface design, using common B-Rep entities or STL file output, which are ideal for reverse engineering.SurfaceDeveloperMeshed 3D SurfaceSTL Mesh Piston Head – Reverse EngineeredPointsexpansion modules formcosmosWith MCOSMOS-3, MAFIS analyzes the SCANPAK profile that is measured and outputs the evaluation results of the desired parameters.Unlike other airfoil analysis modules, which operate outside of the CMM software as a separate package, MAFIS works by com-bining GEOPAK and SCANPAK to generate airfoil measurements.MAFIS calculates the camber line, leading/trailing edge, twist and much more. Blade analysis is easy for beginner operation and does not require an expert to implement.MAFIS (Mitutoyo Airfoil Inspection Software)(Requires SCANPAK)GEARPAKsensor systems test equipmentand seismometers Digital scale and Dro systemssmall tool Instrumentsand Data managementMitutoyo America Corporationone Number to serve You Better1-888-mItUtoYo (1-888-648-8869)M3 Solution Centers:aurora, Illinois (Headquarters)Boston, massachusettsHuntersville, North carolinamason, ohioPlymouth, michigancity of Industry, californiaBirmingham, alabamarenton, washingtonHouston, texas2.5M 0815-07 • Printed in USA • Sept. 2015©215mitutoyoamericacorporationFind additional product literatureand our product catalogNote: All information regarding our products, and in particular the illustrations, drawings, dimensional and performancedata contained in this printed matter as well as other technical data are to be regarded as approximate average values. Wetherefore reserve the right to make changes to the corresponding designs. The stated standards, similar technical regulations,descriptions and illustrations of the products were valid at the time of printing. In addition, the latest applicable version of ourGeneral Trading Conditions will apply. Only quotations submitted by ourselves may be regarded as definitive. Specificationsare subject to change without notice.Mitutoyo products are subject to US Export Administration Regulations (EAR). Re-export or relocation of our products mayrequire prior approval by an appropriate governing authority.Trademarks and RegistrationsDesignations used by companies to distinguish their products are often claimed as trademarks. In all instances where MitutoyoAmerica Corporation is aware of a claim, the product names appear in initial capital or all capital letters. The appropriatecompanies should be contacted for more complete trademark and registration information.。
Modeling wine preferences by data mining from physicochemical propertiesPaulo Cortez a,∗Ant´o nio Cerdeira b Fernando Almeida bTelmo Matos b Jos´e Reis a,ba Department of Information Systems/R&D Centre Algoritmi,University ofMinho,4800-058Guimar˜a es,Portugalb Viticulture Commission of the Vinho Verde region(CVRVV),4050-501Porto,PortugalAbstractWe propose a data mining approach to predict human wine taste preferences that is based on easily available analytical tests at the certification step.A large dataset (when compared to other studies in this domain)is considered,with white and red vinho verde samples(from Portugal).Three regression techniques were applied,un-der a computationally efficient procedure that performs simultaneous variable and model selection.The support vector machine achieved promising results,outper-forming the multiple regression and neural network methods.Such model is useful to support the oenologist wine tasting evaluations and improve wine production. Furthermore,similar techniques can help in target marketing by modeling consumer tastes from niche markets.Key words:Sensory preferences,Regression,Variable selection,Model selection, Support vector machines,Neural networksPreprint submitted to Elsevier22May20091IntroductionOnce viewed as a luxury good,nowadays wine is increasingly enjoyed by a wider range of consumers.Portugal is a top ten wine exporting country with 3.17%of the market share in2005[11].Exports of its vinho verde wine(from the northwest region)have increased by36%from1997to2007[8].To support its growth,the wine industry is investing in new technologies for both wine making and selling processes.Wine certification and quality assessment are key elements within this context.Certification prevents the illegal adulteration of wines(to safeguard human health)and assures quality for the wine market. Quality evaluation is often part of the certification process and can be used to improve wine making(by identifying the most influential factors)and to stratify wines such as premium brands(useful for setting prices).Wine certification is generally assessed by physicochemical and sensory tests [10].Physicochemical laboratory tests routinely used to characterize wine in-clude determination of density,alcohol or pH values,while sensory tests rely mainly on human experts.It should be stressed that taste is the least un-derstood of the human senses[25],thus wine classification is a difficult task. Moreover,the relationships between the physicochemical and sensory analysis are complex and still not fully understood[20].Advances in information technologies have made it possible to collect,store and process massive,often highly complex datasets.All this data hold valu-able information such as trends and patterns,which can be used to improve∗Corresponding author.E-mail pcortez@dsi.uminho.pt;tel.:+351253510313;fax: +351253510300.2decision making and optimize chances of success[28].Data mining(DM)tech-niques[33]aim at extracting high-level knowledge from raw data.There are several DM algorithms,each one with its own advantages.When modeling con-tinuous data,the linear/multiple regression(MR)is the classic approach.The backpropagation algorithm wasfirst introduced in1974[32]and later popular-ized in1986[23].Since then,neural networks(NNs)have become increasingly used.More recently,support vector machines(SVMs)have also been proposed [4][26].Due to their higherflexibility and nonlinear learning capabilities,both NNs and SVMs are gaining an attention within the DMfield,often attaining high predictive performances[16][17].SVMs present theoretical advantages over NNs,such as the absence of local minima in the learning phase.In effect, the SVM was recently considered one of the most influential DM algorithms [34].While the MR model is easier to interpret,it is still possible to extract knowledge from NNs and SVMs,given in terms of input variable importance [18][7].When applying these DM methods,variable and model selection are critical issues.Variable selection[14]is useful to discard irrelevant inputs,leading to simpler models that are easier to interpret and that usually give better plex models may overfit the data,losing the capability to generalize,while a model that is too simple will present limited learning capabilities.Indeed,both NN and SVM have hyperparameters that need to be adjusted[16],such as the number of NN hidden nodes or the SVM kernel parameter,in order to get good predictive accuracy(see Section2.3).The use of decision support systems by the wine industry is mainly focused on the wine production phase[12].Despite the potential of DM techniques to predict wine quality based on physicochemical data,their use is rather scarce3and mostly considers small datasets.For example,in1991the“Wine”dataset was donated into the UCI repository[1].The data contain178examples with measurements of13chemical constituents(e.g.alcohol,Mg)and the goal is to classify three cultivars from Italy.This dataset is very easy to discriminate and has been mainly used as a benchmark for new DM classifiers.In1997[27], a NN fed with15input variables(e.g.Zn and Mg levels)was used to predict six geographic wine origins.The data included170samples from Germany and a100%predictive rate was reported.In2001[30],NNs were used to classify three sensory attributes(e.g.sweetness)of Californian wine,based on grape maturity levels and chemical analysis(e.g.titrable acidity).Only 36examples were used and a6%error was achieved.Several physicochemical parameters(e.g.alcohol,density)were used in[20]to characterize56samples of Italian wine.Yet,the authors argued that mapping these parameters with a sensory taste panel is a very difficult task and instead they used a NN fed with data taken from an electronic tongue.More recently,mineral characterization (e.g.Zn and Mg)was used to discriminate54samples into two red wine classes[21].A probabilistic NN was adopted,attaining95%accuracy.As a powerful learning tool,SVM has outperformed NN in several applications, such as predicting meat preferences[7].Yet,in thefield of wine quality only one application has been reported,where spectral measurements from147 bottles were successfully used to predict3categories of rice wine age[35].In this paper,we present a case study for modeling taste preferences based on analytical data that are easily available at the wine certification step.Build-ing such model is valuable not only for certification entities but also wine producers and even consumers.It can be used to support the oenologist wine evaluations,potentially improving the quality and speed of their decisions.4Moreover,measuring the impact of the physicochemical tests in thefinal wine quality is useful for improving the production process.Furthermore,it can help in target marketing[24],i.e.by applying similar techniques to model the consumers preferences of niche and/or profitable markets.The main contributions of this work are:•We present a novel method that performs simultaneous variable and model selection for NN and SVM techniques.The variable selection is based on sensitivity analysis[18],which is a computationally efficient method that measures input relevance and guides the variable selection process.Also,we propose a parsimony search method to select the best SVM kernel parameter with a low computational effort.•We test such approach in a real-world application,the prediction of vinho verde wine(from the Minho region of Portugal)taste preferences,showing its impact in this domain.In contrast with previous studies,a large dataset is considered,with a total of4898white and1599red samples.Wine pref-erences are modeled under a regression approach,which preserves the order of the grades,and we show how the definition of the tolerance concept is useful for accessing different performance levels.We believe that this inte-grated approach is valuable to support applications where ranked sensory preferences are required,for example in wine or meat quality assurance.The paper is organized as follows:Section2presents the wine data,DM mod-els and variable selection approach;in Section3,the experimental design is described and the obtained results are analyzed;finally,conclusions are drawn in Section4.52Materials and methods2.1Wine dataThis study will consider vinho verde,a unique product from the Minho(north-west)region of Portugal.Medium in alcohol,is it particularly appreciated due to its freshness(specially in the summer).This wine accounts for15%of the total Portuguese production[8],and around10%is exported,mostly white wine.In this work,we will analyze the two most common variants,white and red(ros´e is also produced),from the demarcated region of vinho verde.The data were collected from May/2004to February/2007using only protected designation of origin samples that were tested at the official certification en-tity(CVRVV).The CVRVV is an inter-professional organization with the goal of improving the quality and marketing of vinho verde.The data were recorded by a computerized system(iLab),which automatically manages the process of wine sample testing from producer requests to laboratory and sen-sory analysis.Each entry denotes a given test(analytical or sensory)and the final database was exported into a single sheet(.csv).During the preprocessing stage,the database was transformed in order to include a distinct wine sample(with all tests)per row.To avoid discarding examples,only the most common physicochemical tests were selected.Since the red and white tastes are quite different,the analysis will be performed separately,thus two datasets1were built with1599red and4898white exam-ples.Table1presents the physicochemical statistics per dataset.Regarding the preferences,each sample was evaluated by a minimum of three sensory 1The datasets are available at:http://www3.dsi.uminho.pt/pcortez/wine/6assessors(using blind tastes),which graded the wine in a scale that ranges from0(very bad)to10(excellent).Thefinal sensory score is given by the me-dian of these evaluations.Fig.1plots the histograms of the target variables, denoting a typical normal shape distribution(i.e.with more normal grades that extreme ones).[insert Table1and Fig.1around here]2.2Data mining approach and evaluationWe will adopt a regression approach,which preserves the order of the prefer-ences.For instance,if the true grade is3,then a model that predicts4is better than one that predicts7.A regression dataset D is made up of k∈{1,...,N}examples,each mapping an input vector with I input variables(x k1,...,x kI)toa given target y k.The regression performance is commonly measured by an error metric,such as the mean absolute deviation(MAD)[33]:MAD= N i=1|y i− y i|/N(1)where y k is the predicted value for the k input pattern.The regression error characteristic(REC)curve[2]is also used to compare regression models,with the ideal model presenting an area of1.0.The curve plots the absolute error tolerance T(x-axis),versus the percentage of points correctly predicted(the accuracy)within the tolerance(y-axis).The confusion matrix is often used for classification analysis,where a C×C matrix(C is the number of classes)is created by matching the predicted values(in columns)with the desired classes(in rows).For an ordered output,7the predicted class is given by p i=y i,if|y i− y i|≤T,else p i=yi ,where yidenotes the closest class to y i,given that yi=y i.From the matrix,several metrics can be used to access the overall classification performance,such as the accuracy and precision(i.e.the predicted column accuracies)[33].The holdout validation is commonly used to estimate the generalization capa-bility of a model[19].This method randomly partitions the data into training and test subsets.The former subset is used tofit the model(typically with2/3 of the data),while the latter(with the remaining1/3)is used to compute the estimate.A more robust estimation procedure is the k-fold cross-validation [9],where the data is divided into k partitions of equal size.One subset is tested each time and the remaining data are used forfitting the model.The process is repeated sequentially until all subsets have been tested.Therefore, under this scheme,all data are used for training and testing.However,this method requires around k times more computation,since k models arefitted.2.3Data mining methodsWe will adopt the most common NN type,the multilayer perceptron,where neurons are grouped into layers and connected by feedforward links[3].For regression tasks,this NN architecture is often based on one hidden layer of H hidden nodes with a logistic activation and one output node with a linear function[16]:y=w o,0+o−1j=I+111+exp(− I i=1x i w j,i−w j,0)·w o,i(2)where w i,j denotes the weight of the connection from node j to i and o the output node.The performance is sensitive to the topology choice(H).A NN8with H=0is equivalent to the MR model.By increasing H,more complex mappings can be performed,yet an excess value of H will overfit the data, leading to generalization loss.A computationally efficient method to set H is to search through the range{0,1,2,3,...,H max}(i.e.from the simplest NN to more complex ones).For each H value,a NN is trained and its generalization estimate is measured(e.g.over a validation sample).The process is stopped when the generalization decreases or when H reaches the maximum value (H max).In SVM regression[26],the input x∈ I is transformed into a high m-dimensional feature space,by using a nonlinear mapping(φ)that does not need to be explicitly known but that depends of a kernel function(K).The aim of a SVM is tofind the best linear separating hyperplane,tolerating a small error( )whenfitting the data,in the feature space:y=w0+mi=1w iφi(x)(3)The -insensitive loss function sets an insensitive tube around the residuals and the tiny errors within the tube are discarded(Fig.2).[insert Fig.2around here]We will adopt the popular gaussian kernel,which presents less parameters than other kernels(e.g.polynomial)[31]:K(x,x )=exp(−γ||x−x ||2),γ>0.Under this setup,the SVM performance is affected by three parameters:γ, and C(a trade-offbetweenfitting the errors and theflatness of the mapping).To reduce the search space,thefirst two values will be set using the heuristics[5]:C=3(for a standardized output)and = σ/√N,where σ=1.5/N× N i=1(y i− y i)2and y is the value predicted by a3-nearest neighbor algorithm.The kernel9parameter(γ)produces the highest impact in the SVM performance,with values that are too large or too small leading to poor predictions.A practical method to setγis to start the search from one of the extremes and then search towards the middle of the range while the predictive estimate increases[31].2.4Variable and Model SelectionSensitivity analysis[18]is a simple procedure that is applied after the train-ing phase and analyzes the model responses when the inputs are changed. Originally proposed for NNs,this sensitivity method can also be applied to other algorithms,such as SVM[7].Let y adenote the output obtained byjholding all input variables at their average values except x a,which varies through its entire range with j∈{1,...,L}levels.If a given input variable (x a∈{x1,...,x I})is relevant then it should produce a high variance(V a). Thus,its relative importance(R a)can be given by:V a= L j=1( y a j− y a j)2/(L−1)(4)R a=V a/ I i=1V i×100(%)In this work,the R a values will be used to measure the importance of the inputs and also to discard irrelevant inputs,guiding the variable selection algorithm. We will adopt the popular backward selection,which starts with all variables and iteratively deletes one input until a stopping criterion is met[14].Yet, we guide the variable deletion(at each step)by the sensitivity analysis,in a variant that allows a reduction of the computational effort by a factor of I (when compared to the standard backward procedure)and that in[18]has outperformed other methods(e.g.backward and genetic algorithms).Similarly10to[36],the variable and model selection will be performed simultaneously,i.e. in each backward iteration several models are searched,with the one that presents the best generalization estimate selected.For a given DM method, the overall procedure is depicted bellow:(1)Start with all F={x1,...,x I}input variables.(2)If there is a hyperparameter P∈{P1,...,P k}to tune(e.g.NN or SVM),start with P1and go through the remaining range until the generalization estimate pute the generalization estimate of the model by using an internal validation method.For instance,if the holdout method is used,the available data are further split into training(tofit the model) and validation sets(to get the predictive estimate).(3)Afterfitting the model,compute the relative importances(R i)of all x i∈F variables and delete from F the least relevant input.Go to step4ifthe stopping criterion is met,otherwise return to step2.(4)Select the best F(and P in case of NN or SVM)values,i.e.,the inputvariables and model that provide the best predictive estimates.Finally, retrain this configuration with all available data.3Empirical resultsThe R environment[22]is an open source,multiple platform(e.g.Windows, Linux)and high-level matrix programming language for statistical and data analysis.All experiments reported in this work were written in R and con-ducted in a Linux server,with an Intel dual core processor.In particular,we adopted the RMiner[6],a library for the R tool that facilitates the use of DM techniques in classification and regression tasks.11Beforefitting the models,the data wasfirst standardized to a zero mean and one standard deviation[16].RMiner uses the efficient BFGS algorithm to train the NNs(nnet R package),while the SVMfit is based on the Sequential Minimal Optimization implementation provided by LIBSVM(kernlab pack-age).We adopted the default R suggestions[29].The only exception are the hyperparameters(H andγ),which will be set using the procedure described in the previous section and with the search ranges of H∈{0,1,...,11}[36] andγ∈{23,21,...,2−15}[31].While the maximum number of searches is 12/10,in practice the parsimony approach(step2of Section2.4)will reduce this number substantially.Regarding the variable selection,we set the estimation metric to the MAD value(Equation1),as advised in[31].To reduce the computational effort, we adopted the simpler2/3and1/3holdout split as the internal valida-tion method.The sensitivity analysis parameter was set to L=5,i.e.x a∈{−1.0,−0.5,...,1.0}for a standardized input.As a reasonable balance be-tween the pressure towards simpler models and the increase of computational search,the stopping criterion was set to2iterations without any improvement or when only one input is available.To evaluate the selected models,we adopted20runs of the more robust5-fold cross-validation,in a total of20×5=100experiments for each tested config-uration.Statistical confidence will be given by the t-student test at the95% confidence level[13].The results are summarized in Table2.The test set errors are shown in terms of the mean and confidence intervals.Three met-rics are present:MAD,the classification accuracy for different tolerances(i.e. T=0.25,0.5and1.0)and Kappa(T=0.5).The selected models are described in terms of the average number of inputs(I)and hyperparameter value(H or12γ).The last row shows the total computational time required in seconds.[insert Table2and Fig.3around here]For both tasks and all error metrics,the SVM is the best choice.The differences are higher for small tolerances and in particular for the white wine(e.g.for T=0.25,the SVM accuracy is almost two times better when compared to other methods).This effect is clearly visible when plotting the full REC curves (Fig.3).The Kappa statistic[33]measures the accuracy when compared with a random classifier(which presents a Kappa value of0%).The higher the statistic,the more accurate the result.The most practical tolerance values are T=0.5and T=1.0.The former tolerance rounds the regression response into the nearest class,while the latter accepts a response that is correct within one of the two closest classes(e.g.a3.1value can be interpreted as grade3 or4but not2or5).For T=0.5,the SVM accuracy improvement is3.3 pp for red wine(6.2pp for Kappa),a value that increases to12.0pp for the white task(20.4pp for Kappa).The NN is quite similar to MR in the red wine modeling,thus similar performances were achieved.For the white data,a more complex NN model(H=2.1)was selected,slightly outperforming the MR results.Regarding the variable selection,the average number of deleted inputs ranges from0.9to1.8,showing that most of the physicochemical tests used are relevant.In terms of computational effort,the SVM is the most expensive method,particularly for the larger white dataset.A detailed analysis of the SVM classification results is presented by the average confusion matrixes for T=0.5(Table3).To simplify the visualization,the3 and9grade predictions were omitted,since these were always empty.Most of the values are close to the diagonals(in bold),denoting a goodfit by the model.13The true predictive accuracy for each class is given by the precision metric (e.g.for the grade4and white wine,precision T=0.5=19/(19+7+4)=63.3%). This statistic is important in practice,since in a real deployment setting the actual values are unknown and all predictions within a given column would be treated the same.For a tolerance of0.5,the SVM red wine accuracies are around57.7to67.5%in the intermediate grades(5to7)and very low (0%/20%)for the extreme classes(3,8and4),which are less frequent(Fig.1).In general,the white data results are better:60.3/63.3%for classes6and 4,67.8/72.6%for grades7and5,and a surprising85.5%for the class8(the exception are the3and9extremes with0%,not shown in the table).When the tolerance is increased(T=1.0),high accuracies ranging from81.9to 100%are attained for both wine types and classes4to8.[insert Table3and Fig.4around here]The average SVM relative importance plots(R a values)of the analytical tests are shown in Fig.4.It should be noted that the whole11inputs are shown, since in each simulation different sets of variables can be selected.In several cases,the obtained results confirm the oenological theory.For instance,an increase in the alcohol(4th and2nd most relevant factor)tends to result in a higher quality wine.Also,the rankings are different within each wine type. For instance,the citric acid and residual sugar levels are more important in white wine,where the equilibrium between the freshness and sweet taste is more appreciated.Moreover,the volatile acidity has a negative impact,since acetic acid is the key ingredient in vinegar.The most intriguing result is the high importance of sulphates,rankedfirst for both cases.Oenologically this result could be very interesting.An increase in sulphates might be related to the fermenting nutrition,which is very important to improve the wine aroma.144Conclusions and implicationsIn recent years,the interest in wine has increased,leading to growth of the wine industry.As a consequence,companies are investing in new technolo-gies to improve wine production and selling.Quality certification is a crucial step for both processes and is currently largely dependent on wine tasting by human experts.This work aims at the prediction of wine preferences from objective analytical tests that are available at the certification step.A large dataset(with4898white and1599red entries)was considered,including vinho verde samples from the northwest region of Portugal.This case study was ad-dressed by two regression tasks,where each wine type preference is modeled in a continuous scale,from0(very bad)to10(excellent).This approach pre-serves the order of the classes,allowing the evaluation of distinct accuracies, according to the degree of error tolerance(T)that is accepted.Due to advances in the data mining(DM)field,it is possible to extract knowl-edge from raw data.Indeed,powerful techniques such as neural networks (NNs)and more recently support vector machines(SVMs)are emerging.While being moreflexible models(i.e.no a priori restriction is imposed),the per-formance depends on a correct setting of hyperparameters(e.g.number of hidden nodes of the NN architecture or SVM kernel parameter).On the other hand,the multiple regression(MR)is easier to interpret than NN/SVM,with most of the NN/SVM applications considering their models as black boxes. Another relevant aspect is variable selection,which leads to simpler models while often improving the predictive performance.In this study,we present an integrated and computationally efficient approach to deal with these issues. Sensitivity analysis is used to extract knowledge from the NN/SVM models,15given in terms of relative importance of the inputs.Simultaneous variable and model selection scheme is also proposed,where the variable selection is guided by sensitivity analysis and the model selection is based on parsimony search that starts from a reasonable value and is stopped when the generalization estimate decreases.Encouraging results were achieved,with the SVM model providing the best performances,outperforming the NN and MR techniques,particularly for white vinho verde wine,which is the most common type.When admitting only the correct classified classes(T=0.5),the overall accuracies are62.4% (red)and64.6%(white).It should be noted that the datasets contain six/seven classes(from3to8/9).These accuracies are much better than the ones ex-pected by a random classifier.The performance is substantially improved when the tolerance is set to accept responses that are correct within the one of the two nearest classes(T=1.0),obtaining a global accuracy of89.0%(red)and 86.8%(white).In particular,for both tasks the majority of the classes present an individual accuracy(precision)higher than90%.The superiority of SVM over NN is probably due to the differences in the train-ing phase.The SVM algorithm guarantees an optimumfit,while NN training may fall into a local minimum.Also,the SVM cost function(Fig.2)gives a linear penalty to large errors.In contrast,the NN algorithm minimizes the sum of squared errors.Thus,the SVM is expected to be less sensitive to outliers and this effect results in a higher accuracy for low error tolerances.As argued in[15],it is difficult to compare DM methods in a fair way,with data analysts tending to favor models that they know better.We adopted the default sug-gestions of the R tool[29],except for the hyperparameters(which were set using a grid search).Since the default settings are more commonly used,this16seems a reasonable assumption for the comparison.Nevertheless,different NN results could be achieved if different hidden node and/or minimization cost functions were used.Under the tested setup,the SVM algorithm provided the best results while requiring more computation.Yet,the SVMfitting can still be achieved within a reasonable time with current processors.For example, one run of the5-fold cross-validation testing takes around26minutes for the larger white dataset,which covers a three-year collection period.The result of this work is important for the wine industry.At the certification phase and by Portuguese law,the sensory analysis has to be performed by hu-man tasters.Yet,the evaluations are based in the experience and knowledge of the experts,which are prone to subjective factors.The proposed data-driven approach is based on objective tests and thus it can be integrated into a decision support system,aiding the speed and quality of the oenologist per-formance.For instance,the expert could repeat the tasting only if her/his grade is far from the one predicted by the DM model.In effect,within this domain the T=1.0distance is accepted as a good quality control process and, as shown in this study,high accuracies were achieved for this tolerance.The model could also be used to improve the training of oenology students.Fur-thermore,the relative importance of the inputs brought interesting insights regarding the impact of the analytical tests.Since some variables can be con-trolled in the production process this information can be used to improve the wine quality.For instance,alcohol concentration can be increased or decreased by monitoring the grape sugar concentration prior to the harvest.Also,the residual sugar in wine could be raised by suspending the sugar fermentation carried out by yeasts.Moreover,the volatile acidity produced during the malo-lactic fermentation in red wine depends on the lactic bacteria control activity.17。
MaxPlusMaxPlus is a powerful software tool specifically developed for data analysis and visualization. With its intuitive interface and comprehensive capabilities, MaxPlus offers a wide range of features for users to explore and extract valuable insights from their data. This document provides an overview of MaxPlus and highlights its key features.Features1. Data Import and OrganizationMaxPlus allows users to easily import data from various sources such as CSV files, Excel spreadsheets, and databases. The software supports a wide range of data formats and provides flexible options for data handling. Users can organize their data by creating projects and datasets, making it convenient to manage and manipulate large amounts of data.2. Data Cleaning and PreprocessingData cleaning and preprocessing are essential steps in any data analysis project. MaxPlus provides a variety of tools to handle missing data, outliers, and inconsistencies. Users can perform tasks such as data imputation, outlier detection, and data standardization, enabling them to prepare their data for analysis effectively.3. Exploratory Data Analysis (EDA)EDA is an essential phase in data analysis that helps understand the underlying patterns and relationships in the data. MaxPlus offers a wide range of visualization techniques like scatter plots, histograms, and box plots to explore the data. Users can also perform statistical tests and generate insightful summary statistics to gain a comprehensive understanding of the data.4. Machine LearningMaxPlus incorporates powerful machine learning algorithms for predictive modeling and pattern recognition. Users can apply popular machine learning techniques such as decision trees, support vector machines, and neural networks to build and evaluate models. The software also includes automated model selection and hyperparameter optimization tools, making it easier for users to find the best model for their data.5. Advanced AnalyticsIn addition to traditional data analysis techniques, MaxPlus offers advanced analytics capabilities such as time series analysis, regression analysis, and clustering. These tools allow users to uncover complex relationships and discover hidden insights in their data. MaxPlus provides a wide range of statistical tests and modeling techniques to facilitate advanced data analysis.6. Visualizations and ReportingVisualizations play a crucial role in communicating insights effectively. MaxPlus provides a variety of visualization options, including charts, graphs, and interactive dashboards. Users can customize the appearance and layout of visualizations to create compelling and informative presentations. The software also supports exporting visualizations and reports in various formats for sharing with others.Benefits1. User-Friendly InterfaceMaxPlus offers a user-friendly and intuitive interface, making it easy for users of all skill levels to navigate and utilize its features. The software provides a seamless experience for performing data analysis tasks, allowing users to focus on extracting insights from their data rather than struggling with complex tools.2. Comprehensive FunctionalityWith its wide range of features and capabilities, MaxPlus provides a comprehensive solution for data analysis and visualization. Users can perform various tasks, including data cleaning, exploratory analysis, machine learning, and advanced analytics, all within a single software package. The ability to perform multiple tasks in one tool streamlines the data analysis process and saves time for users.3. Efficient Data HandlingMaxPlus is built with efficiency in mind, allowing users to handle large datasets without sacrificing performance. The software utilizes advanced algorithms and optimization techniques to ensure fast and reliable data processing. Users can easily import, manipulate, and analyze massive amounts of data without experiencing any lag or slowdowns.4. Flexibility and CustomizationMaxPlus provides a wide range of options for customization and flexibility. Users can personaliz e the software’s settings, layouts, and visualizations to suit their preferences and requirements. The ability to tailor the software to individual needs enhances the user experience and allows for more effective and meaningful analysis.5. Collaboration and SharingMaxPlus supports collaboration and sharing among team members. Users can work on projects together, share datasets and models, and provide comments and feedback. The software also enables exporting visualizations and reports in various formats, making it easy to share insights with colleagues or stakeholders.ConclusionMaxPlus is a versatile and powerful data analysis and visualization software. With its extensive range of features and capabilities, MaxPlus provides users with a comprehensive solution for handling and analyzing data. Whether it is datacleaning, exploratory analysis, machine learning, or advanced analytics, MaxPlus offers intuitive tools and a user-friendly interface to support efficient and meaningful analysis. By using MaxPlus, users can unlock valuable insights from their data and make data-driven decisions with confidence.。
Olivier Chapelle,Vladimir VapnikAT&T Research Labs100Schulz driveRed Bank,NJ07701chapelle,vlad@AbstractNew functionals for parameter(model)selection of Support Vector Ma-chines are introduced based on the concepts of the span of support vec-tors and rescaling of the feature space.It is shown that using these func-tionals,one can both predict the best choice of parameters of the modeland the relative quality of performance for any value of parameter.1IntroductionSupport Vector Machines(SVMs)implement the following idea:they map input vectors into a high dimensional feature space,where a maximal margin hyperplane is constructed [6].It was shown that when training data are separable,the error rate for SVMs can be characterized by(1)where is the radius of the smallest sphere containing the training data and is the mar-gin(the distance between the hyperplane and the closest training vector in feature space). This functional estimates the VC dimension of hyperplanes separating data with a given margin.To perform the mapping and to calculate and in the SVM technique,one uses a positive definite kernel which specifies an inner product in feature space.An example of such a kernel is the Radial Basis Function(RBF),This kernel has a free parameter and more generally,most kernels require some param-eters to be set.When treating noisy data with SVMs,another parameter,penalizing the training errors,also needs to be set.The problem of choosing the values of these parame-ters which minimize the expectation of test error is called the model selection problem.It was shown that the parameter of the kernel that minimizes functional(1)provides a good choice for the model:the minimum for this functional coincides with the minimum of the test error[1].However,the shapes of these curves can be different.In this article we introduce refined functionals that not only specify the best choice of parameters(both the parameter of the kernel and the parameter penalizing training error), but also produce curves which better reflect the actual error rate.The paper is organized as follows.Section2describes the basics of SVMs,section3 introduces a new functional based on the concept of the span of support vectors,section4 considers the idea of rescaling data in feature space and section5discusses experiments of model selection with these functionals.2Support Vector LearningWe introduce some standard notation for SVMs;for a complete description,see[6].Let be a set of training examples,which belong to a class labeled by.The decision function given by a SVM is:(2) where the coefficients are obtained by maximizing the following functional:. We now provide an analysis of the number of errors made by the leave-one-out procedure. For this purpose,we introduce a new concept,called the span of support vectors[7].3.2Span of support vectorsSince the results presented in this section do not depend on the feature space,we will consider without any loss of generality,linear SVMs,i.e..Suppose that is the solution of the optimization problem (3).For any fixed support vectorwe define the set as constrained linear combinations ofthe support vectors of the first category :(4)Note that can be less than 0.We also define the quantity,which we call the span of the support vector as theminimum distance between and this set (see figure 1)(5)3λ2= +infλ= -inf Λ1x 2x 3x 1λ2= -1λ3= 20011001100001111Figure 1:Three support vectors with.The set is the semi-openeddashed line.It was shown in [7]that the set is not empty and that,whereis the diameter of the smallest sphere containing the support vectors.Intuitively,the smalleris,the less likely the leave-one-out procedure is to make an error on the vector .Formally,the following theorem holds :Theorem 1[7]If in the leave-one-out procedure a support vector corresponding to is recognized incorrectly,then the following inequality holdsThis theorem implies that in the separable case (),the number of errors made by the leave-one-out procedure is bounded as follows :,because [6].This is already animprovement compared to functional (1),since .But depending on the geome-try of the support vectors the value of the spancan be much less than the diameter of the support vectors and can even be equal to zero.We can go further under the assumption that the set of support vectors does not change during the leave-one-out procedure,which leads us to the following theorem :Theorem2If the sets of support vectors offirst and second categories remain the same during the leave-one-out procedure,then for any support vector,the following equality holds:where and are the decision function(2)given by the SVM trained respectively on the whole training set and after the point has been removed.The proof of the theorem follows the one of Theorem1in[7].The assumption that the set of support vectors does not change during the leave-one-out procedure is obviously not satisfied in most cases.Nevertheless,the proportion of points which violate this assumption is usually small compared to the number of support vec-tors.In this case,Theorem2provides a good approximation of the result of the leave-one procedure,as pointed out by the experiments(see Section5.1,figure2).As already noticed in[1],the larger is,the more“important”in the decision function the support vector is.Thus,it is not surprising that removing a point causes a change in the decision function proportional to its Lagrange multiplier.The same kind of result as Theorem2has also been derived in[2],where for SVMs without threshold,the following inequality has been derived:.The span takes into account the geometry of the support vectors in order to get a precise notion of how “important”is a given point.The previous theorem enables us to compute the number of errors made by the leave-one-out procedure:Corollary1Under the assumption of Theorem2,the test error prediction given by the leave-one-out procedure is(6)Note that points which are not support vectors are correctly classified by the leave-one-out procedure.Therefore defines the number of errors of the leave-one-out procedure on the entire training set.Under the assumption in Theorem2,the box constraints in the definition of(4)can be removed.Moreover,if we consider only hyperplanes passing through the origin,the constraint can also be removed.Therefore,under those assumptions,the com-putation of the span is an unconstrained minimization of a quadratic form and can be done analytically.For support vectors of thefirst category,this leads to the closed form ,where is the matrix of dot products between support vectors of the first category.A similar result has also been obtained in[3].In Section5,we use the span-rule(6)for model selection in both separable and non-separable cases.4RescalingAs we already mentioned,functional(1)bounds the VC dimension of a linear margin clas-sifier.This bound is tight when the data almost“fills”the surface of the sphere enclosing the training data,but when the data lie on aflat ellipsoid,this bound is poor since the radius of the sphere takes into account only the components with the largest deviations.The idea we present here is to make a rescaling of our data in feature space such that the radius of the sphere stays constant but the margin increases,and then apply this bound to our rescaled data and hyperplane.Let usfirst consider linear SVMs,i.e.without any mapping in a high dimensional space.The rescaling can be achieved by computing the covariance matrix of our data and rescaling according to its eigenvalues.Suppose our data are centered and let be the normalized eigenvectors of the covariance matrix of our data.We can then compute thesmallest enclosing box containing our data,centered at the origin and whose edges are parallels to.This box is an approximation of the smallest enclosing ellipsoid. The length of the edge in the direction is.The rescaling consists of the following diagonal transformation:Let us consider and.The decision function is not changed under this transformation since and the datafill a box of side length1.Thus, in functional(1),we replace by1and by.Since we rescaled our data in a box,we actually estimated the radius of the enclosing ball using the-norm instead of the classical-norm.Further theoretical works needs to be done to justify this change of norm.In the non-linear case,note that even if we map our data in a high dimensional feature space, they lie in the linear subspace spanned by these data.Thus,if the number of training data is not too large,we can work in this subspace of dimension at most.For this purpose,one can use the tools of kernel PCA[5]:if is the matrix of normalized eigenvectors of the Gram matrix and the eigenvalues,the dot product is replaced by.Thus,we can still achieve the diagonal transformation andfinally functional(1)becomes5ExperimentsTo check these new methods,we performed two series of experiments.One concerns the choice of,the width of the RBF kernel,on a linearly separable database,the postal database.This dataset consists of7291handwritten digit of size16x16with a test set of2007examples.Following[4],we split the training set in23subsets of317training examples.Our task consists of separating digit0to4from5to9.Error bars infigures2a and4are standard deviations over the23trials.In another experiment,we try to choose the optimal value of in a noisy database,the breast-cancer database1.The dataset has been split randomly100times into a training set containing200examples and a test set containing77examples.Section5.1describes experiments of model selection using the span-rule(6),both in the separable case and in the non-separable one,while Section5.2shows VC bounds for model selection in the separable case both with and without rescaling.5.1Model selection using the span-ruleIn this section,we use the prediction of test error derived from the span-rule(6)for model selection.Figure2a shows the test error and the prediction given by the span for differ-ent values of the width of the RBF kernel on the postal database.Figure2b plots the same functions for different values of on the breast-cancer database.We can see that the method predicts the correct value of the minimum.Moreover,the prediction is very accurate and the curves are almost identical.Log sigma E r r o r(a)choice of in the postal database Figure 2:Test error and its prediction using the span-rule (6).The computation of the span-rule (6)involves computing the span (5)for every supportvector.Note,however,that we are interested in the inequality,rather than the exact value of the span .Thus,while minimizing ,if we find a point such that ,we can stop the minimization because this point will be correctly classified by the leave-one-out procedure.Figure 3compares the time required to (a)train the SVM on the postal database,(b)com-pute the estimate of the leave-one-out procedure given by the span-rule (6)and (c)compute exactly the leave-one-out procedure.In order to have a fair comparison,we optimized the computation of the leave-one-out procedure in the following way :for every support vector,we take as starting point for the minimization (3)involved to compute (the decision function after having removed the point ),the solution given by on the whole training set.The reason isThe results show and is very attractive Figure 3:Comparison of time required for SVM training,computation of span and leave-one-out on the postal database5.2VC dimension with rescalingIn this section,we perform model selection on the postal database using functional (1)and its rescaled version.Figure 4a shows the values of the classical bound for different values of .This bound predicts the correct value for the minimum,but does not reflect theactual test error.This is easily understandable since for large values of,the data in input space tend to be mapped in a veryflat ellipsoid in feature space,a fact which is not taken into account[4].Figure4b shows that by performing a rescaling of our data,we manage to have a much tighter bound and this curve reflects the actual test error,given infigure2a.Figure4:Bound on the VC dimension for different values of on the postal database.The shape of the curve with rescaling is very similar to the test error onfigure2.6ConclusionIn this paper,we introduced two new techniques of model selection for SVMs.One is based on the span,the other is based on rescaling of the data in feature space.We demonstrated that using these techniques,one can both predict optimal values for the parameters of the model and evaluate relative performances for different values of the parameters.These functionals can also lead to new learning techniques as they establish that generalization ability is not only due to margin.AcknowledgmentsThe authors would like to thank Jason Weston and Patrick Haffner for helpfull discussions and comments.References[1] C.J.C.Burges.A tutorial on support vector machines for pattern recognition.Data Mining andKnowledge Discovery,2(2):121–167,1998.[2]T.S.Jaakkola and D.Haussler.Probabilistic kernel regression models.In Proceedings of the1999Conference on AI and Statistics,1999.[3]M.Opper and O.Winter.Gaussian process classification and SVM:Meanfield results and leave-one-out estimator.In Advances in Large Margin Classifiers.MIT Press,1999.to appear.[4] B.Sch¨o lkopf,J.Shawe-Taylor,A.J.Smola,and R.C.Williamson.Generalization bounds viaeigenvalues of the Gram matrix.Submitted to COLT99,February1999.[5] B.Sch¨o lkopf,A.Smola,and K.-R.M¨u ller.Kernel principal component analysis.In Artifi-cial Neural Networks—ICANN’97,pages583–588,Berlin,1997.Springer Lecture Notes in Computer Science,V ol.1327.[6]V.Vapnik.Statistical Learning Theory.Wiley,New York,1998.[7]V.Vapnik and O.Chapelle.Bounds on error expectation for SVM.Neural Computation,1999.Submitted.。