当前位置:文档之家› Speeding Up Multi-class SVM Evaluation by PCA and Feature Selection

Speeding Up Multi-class SVM Evaluation by PCA and Feature Selection

Speeding Up Multi-class SVM Evaluation by PCA and Feature Selection
Speeding Up Multi-class SVM Evaluation by PCA and Feature Selection

Speeding Up Multi-class SVM Evaluation by PCA and Feature Selection

Hansheng Lei,Venu Govindaraju

CUBS,Center for Uni?ed Biometrics and Sensors

State University of New York at Bu?alo

Amherst,NY14260

Email:{hlei,govind@cse.bu?https://www.doczj.com/doc/fb13868225.html,}

Abstract

Support Vector Machine(SVM)is the state-of-art learning machine that has been very fruitful not only in pattern recog-nition,but also in data mining areas,such as feature selec-tion on microarray data,novelty detection,the scalability of algorithms,etc.SVM has been extensively and successfully applied in feature selection for genetic diagnosis.In this pa-per,we do the contrary,i.e.,we use the fruits achieved in the applications of SVM in feature selection to improve SVM it-self.By reducing redundant and non-discriminative features, the computational time of SVM is greatly saved and thus the evaluation speeds up.We propose combining Principal Component Analysis(PCA)and Recursive Feature Elimi-nation(RFE)into multi-class SVM.We found that SVM is invariant under PCA transform,which quali?es PCA to be a desirable dimension reduction method for SVM.On the other hand,RFE is a suitable feature selection method for binary SVM.However,RFE requires many iterations and each iteration needs to train SVM once.This makes RFE infeasible for multi-class SVM if without PCA dimension re-duction,especially when the training set is large.Therefore, combining PCA with RFE is necessary.Our experiments on the benchmark database MNIST and other commonly-used datasets show that PCA and RFE can speed up the evalua-tion of SVM by an order of10while maintaining comparable accuracy.

1Introduction

The Support Vector Machine(SVM)was originally de-signed for binary classi?cation problem[1].It separates two classes with maximum margin.The margin is de-scribed by Support Vectors(SV)which are determined by solving a Quadratic Programming(QP)optimization problem.The training of SVM,dominated by the QP optimization,used to be very slow and lack of scalabil-ity.A lot of e?orts have been done to crack the QP problem and enhance its scalability[17,13,14].The bottleneck lies in the kernel matrix.Suppose we have N data points for training,then the size of the kernel matrix will be N×N.When N is more than thou-sands(say,N=5000),the kernel matrix is too big to stay in the memory of a common personal computer. This had been a challenge for SVM until the Sequential Minimum Optimization(SMO)was invented by[14]. The space complexity of SVM training is dramatically brought down to O(1)by SMO.Thus,the training prob-lem was almost solved,although there might be lurking more powerful solutions.With the support of SMO,the great scalability of SVM has demonstrated its promising potentials in data mining areas[19].

In the past decade,SVM has been widely applied in pattern recognition as well as data mining with fruitful results.However,the SVM itself also needs improvement in both training and testing(evaluation).

A lot of work have been done to improve the SVM training and the SMO can be considered as the state-of-art solution for https://www.doczj.com/doc/fb13868225.html,paratively,only a few e?orts have been put at the evaluation side of SVM[2,4,10].

In this paper,we propose a method for SVM eval-uation enhancement via Principle Component Analy-sis(PCA)and Recursive Feature Elimination(RFE). PCA is an orthogonal transformation of coordinate sys-tem that preserves the Euclidean distance of original points(each point is considered as a vector of features or components).By PCA transform,the energy of points are concentrated into the?rst few components.This leads to dimension reduction.Feature selection has been heavily studied,especially for the purpose of gene selec-tion on microarray data.The common situation in the gene related classi?cation problem is:there are thou-sands of genes but only no more than hundreds of sam-ples,i.e.,the number of dimensions is much more than the number of samples.In this condition,the prob-lem of over?tting arises.Among those genes,which of them are discriminative?Finding the minimum subset of genes that interact can help cancer diagnosis.RFE in the context of SVM has achieved excellent results on feature selection[5].Here,we do the contrary,i.e., we use the fruits of the application of SVM in feature selection to improve SVM itself.

The rest of this paper is organized as follows.After the introduction,we brie?y discuss the background of SVM,PCA and RFE as well as some related works in §2.Then,we prove SVM is invariant under PCA and describe how to incorporate PCA and RFE into SVM to speed up SVM evaluation in§3.Experimental results

on benchmark datasets are reported in§4.Finally, conclusion is drawn in§5.

2Background and Related Works

In this section,we discuss the basic concepts of SVM and how RFE is incorporated into SVM for feature selection on gene expressions.In addition,PCA is also introduced.We prove that SVM is invariant under PCA transformation and the propose combining PCA and RFE safely to improve SVM evaluation.

2.1Support Vector Machines(SVM)The basic form of a SVM classi?er can be expressed as:

(2.1)g(x)=w·φ(x)+b,

where input vector x∈ n,w is a normal vector of a separating hyper-plane in the feature space produced from the mapping of a functionφ(x): n→ n (φ(x) can be linear or non-linear,n can be?nite or in?nite), and b is a bias.Since SVM was originally designed for two-class classi?cation,the sign of g(x)tells vector x belongs to class1or class-1.

Given a set of training samples x i∈ n,i= 1,···,N and corresponding labels y i∈{?1,+1},the separating hyper-plane(described by w)is determined by minimizing the structure risk instead of the empirical error.Minimizing the structure risk is equivalent to seeking the optimal margin between two classes.The

width of the margin is2

w·w =2

w 2

.Plus some

trade-o?between structure risk and generalization,the training of SVM is de?ned as a constrained optimization problem:

min w,b 1

2

w·w+C

N

i=1

ξi

(2.2)

subject to

y i(w·φ(x i)+b)≥1?ξi,

ξi≥0,?i,

where parameter C is the trade-o?.

The solution to(2.2)is reduced to a QP optimiza-tion problem:

max a a T a?

1

2

a T H a

(2.3)

subject to

0≤αi≤C,?i,

N

i=1

y iαi=0,

where a=[α1,···,αN]T,and H is a N×N matrix, called the kernel matrix,with each element H(i,j)= y i y jφ(x i)·φ(x j).

Solving the QP problem yields:

w=

N

i=1

αi y iφ(x i),

(2.4)

b=

N

j=1

αi y jφ(x i)·φ(x j)+y i,?i.

(2.5)

Each training sample x i is associated with a La-grange coe?cientαi.Those samples whose coe?cient αi is nonzero are called Support Vectors(SV).Only a small portion of training samples become SVs(say,3%).

Substituting eq.(2.4)to(2.1),we have the formal expression of SVM classi?er:

g(x)=

N

i=1

αi y iφ(x i)·φ(x)+b

(2.6)

=

N

i=1

αi y i K(x i,x)+b,

where the K is a kernel function:K(x i,x j)=φ(x i)·φ(x j).By the kernel functions,we do not have to explicitly knowφ(x).The most commonly-used kernel functions are:1)Linear kernel,i.e.,φ(x i)=x i,thus, K(x i,x j)=x i·x j=x T i x j;2)Polynomial kernel,i.e., K(x i,x j)=(x i·x j+c)d,where c and d are some positive constants.3)Gaussian Radial Basis(RBF)kernel,i.e., K(x i,x j)=e(? x i?x j

2

2σ2

).If the used kernel is linear, then the SVM is called linear SVM,otherwise,non-line SVM.

The QP problem(2.3)involves a matrix H that has a number of elements equal to the square of the number of training samples.If there are more than 5000training samples,H will not be able to?t into a 128-Megabyte memory(assume each element is8-byte double).Thus,the QP problem will become intractable via standard QP techniques.There were a lot of e?orts to meet his challenge[17,13]and?nally the SMO gave a perfect solution[14]in1999.SMO breaks the large QP problem into a series of smallest possible QP problems. Solving those small QP optimizations analytically only needs O(1)memory.In this way,the scalability of SVM is enhanced dramatically.Since the training problem was cracked by SMO,the focus of e?orts has been transferred to the evaluation of SVM[2,10]:improve accuracy and evaluation speed.Increasing accuracy is quite challenging,while the evaluation speed has much margin to work on.To enhance the SVM evaluation speed,we should refer back to eq.(2.6),from which we can imagine there are following possible ways to achieve speedup:

1.Reduce the number of SVs directly.w is described

by a linear combination of SVs and to obtain g(x), x needs to do inner product with all SVs.Thus, reducing the total number of SVs can directly reduce the computational time of SVM in test phase.Burges et al.proposed such a method[2].

It tries to?nd a w that approximates w as close as possible in the feature space.Similarly as w, w is expressed by a list of vectors(called reduced set)associated with corresponding coe?cients(α i).

However,the method for determining the reduced set is computationally very expensive.Exploring in this direction further,Downs et al.found a method to identify and discard unnecessary SVs(those SVs who linearly depend on other SVs)while leaving the SVM decision unchanged[4].A reduction in SVs as high as40.96%was reported therein.

2.Reduce the size of quadratic program and thus re-

duce the number of SVs indirectly.A method called RSVM(Reduced Support Vector Machines)was proposed by Lee et al.[10].It preselects a subset of training samples as SVs and solves a smaller QP.

The authors reported that RSVM needs much less computational time and memory usage than stan-dard SVM.A comparative study on RSVM and SVM by Lin et al.[11]shew that standard SVM possesses higher generalization ability,while RSVM may be suitable in very large training problems or those that have a large portion of training samples becoming SVs.

3.Reduce the number of vector components.Sup-

pose the length of vector is originally n,can we reduce the length to p n by removing non-discriminative components or features?To the best of our knowledge,there are very few e?orts,if any, have been done in this direction.Since there are mature fruits in dimension reduction and feature selection,we propose to make use of successful tech-niques gained by the applications of SVM in fea-ture selection to improve SVM,especially multi-class SVM.

2.2Recursive Feature Elimination(RFE)RFE is an iterative procedure to remove non-discriminative features[6]in binary classi?cation problem.The frame-work of RFE consists of three steps:

1.Train the classi?er;

https://www.doczj.com/doc/fb13868225.html,pute the ranking of all features with a certain

criterion in term of their contribution to classi?ca-tion;

3.Remove the feature with lowest ranking.Goto step

1until no more features.

Note that in step3,only one feature is eliminated each time.It may be more e?cient to remove several features at a time,but at expense of possible perfor-mance degradation.There are many feature selection methods besides RFE.However,in the context of SVM, RFE has been proved to be one of the most suitable feature selection methods by extensive experiments[6]. The outline of RFE in SVM is as follows: Algorithm.1SVM-RFE

Inputs:Training samples X0=[x1,x2,···,x N]and class labels Y=[y1,y2,···,y N].

Outputs:feature ranked list r.

1:s=[1,2,···,n];/*surviving features*/

2:r=[];/*feature ranking list*/

3:while s=[]do

4:X=X0(s,:);/*only use surviving features of every sample*/

5:Train linear SVM on X and Y and obtain w; 6:c i=w2i,?i/*weight of the i th feature in s*/ 7:f=argmin i(c i);/*index of the lowest ranking*/ 8:r=[s(f),r];/*update list*/

9:s=s(1:f?1,f+1:length(s));/*eliminate the lowest ranked feature*/

10:end while./*end of algorithm*/

Note that the ranking criterion for the discrim-inability of features is based on w,the normal vector of the separating hyper-plane,as shown in line6.The idea here is that one may consider a feature is discriminative if it signi?cantly in?uences the width of the margin of the SVM.Recall that SVM tries to seek the maximum width of margin and the width is2

w 2

=2/

n

i=1

w2i.So, if a feature with large w2i is removed,the change of the margin is also large,thus,this feature is very important.

Also note that in line5,linear SVM is usually used in gene selection applications.According to our experience,linear SVM works better than non-line SVM in gene selection because the gene expression samples tend to be linear separable.However,in regular domains,like handwritten character or facial classi?cation using images as samples,non-linear SVM is usually better.

RFE requires many iterations.If we have n features in total and want to choose the top p features,the num-ber of iterations is n?p if only one feature is eliminated at a time.More than one features can be removed a time by modifying line7-9in the algorithm above,at the ex-pense of possible degradation of https://www.doczj.com/doc/fb13868225.html,ually half features can be eliminated at a time on mircoarray

gene data until the number of features come down to no

more than 100(then features should be eliminated with caution).

2.3Principal Component Analysis (PCA)It is well known that the training of SVM is very slow compared with other classi?ers.RFE works smoothly in gene selection problems,because there are usually no more than hundreds of training samples of gene expressions in that situation.When we come back to conventional classi?cation problem,the training set usually has large size (say,tens of thousands of samples).In this case,it is desirable to reduce the computational time of each training as well as the total number of iterations.We use PCA to preprocess the data and perform dimension reduction before RFE.Given a set of centered vectors x i ∈ n ,k =1,···,N , N

i =1x i =0,PCA diagonalizes the scatter matrix:(2.7)S =1N N

i =1x i x T i .To do this,the eigen problem has to be solved:(2.8)S v =λv ,

where eigenvalues λ≥0and eigenvectors v ∈ n

.Those eigenvectors v i ,i =1,···,n ,are called the Principal Components .Arbitrary pair of principle components are orthogonal to each other,i.e.,v T

i v j =0(i =j ).And,the eigenvectors are normal,i.e.,v T

i v i =1.Those components span a new orthogonal coordinate system with each component as an axis.The original vector x i has its new coordinates in the new system by projecting it to every axis (suppose V =[v 1,···,v n ],the projection of x i to the new coordinate system is x i =V T x i ).The projected vector x

i has the same dimension n as x i ,since there are n eigenvectors together.

The bene?t of projecting x i into the PCA space is:those eigenvectors associated with large eigenvalues are more principle,i.e.,the projecting values to those v i are larger and thus more important.Those less important components can be removed and thus the dimension of the space is safely reduced.How many components should be eliminated depends on the applications.PCA is a suitable dimension reduction method for SVM clas-si?ers because SVM is invariant under PCA transform.We will give proof in next section.3Speeding Up SVM by PCA and RFE We speed up SVM evaluation via PCA for dimension

reduction and RFE for feature selection.RFE has

been combined with linear SVM in the applications of

gene selection.Here,we incorporate RFE into stan-dard SVM (linear or non-linear)to eliminate the non-discriminative features and thus enhance SVM in test phase.A motivation here is for the conventional clas-si?cation problems,like handwriting recognition,face detection,the input vectors of features are the pixels of images (if we do not use prede?ned features).For a 28×28image,the vector will be 784long.Reducing the length of every vector while maintaining the accuracy is of practical interests.Another motivation is that not all feature are discriminative,especially in multi-class problem.For example,in the handwritten digit recog-nition,when we try to distinguish ’4’and ’9’,usually only the upper part (closed or open)is necessary to tell them apart.The other parts or features are actually of little use here.An e?cient implementation method

of multi-class SVM is ’One-vs-One’(OVO)[8].OVO

is constructed by training binary SVMs between pair-wise classes.Thus,OVO model consists of M (M ?1)2

bi-nary SVMs for M -class problem.Each of the binary SVM classi?er casts one vote for its favored class,and ?nally the class with maximum votes wins.There all

other multi-class SVM implementation methods besides

OVO,such ”One-vs-All”[18,16]and DAG SVM [15].

None of them outperforms OVO signi?cantly,if compa-rable.But most of them are to decompose multi-class problem to a series of binary problems.Therefore,the concept of RFE originally for two-class is also applica-ble in multi-class.Some features are discriminative for

this pair of classes,but may be not useful for another

pair.We do not have to use a ?xed number of features for every binary SVM classi?https://www.doczj.com/doc/fb13868225.html,ing PCA and RFE,we propose the following Feature Reduced Support Vec-tor Machines (FR-SVM)algorithms in the framework of OVO.Algorithm.2Training of FR-SVM

012,···,x N ];class labels Y =[y 1,y 2,···,y N ];M (number of classes);P (number of chosen most principal components);and F (number of chosen top ranked features).

Outputs :Trained multi-class SVM classi?er OVO ;[V,D,P](the parameters for PCA transformation).1:[V,D ]=P CA (X 0);/*Eigen vectors &values */2:X =P CA T ransform (V,D,X 0,P );/*dimension reduction by PCA to P components */

3:for i=1to M do 4:for j=i+1to M do

5:C 1=X (:,find (Y ==i ));/*data of class i */

6:C 2=X (:,find (Y ==j ));/*data of class j */

7:r =SVM-RFE (C 1,C 2);/*ranking list*/8:F c ←F top ranked features from r ;9:C

1=C 1(F c,:);/*only the F components*/

10:C 2=C2(F c,:);

11:Binary SVM model←Train SVM on C 1&C 2; 12:OV O{i}{j}←{Binary SVM Model,Fc};

/*save the model and selected features*/ 13:end for

14:end for/*end of algorithm*/

Algorithm.3Evaluation of FR-SVM

Inputs:Evaluation samples X0=[x1,x2,···,x N]; Trained multi-class SVM classi?er OVO;M(number of classes);[V,D,P](the parameters for PCA transforma-tion).

Outputs:Labels Y=[y1,y2,···,y N].

1:X=P CA T ransform(V,D,X0,P);

2:for k=1to N do

3:x=X(:,k);/*one test sample*/

4:Votes=zeros(1,M);/*votes for each class*/

5:for i=1to M do

6:for j=i+1to M do

7:{Binary SVM,Fc}←OV O{i}{j};

8:x =x(F c);/*only the selected compo-nents*/

9:Label←SVM(x );/*binary SVM evalua-tion*/

10:Votes(Label)=Votes(Label)+1;

11:end for

12:end for

13:Y(i)=find(V otes==max(V otes));/*the one with max votes wins*/

14:end for/*end of algorithm*/

Algorithm.2and3are the training and evaluation of FR-SVM respectively.Note that for every binary SVM, we apply RFE to obtain the top F most discriminative features(components).Then,in evaluation,we only use those selected features.The F features may be di?erent across di?erent pair of classes.Instead of prede?ning the number of features F,one might wants to determine such an optimal F automatically by cross validation.This is a choice but too time-consuming when the training set or the number of classes is large. To make the FR-SVM more feasible,we let F be user-de?ned.Similarly,FR-SVM leaves P(number of chosen principal components)to the users.

We use PCA for dimension reduction because SVM is invariant under PCA transformation.We state it as a theorem and prove it.

Theorem3.1.The evaluation result of SVM with lin-ear,polynomial and RBF kernels is invariant if input vector is transformed by PCA.

Proof.Suppose V=[v1,v2,···,v n],where v i is an eigenvector.The PCA transformation of vector x is V T x.Recall the optimization problem(2.3).If ker-nel matrix H does not change,then the optimization does not change(under the same constraints).Since H(i,j)=y i y j K(x i,x j),all we need for proof of in-variance is K(V T x i,V T x j)=K(x i,x j).Note that all eigenvectors are normal,i.e.,v T i v i=1,?i and mutually orthogonal,i.e.,v T i v j=0(i=j).Therefore,we have V T V=I,where I is a n×n unit matrix.

Now,for linear case,K(V T x i,V T x j)=V T x i·V T x j=(V T x i)T V T x j=x T i(V V T)x j=x T i I x j= x T i x j=K(x i,x j).Similarly,we can prove the poly-nomial case.

For RBF kernel,it is enough to show V T x i?V T x j 2= x i?x j 2.Expanding the Euclidean norm, we have V T x i?V T x j 2=(V T x i)2+(V T x j)2?2(V T x i)T(V T x j)=x2i+x2j?2x i x j= x i?x j 2.

Backed up by the theorem above,we can safely utilize PCA to preprocess the training samples and reduce their dimensions.Of course,this preprocessing is optional,if the original dimension of samples is not high(say,like below50),we do not have to carry PCA transformation.Our recommendation to use PCA to reduce dimension before RFE is due to two concerns: one is that the RFE needs many iterations.The number of iterations is directly related to the dimension(i.e.,the number of original features).Thus,dimension reduction leads to reduction of RFE iterations;another is that SVM training is quite slow,dimension reduction saves the computational time of each iteration of training.We will see how PCA and RFE contribute to the speedup of SVM by experiments in the next section.

4Experiments

The main tasks of the experiments are to:1)test the accuracy of FR-SVM.For every classi?er,one of the most important measures is its accuracy.Since our goal is to enhance the SVM evaluation speed,we should be careful not to jeopardize the performance of SVM.

2)observe how much speedup we can achieve without negative in?uence on the performance of SVM.

Three datasets were used in our experiments.The description of them are summarized in table1.The Iris is one of the most classical datasets for testing classi?cation,available form[7].It has150samples with50in each of the three classes.We used the?rst 35samples from every class for training and the left15 for testing.The MNIST database[9]contains10classes of handwritten digits(0-9).There are60,000samples for training and10,000samples for testing.The digits have been size-normalized and centered in a?xed-size image(28×28).It is a benchmark database for machine learning techniques and pattern recognition methods.

Table1:Description of the multi-class datasets used in the experiments.

Name#of#of#of#of

Training Testing Classes Attributes

Samples Samples

Iris1054534

MNIST600001000010784

Isolet6238155926617

The third dataset Isolet were generated from spoken letters[7].We chose it because it has26classes,which is reasonably high among publicly available datasets. The number of the three class varies from3to26.The number of samples varies from hundreds(Iris)to tens of thousand(MNIST).We hope the typicality of the datasets makes the experimental results convincing.

The software package we used was the OSU SVM Classi?er Matlab Toolbox[12],which is based on the software LIBSVM[3].On each dataset,we trained multi-class OVO SVM.The kernel we chose was the RBF kernel,since it has been widely observed RBF kernel usually outperforms other kernels,such as linear and polynomial ones.The regularizing parame-ters C andσwere determined via cross validation on the training set.The validation performance was measured by training on70%of the training set and testing on the left30%.The C andσthat lead to best accuracy were selected.We did not scale the original samples to range[-1,1],because we found that doing so did not help much.Our FR-SVM were also trained with exactly the same parameters(C andσ)and conditions as OVO SVM except that we varied additional two parameters P(the number of principal components)and F(the number of top ranked features)to see the performance. We compared the performance of OVO SVM and FR-SVM basically in three aspects:classi?cation accuracy, training and testing speed.

4.1PCA Dimension Reduction First experiment we did is:vary P and let F=P,which means PCA is applied alone without feature selection.Since PCA reduces dimensions before SVM training and testing, thus PCA is able the enhance both the training and testing speed.Table2summarizes the results on the Iris dataset.The?rst line with?means without PCA transformation,i.e.,it shows the results of standard OVO SVM.The execution time of PCA is separated from the regular SVM training time.The former is shown in the second column and latter in the third column.Therefore,the total training time of FR-SVM is actually the sum of PCA and SVM training.

Since the Iris is very small,the contribution of PCA Table2:The results of FR-SVM on the Iris(without RFE feature selection).P is the number of principal components chosen.The line with?is the results of OVO SVM.C=500andσ=200for both OVO SVM and FR-SVM.

P Accuracy PCA Training Testing (F=P)(secs)(secs)(secs)?100%NA00

4100%0.01600

3100%0.01500

297.78%0.01500

197.78%0.01500

on training and testing time is not obvious(all0).Only PCA transformation itself costs some time.However, the interesting observation is that the accuracy remains perfect with dimension reduction as0(P=4)or1 (P=3)and even only1component(P=1)can issue a97.78%accuracy.

The results on the MNIST and Isolet are shown in Table3and4respectively.It is surprising that the PCA dimension reduction enhances the accuracy of SVM on the MNIST dataset with a proper value of P.When P=50,the accuracy of FR-SVM is98.30%, while that of OVO SVM is97.90%.Although the enhancement is not signi?cant,it is still encouraging. When P=25,a speedup of11.5(278.9/24.3)in testing and6.8(4471.6/(228.6+441.4))in training is achieved.The speedups are in our expectation,since the dimensions of samples are reduced before SVM training and testing.The results on the Isolet are similar,as shown in table 4.The di?erence from MNIST is that the accuracy of FR-SVM decreases as P decreases,but not signi?cantly before P=100. The computational time of training and testing is saved dramatically by PCA dimension reduction.The interesting observation on Isolet is that training time is reduced from249.4(secs)to86.5(secs)with only PCA transformation but no dimension reduction(when P=617).The reason is unknown and maybe can be traced into the implementation of the SVM optimization toolbox.From all table2,3and4,we can see that the accuracy of SVM remains the same by PCA transformation but without dimension reduction(when P=the number of original dimensions).This con?rms our theorem3.1.

4.2RFE Feature Selection The second experi-ment we did is to vary F without PCA transformation. Since the feature elimination procedure requires many iterations,we?xed the number of features eliminated each time as1on Iris,and20on MNIST and Isolet

Table3:The results of FR-SVM on the MNIST(without RFE feature selection).P is the number of principal components chosen.The line with?is the results of OVO SVM.C=5andσ=20for both OVO SVM and FR-SVM.

P Accuracy PCA Training Testing (F=P)(secs)(secs)(secs)?97.90%NA4471.6278.9

78497.90%258.64467.4275.7

50097.84%255.84230.7250.1

30097.58%252.43800.8237.2

20097.92%243.63650.2231.9

15097.96%237.52499.2175.7

10098.20%234.31529.6102.2

5098.30%230.1628.848.8

2597.94%228.6441.424.3

1092.76%228.0128.712.8

Table4:The results of FR-SVM on the Isolet(without RFE feature selection).P is the number of principal components chosen.The line with?is the results of OVO SVM.C=10andσ=200for both OVO SVM and FR-SVM.

P Accuracy PCA Training Testing (F=P)(secs)(secs)(secs)?96.92%NA249.436.7

61796.92%21.286.536.6

40096.86%19.766.124.5

30096.86%19.357.918.4

20096.60%18.643.512.7

15096.54%18.338.39.3

10096.28%18.033.4 6.4

5095.51%18.024.7 3.6

2593.39%18.020.3 2.3

1081.21%17.412.1 2.0Table5:The results of FR-SVM on the Iris(without PCA dimension reduction).F is the number of top ranked features chosen for each pair of classes.C=500 andσ=200for both OVO SVM and FR-SVM.

F Accuracy RFE Training Testing

(secs)(secs)(secs)

4100%000

3100%0.01600

2100%0.01600

1100%0.01600

empirically.The accuracy,time of RFE,training time and testing time were reported,as shown in table5, 6and7on the three datasets respectively.The?rst line shows the results of OVO SVM actually,because no feature is eliminated.Like the previous experiment, the execution time of RFE is also separated from the regular SVM training time.The total training time of FR-SVM here is actually the sum of RFE(the second column)and SVM training(the third column).

On the Iris,only RFE requires some time,since the size of dataset is very small.The interesting thing is that only one feature is enough to distinguish a pair of Iris classes(When F=1,the accuracy is still100%), as shown in table5.

On the MNIST,the accuracy steadily decreases as F decreases,but not signi?https://www.doczj.com/doc/fb13868225.html,pared with PCA(as shown in table3),the RFE seems not as reliable as PCA on MNIST in term of accuracy,while the speedup gained in testing is quite close.In addition, we can see that RFE is very computationally expensive because of its recursive iterations.To eliminate684 features(let F=100),the number of iterations is34 (284/20,20features eliminated at a time),which takes 13.7hours(49432secs).Without PCA,the procedure of RFE is painfully long.From the results on the Isolet as shown in table7,we have the similar observations.

4.3Combination of PCA and RFE The third experiment was to see how the combination of PCA and RFE contributes to multi-class SVM classi?cation. We chose the minimum P in the?rst experiment which guarantees the accuracy is as high as that of the standard SVM.Then we chose F as small as possible that also issues a comparable accuracy.The parameters C andσremained the same as previous experiments. Table8summarizes the results.Training speedup is calculated as Training time of OVO SVM

Training time of FR-SVM

.Note that the training time of FR-SVM is actually the sum of PCA, RFE,and SVM training.Similarly,the evaluation

speedup is Testing time of OVO SVM

Testing time of FR-SVM

.On the Iris dataset, the training speedup of FR-SVM over OVO SVM is

Table6:The results of FR-SVM on the MNIST(without PCA dimension reduction).F is the number of top ranked features chosen for each pair of classes.C=5 andσ=20for both OVO SVM and FR-SVM.

F Accuracy RFE Training Testing

(secs)(secs)(secs) 78497.90%04471.6278.9

50097.87%353063179.7251.1

30097.82%451202535.3243.6

20097.74%472801094.4221.0

15097.40%49344648.0114.1

10096.62%49432398.7110.6

5094.56%49440286.247.1

2589.86%49446208.825.5

1075.60%52126171.913.5

Table7:The results of FR-SVM on the Isolet(without PCA dimension reduction).F is the number of top ranked features chosen for each pair of classes.C=10 andσ=200for both OVO SVM and FR-SVM.

F Accuracy RFE Training Testing

(secs)(secs)(secs) 61796.92%0249.436.7

40096.98%758.480.222.3

30096.86%868.650.015.7

20096.60%981.130.012.4

15096.73%1065.323.611.5

10096.28%1164.012.8 5.3

5096.09%1284.8 4.6 3.1

2595.2%1301.8 3.9 3.0

1093.14%1314.7 2.6 2.8Table8:Combination of PCA and RFE.P is the number of principal components chosen.F is the number of top ranked features chosen for each pair of classes.Speedup is FR-SVM vs.OVO SVM.

Dataset P F Accuracy Training Testing

Speedup Speedup Iris33100%1 1.3

MNIST504098.14% 3.9010.9

Isolet2006096.28%0.9811.1

1because both execution time are negligible.On the MNIST,FR-SVM achieved a speedup in both training and testing.The gain in training is due to the dimension reduction by PCA.The gain in testing is due to the combinatorial contribution of PCA and RFE.On the Isolet,the training of FR-SVM is slightly slower than OVO SVM because of RFE(with training speedup as 0.98).However,in testing,we can also see a signi?cant enhancement by an order of over10while the accuracy is still comparable.When P=50and F=40,the accuracy of FR-SVM on MNIST is98.14%,higher than that of OVO SVM(97.90%).When P=200and F=60,the accuracy of FR-SVM on Isolet is96.28%, slightly lower than that of OVO SVM(96.92%).To sum up,PCA and RFE can signi?cantly enhance the evaluation speed of standard SVM with proper settings of P and F while maintaining comparable accuracy.

5Conclusion

Incorporating both PCA and RFE into standard SVM, we propose FR-SVM for e?cient multi-class classi?ca-tion.PCA and RFE reduce dimensions and select the most discriminative features.Choosing a proper num-ber of principle components and a number of top ranked features for each pairwise classes,a signi?cant enhance-ment in evaluation can be achieved while comparable accuracy is maintained.

References

[1] B.Boser,I.Guyon,and V.Vapnik.A training algo-

rithm for optimal margin classi?ers.In D.Haussler, editor,5th Annual ACM Workshop on COLT,pages 144–152,1992.

[2] C.Burges and B.Sch¨o lkopf.Improving speed and ac-

curacy of support vector learning machines.In Ad-vances in Kernel Methods:Support Vector Learnings, pages375–381,Cambridge,MA,1997.MIT Press. [3] C.Chang and C.Lin.Libsvm:a library for support

vector machines.https://www.doczj.com/doc/fb13868225.html,/, 2001.

[4]T.Downs,K.Gates,and A.Masters.Exact simpli?-

cation of support vector solutions.Journal of Machine Learning Research,vol.2:293–297,2001.

[5]I.Guyon and A.Elissee?.An introduction to variable

and feature selection.Journal of Machine Learning Research,vol.3:1157–1182,2003.

[6]I.Guyon,J.Weston,S.Barnhill,and V.Vapnik.Gene

selection for cancer classi?cation using support vector machines.Machine Learning,vol.46:389–422,2002.

[7]S.Hettich and S.Bay.The UCI KDD archive.

https://www.doczj.com/doc/fb13868225.html,,1999.

[8]U.Kre?el.Pairwise classi?cation and support vector

machines.In Advances in Kernel Methods:Support Vector Learnings,pages255–268,Cambridge,MA, 1999.MIT Press.

[9]Y.LeCun,L.Bottou,Y.Bengio,and P.Ha?ner.

Gradient-based learning applied to document recogni-tion.Proceedings of the IEEE,vol.86:2278–2324,Nov 1998.

[10]Y.Lee and O.Mangasarian.Rsvm:Reduced support

vector machines.In The First SIAM International Conference on Data Mining,2001.

[11]K.-M.Lin and C.-J.Lin.Lin a study on reduced sup-

port vector machines.IEEE Transactions on Neural Networks,vol.14:1449–1559,2003.

[12]J.Ma,Y.Zhao,and S.Ahalt.Osu svm classi?er mat-

lab toolbox.https://www.doczj.com/doc/fb13868225.html,/,2002.

[13] E.Osuna,R.Freund,and F.Girosi.Improved training

algorithm for support vector machines.In IEEE Workshop on Neural Networks for Signal Processing, 1997.

[14]J.Platt.Fast training of support vector machines

using sequential minimal optimization.In Advances in Kernel Methods-Support Vector Learning,pages 185–208,Cambridge,MA,1999.MIT Press.

[15]J.Platt,N.Cristianini,and https://www.doczj.com/doc/fb13868225.html,rge

margin DAGs for multiclass classi?cation.In Advances in Neural Information Processing Systems,volume12, pages547–553,2000.

[16]R.Ri?n and A.Klautau.In defense of one vs-all-

classi?cation.Journal of Machine Learning Research, vol.5:101–141,2004.

[17]V.Vapnik.Estimation of Dependences Based on

Empirical Data.Springer-Verlag,New York,1982. [18]V.Vapnik.Statistical Learning Theory.Wiley,New

York,1998.

[19]H.Yu.Data Mining via Support Vector Machines:

Scalability,Applicability,and Interpretability.PhD thesis,Univeristy of Illinois at Urbana-Champaign, May2004.

基于知识库的手写体数字识别

HUNAN UNIVERSITY 课程模式识别 题目基于知识库的手写体数字识别学生姓名 学生学号

专业班级 学院名称 2016 年6 月25 日

基于知识库的手写体数字识别 1案例背景: 手写体数字识别是图像识别学科下的一个分支,是图像处理和模式识别研究领域的重要应用之一,并且具有很强的通用性。由于手写数字的随意性很大,如笔画粗细、字体大小、倾斜角度等因素都有可能直接影响到字符的识别准确率,所以手写体数字识别是一个很有挑战性的课题。在过去的数十年中,研究者们提出了许多识别方法,并取得了一定的成果。在大规模数据统计如例行年检、人口普查、财务、税务、邮件分拣等应用领域都有广阔的应用前景。 本案例实现了手写阿拉伯数字的识别过程,并对手写数字识别的基于统计的方法进行了简要介绍和分析。本文实现的手写字体识别程序具有手写数字图像读取、特征提取、数字模板特征库以及识别功能。 2 理论基础: 2-1手写字体识别方法: 手写体数字识别是一个跨学科的复杂问题,综合了图像处理、模式识别、机器学习等多个领域的知识,其识别过程一般包含图像预处理、特征提取、分类器的设定及其后处理等组成。处理流程如图2-1所示。

图2-1 手写体数子识别流程图 2-2 图像预处理 手写体数字识别的首要工作是图像预处理。在图像预处理过程中需要解决的主要问题有:定位、图像二值化、平滑化(去噪)H J、字符切分、规范化等。图像二值化是指将整个图像呈现出明显的黑白效果。待识别的手写体数字图像在扫描过程中,常会带来一些噪声,用不同的扫描分辨率得到的数字图像,其质量也各不相同,故而要先将这些干扰因素排除掉。另外,还需要正确分割整幅文档图像中的手写体数字,而分割后的数字大小、字体常各不相同,故还需进行归一化处理。 2-3 特征提取 特征提取的目的是从经过预处理后的数字图像中,提取出用以区分与其它数字类别的本质属性并数值化,形成特征矢量的过程。常见的手写体数字特征有:模板特征、统计特征、结构特征和变换特征。 2-4 分类器 不同的分类方式对应不同的分类器,可选的分类器有神经网络、支持向量机

ANN MNIST手写数字识别总结

由于第十四周除了正常上课外,其余时间在整理机器学习的笔记,做中特社会调查报告,然后就是元旦放假,故第十四周没提交周报。 本周正常上课,继续完成老师都布置的课业任务,总结通信系统仿真结果,并且完成报告的撰写,分析社会调查结果,做好报告,查阅物理层安全方面的资料,翻译和整理论文。其余时间是开始学习深度学习理论和编程实践,人工神经网络(ANN)和卷积神经网络,了解深度学习几个框架(Caffe 、Torch、TensorFlow、MxNet),最主要还是TensorFlow,学习和查找了一下深度学习优化算法,并且利用人工神经网络做手写数字识别。 心得体会:第一个感受是时间过得很快,已然是15周了,要加快各方面进程。神经网络从线性分类器开始,线性分类器是产生一个超平面将两类物体分开。一层神经网络叫做感知器,线性映射加激励输出,每个神经元对输入信号利用激励函数选择输出,就像大脑神经元的兴奋或抑制,增加神经元数量、隐层数量,就可以无限逼近位置函数分布的形态,过多会出现过拟合算法。ANN的学习方法是BP后向传播算法,其主要思想是每一层的带来的预测误差是由前一层造成的,通过链式求导法则将误差对每一层的权重和偏置进行求导更新权重和偏置,以达到最优结果。因为ANN每一层神经元是全连接的,对于图像这种数据就需要非常大数量的参数,所以就出现了卷积神经网路CNN,利用一些神经元以各种模版在图像上滑动做卷积形成几张特征图,每个神经元的滑动窗口值不一样代表其关注点不一样,单个神经元对整张图的窗口权重是一样的即权值共享(这就是减少参数和神经元数量的原因),这么做的依据是像素局部关联性。CNN主要有数据数据输入层、卷积计算层、激励层、池化层(下采样层)、全连接层、Batch Normalization层(可能有)CNN学习方法也是BP算法迭代更新权重w和偏置b。CNN优点是共享卷积核,对高维数据处理无压力,无需手动选取特征,训练好权重,即得特征,深层次的网络抽取,信息丰富,表达效果好,缺点是需要调参,需要大样本量,训练最好要GPU,物理含义不明确。主要采用随机失活的方法解决过拟合问题,因为CNN网络学习能力强,如果样本量小,容易让网络将样本的所有细节记忆下来而不是学习到样本的共性规律,所以随机失活神经元让部分神经元工作就可以缓解过拟合问题。个人觉得深度学习理论不是很难,就是对硬件的要求很高,GPU真是其必备工具。深度学习学习最主要的学习框架觉得是TensorFlow,因为Google大力支持,社区很庞大,就是依赖硬件能力强。 以下是ANN MNIST手写数字识别程序和结果,数据集是经典的Yann LeCun(人工智能界大佬)MNIST数据集,每张照片大小是28 * 28的灰度图,训练集5000张图片,验证集1000张图片,测试集10000张:

(完整word版)支持向量机(SVM)原理及应用概述分析

支持向量机(SVM )原理及应用 一、SVM 的产生与发展 自1995年Vapnik (瓦普尼克)在统计学习理论的基础上提出SVM 作为模式识别的新方法之后,SVM 一直倍受关注。同年,Vapnik 和Cortes 提出软间隔(soft margin)SVM ,通过引进松弛变量i ξ度量数据i x 的误分类(分类出现错误时i ξ大于0),同时在目标函数中增加一个分量用来惩罚非零松弛变量(即代价函数),SVM 的寻优过程即是大的分隔间距和小的误差补偿之间的平衡过程;1996年,Vapnik 等人又提出支持向量回归 (Support Vector Regression ,SVR)的方法用于解决拟合问题。SVR 同SVM 的出发点都是寻找最优超平面(注:一维空间为点;二维空间为线;三维空间为面;高维空间为超平面。),但SVR 的目的不是找到两种数据的分割平面,而是找到能准确预测数据分布的平面,两者最终都转换为最优化问题的求解;1998年,Weston 等人根据SVM 原理提出了用于解决多类分类的SVM 方法(Multi-Class Support Vector Machines ,Multi-SVM),通过将多类分类转化成二类分类,将SVM 应用于多分类问题的判断:此外,在SVM 算法的基本框架下,研究者针对不同的方面提出了很多相关的改进算法。例如,Suykens 提出的最小二乘支持向量机 (Least Square Support Vector Machine ,LS —SVM)算法,Joachims 等人提出的SVM-1ight ,张学工提出的中心支持向量机 (Central Support Vector Machine ,CSVM),Scholkoph 和Smola 基于二次规划提出的v-SVM 等。此后,台湾大学林智仁(Lin Chih-Jen)教授等对SVM 的典型应用进行总结,并设计开发出较为完善的SVM 工具包,也就是LIBSVM(A Library for Support Vector Machines)。LIBSVM 是一个通用的SVM 软件包,可以解决分类、回归以及分布估计等问题。 二、支持向量机原理 SVM 方法是20世纪90年代初Vapnik 等人根据统计学习理论提出的一种新的机器学习方法,它以结构风险最小化原则为理论基础,通过适当地选择函数子集及该子集中的判别函数,使学习机器的实际风险达到最小,保证了通过有限训练样本得到的小误差分类器,对独立测试集的测试误差仍然较小。 支持向量机的基本思想:首先,在线性可分情况下,在原空间寻找两类样本的最优分类超平面。在线性不可分的情况下,加入了松弛变量进行分析,通过使用非线性映射将低维输

一台电脑接两个显示器,双屏显示(VGA篇、HDMI篇)全攻略

一台电脑接两个显示器,双屏显示介绍 双屏显示的原始需求 一台电脑配一个显示器应该是最常见的搭 配,我们日常的工作、娱乐基本上都是这 样的搭配。但是这种用法,当您打开多个 窗口的时候,一个显示器就显得很拥挤, 尤其是做一些复杂工作,比如分析图表、 调试程序时,你往往需要不断地在不同窗 口之间来回切换,非常麻烦,有没有方法 让这些事情变的简单一些呢? 有!答案是:Windows的双屏显示功能(或 多屏显示,windows最多可以支持10个显 示器同时工作) 双屏显示的安装 双屏显示就是利用一个双头输出的显卡接两个显示器。现在的显卡几乎都是双头输出,一个VGA一个DVI、两个DVI 或者是一个DVI一个HDMI。接一个显示器时,只用VGA或DVI(根据显示器的接口类型不同),接双显示器时,将两个显示器分别接到显卡的VGA和DVI上,若显示器无DVI输入,可以用DVI转VGA转接头,将显卡的DVI转为VGA,如下图所示。 双屏显示的设置 双屏显示的设置非常简单,因为Windows本身支持,所以,把两个显示器都接好后,开启电脑,在Windows的“显示属性”的“设置”页面里,就会看到有两个显示器的图示,如下图所示。

如上图,只要用鼠标选中2号显示器,给它设置合适的分辨率,并勾选“将Windows桌面扩展到该监视器上”,就可以将第二个显示器点亮了,如下图。 这时,就可以用鼠标左键按住已打开的任务窗口(按住窗体的蓝色标题栏),移动鼠标就可以把该窗口从一个屏幕上拖到另一个屏幕上(注:已最大化的窗口不能移动),如下图。

将程序移动到扩展屏幕上,一样可以最大化运行,这个扩展屏幕可以理解成主屏幕的扩充,主屏幕的一部分,所以几乎所有程序都可以在扩展屏幕上运行,没有什么限制(注:可能需要安装显卡的原厂驱动程序)。 扩展屏幕带来的额外好处 作为Windows为了解决运行复杂任务的工具诞生的扩展屏幕,现在已经有了更好的用途:家庭多媒体。现在的液晶电视几乎都有VGA和HDMI接口,可以很方便地连接到电脑上,把电视当显示器用。试想一下,如果把显卡的另外一个输出接到大屏幕液晶电视上,用液晶电视,而不是显示器来观看网上的高清节目,或者玩电脑游戏,那有多爽。你可以在电视上看电脑里或网上的电影,电脑同时还可以做其他事情 电脑连接电视的方法---HDMI篇 为什么要用HDMI线实现电脑连接电视? 上一篇文章讲到,因为现在的液晶电视基本上都有VGA接口,所以你可以很容易地用VGA线实现电脑连接电视上,但是该文有一个地方没有提到,那就是分辨率的问题,现在的液晶电视的主流面板已经是全高清面板(1920X1080),但是遗憾的是,很多电视(尤其是一些国产电视)上的VGA接口还停留在五、六年前的标准,只能支持640X480,800X600,1024X768这几个常用的低端显示器的分辨率,不能支持1920X1080,让你的全高清液晶面板不能发挥最佳水平!(下图是海尔的52寸液晶电视LU52T1的说明书截图,其中的PC接口就是VGA接口,只支持最高1024X768的分辨率,所以如果想让这款电视显示1080P的画面,就只能用它的HDMI接口进行电脑连接电视)

手写数字识别的实现

燕山大学 课程设计说明书 题目:手写数字识别的实现 学院(系):电气工程学院 年级专业: 08-自动化仪表 学号: 080103020179 学生姓名:付成超 指导教师:林洪彬程淑红 教师职称:讲师讲师 2010年 12 月 24 日

燕山大学课程设计(论文)任务书 院(系):电气工程学院基层教学单位:自动化仪表系 学号080103020179 学生姓名付成超专业(班级)自动化仪表设计题目手写数字识别实现 设 计技术参数 通过由数字构成的图像,自动实现几个不同数字的识别,设计识别方法,有较高的识别率 设计要求 设计图像中不同数字的识别方法,可以先从两个数字的识别开始,尽量实现多个不同数字的识别。设计中应该有自己的思想、设计体会 工作量1.分析图像特征,查阅相关资料,根据图像的特征提出解决问题的思路。2.查阅相关资料,学会MATLAB的编程方法 3.根据解决思路,编辑程序,根据调试结果,修改相应思路,找出最佳解决方案 工作计划周一分析图像,查阅各种资料,提出可行的解决方案。周二熟悉MATLAB软件,学会软件的简单编程方法。 周三根据可行的方法,编写程序,调试并修改方案。周四根据调试结果,选取最佳方案并完成设计论文。周五进一步完善设计论文,准备论文答辩。 参考资料[] MICHAEL SIPSER著,张立昂等译,《计算理论导引》,机械工业出版社,2000。 [2] 王晓龙,关毅等编,《计算机自然语言处理》,清华大学出版社,2005。 [3] R.C.Gonzales等著,阮秋崎等译,《数字图像处理》,电子工业出版社,2002。 [4] 王文杰等编,《人工智能原理》,人民邮电出版社,2003。 指导教师签字基层教学单位主任签字 2010年 12 月 24 日

svm使用详解

1.文件中数据格式 label index1:value1 index2:value2 ... Label在分类中表示类别标识,在预测中表示对应的目标值 Index表示特征的序号,一般从1开始,依次增大 Value表示每个特征的值 例如: 3 1:0.122000 2:0.792000 3 1:0.144000 2:0.750000 3 1:0.194000 2:0.658000 3 1:0.244000 2:0.540000 3 1:0.328000 2:0.404000 3 1:0.402000 2:0.356000 3 1:0.490000 2:0.384000 3 1:0.548000 2:0.436000 数据文件准备好后,可以用一个python程序检查格式是否正确,这个程序在下载的libsvm文件夹的子文件夹tools下,叫checkdata.py,用法:在windows命令行中先移动到checkdata.py所在文件夹下,输入:checkdata.py 你要检查的文件完整路径(包含文件名) 回车后会提示是否正确。

2.对数据进行归一化。 该过程要用到libsvm软件包中的svm-scale.exe Svm-scale用法: 用法:svmscale [-l lower] [-u upper] [-y y_lower y_upper] [-s save_filename] [-r restore_filename] filename (缺省值: lower = -1,upper = 1,没有对y进行缩放) 其中, -l:数据下限标记;lower:缩放后数据下限; -u:数据上限标记;upper:缩放后数据上限; -y:是否对目标值同时进行缩放;y_lower为下限值,y_upper 为上限值;(回归需要对目标进行缩放,因此该参数可以设定为–y -1 1 ) -s save_filename:表示将缩放的规则保存为文件save_filename; -r restore_filename:表示将缩放规则文件restore_filename载入后按此缩放; filename:待缩放的数据文件(要求满足前面所述的格式)。 数据集的缩放结果在此情况下通过DOS窗口输出,当然也可以通过DOS的文件重定向符号“>”将结果另存为指定的文件。该文件中的参数可用于最后面对目标值的反归一化。反归一化的公式为:

手写数字识别系统的设计与实现

] 手写数字识别系统的设计与实现 摘要本手写数字识别系统是一个以VISUAL STUDIO C++ 为编译环境,使用MFC进行图形图像界面开发的系统。主要功能是通过在点击手写数字识别菜单下的绘制数字标签弹出的绘制数字窗口中完成数字的手写,在此窗口中可以进行数字的保存及清屏,然后通过文件菜单中的打开标签打开所绘制的数字,从而进行数字的预处理,其中包括灰度化及二值化处理,然后进行特征提取,最后实现数字的识别。本系统的界面设计友好,流程正确,功能也较为完善。实验结果表明,本系统具有较高的识别率。 关键词:绘制数字;预处理;特征提取;特征库;数字识别 / ;

目录 前言 (1) 概述 (2) 1 需求分析 (4) 功能需求分析 (4) , 性能需求分析 (4) 数据需求分析 (5) 相关软件介绍 (5) 2 手写数字识别系统的设计与基本原理 (6) 系统整体功能模块设计 (6) 手写数字识别系统的基本原理 (6) 数字图像的绘制 (6) 图像的预处理 (6) ) 图像的特征提取 (7) 特征库的建立 (8) 图像数字的识别 (8) 3 手写数字识别系统程序设计 (8) 数字图像的绘制 (8) 数字的特征提取 (15) 模板特征库的建立 (18) 数字的识别 (20) (

总结 (23) 致谢 (24) 参考文献 (25)

前言 自上世纪六十年代以来,计算机视觉与图像处理越来越受到人们的关注,并逐渐成为一门重要的学科领域。而作为它们的研究对象的数字图像,也因为它含有研究目标的丰富信息而成为越来越重要的研究对象。图像识别的目标是用计算机自动完成某些信息的处理,用来替代人工去处理图像分类及识别的任务。 手写数字识别是图像识别学科下的一个分支,是图像处理和模式识别领域研究的课题之一,由于其具有很强的实用性一直是多年来的研究热点。由于手写体数字的随意性很大,例如,笔画的粗细,字体的大小,倾斜等等都直接影响到字符的正确识别,所以手写体数字识别是一个很有挑战性的课题。在过去的数十年中,研究者们提出了许多的识别方法,取得了较大的成果。手写体数字识别实用性很强,在大规模数据统计(如例行年检,人口普查),财务,税务,邮件分拣等等应用领域中都有广阔的应用前景。本课题拟研究手写体数字识别的理论和方法,开发一个小型的手写体数字识别系统。 在研究手写体数字识别理论和方法的基础上,开发这样一个小型的手写体数字识别系统需要完成以下主要方面的研究与设计工作:手写数字绘制的问题、数字的预处理问题、特征提取问题、特征库的建立问题、数字识别问题。

开题报告-基于SVM的手写数字识别的应用与实现

毕业设计开题报告 计算机科学与技术 基于SVM的手写数字识别的应用与实现 一、综述本课题国内外研究动态,说明选题的依据和意义 阿拉伯数字作为唯一被世界各国通用的符号,是人类文明发展的标志之一,也是人类交流沟通的主要媒介。在人们日常生活当中,离不开数字的使用,我们每天都要进行大量的数字工作处理,比如邮政编码、统计报表、财务报表、银行汇款转账等等,如此繁琐的数字工作处理占去了我们很大一部分时间,空间。而对于,计算机大范围普及,人工智能高度发展的当今社会,利用手写数字识别系统代替人们进行这样繁重的手工劳动,备受国内外人士的高度重视。 由于手写数字识别本身的一些特点,对它的研究有及其重要的理论价值: ⑴阿拉伯数字是唯一被世界各国通用的符号,对手写体数字识别的研究基本上与文化背景无关,各地的研究工作者基于同一平台开展工作,有利于研究的比较和探讨。 ⑵手写数字识别应用广泛,如邮政编码自动识别,税表系统和银行支票自动处理等。这些工作以前需要大量的手工录入,投入的人力物力较多,劳动强度较大。手写数字识别的研究适应了无纸化办公的需要,能大大提高工作效率。 ⑶由于数字类别只有10个,较其他字符识别率较高,可用于验证新的理论和做深入的分析研究。许多机器学习和模式识别领域的新理论和算法都是先用手写数字识别进行检验,验证理论的有效性,然后才应用到更复杂的领域当中。这方面的典型例子就是人工神经网络和支持向量机(Support Vector Machine)。 ⑷手写数字的识别方法很容易推广到其它一些相关问题,如对英文之类拼音文字的识别。事实上,很多学者就是把数字和英文字母的识别放在一起研究的。 手写数字识别的一般原理为:首先把数字图像经过预处理,然后得到的数据进行特征提取或不用进行特征提取就可以直接输入识别器进行识别得到结果。手写数字识别的预处理通常包括数字图像的二值化处理、细化处理等步骤。数字图像的二值化处理是将上一步骤所得到的灰度数字图像转化为二值数字图像,即在数字图像中区分出字符和背景。二值化处理方法很多,但考虑到大量数字识别的需要,一般只能采用一维的阈值分割算法进行处理以获得二值化数字图像,预处理技术在当前比较成熟。 基于SVM的手写数字识别系统主要是利用支持向量机在识别领域良好的识别性能。对于一个完整的识别系统应包括从图像采集到得出识别结果的过程,由于本系统主要是用来检验支持向量机在手写数字识别系统中的应用,所以在本系统中图像采集、样本预处理等就不在

svm核函数matlab

clear all; clc; N=35; %样本个数 NN1=4; %预测样本数 %********************随机选择初始训练样本及确定预测样本******************************* x=[]; y=[]; index=randperm(N); %随机排序N个序列 index=sort(index); gama=23.411; %正则化参数 deita=0.0698; %核参数值 %thita=; %核参数值 %*********构造感知机核函数************************************* %for i=1:N % x1=x(:,index(i)); % for j=1:N % x2=x(:,index(j)); % K(i,j)=tanh(deita*(x1'*x2)+thita); % end %end %*********构造径向基核函数************************************** for i=1:N x1=x(:,index(i)); for j=1:N x2=x(:,index(j)); x12=x1-x2; K(i,j)=exp(-(x12'*x12)/2/(deita*deita)); End End %*********构造多项式核函数**************************************** %for i=1:N % x1=x(:,index(i)); % for j=1:N % x2=x(:,index(j)); % K(i,j)=(1+x1'*x2)^(deita); % end %end %*********构造核矩阵************************************ for i=1:N-NN1 for j=1:N-NN1 omeiga1(i,j)=K(i,j); end end

基于神经网络的手写数字识别系统的设计与实现

中南大学 本科生毕业论文(设计) 题目基于神经网络的手写数字 识别系统的设计与实现

目录 摘要 (Ⅰ) ABSTRACT (Ⅱ) 第一章绪论 (1) 1.1手写体数字识别研究的发展及研究现状 (1) 1.2神经网络在手写体数字识别中的应用 (2) 1.3 论文结构简介 (3) 第二章手写体数字识别 (4) 2.1手写体数字识别的一般方法及难点 (4) 2.2 图像预处理概述 (5) 2.3 图像预处理的处理步骤 (5) 2.3.1 图像的平滑去噪 (5) 2.3.2 二值话处理 (6) 2.3.3 归一化 (7) 2.3.4 细化 (8) 2.4 小结 (9) 第三章特征提取 (10) 3.1 特征提取的概述 (10) 3.2 统计特征 (10) 3.3 结构特征 (11) 3.3.1 结构特征提取 (11) 3.3.2 笔划特征的提取 (11) 3.3.3 数字的特征向量说明 (12) 3.3 知识库的建立 (12) 第四章神经网络在数字识别中的应用 (14) 4.1 神经网络简介及其工作原理 (14) 4.1.1神经网络概述[14] (14) 4.1.2神经网络的工作原理 (14) 4.2神经网络的学习与训练[15] (15) 4.3 BP神经网络 (16) 4.3.1 BP算法 (16) 4.3.2 BP网络的一般学习算法 (16)

4.3.3 BP网络的设计 (18) 4.4 BP学习算法的局限性与对策 (20) 4.5 对BP算法的改进 (21) 第五章系统的实现与结果分析 (23) 5.1 软件开发平台 (23) 5.1.1 MATLAB简介 (23) 5.1.2 MATLAB的特点 (23) 5.1.3 使用MATLAB的优势 (23) 5.2 系统设计思路 (24) 5.3 系统流程图 (24) 5.4 MATLAB程序设计 (24) 5.5 实验数据及结果分析 (26) 结论 (27) 参考文献 (28) 致谢 (30) 附录 (31)

svmtrain和svmpredict简介回归、分类

svmtrain和svmpredict简介 分类:SVM 本文主要介绍了SVM工具箱中svmtrain和svmpredict两个主要函数: (1)model= svmtrain(train_label, train_matrix, ['libsvm_options']); 其中: train_label表示训练集的标签。 train_matrix表示训练集的属性矩阵。 libsvm_options是需要设置的一系列参数,各个参数可参见《libsvm 参数说明.txt》,里面介绍的很详细,中英文都有的。如果用 回归的话,其中的-s参数值应为3。 model:是训练得到的模型,是一个结构体(如果参数中用到-v,得到的就不是结构体,对于分类问题,得到的是交叉检验下的平均分类准确 率;对于回归问题,得到的是均方误差)。 (2)[predicted_label, accuracy/mse,decision_values/prob_estimates] =svmpredict(test_label, test_matrix, model, ['libsvm_options']); 其中: test _label表示测试集的标签(这个值可以不知道,因为作预测的时候,本来就是想知道这个值的,这个时候,随便制定一个值就可以 了,只是这个时候得到的mse就没有意义了)。 test _matrix表示测试集的属性矩阵。 model 是上面训练得到的模型。 libsvm_options是需要设置的一系列参数。 predicted_label表示预测得到的标签。 accuracy/mse是一个3*1的列向量,其中第1个数字用于分类问题,表示分类准确率;后两个数字用于回归问题,第2个数字 表示mse;第三个数字表示平方相关系数(也就是说,如 果分类的话,看第一个数字就可以了;回归的话,看后两 个数字)。 decision_values/prob_estimates:第三个返回值,一个矩阵包含决策

手写体数字识别系统

石河子大学 信息科学与技术学院毕业论文 课题名称:手写体数字识别系统设计 学生姓名: 学号: 学院:信息科学与技术学院 专业年级:电子信息工程2007级 指导教师: 职称: 完成日期:二○一一年六月十一日

手写体数字识别系统设计 学生: 指导教师: [摘要] 随着科学技术的迅速发展,在邮政编码、统计报表、财务报表、银行票据等处理大量字符信息录入的场合,手写数字识别系统的应用需求越来越强烈,如何将数字方便、快速地输入到计算机中已成为关系到计算机技术普及的关键问题。本文设计实现了一个基于Matlab软件的手写体数字识别系统,采用模块化设计方法,编写了摄像头输入、直接读取图片、写字板输入三个模块,利用摄像头等工具,将以文本形式存在的手写体数字输入进计算机,完成对手写体数字图片的采集,并设计了一种手写数字识别方法,对手写体数字图像进行预处理、结构特征提取、分类识别,最终以文本形式输出数字,从而实现手写体数字的识别。 [关键词] 预处理,结构特征提取,分类识别,手写体数字识别 I

Handwritten Digit Recognition System Students: Teacher: Abstract:With the rapid development of science and technology, in zip code, statistics, reports, financial statements, Bank bills dealing with a large number of characters, such as information recorded occasions, handwritten digit recognition system of requirement has become stronger and stronger, how easily and quickly the number entered in the computer has become a key issue relates to the popularization of computer technology. This article design implementation has a based on Matlab software of handwriting body digital recognition system, used module of design method, write has camera entered, and directly read pictures, and write Board entered three a module, using camera, tools, will to text form exists of handwriting body digital entered into computer, completed on handwriting body digital pictures of collection, and design has a handwriting digital recognition method, on handwriting body digital image for pretreatment, and structure features extraction, and classification recognition, eventually to text form output digital, to implementation handwriting body digital of recognition. Key words: Pretreatment, structure feature extraction, classification and recognition, handwritten digit recognition. II

电脑双屏显示的设置方法,让你的教学更加得心应手

双屏显示设置,让你的演讲更加得心应手 用了多年的PPT,也给朋友们分享了很多次PPT使用中的一些技巧,但有一个问题一直困扰着我,也曾请教了很多高手,却也一直没有解决。这个问题就是,如何在使用投影仪放映幻灯片(PPT)时,我们能看到PPT备注里的内容,而其他人却看不到。(通常情况下,如果你想在投影时看到PPT备注里的话,那么其他同事/客户/学员也会一览无余,不看备注就可能遗漏掉很多我们曾经准备过怎么讲的很多内容,除非你对这个课程非常熟悉,或者你的记忆力非常强) 今天项目组开会的时候,无意中发现我的一位正在使用投影给我们讲东西的同事,居然实现了这个效果,当时我真是兴奋不已(后来这位同事告诉我,这叫“双屏显示”)。会后,在同事的“点拨”下,我终于把这个问题搞定了。现在我就用几张图例给大家介绍一下这个双屏显示的设置方式,希望对大家有所帮助:) 双屏显示设置主要有两个部分,第一部分桌面属性设置,第二部分为PPT设置。这两部分先后设置好后,直接放映就可以了。 Step.1.在桌面单击鼠标右键打开桌面属性(如下图),然后单击“设置”

Step.2.选择监视器2(如下图) Step.3.选中“将Windows 桌面扩展到该监视器上”,并“确定”。

到此桌面设置完成 Step.4.现在进行PPT放映设置。首先打开一个PPT文件,并单击窗口顶端的“幻灯片放映”(如下图)

Step.5.选择并打开“设置放映方式”(如下图) Step.6.打开设置放映方式的对话框(如下图),选择对话框右下的多监视器,并勾选其底部的“显示演示者视图”,然后选择适当的分辨率并最后“确定”(此时需确认你的电脑已经与投影连接,否则此时无法选定)

选取SVM中参数c和g的最佳值

写了个程序来选取SVM中参数c和g的最佳值. [写这个的目的是方便大家用这个小程序直接来寻找c和g的最佳值,不用再另外编写东西了.] 其实原本libsvm C语言版本中有相应的子程序可以找到最佳的c和g,需装载python语言然后用py 那个画图就可以找到最佳的c和g,我写了个matlab版本的.算是弥补了libsvm在matlab版本下的空缺. 测试数据还是我视频里的wine data. 寻找最佳c和g的思想仍然是让c和g在一定的范围里跑(比如 c = 2^(-5),2^(-4),...,2^(5),g = 2^(-5),2^(-4),...,2^(5)),然后用cross validation的想法找到是的准确率最高的c和g,在这里我做了一点修改(纯粹是个人的一点小经验和想法),我改进的是: 因为会有不同的c和g都对应最高的的准确率,我把具有最小c的那组c和g认为是最佳的c和g,因为惩罚参数不能设置太高,很高的惩罚参数能使得validation数据的准确率提高,但过高的惩罚参数c会造成过学习状态,反正从我用SVM到现在,往往都是惩罚参数c过高会导致最终测试集合的准确率并不是很理想.. 在使用这个程序时也有小技巧,可以先大范围粗糙的找比较理想的c和g,然后再细范围找更加理想的c和g. 比如首先让c = 2^(-5),2^(-4),...,2^(5),g = 2^(-5),2^(-4),...,2^(5)在这个范围找比较理想的c和g,如图:

====== 此时bestc = 0.5,bestg=1,bestacc = 98.8764[cross validation 的准确率] 最终测试集合的准确率Accuracy = 96.6292% (86/89) (classification) ====== 此时看到可以把c和g的范围缩小.还有步进的大小也可以缩小(程序里都有参数可以自己调节,也有默认值可不调节). 让c = 2^(-2),2^(-1.5),...,2^(4),g = 2^(-4),2^(-3.5),...,2^(4)在这个范围找比较理想的c 和g,如图: ============= 此时bestc = 0.3536,bestg=0.7017,bestacc = 98.8764[cross validation 的准确率] 最终测试集合的准确率Accuracy = 96.6292% (86/89) (classification) ===================上面第二个的测试的代码: 1.load wine_SVM;

手写体数字识别系统的设计与实现

大学生研究计划项目 论文报告 项目名称:_手写体数字识别系统的设计与实现 负责人:_________ _______________ 学院/专业:_____ ______ 学号:____ ________ 申请经费:_____ _________________ 指导教师:______ _______ 项目起止时间:2011年6月-2012年3月

摘要 手写体数字识别系统依托计算机应用软件为载体,利用C++程序设计的相关知识,运用模块设计等相关技术,最终完成手写体设计系统的程序综合设计。 关键字:手写体数字处理模式识别程序设计 一、论题概述 模式识别是六十年代初迅速发展起来的一门学科。由于它研究的是如何用机器来实现人(及某些动物)对事物的学习、识别和判断能力,因而受到了很多科技领域研究人员的注意,成为人工智能研究的一个重要方面。 字符识别是模式识别的一个传统研究领域。从50年代开始,许多的研究者就在这一研究领域开展了广泛的探索,并为模式识别的发展产生了积极的影响。 字符识别一般可以分为两类:1.联机字符识别;2.光学字符识别(Optical Chara- cter Recognition,OCR)或称离线字符识别。在联机字符识别中,计算机能够通过与计算机相连的输入设备获得输入字符笔划的顺序、笔划的方向以及字符的形状,所以相对OCR来说它更容易识别一些。但联机字符识别有一个重要的不足就是要求输入者必须在指定的设备上书写,然而人们在生活中大部分的书写情况是不满足这一要求的,比如人们填写各种表格资料,开具支票等。如果需要计算机去认识这些己经成为文字的东西,就需要OCR技术。比起联机字符识别来,OCR不要求书写者在特定输入设备上书写,它可以与平常一样书写,所以OCR 的应用更为广泛。OCR所使用的输入设备可以是任何一种图像采集设备,如CCD、扫描仪、数字相机等。通过使用这类采集设备,OCR系统将书写者已写好的文字作为图像输入到计算机中,然后由计算机去识别。由于OCR的输入只是简单的一副图像,它就不能像联机输入那样比较容易的从物理特性上获得字符笔划的顺序信息,因此OCR是一个更具挑战性的问题。 数字识别是多年来的研究热点,也是字符识别中的一个特别问题,它是本文研究的重点。数字识别在特定的环境下应用特别广泛,如邮政编码自动识别系统,税表和银行支票自动处理系统等。一般情况下,当涉及到数字识别时,人们往往要求识别器有很高的识别可靠性,特别是有关金额的数字识别时,如支票中填写

怎么设置双显示器显示

怎么设置双显示器显示 首先,检验笔记本是否具有双屏显示(DualView)功能。 打开“显示属性”对话框的“设置”项,如果出现有两个显示器图标,表明显卡支持双屏显示(DualView);如果没有两个显示器图标出现,只看到桌面的缩小图标,表示显卡不支持双屏显示(DualView)。 其次,将笔记本电脑扩展成双屏显示的设置如下: 将外接监视器连接在笔记本的VGA端口上,打开笔记本的“显示属性”对话框,选择“设置”选项,可以看到两个监视器的图标; 点击“高级”后选择“显示”卡,将笔记本的LCD设为主显示器,外接监视器设为次显示器,此时还可以分别设置主、次显示器颜色的深度、分辨率和刷新率,它们不必相同; 设置后返回到显示属性,此时监视器1为笔记本LCD,监视器2为外接CRT,鼠标点击其中一个显示器图标时,它被高亮显示,并同时显示其监视器的有关信息; 选中标有“将我的Windows桌面延伸至这个显示器上”的复选框,这时会弹出一个“兼容性警告”窗口,选择“确定”。 将外接CRT放置在办公桌的左方,LCD放置在右方,此时就将CRT(监视器2)排在左边,LCD(监视器1)排在右边。 设置完成后,主显示器LCD与原来一样,而外接显示器是空白的,此时就可以在两个显示器之间移动鼠标、拖动或拉伸窗口,也可以将应用程序移到外接显示器中执行。 补充 用笔记本演示幻灯片的方法: 笔记本外接显示器或投影仪的双屏显示(DualView),可以实现多显示、多任务的演示目的。一些软件也开始支持多显示、多任务,在教学和演讲中用到这些功能。 设置幻灯片的放映方式。 在菜单栏中点击“幻灯片放映”,选择“设置放映方式”项,弹出“设置放映方式”窗口; 在“多监视器”的“幻灯片放映显示于”中选择“监视器2”,是为了让幻灯片的内容在外接显示器中放映;

手写数字识别实践指导手册

手写数字系统实践指导手册 1 问题描述 设计一个简单的手写数字识别系统,能够识别手写输入的数字1-9并且能够识别选中的文本文件中的数字,应具有简单方便的操作界面,输入输出等。 1.1功能需求分析 通过分析,以及从用户的角度考虑,系统应该具有以下功能: (1)数字的手写输入。作为一个手写数字识别系统,首先应该能够让用户过绘制窗口进行数字绘制,系统得到用户的手写输入进行处理。 (2)直接选择文件。用户还可以选择系统中的文本文件进行处理。 (3)数据预处理。包括计算数据大小、二值化、格式化处理等。 (4)数字提取。将经过二值化后的图像中的个数字区域进行提取,只有能够将数字进行准确的提取,才能将其一一识别。 (5)基准库的选择与建立。选择一个可供系统训练和测试的样本库非常重要,本系统的训练集和测试集选择的是《机器学习实战》中所给的数据。 (6)识别数字。经过训练集进行训练后,使用knn算法对需要识别的数字识别。 2 数据集获取 ●任务要求: 从网上爬取或者下载适合进行手写数字识别系统的训练集和测试集 ●实践指导: 方式一:自己从网上找适合的数据下载 方式二:推荐数据集:“手写数字数据集的光学识别”一文中的数据集合,该文登载与2010年10月3日的UCI机器学习资料库中https://www.doczj.com/doc/fb13868225.html,/ml

3 功能设计与实现 3.1手写数字识别系统结构图: 图一:系统结构图 3.2识别用户选择手选文件功能设计与实现 ●任务要求: 用户可以自己从电脑中选择文本文件进行识别。 ●实践指导: KNN分类器的构造思路及原理如下: 1)选择训练集和测试集。系统所采用的数据集选用的是“手写数字数据集的光学识别”一文中的数据集合。0-9每个数字大约有200个训练数据20个测试数据。数字的文本格式如图所示。

基于libsvm的手写字体识别

基于libsvm的手写字体识别 程序: 用的是faruto大神的程序,在此做声明 程序有自己的注释 【思路】:整个程序的流程是:1、首先用遗传算法GA和交叉验证的方式,对参数c(损失函数系数)和参数g(核函数参数)进行寻优;2、然后将两个参数和训练样本进行训练:model = svmtrain(TrainLabel, TrainData, cmd);3、最后导入测试样本集进行测试:preTestLabel = svmpredict(TestLabel, TestData, model); 【注意:】训练和测试所使用的data和label都必须是doubel型,可以用double()函数或者是str2doubel进行转换。(不知道在哪里看到的) 如有疑问请咨询qq:778961303 -g r(gama):核函数中的gamma函数设置(针对多项式/rbf/sigmoid核函数) -c cost:设置C-SVC,e -SVR和v-SVR的参数(损失函数)(默认1) %% close all; clear; clc; format compact; %紧凑显示 %% 载入训练数据 [FileName,PathName,FilterIndex] = uigetfile( ... {'*.bmp';'*.jpg'},'请导入训练图片','*.bmp','MultiSelect','on'); %打开文件的导向操作 if ~FilterIndex return; end num_train = length(FileName); TrainData = zeros(num_train,16*16); TrainLabel = zeros(num_train,1); for k = 1:num_train pic = imread([PathName,FileName{k}]); %读取训练用的图片 pic = pic_preprocess(pic); %将图片变成16*16的矩阵 % imshow(pic); TrainData(k,:) = double(pic(:)'); %将图片改写成一个double类型的行向量 TrainLabel(k) = str2double(FileName{k}(1)); %图片的类标签 end %% 建立支持向量机

相关主题
文本预览
相关文档 最新文档