当前位置:文档之家› How boosting the margin can also boost classifier complexity

How boosting the margin can also boost classifier complexity

How boosting the margin can also boost classifier complexity
How boosting the margin can also boost classifier complexity

How Boosting the Margin Can Also Boost Classi?er Complexity

Lev Reyzin lev.reyzin@https://www.doczj.com/doc/6c3818017.html, Yale University,Department of Computer Science,51Prospect Street,New Haven,CT06520,USA

Robert E.Schapire schapire@https://www.doczj.com/doc/6c3818017.html, Princeton University,Department of Computer Science,35Olden Street,Princeton,NJ08540,USA

Abstract

Boosting methods are known not to usu-

ally over?t training data even as the size

of the generated classi?ers becomes large.

Schapire et al.attempted to explain this phe-

nomenon in terms of the margins the clas-

si?er achieves on training https://www.doczj.com/doc/6c3818017.html,ter,

however,Breiman cast serious doubt on this

explanation by introducing a boosting algo-

rithm,arc-gv,that can generate a higher

margins distribution than AdaBoost and yet

performs worse.In this paper,we take a

close look at Breiman’s compelling but puz-

zling results.Although we can reproduce his

main?nding,we?nd that the poorer per-

formance of arc-gv can be explained by the

increased complexity of the base classi?ers it

uses,an explanation supported by our exper-

iments and entirely consistent with the mar-

gins theory.Thus,we?nd maximizing the

margins is desirable,but not necessarily at

the expense of other factors,especially base-

classi?er complexity.

1.Introduction

The AdaBoost boosting algorithm(Freund& Schapire,1997)and most of its relatives produce clas-si?ers that classify by voting the weighted predictions of a set of base classi?ers which are generated in a se-ries of rounds.Thus,the size—and hence,naively, the apparent complexity—of the?nal combined clas-si?er used by such algorithms increases with each new round of boosting.Therefore,according to Occam’s razor(Blumer et al.,1987),the principle that less com-plex classi?ers should perform better,boosting should Appearing in Proceedings of the23rd International Con-ference on Machine Learning,Pittsburgh,PA,2006.Copy-right2006by the author(s)/owner(s).su?er from over?tting;that is,with many rounds of boosting,the test error should increase as the?nal classi?er becomes overly complex.Nevertheless,it has been observed by various authors(Breiman,1998; Drucker&Cortes,1996;Quinlan,1996)that boosting often tends to be resistant to this kind of over?tting, apparently in de?ance of Occam’s razor.That is,the test error of AdaBoost often tends to decrease well af-ter the training error is zero,and does not increase even after a very large number of rounds.1

Schapire et al.(1998)attempted to explain AdaBoost’s tendency not to over?t in terms of the margins of the training examples,where the margin is a quantity that can be interpreted as measuring the con?dence in the prediction of the combined classi?er.Giving both the-oretical and empirical evidence,they argued that with more rounds of boosting,AdaBoost is able to increase the margins,and hence the con?dence,in the predic-tions that are made on the training examples,and that this increase in con?dence translates into better per-formance on test data,even if the boosting algorithm is run for many rounds.

Although Schapire et al.backed up their arguments with both theory and experiments,Breiman(1999) soon thereafter presented experiments that raised im-portant questions about the margins explanation.Fol-lowing the logic of the margins theory,Breiman at-tempted to design a better boosting algorithm,called arc-gv,that would provably maximize the minimum margin of any training example.He then ran experi-ments comparing the performance of arc-gv and Ada-Boost using CART decision trees pruned to a?xed number of nodes as base classi?ers.He found that arc-gv did indeed produce uniformly higher margins than AdaBoost.However,contrary to what was apparently predicted by the margins theory,he found that his new 1However,in some of these cases,the test error has been observed to increase slightly after an extremely large number of rounds(Grove&Schuurmans,1998).

algorithm arc-gv performed worse on test data than AdaBoost in almost every case.Breiman concluded rather convincingly that his experiments put the mar-gins explanation into serious doubt and that a new understanding is needed.

In this paper,we take a close look at these compelling experiments to try to determine if they do in fact con-tradict the margins theory.In fact,the theory that was presented by Schapire et al.states that the gen-eralization error of the?nal combined classi?er can be upper bounded by a function that depends not only on the margins of the training examples,but also on the number of training examples and the complexity of the base classi?ers(where complexity might,for in-stance,be measured by VC-dimension or description length).Breiman was well aware of this dependence on the complexity of the base classi?ers and attempted to control for this factor in his experiments by always choosing decision trees of a?xed size.However,in our experiments,we?nd that there still remain important di?erences between the trees chosen by AdaBoost and arc-gv.Speci?cally,we?nd that the trees produced using arc-gv are considerably deeper,both in terms of maximum and average depth of the leaves.Intuitively, such deep trees are more prone to over?tting,and in-deed,it is clear that the space of decision trees of a given size is much more greatly constrained when a bound is placed on the depth of the leaves.Further-more,we?nd experimentally that the deeper trees gen-erated by arc-gv are measurably more prone to over-?tting than those of AdaBoost.The use of depth as a measure of tree complexity was also suggested in the work of Mason,Bartlett and Golea(2002)who worked on?nding more re?ned ways of measuring the complexity of a decision tree besides its overall size. Thus,we argue that the trees found by arc-gv have topologies that are more complex in terms of their ten-dency to lead to over?tting,and that this increase in complexity accounts for arc-gv’s inferior performance on test data,an argument that is consistent with the margins theory.

We then consider the use of other base classi?ers,such as decision stumps,whose complexity can be more tightly controlled.We again compare the performance of AdaBoost and arc-gv,and again?nd that Ada-Boost is superior,despite the fact that base classi-?ers of equivalent complexity are being used,and de-spite the fact that arc-gv tends to obtain a higher minimum margin than AdaBoost.Nevertheless,on close inspection,we see that the bounds presented by Schapire et al.are in terms of the entire distribution of margins,not just the minimum margin.When this overall margin distribution is examined,we?nd that although arc-gv obtains a higher minimum margin,the margin distribution as a whole is very much higher for AdaBoost.Thus,again,these experiments do not ap-pear to contradict the margins theory.

In sum,our experiments explore the complex interplay between margins,base classi?er complexity and sam-ple size that helps to determine how well a classi?er performs.We believe that understanding this interac-tion better might help us to design better algorithms. In a sense,our results con?rm Breiman’s point that maximizing margins is not enough;we also need to think about the other factors,especially base classi-?er complexity,and how that can be driven up by an over-aggressive attempt to increase the margins.Our results also explore the interplay between minimum margin and the overall margins distribution as seen in the way that arc-gv only increases the minimum mar-gin,but AdaBoost sometimes seems to do a better job with the overall distribution.

Our paper focuses only on Breiman’s arc-gv algo-rithm for maximizing margins although others have been proposed,for instance,by R¨a tsch and War-muth(2002),Grove and Schuurmans(1998)and Rudin,Schapire and Daubechies(2004).Moreover, Mason,Bartlett and Golea(2004)were able to show how the direct optimization of margins could indeed lead to improved performance.We also focus only on the theoretical bounds of Schapire et al.,although these have been greatly improved,for instance,by Koltchinskii and Panchenko(2002).Overviews on boosting are given by Schapire(2002)and Meir and R¨a tsch(2003).

After reviewing the margins theory in Section2,we be-gin our study in Section3with experiments intended to replicate those of Breiman.In Section4,we then present evidence that arc-gv produces higher margins by using more complex base classi?ers and that its poorer performance is consistent with the margins the-ory.In Section5,we try to control the complexity of the base classi?ers but?nd that this prevents arc-gv from having a uniformly higher margins distribution.

2.Algorithms and Theory

Boosting algorithms combine moderately inaccurate prediction rules and take their weighted majority vote to form a single classi?er.On each round,a boost-ing algorithm generates a new prediction rule to use and then places more weight on the examples classi-?ed incorrectly.Hence,boosting constantly focuses on classifying correctly the examples that are the hardest

Given:(x1,y1),...,(x m,y m)

where x i∈X,y i∈Y={?1,+1}

Initialize D1(i)=1/m.

For t=1,...,T:

?Train base learner using distribution D t.?Get base classi?er h t:X→{?1,+1}.?Chooseαt∈R.

?Update:

D t+1(i)=D t(i)exp(?αt y i h t(x i))

Z t

where Z t is a normalization factor(chosen so that

D t+1will be a distribution).

Output the?nal classi?er:

H(x)=sign T t=1αt h t(x)

Figure1.A generic algorithm equivalent to both AdaBoost and arc-gv,depending on howαt is selected.

to classify.

Figure1presents a generic algorithm that is equivalent to both AdaBoost and arc-gv,depending on the choice ofαt.Speci?cally,AdaBoost setsαto be

αt=1

2

ln

1+γt

1?γt

whereγt is the so-called edge of h t:

γt= i D t(i)y i h t(x i)

which is linearly related to h t’s weighted error. AdaBoost greedily minimizes a bound on the training error of the?nal classi?er.In particular,as shown by Schapire and Singer(1999),its training error is bounded by t Z t,so,on each round,it chooses h t and setsαto minimize Z t,the normalizing factor. Freund and Schapire(1997)derived an early bound on the generalization error of boosting,showing that Pr[H(x)=y]≤?Pr[H(x)=y]+?O T d m where Pr[·]denotes probability over the distribution that was assumed to have generated the training ex-amples,?Pr[·]denotes the empirical probability on the training sample,and d is the VC-dimension of the space of all possible base classi?ers.However,this bound becomes very weak as the number of rounds T increases,and predicts that AdaBoost will quickly over?t with only a moderate number of rounds.Early experiments(Breiman,1998;Drucker&Cortes,1996; Quinlan,1996),however,showed just the opposite, namely,that AdaBoost tends not to over?t. Schapire et al.(1998)attempted to explain why boost-ing often does not over?t using the concept of mar-gins on the training examples.The margin of example (x,y)depends on the votes h t(x)with weightsαt of all the hypotheses:

margin(x,y)=

y tαt h t(x)

tαt.

The magnitude of the margin represents the strength of agreement of the base classi?ers,and its sign indi-cates whether the combined vote produces a correct https://www.doczj.com/doc/6c3818017.html,ing the margins,Schapire et al.proved a bound not dependent on the number of boosting rounds.They showed that for anyθ,the generalization error is at most

?Pr[margin(x,y)≤θ]+?O d mθ2 .(1)

We can notice that this margins bound depends most heavily on the margins near the bottom of the distri-bution,since having generally high smallest margins allowsθto be small without?Pr[margin(x,y)≤θ]get-ting too large.

Following this logic,Breiman(1999)designed arc-gv to greedily maximize the minimum margin.Arc-gv follows the same algorithm as AdaBoost,except for settingαt di?erently:

αt=

1

2

log

1+γt

1?γt

?

1

2

log

1+?t

1??t

where?t is the minimum margin over all training ex-amples of the combined classi?er up to the current round:

?t=min

i y i

t?1s=1αs h s(x i)

t?1s=1αs

(It is understood that?1=0.)

Arc-gv has the property2that its minimum margin converges to the largest possible minimum margin, 2Meir and R¨a tsch(2003)claimed they can only prove this property when taking?t to be the maximum minimum margin over all previous rounds in the equation above;we nevertheless decided to use Breiman’s original formulation of arc-gv.

Table1.Dataset sizes for training and test.

cancer ion ocr17ocr49splice training630315100010001000

test6936500050002175 provided that the edges are su?ciently large,as will be the case if the base classi?er with largest edge is se-lected on every round.Thus,the margin theory would appear to predict that arc-gv’s performance should be better than AdaBoost’s,although as we here explore,

there are other factors at play.

3.Breiman’s Experiments

Breiman(1999)showed that it is possible for arc-gv to produce a higher margins distribution and yet per-form worse.He ran AdaBoost and arc-gv for100 rounds using pruned CART decision trees as base clas-si?ers.Each such tree was created by generating the full CART tree and pruning it to the best(i.e.,min-imum weighted error)k-leaf subtree.Breiman’s most compelling results were for trees of size k=16where he found that the margins distributions are uniformly higher for arc-gv than for AdaBoost.

We begin our study by replicating his results.How-ever,unlike Breiman,we did not see the margins of arc-gv being signi?cantly higher until we ran the al-gorithms for500rounds.Since Breiman’s critique of the margins theory is strongest when the di?erence in the margins distributions is clear,we focus only on the 500-round case with k=16.

We considered the following datasets:breast cancer, ionosphere,splice,ocr17,and ocr49,all available from the UCI repository.These include the same natural datasets as Breiman,except the sonar dataset,since it only includes208data points and thereby produces high variance in experiment.The splice dataset was modi?ed to collapse the two splice categories into one to create binary-labeled data.Also,ocr17and ocr49 contain randomly chosen subsets of the NIST database of handwritten digits consisting only of the digits1and 7,and4and9(respectively);in addition,the images have been scaled down to14×14pixels,each with only four intensity levels.Table1shows the num-ber of training and test examples used in each.The stark di?erences in the training and test sizes among the datasets occur because we used the same random splits Breiman used for ionosphere and breast cancer, but the additional datasets we used had many more data points,which allowed us to use larger sets for

0.2

0.4

0.6

0.8

1

1.2

0.5 0.6 0.7 0.8 0.9 1

c

u

m

u

l

a

t

i

v

e

f

r

e

q

u

e

n

c

y

margin

"AdaBoost_bc"

"Arc-gv_bc"

Figure2.Cumulative margins for AdaBoost and arc-gv for the breast cancer dataset after500rounds of boosting.

Table2.Test errors,averaged over10trials,of AdaBoost and arc-gv,run for500rounds using CART decision trees pruned to16leaf nodes as base classi?ers.

cancer ion ocr17ocr49splice AdaBoost 2.46 3.460.96 2.04 3.18 arc-gv 3.047.69 1.76 2.38 3.45

test data.In running our experiments,we followed Breiman’s technique of choosing an independently ran-dom subset for training data of sizes speci?ed in the table.All experiments were repeated on ten random partitions of the data,and,in most cases,the results were averaged.

Figure2shows the cumulative margins distribution af-ter500rounds for both AdaBoost and arc-gv on the breast cancer dataset.As observed by Breiman,arc-gv does indeed produce higher margins.These distribu-tions are representative of the margins distributions for the rest of the datasets.

Table2shows the test errors for each algorithm. Again,in conformity with Breiman,we see that the test errors of arc-gv are indeed higher than those of AdaBoost.

To further visualize what is happening during the run-ning of these algorithms,we plotted both the test error and minimum margin as a function of the number of rounds in Figures3and4.These results seem to be in direct contradiction to the margins theory.

Table 3.Test errors,minimum margins,and tree depths,averaged over 10trials,of AdaBoost and arc-gv,run for 500rounds using CART decision trees pruned to 16leaf nodes as base classi?ers.(For 100rounds,we also saw arc-gv producing deeper trees on average.)

test error

minimum margin

tree depth

arc-gv AdaBoost arc-gv AdaBoost arc-gv AdaBoost breast cancer 3.04 2.460.640.619.717.86ionosphere 7.69 3.460.970.778.897.23ocr 17 1.760.960.950.887.477.41ocr 49 2.38 2.040.530.497.39 6.70splice

3.45 3.180.460.427.12 6.67

0 1

2

3

4 5 6

7

8

200

400

600

800

1000

t e s t e r r o r

round

"AdaBoost_ocr49"

"Arc-gv_ocr49"

Figure 3.Test errors for AdaBoost and arc-gv for the ocr49dataset as a function of the number of rounds of boosting.

4.Tree Complexity

Can these results be reconciled with the margins ex-planation?In fact,according to Eq.(1),there are factors other than the minimum margin that need to be considered.Speci?cally,the generalization error of the combined classi?er depends both on the margins it generates,the size of the training sample,and on the complexity of the base classi?ers.Since the size of the sample is the same for both arc-gv and AdaBoost,after recording the margins,we should examine the complexity of the base classi?ers.

How can we measure the complexity of a decision tree?The most obvious measure is the number of leaves in the tree,which,like Breiman,we are already control-ling by always selecting trees with exactly 16leaves.However,even among all trees of ?xed size,we claim that there remain important topological di?erences that a?ect the tendency of the trees to over?t.In

0.1

0.2

0.3

0.4 0.5

0.6

0 200 400

600 800 1000

m i n i m u m m a r g i n

round

"AdaBoost_ocr49_margins""Arc-gv_ocr49_margins"

Figure 4.Minimum margins for AdaBoost and arc-gv for the ocr49dataset as a function of the number of rounds of boosting.

particular,deeper trees make predictions based on a longer sequence of tests and therefore intuitively tend to be more specialized than shallow trees and thus more likely to over?t.

In fact,arc-gv generates signi?cantly deeper trees than AdaBoost.Table 3shows the average depths of the trees (measured by the maximum depth of any leaf)in addition to the minimum margin and error rates for each algorithm.We also measured the running aver-age of the tree complexity of both algorithms as the number of rounds increased.The pattern in Figure 5is representative of the results for most datasets.In this ?gure,we can see that at the beginning of boost-ing,the depths of the trees generated by AdaBoost converge downward to a value,while the depths of the trees generated by arc-gv continue to increase for about 200rounds before leveling o?to a higher value.It is evident that while arc-gv has higher margins and

7

7.5

8

8.5

9

9.5

10

50 100 150 200

250 300 350 400 450 500

c u m u l a t i v e a v e r a g e t r e e

d

e p t h

round

"AdaBoost_bc""Arc-gv_bc"

Figure 5.Cumulative average of decision tree depth for AdaBoost and arc-gv for the breast cancer set for 500rounds of boosting.

Table 4.Percent test and training errors per generated tree,and their di?erences,averaged over all CART deci-sion trees generated in 500rounds of boosting,over 10trials.

AdaBoost arc-gv

test train di?test

train di?cancer 13.29.7 3.510.4 6.3 4.1ion 19.810.98.912.5 2.69.9ocr 17 5.6 3.7 1.9 2.60.6 2.0ocr 4924.821.1 3.721.917.8 4.1splice

27.723.4 4.3

23.9

19.2 4.7

higher error,it also produces,on average,deeper trees.Referring back to the bound in Eq.(1),we can up-per bound the VC-dimension d of a ?nite space of base classi?ers H by lg |H|.Thus,measuring com-plexity is essentially a matter of counting how many trees there are of bounded depth.Clearly,the more tightly bounded is the depth,the more constrained is the space of allowable trees,and the smaller will be the complexity measure lg |H|.This can be seen in Figure 6which shows the number of 16-leaf tree topologies of depth at most d ,as a function of d .So we are claiming that a possible explanation for the better performance of arc-gv despite its higher margins is that it achieves them by choosing from a greater set of base classi?ers.By the bound in Eq.(1),we can see that the higher depths of arc-gv trees can be a?ecting the generalization error even if the margins explanation holds.

2e+06

4e+06

6e+06

8e+06

1e+07

2 4 6

8 10 12 14 16

n u m b e r o f t r e e s

max depth

"maxdepth"

Figure 6.The number of tree topologies of depth at most d ,as a function of d .

In general,we expect the di?erence between training

and test errors to be greater when classi?ers are se-lected from a larger or more complex class.Thus,as a further indication that the deeper trees generated by arc-gv are more likely to cause over?tting,we can directly measure the di?erence between the test error and (unweighted)training error for the trees gener-ated by each algorithm.In Table 4,we can see that this di?erence is substantially higher for arc-gv than for AdaBoost in each of the datasets.This adds to the evidence that arc-gv is producing higher margins by using trees which are more complex in the sense that they have a greater tendency to over?t.3

The margins explanation basically says that when all other factors are equal,higher margins result in lower error.Given,however,that arc-gv tends to choose trees from a larger class,its higher test error no longer qualitatively contradicts the margin theory.

5.Controlling Classi?er Complexity

Knowing that arc-gv should produce a higher min-imum margin in the limit,and observing that with CART trees,arc-gv produces a uniformly higher dis-tribution than AdaBoost,we wished to ?x the com-plexity of the classi?ers both algorithms produce.The

3

While it is curious that the test errors of the individ-ual trees generated by arc-gv are on average lower than those for AdaBoost,it does not necessarily follow that the combination of trees generated by arc-gv should perform better than that produced by AdaBoost.For example,as will be seen in Section 5,decision stumps can work quite well as base classi?ers while individually having quite large test and training errors.

Table 5.Test errors,minimum margins,and average margins averaged over 100trials,of AdaBoost and arc-gv,run for 100rounds using decision stumps as weak learners.

test error

minimum margin

average margin

arc-gv AdaBoost arc-gv AdaBoost arc-gv AdaBoost cancer 4.15 4.29-.01-.06.07.27ionosphere 10.279.58.01.03.09.20ocr 17 1.12 1.10.03.06.14.36ocr 49 6.38 6.28-.02-.07.05.20splice

7.22 6.79-.01-.07.06.21

0.2

0.4

0.6

0.8 1

1.2

-0.1

0 0.1 0.2

0.3 0.4 0.5 0.6 0.7

c u m u l a t i v e f r e q u e n c y

margin

"AdaBoost_bc"

"Arc-gv_bc"

Figure 7.Cumulative margins for AdaBoost and arc-gv for the breast cancer dataset after 100rounds of boosting on decision stumps.

margins theory tells us that if arc-gv still continued to produce higher margins,it should also perform better.However,if we could see that arc-gv,with some class of weak learners,gets higher margins without gener-ating higher depth trees and still performs worse,it would put the margins theory into serious doubt.A natural class to look at is decision stumps,which are commonly used as base classi?ers in boosting and all have the same complexity by most any measure.Yet,looking at sample margins distributions that Ada-Boost and arc-gv generate,in Figure 7,we can see that while arc-gv usually does have a larger minimum margin,it does not have a higher margins distribu-tion overall.In fact,if we look at the average margins,AdaBoost’s are uniformly higher,and once again Ada-Boost on average performs better than arc-gv.These results are in Table 5.4

4

If arc-gv and AdaBoost run for more rounds,their mar-gins distributions begin to converge,as do their test errors.

Table 6.Percent test and training errors per generated stump,and their di?erences,averaged over all decision stumps generated in 100rounds of boosting,over 10trials.

AdaBoost arc-gv

test train di?test

train di?cancer 40.740.50.241.841.70.1ion 42.441.4 1.042.741.80.9ocr 1734.133.90.234.534.20.2ocr 4942.442.00.443.342.90.4splice

42.541.90.6

43.2

42.70.5

This result is both surprising and insightful.We would have expected arc-gv to have uniformly higher margins once more,but this time have lower test error.Yet,it seems that in the case where arc-gv could not pro-duce more complex trees,it sacri?ced on the margins distribution as a whole to have an optimal minimum margin in the limit.Knowing this,the margins theory would no longer predict arc-gv to perform better,and it does not.

This is because the margins bound,in Eq.(1),depends on setting θto be as low as possible while keeping the probability of being less than θlow.So if the margins of AdaBoost overtake the margins of arc-gv at the lower cumulative frequencies,then the theory would predict AdaBoost to perform better.This is exactly what hap-pens.

For comparison to Table 4,we give in Table 6the di?erences between the test and training errors of in-dividual decision stumps generated in 100rounds of AdaBoost and arc-gv.Consistent with theory,the dif-ferences in these test and training errors for individ-ual stumps are much smaller than they are for CART

Hence,in the few data sets where AdaBoost has slightly higher minimum margin after 100rounds,this di?erence disappears when boosting is run longer.

trees,re?ecting the lower complexity or tendency to over?t of stumps compared to trees.Since these dif-ferences are nearly identical for AdaBoost and arc-gv, this also suggests that the stumps generated by the two algorithms are roughly of the same complexity.

6.Discussion

In this paper,we have shown an alternative explana-tion for arc-gv’s poorer performance that is consistent with the margins theory.We can see that while having higher margins is desirable,we must pay attention to other factors that can also in?uence the generalization error of the classi?er.

Our experiments with decision stumps show us that it may be fruitful to consider boosting algorithms that greedily maximize the average or median mar-gin rather than the minimum one.Such an algorithm may outperform both AdaBoost and arc-gv. Finally,we leave open an interesting question.We have tried to keep complexity constant using base clas-si?ers other than decision stumps,and in every in-stance we have seen AdaBoost generate higher aver-age margins.Is there a base classi?er that has con-stant complexity,with which arc-gv will have an over-all higher margins distribution than AdaBoost?If such a base learner exists,it would be a good test of the margins explanation to see whether arc-gv would have lower error than AdaBoost as we predict.However, it is also possible that unless arc-gv“cheats”on com-plexity,it cannot generate overall higher margins than AdaBoost.

Acknowledgments

This material is based upon work supported by the Na-tional Science Foundation under grant numbers CCR-0325463and IIS-0325500.We also thank Cynthia Rudin for helpful discussions.

References

Blumer,A.,Ehrenfeucht,A.,Haussler,D.,&War-muth,M.K.(1987).Occam’s https://www.doczj.com/doc/6c3818017.html,rmation Processing Letters,24,377–380.

Breiman,L.(1998).Arcing classi?ers.The Annals of Statistics,26,801–849.

Breiman,L.(1999).Prediction games and arcing clas-si?ers.Neural Computation,11,1493–1517. Drucker,H.,&Cortes,C.(1996).Boosting decision

trees.Advances in Neural Information Processing Systems8(pp.479–485).

Freund,Y.,&Schapire,R.E.(1997).A decision-theoretic generalization of on-line learning and an application to boosting.Journal of Computer and System Sciences,55,119–139.

Grove,A.J.,&Schuurmans,D.(1998).Boosting in the limit:Maximizing the margin of learned ensem-bles.Proceedings of the Fifteenth National Confer-ence on Arti?cial Intelligence.

Koltchinskii,V.,&Panchenko,D.(2002).Empirical margin distributions and bounding the generaliza-tion error of combined classi?ers.The Annals of Statistics,30.

Mason,L.,Bartlett,P.L.,&Golea,M.(2002).Gen-eralization error of combined classi?ers.Journal of Computer and System Sciences,65,415–438.

Meir,R.,&R¨a tsch,G.(2003).An introduction to boosting and leveraging.In S.Mendelson and A.Smola(Eds.),Advanced lectures on machine learning(lnai2600),119–184.Springer.

Quinlan,J.R.(1996).Bagging,boosting,and C4.5. Proceedings of the Thirteenth National Conference on Arti?cial Intelligence(pp.725–730).

R¨a tsch,G.,&Warmuth,M.(2002).Maximizing the margin with boosting.15th Annual Conference on Computational Learning Theory(pp.334–350).

Rudin,C.,Schapire,R.E.,&Daubechies,I.(2004). Boosting based on a smooth margin.17th Annual Conference on Learning Theory(pp.502–517).

Schapire,R.E.(2002).The boosting approach to ma-chine learning:An overview.Nonlinear Estimation and Classi?cation.Springer.

Schapire,R.E.,Freund,Y.,Bartlett,P.,&Lee,W.S. (1998).Boosting the margin:A new explanation for the e?ectiveness of voting methods.The Annals of Statistics,26,1651–1686.

Schapire,R.E.,&Singer,Y.(1999).Improved boost-ing algorithms using con?dence-rated predictions. Machine Learning,37,297–336.

Matlab第十章作业

Matlab 第十章作业 10.4考虑一个单位负反馈控制系统,其前向通道传递函数为: )5s (s 1)s (G 20+= 试应用Bode 图法设计一个超前校正装置)1 1()(++=Ts Ts K s G c c αα 使得校正后系统的相角裕度 50=γ,幅值裕度dB 10K g ≥,带宽 s /rad 2~1b =ω。其中,10<<α。试问已校正系统的谐振峰值r M 和谐振角频率 r ω的值各为多少? 解: ① 先建立超前校正函数fg_lead_pm (wc 未知) 函数语句如下 function [ngc,dgc]=fg_lead_pm(ng0,dg0,Pm,w) [mu,pu]=bode(ng0,dg0,w); [gm,pm,wcg,wcp]=margin(mu,pu,w); alf=ceil(Pm-pm+5); phi=(alf)*pi/180; a=(1+sin(phi))/(1-sin(phi)) ; a1=1/a%求参数α dbmu=20*log10(mu); mm=-10*log10(a); wgc=spline(dbmu,w,mm); T=1/(wgc*sqrt(a)); ngc=[a*T,1]; dgc=[T,1]; ② 建立M 文件l104其语句如下 ng0=[1];dg0=conv([1,0],conv([1,0],[1,5])); t=[0:0.01:5]; w=logspace(-3,2); g0=tf(ng0,dg0) b1=feedback(g0,1)%校正前系统闭环传函 [gm,pm,wcg,wcp]=margin(g0)%校正前参数 Pm=50; [ng1,dg1]=fg_lead_pm(ng0,dg0,Pm,w);%利用超前校正进行校正 g1=tf(ng1,dg1);%校正环节传递函数 g2=g0*g1;%校正后前向通道传函 [gm1,pm1,wcg1,wcp1]=margin(g2)%校正后系统参数 km1=20*log(gm1)%校正后系统幅值裕度工程表示 b2=feedback(g2,1);%校正后系统闭环传函

交易开拓者函数一览表文华对照

交易开拓者函数一览表(文华对照) 交易开拓者文华 数学函数 绝对值Abs ABS(X) 反余弦值Acos ACOS(X) 反双曲余弦值Acosh 反正弦值Asin ASIN(X) 反双曲正弦值Asinh 反正切值Atan ATAN(X) 给定的X及Y坐标值的反正切值Atan2 反双曲正切值Atanh 沿绝对值增大方向按基数舍入Ceiling 从给定数目的对象集合中提取若 Combin 干对象的组合数 余弦值Cos COS(X) 双曲余弦值Cosh 余切值Ctan 沿绝对值增大方向取整后最接近 Even 的偶数 e的N次幂Exp EXP(X) 数的阶乘Fact 沿绝对值减少的方向去尾舍入Floor

实数舍入后的小数值FracPart 实数舍入后的整数值IntPart 自然对数Ln LN(X) 对数Log LOG(X) 余数Mod MOD(A,B) 负绝对值Neq 指定数值舍入后的奇数Odd 返回PI Pi 给定数字的乘幂Power POW(A,B) 随机数Rand 按指定位数舍入Round 靠近零值,舍入数字RoundDown 远离零值,舍入数字RoundUp 数字的符号Sign SGN(X) 正弦值Sin 双曲正弦值Sinh SIN(X) 平方Sqr SQUARE(X) 正平方根Sqrt SQRT(X) 正切值Tan TAN(X) 双曲正切值Tanh 取整Trunc INTPART(X) 字符串函数

测试是否相同Exact 返回字符串中的字符数Len 大写转小写Lower 数字转化为字符串Text 取出文本两边的空格Trim 小写转大写Upper 文字转化为数字Value 颜色函数 黑色Black COLORBLACK 蓝色Blue COLORBLUE 青色Cyan COLORCYAN 茶色DarkBrown 深青色DarkCyan 深灰色DarkGray 深绿色DarkGreen 深褐色DarkMagenta 深红色DarkRed 默认颜色DefaultColor 绿色Green COLORGREEN 浅灰色LightGray COLORLIGHTGREY 紫红色Magenta COLORMAGENTA 红色Red COLORRED

HALCON 算子函数 整理 1-19章

halcon算子中文解释 comment ( : : Comment : ) 注释语句 exit ( : : : ) 退出函数 open_file ( : : FileName, FileType : FileHandle ) 创建('output' or 'append' )或者打开(output )文本文件 fwrite_string ( : : FileHandle, String : ) 写入string dev_close_window ( : : : ) 关闭活跃的图形窗口。 read_image ( : Image : FileName : ) ;加载图片 get_image_pointer1 ( Image : : : Pointer, Type, Width, Height ) 获得图像的数据。如:类型(= ' 字节',' ' ',uint2 int2 等等) 和图像的尺寸( 的宽度和高度) dev_open_window( : :Row,Column,WidthHeight,Background :WindowHandle ) 打开一个图形的窗口。 dev_set_part ( : : Row1, Column1, Row2, Column2 : ) 修改图像显示的位置 dev_set_draw (’fill’)填满选择的区域 dev_set_draw (’margin’)显示的对象只有边缘线, dev_set_line_width (3) 线宽用Line Width 指定 threshold ( Image : Region : MinGray, MaxGray : ) 选取从输入图像灰度值的g 满足下列条件:MinGray < = g < = MaxGray 的像素。 dev_set_colored (number) 显示region 是用到的颜色数目 dev_set_color ( : : ColorName : ) 指定颜色 connection ( Region : ConnectedRegions : : ) 合并所有选定像素触摸相互连通区 fill_up ( Region : RegionFillUp : : ) 填补选择区域中空洞的部分 fill_up_shape ( Region : RegionFillUp : Feature, Min, Max : ) select_shape ( Regions : SelectedRegions : Features, Operation, Min, Max : ) 选择带有某些特征的区域,Operation 是运算,如“与”“或” smallest_rectangle1 ( Regions : : : Row1, Column1, Row2, Column2 ) 以矩形像素坐标的角落,Column1,Row2(Row1,Column2) 计算矩形区域( 平行输入坐标轴) 。 dev_display ( Object : : : ) 显示图片 disp_rectangle1 ( : : WindowHandle, Row1, Column1, Row2, Column2 : ) 显示的矩形排列成的。disp_rectangle1 显示一个或多个矩形窗口的产量。描述一个矩形左上角(Row1,Column1) 和右下角(Row2,Column2) 。显示效果如图1. texture_laws ( Image : ImageT exture : FilterTypes, Shift, FilterSize : ) texture_laws 实行纹理变换图像FilterTypes: 预置的过滤器Shift :减少灰度变化FilterSize :过滤的尺寸 mean_image ( Image : ImageMean : MaskWidth, MaskHeight : ) 平滑图像, 原始灰度值的平均数MaskWidth: 过滤器的宽度面具 bin_threshold ( Image : Region : : ) 自动确定阈值 Region: 黑暗的区域的图像 dyn_threshold ( OrigImage, ThresholdImage : RegionDynThresh : Offset, LightDark : ) 比较两个像素的图像像素RegionDynThresh(Out) 分割区域Offset: 减少噪音引起的问题LightDark 提取光明、黑暗或类似的地方? dilation_circle ( Region : RegionDilation : Radius : ) 扩张有一个圆形结构元素的地区Radius 圆半径 complement ( Region : RegionComplement : : ) 返还补充的区域 reduce_domain ( Image, Region : ImageReduced : : ) 减少定义领域的图像

MATLAB实验报告7.2.4

扬州大学实验报告纸 实验名称:7.2 控制原理仿真实验 班级:建电1102 姓名:葛嘉新 学号:111705205 7.2.4 控制系统的博德图 1.实验目的 (1)利用计算机作出开环系统的博德图; (2)观察记录控制系统的开环频域性能; (3)控制系统的开环频率特性分析。 2.实验步骤 (1)运行MATLAB (2)练习相关m 函数 1)博德图绘图函数 bode(sys) bode(sys,{wmin,wmax}) bode(sys,w) [m,p,w]=bode(sys) 函数功能:对数频率特性作图函数,即博德图作图。 2)对数分度函数 logspace(d1,d2) logspace(d1,d2,n) 函数功能:产生对数分度向量。 3)稳定裕度函数 margin(sys) [Gm,Pm,wg,wp]=margin(sys) [Gm,Pm,wg,wp]=margin(m,p,w) 函数功能:计算系统的稳定裕度、相位裕度Gm 和幅值裕度Pm 。 3.实验内容 (1)221 (s), 21 G T S TS ζ= ++{0.1 2,1,0.5,0.1,0.01T ζ== (分别作图并保持) >> n=1; >> d1=[0.01 0.4 1];d2=[0.01 0.2 1];d3=[0.01 0.1 1]; >> d4=[0.01 0.02 1];d5=[0.01 0.002 1]; %建立系统模型 >> bode(n,d1);hold on; %作博德图,并保持作图 >> bode(n,d2);bode(n,d3);bode(n,d4);bode(n,d5) >> grid %各系统博德图和网格线,如图1 -80-60-40-2002040M a g n i t u d e (d B ) 10 -1 10 10 1 10 2 10 3 -180-135-90-45 0P h a s e (d e g ) Bode Diagram Frequency (rad/sec) 图1 图1说明: 幅频特性中,ω=10时,最小的是ζ=2,最大的是ζ=0.01。相频特性中,ω=10时,越变最小的是ζ=2,越变最大的是ζ=0.01。 (2) 31.6 (s)(0.01+1)(0.1s+1) G s s = 要求: 1)作博德图,在曲线上标出: 幅频特性——初始段斜率、高频段斜率、开环截止频率、中频段穿越斜率。 相频特性——低频段渐进相位角、高频段渐进相位角、-180°线的穿越频率。 2)由稳定裕度命令计算系统的稳定裕度Lg 和γc ,并确定系统的稳定性; 3)在图上作近似折线特性相比较。 >> n=31.6;d=conv([0.01 1 0],[0.1 1]); >> sys=tf(n,d); >> margin(sys) >> hold on >> grid %博德图如下页图2

matlab的实验指导

实验一、MATLAB 语言的符号运算 1、实验目的 (1)学习MATLAB 语言的基本符号运算; (2)学习MATLAB 语言的矩阵符号运算; 2、实验内容 (1)基本符号运算 1) 符号微分、积分 syms t f1=sin (2*t); df1=diff(f1) if1=int (f1) 2) 泰勒级数展开 tf1=taylor (f1,8) 3) 符号代数方程求解 syms a b c x; f=a*x^2+b*x+c; ef=solve (f) 4) 符号微分方程求解 f=’D2x+2*Dx+10*x=0’;g=’Dx(0)=1,x(0)=0’; dfg=dsolve(f,g) 求满足初始条件的二阶常系数齐次微分方程的特解: 2|,4|,020'022-===++==t t s s s dt ds dt s d 5) 积分变换 syms t f1=exp(-2*t)*sin (5*t)

F1=laplace(f1) F2=ilaplace(F1) (2) 符号矩阵运算 1)创建与修改符号矩阵 G1=sym(‘[1/(s+1),s/(s+1)/(s+2);1/(s+1)/(s+2),s/(s+2)]’) G2=subs(G1,G1(2,2),’0’) G3=G1(1,1) 2)常规符号运算 syms s d1=1/(s+1);d2=1/(s+2);d=d1*d2 ad=sym(‘[s+1 s;0 s+2]’); G=d*ad n1=[1 2 3 4 5];n2=[1 2 3]; p1=poly2sym (n1);p2=poly2sym(n2); p=p1+p2 pn=sym2poly(p) 实验二、控制系统的阶跃响应 一、实验目的 (1)观察学习控制系统的单位阶跃响应; (2)记录单位阶跃响应曲线 (3)观察时间响应分析的一般方法 二、实验步骤 (1)运行MATLAB

自控讲义(完全版)

1. 成绩: 报告50%:最多三页,不许打印。注意格式填写全,否则扣分;主体部分包括: 实验目的,实验内容,源程序及必要的注释,实验总结) 实验50%:(1)纪律10%, ⑵回答问题10%:①必答题,抽到必答。主要考察考察预习情况和对 相应理论掌握情况 ②选答题,答对加分,答错不扣分。与课堂内容有 关,略有深度,依学号末位随机抽取。每次人数不 定。 ⑶实验能力30%:能独立完成实验,且完成用时较少者分最高。 2. 交报告时间:二次上课时交第一次报告,依此类推,最后一次由班长收齐交到 我办公室教6-816 研究自控课程的意义:苏27,901雷达,孵化器 实验一 系统的动态响应 一、实验目的 1、掌握一阶系统和二阶系统的非周期信号响应。 2、理解二阶系统的无阻尼、欠阻尼、临界阻尼和过阻尼。 3、掌握分析系统稳定性、瞬态过程和稳态误差的方法。 4、理解高阶系统的主导极点对系统特性的影响。 5、理解系统的零点对系统动态特性的影响。 二、实验内容(讲一个(M 文件和SIMULINK 两种方法),练一个,报告上写一个) 1、(讲解演示) 系统框图如下所示, (1) 若2()210 G s s s =++,求系统的单位阶跃响应。 (2)分析系统的稳定性。(举例说明研究稳定性的意义,判断准则) 我们采用matlab 进行分析步骤 (1)采用M 文件求解

步骤:a.新建一个m文件 b.输入系统的传递函数 c.分析系统的稳定性(根据线性系统稳定的充分必要条件:闭环系统特征方程D(s)=0的所有根均具有负实部,只要求出系统极点,由其实部可得到系统稳定性) d.输入为单位阶跃信号时,求系统的响应 example1.m: numg=[10] %分子多项式 deng=[1 2 10] %分母多项式 sys1=tf(numg,deng) %tf函数构造开环传递函数G(s) sys=feedback(sys1,1) % 求得闭环传递函数 pole(sys) % 求系统极点,即特征方程的根 step(sys) % 求阶跃响应 grid on 运行结果:ans = 极点 -1.0000 + 4.3589i -1.0000 - 4.3589i

自动控制原理常用名词解释

词汇 第一章 自动控制 ( Automatic Control) :是指在没有人直接参与的条件下,利用控制装置使被控对象的某些物理量(或状态)自动地按照预定的规律去运行。 开环控制 ( open loop control ):开环控制是最简单的一种控制方式。它的特点是,按照控制信息传递的路径,控制量与被控制量之间只有前向通路而没有反馈通路。也就是说,控制作用的传递路径不是闭合的,故称为开环。 闭环控制 ( closed loop control) :凡是将系统的输出量反送至输入端,对系统的控制作用产生直接的影响,都称为闭环控制系统或反馈控制 Feedback Control 系统。这种自成循环的控制作用,使信息的传递路径形成了一个闭合的环路,故称为闭环。 复合控制 ( compound control ):是开、闭环控制相结合的一种控制方式。 被控对象:指需要给以控制的机器、设备或生产过程。被控对象是控制系统的主体,例如火箭、锅炉、机器人、电冰箱等。控制装置则指对被控对象起控制作用的设备总体,有测量变换部件、放大部件和执行装置。 被控量 (controlled variable ) :指被控对象中要求保持给定值、要按给定规律变化的物理量。被控量又称输出量、输出信号。 给定值 (set value ) :是作用于自动控制系统的输入端并作为控制依据的物理量。给定值又称输入信号、输入指令、参考输入。 干扰 (disturbance) :除给定值之外,凡能引起被控量变化的因素,都是干扰。干扰又称扰动。 第二章 数学模型 (mathematical model) :是描述系统内部物理量(或变量)之间动态关系的数学表达式。 传递函数 ( transfer function) :线性定常系统在零初始条件下,输出量的拉氏变换与输入量的拉氏变换之比,称为传递函数。 零点极点 (z ero and pole) :分子多项式的零点(分子多项式的根)称为传递函数的零点;分母多项式的零点(分母多项式的根)称为传递函数的极点。 状态空间表达式 (state space model) :由状态方程与输出方程组成,状态方程是各状态变量的一阶导数与状态、输入之间的一阶微分方程组。输出方程是系统输出与状态、输入之间的关系方程。 结构图 (block diagram) :将传递函数与第一章介绍的定性描述系统的方框图结合起来,就产生了一种描述系统动态性能及数学结构的方框图,称之为系统的动态结构图。 信号流图 (signal flow diagram) :是表示复杂控制系统中变量间相互关系的另一种图解法,由节点和支路组成。 梅逊公式 (Mason's gain formula) :利用梅逊增益公式,可以直接得到系统输出量与输入变量之间的传递函数。 第三章 时域 (time domain) :一种数学域,与频域相区别,用时间 t 和时间响应来描述系统。 一阶系统 ( first order system) :控制系统的运动方程为一阶微分方程,称为一阶系统。 二阶系统 ( s econd order system) :控制系统的运动方程为二阶微分方程,称为二阶系统。 单位阶跃响应 ( unit step response) :系统在零状态条件下,在单位阶跃信号作用下的响应称单位阶跃响应。 阻尼比ζ (damping ratio) :与二阶系统的特征根在 S 平面上的位置密切相关,不同阻尼比

vf常用函数列表

? 在下一行显示表达式串 ?? 在当前行显示表达式串 @... 将数据按用户设定的格式显示在屏幕上或在打印机上打印 ACCEPT 把一个字符串赋给内存变量 APPEND 给数据库文件追加记录 APPEND FROM 从其它库文件将记录添加到数据库文件中 AVERAGE 计算数值表达式的算术平均值 BROWSE 全屏幕显示和编辑数据库记录 CALL 运行内存中的二进制文件 CANCEL 终止程序执行,返回圆点提示符 CASE 在多重选择语句中,指定一个条件 CHANGE 对数据库中的指定字段和记录进行编辑 CLEAR 清洁屏幕,将光标移动到屏幕左上角 CLEAR ALL 关闭所有打开的文件,释放所有内存变量,选择1号工作区 CLEAR FIELDS 清除用SET FIELDS TO命令建立的字段名表 CLEAR GETS 从全屏幕READ中释放任何当前GET语句的变量CLEAR MEMORY 清除当前所有内存变量 CLEAR PROGRAM 清除程序缓冲区 CLEAR TYPEAHEAD 清除键盘缓冲区 CLOSE 关闭指定类型文件 CONTINUE 把记录指针指到下一个满足LOCATE命令给定条件

的记录,在LOCATE命令后出现。无LOCATE则出错 COPY TO 将使用的数据库文件复制另一个库文件或文本文件 COPY FILE 复制任何类型的文件 COPY STRUCTURE EXTENED TO 当前库文件的结构作为记录,建立一个新的库文件 COPY STRUCTURE TO 将正在使用的库文件的结构复制到目的库文件中 COUNT 计算给定范围内指定记录的个数 CREATE 定义一个新数据库文件结构并将其登记到目录中 CREATE FROM 根据库结构文件建立一个新的库文件 CREATE LABEL 建立并编辑一个标签格式文件 CREATE REPORT 建立宾编辑一个报表格式文件 DELETE 给指定的记录加上删除标记 DELETE FILE 删除一个未打开的文件 DIMENSION 定义内存变量数组 DIR 或DIRECTORY 列出指定磁盘上的文件目录 DISPLAY 显示一个打开的库文件的记录和字段 DISPLAY FILES 查阅磁盘上的文件 DISPLAY HISTORY 查阅执行过的命令 DISPLAY MEMORY 分页显示当前的内存变量 DISPLAY STATUS 显示系统状态和系统参数 DISPLAY STRUCTURE 显示当前书库文件的结构

控制系统的波得图

实验5 控制系统的波得图 一.实验目的 1.利用计算机作出开环系统的波得图; 2. 观察记录控制系统的开环频域性能; 3.控制系统的开环频率特性分析。 二.实验步骤 1.在Windows界面上用鼠标双击matlab图标,即可打开MATLAB命令平台。 2. 练习相关M函数 波德图绘图函数: bode(sys) bode(sys,{wmin,wmax}) bode(sys,w) [m,p,w]=bode(sys) 函数功能:对数频率特性作图函数,即波得图作图。 格式1:给定开环系统的数学模型对象sys作波得图,频率向量w自动给出。 格式2:给定变量w的绘图区间为{wmin,wmax}。 格式3:频率向量w由人工给出。w的单位为[弧度]/秒,可以由命令logspace 得到对数等分的w值。 格式3:返回变量格式,不作图。 m为频率特性G(jω)的幅值向量,p为频率特性的G(jω)幅角向量,w为频率向量。 例如,系统开环传递函数为 作图程序为 num=[10]; den=[1 2 10]; bode(num,den); 或者给定人工变量 w=logspace(-1,1,32); bode(num,den,w); 对数分度函数: logspace(d1,d2) logspace(d1,d2,n) 函数功能:产生对数分度向量。 格式1:从10d1到10d2之间作对数等分分度,产生50个元素的对数等间隔向量。 格式2:从10d1到10d2之间作对数等分分度,给定等分数n。 半对数绘图函数: semilogx(…) 函数功能:半对数绘图命令。

使用格式:横坐标为对数等分分度,其它与plot()命令的使用格式相同。 对于上述系统作对数幅频特性。程序为 w=logspace(-1,1,32); % w 范围和点数n mag=10./((i*w).^2+2.*(i*w)+10); % 计算模值 L=20*log(abs(mag)); % 模取对数 semilogx(w,L); % 半对数作图 grid % 画网格线 稳定裕度函数: margin(sys) [Gm,Pm,wg,wp]= margin(sys) [Gm,Pm,wg,wp]= margin(m,p,w) 函数功能:计算系统的稳定裕度,相位裕度Gm 和幅值裕度Pm 。 格式1:给定开环系统的模型对象sys 作波得图,并在图上标注幅值裕度Gm 和 对应的频率wg ,相位裕度Pm 和对应的频率wp 。 格式2:返回变量格式,不作图。返回幅值裕度Gm 和对应的频率wg ,相位裕度 Pm 和对应的频率wp 。 格式3:给定频率特性的参数向量,幅值m ,相位p 和频率w ,由插值法计算幅值裕度Gm 和对应的频率wg ,相位裕度Pm 和对应的频率wp 。 三.实验内容 1. ) 11.0)(101.0(6.31)(1++= s s s s G (1) 作波得图,在曲线上标出: 幅频特性--初始段斜率.高频段斜率.开环截止频率 相频特性---180°线的穿越频率。 num=[31.6]; den=[0.001 0.11 1 0]; bode(num,den); grid

MATLAB命令大全--以功能进行分类

管理命令和函数 help 在线帮助文件 doc 装入超文本说明 what M、MAT、MEX文件的目录列表type 列出M文件 lookfor 通过help条目搜索关键字 which 定位函数和文件 Demo 运行演示程序 Path 控制MATLAB的搜索路径 管理变量和工作空间 Who 列出当前变量 Whos 列出当前变量(长表) Load 从磁盘文件中恢复变量 Save 保存工作空间变量 Clear 从内存中清除变量和函数 Pack 整理工作空间内存 Size 矩阵的尺寸 Length 向量的长度 disp 显示矩阵或 与文件和*作系统有关的命令 cd 改变当前工作目录 Dir 目录列表 Delete 删除文件 Getenv 获取环境变量值 ! 执行DOS*作系统命令 Unix 执行UNIX*作系统命令并返回结果Diary 保存MATLAB任务 控制命令窗口 Cedit 设置命令行编辑 Clc 清命令窗口 Home 光标置左上角 Format 设置输出格式 Echo 底稿文件内使用的回显命令 more 在命令窗口中控制分页输出

启动和退出MATLAB Quit 退出MATLAB Startup 引用MATLAB时所执行的M文件 Matlabrc 主启动M文件 一般信息 Info MATLAB系统信息及Mathworks公司信息Subscribe 成为MATLAB的订购用户 hostid MATLAB主服务程序的识别代号 Whatsnew 在说明书中未包含的新信息 Ver 版本信息 *作符和特殊字符 + 加 —减 * 矩阵乘法 .* 数组乘法 ^ 矩阵幂 .^ 数组幂 / 左除或反斜杠 / 右除或斜杠 ./ 数组除 Kron Kronecker张量积 : 冒号 ( ) 圆括号 [ ] 方括号 . 小数点 .. 父目录 …继续 , 逗号 ; 分号 % 注释 ! 感叹号 …转置或引用 = 赋值 = = 相等 < > 关系*作符

Maximum Margin Classifier_1

? Display Equation with MathJaX Acrobat meets Embedding ? 支持向量机: Maximum Margin Classifier by pluskid, on 2010-09-08, in Machine Learning 86 comments 本文是“支持向量机系列”的第一篇,参见本系列的其他文章。 支持向量机即 Support Vector Machine,简称 SVM 。我最开始听说这头机器的名号的时候,一种神秘感就油然而 生,似乎把 Support 这么一个具体的动作和 Vector 这么一个抽象的概念拼到一起,然后再做成一个 Machine ,一听 就很玄了! 不过后来我才知道,原来 SVM 它并不是一头机器,而是一种算法,或者,确切地说,是一类算法,当然,这样抠字 眼的话就没完没了了,比如,我说 SVM 实际上是一个分类器 (Classifier) ,但是其实也是有用 SVM 来做回归 (Regression) 的。所以,这种字眼就先不管了,还是从分类器说起吧。 SVM 一直被认为是效果最好的现成可用的分类算法之一(其实有很多人都相信,“之一”是可以去掉的)。这里“现成 可用”其实是很重要的,因为一直以来学术界和工业界甚至只是学术界里做理论的和做应用的之间,都有一种“鸿沟”, 有些很 fancy 或者很复杂的算法,在抽象出来的模型里很完美,然而在实际问题上却显得很脆弱,效果很差甚至完全 fail 。而 SVM 则正好是一个特例——在两边都混得开。 好了,由于 SVM 的故事本身就很长,所以废话就先只说这么多了,直接入题吧。当然,说是入贴,但是也不能一上 来就是 SVM ,而是必须要从线性分类器开始讲。这里我们考虑的是一个两类的分类问题,数据点用 $x$ 来表示,这 是一个 $n$ 维向量,而类别用 $y$ 来表示,可以取 1 或者 -1 ,分别代表两个不同的类(有些地方会选 0 和 1 ,当然其实分类问题选什么都无所谓,只要是两个不同的数字即可,不过这里选择 +1 和 -1 是为了方便 SVM 的推导,后面就会明了了)。一个线性分类器就是要在 $n$ 维的数据空间中找到一个超平面,其方程可以表示为 \[ w^Tx + b = 0 \] 一个超平面,在二维空间中的例子就是一条直线。我们希望的是,通过这个超平面可以把两类数据分隔开来,比如,在超平面一边的数据点所对应的 $y$ 全是 -1 ,而在另一边全是 1 。具体来说,我们令 $f(x)=w^Tx + b$ ,显然,如果 $f(x)=0$ ,那么 $x$ 是位于超平面上的点。我们不妨要求对于所有满足 $f(x)<0$ 的点,其对应的$y$ 等于 -1 ,而 $f(x)>0$ 则对应 $y=1$ 的数据点。当然,有些时候(或者说大部分时候)数据并不是线性可分的,这个时候满足这样条件的超平面就根本不存在,不过关于如何处理这样的问题我们后面会讲,这里先从最简单的情形开始推导,就假设数据都是线性可分的,亦即这样的超平面是存在的。 如图所示,两种颜色的点分别代表两个类别,红颜色的线表示一个可行的超平面。在进行分类的时 候,我们将数据点 $x$ 代入 $f(x)$ 中,如果得到的结果小于 0 ,则赋予其类别 -1 ,如果大于 0 则 赋予类别 1 。如果 $f(x)=0$,则很难办了,分到哪一类都不是。事实上,对于 $f(x)$ 的绝对值很小 的情况,我们都很难处理,因为细微的变动(比如超平面稍微转一个小角度)就有可能导致结果类 别的改变。理想情况下,我们希望 $f(x)$ 的值都是很大的正数或者很小的负数,这样我们就能更加 确信它是属于其中某一类别的。 从几何直观上来说,由于超平面是用于分隔两类数据的,越接近超平面的点越“难”分隔,因为如果 超平面稍微转动一下,它们就有可能跑到另一边去。反之,如果是距离超平面很远的点,例如图中 的右上角或者左下角的点,则很容易分辩出其类别。 实际上这两个 Criteria 是互通的,我们定义 functional margin 为 $\hat{\gamma}=y(w^Tx+b)=yf(x)$,注意前面乘上类别 $y$ 之后可以保 证这个 margin 的非负性(因为 $f(x)<0$ 对应于 $y=-1$ 的那些点),而 点到超平面的距离定义为 geometrical margin 。不妨来看看二者之间的 关系。如图所示,对于一个点 $x$ ,令其垂直投影到超平面上的对应的 为 $x_0$ ,由于 $w$ 是垂直于超平面的一个向量(请自行验证),我 们有 \[ x=x_0+\gamma\frac{w}{\|w\|} \] 又由于 $x_0$ 是超平面上的点, 满足 $f(x_0)=0$ ,代入超平面的方程即可算出 \[ \gamma = \frac{w^Tx+b}{\|w\|}=\frac{f(x)}{\|w\|} \] 不过,这里的 $\gamma$ 是带符 号的,我们需要的只是它的绝对值,因此类似地,也乘上对应的类别 $y$ 即可,因此实际上我们定义 geometrical margin 为: \[ \tilde{\gamma} = y\gamma = \frac{\hat{\gamma}}{\|w\|} \] 显然,functional margin 和 geometrical margin 相差一个 $\|w\|$ 的缩放因子。按照我们前面的分析,对一个数据点进行分类,当它的 margin 越大的时候,分类的 confidence 越大。对于一个包含 $n$ 个点的数据集,我们可以很自然地定义它的 margin 为所有这 $n$ 个点的 margin 值中最小的那个。于是,为了使得分类的 confidence 高,我们希望所选择的 hyper plane 能够最大化这个 margin 值。 不过这里我们有两个 margin 可以选,不过 functional margin 明显是不太适合用来最大化的一个量,因为在 hyper plane 固定以后,我们可以等比例地缩放 $w$ 的长度和 $b$ 的值,这样可以使得 $f(x)=w^Tx+b$ 的值任意大,亦即 functional margin $\hat{\gamma}$ 可以在 hyper plane 保持不变的情况下被取得任意大,而 geometrical margin 则没有这个问题,因为除上了 $\|w\|$ 这个分母,所以缩放 $w$ 和$b$ 的时候 $\tilde{\gamma}$ 的值是不会改变的,它只随着 hyper plane 的变动而变动,因此,这是更加合适的一个 margin 。这样一来,我们的 maximum margin classifier 的目标函数即定义为 \[ \max \tilde{\gamma} \] 当然,还需要满足一些条件,根据 margin 的定义,我们有 \[ y_i(w^Tx_i+b) = \hat{\gamma}_i \geq \hat{\gamma}, \quad i=1,\ldots,n \] 其中 $\hat{\gamma}=\tilde{\gamma}\|w\|$ ,根据我们刚才的讨论,即使在超平面固定的情况下,$\hat{\gamma}$ 的值也可以随着 $\|w\|$ 的变化而变化。由于我们的目标就是要确定超平面,因此可以把这个无关的变量固定下来,固定的方式有两种:一是固定 $\|w\|$ ,当我们找到最优的 $\tilde{\gamma}$ 时 $\hat{\gamma}$ 也就可以随之而固定;二是反过来固定 $\hat{\gamma}$ ,此时 $\|w\|$ 也可以根据最优的 $\tilde{\gamma}$ 得到。处于方便推导和优化的目的,我们选择第二种,令 $\hat{\gamma}=1$ ,则我们的目标函数化为: \[ \max \frac{1}{\|w\|}, \quad s.t., y_i(w^Tx_i+b)\geq 1, i=1,\ldots,n \] 通过求解这个问题,我们就可以找到一个 margin 最大的 classifier ,如下图所示,中间的红色线条是 Optimal Hyper Plane ,另外两条线到红线的距离都是等于 $\tilde{\gamma}$ 的:

Matlab简答题

Matlab简答题 1 程序M文件和函数M文件的区别 (1)程序M 文件中创建的变量都是MATLAB 工作空间中的变量,工作空间中的其它程序和函数可以共享;而函数M 文件创建的变量则一般为局限于函数运行空间内的局部变量; (2)函数M 文件可以使用传递参数,在函数M 文件的调用式中可以有输入参数和输出参数,而程序M文件则没有这种功能。 2.直接输入法:从键盘直接输入矩阵的元素。 需遵循以下规则: 1)用中括号[ ]把所有矩阵元素括起来。 2)同一行的不同元素之间用空格或逗号间隔。 3 变量名的命名规则 1)变量名应以字母开头,可以由字母数字和下划线混合组成。 2)组成变量名的字母长度不应大于31。 3)变量名应区分字母大小写。 4 举例说明左除与右除的区别 在通常情况下,左除x=a\b是a*x=b的解,右除x=b/a是x*a=b的解,一般情况下,a\b b/a。 Matlab提供了两种除法运算:左除()和右除(/)。 一般情况下,x=ab是方程a*x =b的解,而x=b/a是方程x*a=b的解。 例:a=[1 2 3; 4 2 6; 7 4 9] b=[4; 1; 2]; x=ab 则显示:x= -1.5000

2.0000 0.5000 如果a为非奇异矩阵,则ab和b/a可通过a的逆矩阵与b阵得到: ab = inv(a)*b b/a = b*inv(a) 5已知矩阵A,B,写出A*B和A.*B的区别 带点的称为“点乘”、“位乘“,即为两个行列数相同的矩阵,对应位置一一相乘,得到的结果依位置对应到结果矩阵中(条件size(A)=size(B)) A=[1 2 3 4;5 6 7 8] A = 1 2 3 4 5 6 7 8 >> B=[2 4 6 8;10 12 14 16] B = 2 4 6 8 10 12 14 16 >> A.*B ans = 2 8 18 32 50 72 98 128 后者就是矩阵乘法了,要求前者A的列数与后者B行数对应 (条件A(m,n)*B(n,q)=C(m,q) 接上面的例子

控制系统常用函数的使用

5.zp2tf 6.tf2zp 7.series串联连接 8.parallel并联连接 10.linmod将结构图模型(Simulink结构图)转化成状态空间模型 先建立simulink结构图,保存在当前目录下,文件名取为“sysmodel.mdl”如图0所示。 图0 11.ss2tf将状态空间模型转化成多项式模型

由step函数生成的图形,可以右击直接得到各种动态性能指标,如图1所示 图2 w一定时,系统随着阻尼比ζ的增大,闭环极点的实部在s左半平面的位置更加远可见当 n

图3 w增大,系统响应加速,振荡频率增大,系统调整时间缩短,但可见,当ζ一定时,随着 n 是超调量没变化。 16.impulse用于求系统的单位脉冲响应 19.spline插值函数,用于由若干实验点估算实验点之外的函数值,下面的程序由插值函数

“rltool ”命令,然后回车即可。 25. nichols 绘制尼科尔斯图,尼科尔斯图的作用同伯德图,不过纵坐标表示的是|G(jw)|,而 横坐标则为∠G(jw)。 26. dcgain 求连续系统的稳态增益。 27. residue 部分分式展开。 28. linspace 构造线性分布的向量。 29. roots 求多项式根。 30. limit 求极限,可用于求解系统的稳态位置、速度与加速度误差系数。 例:已知某稳定单位负反馈系统的闭环传递函数为25()556 s s s Φ= ++。试求系统的稳定位置、速度与加速度误差系数Kp 、Kv 和Ka 。 32. pade 用于近似求取纯延迟时间函数 33. tf2ss 传递函数到状态空间转换 34. subs 变量替换 参考文献 [1]熊晓君.自动控制原理实验教程(硬件模拟与MATLAB 仿真)[M].北京:机械工业出版社, 2009. [2]黄忠霖. 自动控制原理的MATLAB 实现[M]. 北京:国防工业出版社,2007. 谭彩铭 2012-3-27 南京理工大学基础实验楼

支持向量机讲解(很详细明了)

w支持向量机: Maximum Margin Classifier by pluskid, on 2010-09-08, in Machine Learning84 comments 支持向量机即Support Vector Machine,简称SVM 。我最开始听说这头机器的名号的时候,一种神秘感就油然而生,似乎把Support 这么一个具体的动作和Vector 这么一个抽象的概念拼到一起,然后再做成一个Machine ,一听就很玄了! 不过后来我才知道,原来SVM 它并不是一头机器,而是一种算法,或者,确切地说,是一类算法,当然,这样抠字眼的话就没完没了了,比如,我说SVM 实际上是一个分类器(Classifier) ,但是其实也是有用SVM 来做回归(Regression) 的。所以,这种字眼就先不管了,还是从分类器说起吧。 SVM 一直被认为是效果最好的现成可用的分类算法之一(其实有很多人都相信,“之一”是可以去掉的)。这里“现成可用”其实是很重要的,因为一直以来学术界和工业界甚至只是学术界里做理论的和做应用的之间,都有一种“鸿沟”,有些很fancy 或者很复杂的算法,在抽象出来的模型里很完美,然而在实际问题上却显得很脆弱,效果很差甚至完全fail 。而SVM 则正好是一个特例——在两边都混得开。 好了,由于SVM 的故事本身就很长,所以废话就先只说这么多了,直接入题吧。当然,说是入贴,但是也不能一上来就是SVM ,而是必须要从线性分类器开始讲。这里我们考虑的 是一个两类的分类问题,数据点用x来表示,这是一个n维向量,而类别用y来表示,可以取1 或者-1 ,分别代表两个不同的类(有些地方会选0 和 1 ,当然其实分类问题选什么都无所谓,只要是两个不同的数字即可,不过这里选择+1 和-1 是为了方便SVM 的 推导,后面就会明了了)。一个线性分类器就是要在n维的数据空间中找到一个超平面,其方程可以表示为 一个超平面,在二维空间中的例子就是一条直线。我们希望的是,通过这个超平面可以把两类数据分隔开来,比如,在超平面一边的数据点所对应的y全是-1 ,而在另一边全是 1 。具体来说,我们令f(x)=w T x+b,显然,如果f(x)=0,那么x是位于超平面上的点。我们不妨要求对于所有满足f(x)<0的点,其对应的y等于-1 ,而f(x)>0则对 应y=1的数据点。当然,有些时候(或者说大部分时候)数据并不是线性可分的,这个时候满足这样条件的超平面就根本不存在,不过关于如何处理这样的问题我们后面会讲,这里

ACAA认证前端工程师参考样题

ACAA认证前端工程师参考样题 1,构成一个网页的两个最基本的元素 [单选题] * A. 文字和图片(正确答案) B. 音乐和链接 C. 图片和链接 D. 动画和音乐 2,在一些编辑器中,输入html:5,按tab键后会出现html文件主体内容,这里的5指的是 [单选题] * A. html版本号(正确答案) B. html文件的文件名 C. 没有实际意义 D. html编号 3,下列哪个是区段标签 * A.
(正确答案) B.


(正确答案) C.
相关文档 最新文档