当前位置:文档之家› Toward a Method of Selecting Among Computational Models of Cognition

Toward a Method of Selecting Among Computational Models of Cognition

Toward a Method of Selecting Among Computational Models of Cognition
Toward a Method of Selecting Among Computational Models of Cognition

Toward a Method of Selecting Among Computational Models of Cognition

Mark A.Pitt,In Jae Myung,and Shaobo Zhang

The Ohio State University

The question of how one should decide among competing explanations of data is at the heart of the scientific https://www.doczj.com/doc/1f132757.html,putational models of cognition are increasingly being advanced as explanations of behavior.The success of this line of inquiry depends on the development of robust methods to guide the evaluation and selection of these models.This article introduces a method of selecting among mathematical models of cognition known as minimum description length ,which provides an intuitive and theoretically well-grounded understanding of why one model should be chosen.A central but elusive concept in model selection,complexity,can also be derived with the method.The adequacy of the method is demonstrated in 3areas of cognitive modeling:psychophysics,information integration,and categorization.

How should one choose among competing theoretical explana-tions of data?This question is at the heart of the scientific enter-prise,regardless of whether verbal models are being tested in an experimental setting or computational models are being evaluated in simulations.A number of criteria have been proposed to assist in this endeavor,summarized nicely by Jacobs and Grainger (1994).They include (a)plausibility (are the assumptions of the model biologically and psychologically plausible?);(b)explana-tory adequacy (is the theoretical explanation reasonable and con-sistent with what is known?);(c)interpretability (do the model and its parts—e.g.,parameters—make sense?are they understand-able?);(d)descriptive adequacy (does the model provide a good description of the observed data?);(e)generalizability (does the model predict well the characteristics of data that will be observed in the future?);and (f)complexity (does the model capture the phenomenon in the least complex—i.e.,simplest—possible manner?).The relative importance of these criteria may vary with the types of models being compared.For example,verbal models are likely to be scrutinized on the first three criteria just as much as the last

three to thoroughly evaluate the soundness of the models and their https://www.doczj.com/doc/1f132757.html,putational models,on the other hand,may have already satisfied the first three criteria to a certain level of accept-ability earlier in their evolution,leaving the last three criteria to be the primary ones on which they are evaluated.This emphasis on the latter three can be seen in the development of quantitative methods designed to compare models on these criteria.These methods are the topic of this article.

In the last two decades,interest in mathematical models of cognition and other psychological processes has increased tremen-dously.We view this as a positive sign for the discipline,for it suggests that this method of inquiry holds considerable promise.Among other things,a mathematical instantiation of a theory provides a test bed in which researchers can examine the detailed interactions of a model’s parts with a level of precision that is not possible with verbal models.Furthermore,through systematic evaluation of its behavior,an accurate assessment of a model’s viability can be obtained.

The goal of modeling is to infer the structural and functional properties of a cognitive process from behavioral data that were thought to have been generated by that process.At its most basic level,then,a mathematical model is a set of assumptions about the structure and functioning of the process.The adequacy of a model is first assessed by measuring its ability to reproduce human data.If it does so reasonably well,then the next step is to compare its performance with competing models.

It is imperative that the model selection method that is used to select among competing models accurately measures how well each model approximates the mental process.Above all else,the method must be valid.Otherwise,the purpose of modeling is undermined.One runs the risk of choosing a model that in actuality is a poor approximation of the underlying process of interest,leading researchers astray.The potential severity of this problem should make it clear that sound methodology is not only integral to but also necessary for theoretical advancement.In short,model selection methods must be as sophisticated and robust as the models themselves.

In this article,we introduce a new quantitative method of model selection.It is theoretically well grounded and provides a clear

Mark A.Pitt,In Jae Myung,and Shaobo Zhang,Department of Psy-chology,The Ohio State University.

Portions of this work were presented at the 40th annual meeting of the Psychonomic Society,Los Angeles,California,November 18–22,1999,and at the 31st and 32nd annual meetings of the Society for Mathematical Psychology,Nashville,Tennessee (August 6–9,1998)and Santa Cruz,California (July 29–August 1,1999),respectively.We thank D.Bamber,R.Golden,and Andrew Hanson for their valuable comments and attention to detail in reading earlier versions of this article.

Mark A.Pitt and In Jae Myung contributed equally to the article,so order of authorship should be viewed as arbitrary.The section Three Application Examples is based on Shaobo Zhang’s doctoral dissertation submitted to the Department of Psychology at The Ohio State University.In Jae Myung and Mark A.Pitt were supported by National Institute of Mental Health Grant MH57472.

Correspondence concerning this article should be addressed to Mark A.Pitt or In Jae Myung,Department of Psychology,The Ohio State Univer-sity,1885Neil Avenue Mall,Columbus,Ohio 43210-1222.E-mail:pitt.2@https://www.doczj.com/doc/1f132757.html, or myung.1@https://www.doczj.com/doc/1f132757.html,

Psychological Review

Copyright 2002by the American Psychological Association,Inc.

2002,Vol.109,No.3,472–4910033-295X/02/$5.00DOI:10.1037//0033-295X.109.3.472

472

understanding of why one model should be chosen over another. The purpose of the article is to provide a good conceptual under-standing of the problem of model selection and the solution being advocated.Consequently,only the most important(and new)tech-nical advances are discussed.A more thorough treatment of the mathematics can be found in other sources(Myung,Balasubrama-nian,&Pitt,2000;Myung,Kim,&Pitt,2000;Myung&Pitt, 1997,1998).After introducing the problem of model selection and identifying model complexity as a key property of a model that must be considered by any selection method,we introduce an intuitive statistical tool that assists in understanding and measuring complexity.Next,we develop a quantitative measure of complex-ity within the mathematics of differential geometry and show how it is incorporated into a powerful model selection method known as minimum description length(MDL).Finally,application exam-ples of MDL and the complexity measure are provided by com-paring models in three areas of cognitive psychology:psychophys-ics,information integration,and categorization.

Generalizability Instead of Goodness of Fit

Model selection in psychology has largely been limited to a single criterion to measure the accuracy with which a set of models describes a mental process:goodness of fit(GOF).The model that fits a particular set of observed data the best(i.e.,accounts for the most variance)is considered superior because it is presumed to approximate most closely the mental process that generated the data.Typical measures of GOF include the root mean squared error(RMSE),which is the square root of the sum of squared deviations between observed and predicted data divided by the number of data points fitted,and the maximum likelihood,which is the probability of obtaining the observed data maximized with respect to the model’s parameter values.GOF as a selection criterion is attractive because it appears to measure exactly what one wants to know:How well does the model mimic human behavior?In addition,the GOF measure is easy to calculate. GOF is a necessary and important component of model selec-tion:Data are the only link to the underlying cognitive process,so a model’s ability to describe the output from this process must be considered in model selection.However,model selection based solely on GOF can lead to erroneous results and the choice of an inferior model.Just because a model fits data well does not necessarily imply that the regularity one seeks to capture in the data is well approximated by the model(Roberts&Pashler,2000). Properties of the model itself can enable it to provide a good fit to the data for reasons that have nothing to do with the model’s approximation to the cognitive process(Myung,2000).Two of these properties are the number of parameters in the model and its functional form(i.e.,the way in which the model’s parameters and data are combined in the model equation).Together they contrib-ute to a model’s complexity,which refers to the flexibility inherent in a model that enables it to fit diverse patterns of data.1The following simulation example demonstrates the independent con-tribution of these two properties to GOF.

Three models were compared on their ability to fit data.Model M

1

(defined in Table1)generated the data to fit,and is therefore

considered the“true”model.Model M

2differed from M

1

only in

having one additional parameter,two instead of one;note that their functional forms are the same.Model M

3

had the same number of

parameters as M

2

,but a different functional form(a is an exponent of x rather than an additive component).Parameters were chosen for each of the three models to give the best fit to1,000randomly

generated samples of data from the model M

1

.Each model’s mean fit to the samples is shown in the first row of Table1along with the percentage of time that particular model provided a better fit

than its two competitors.As can been seen,M

2

and M

3

,with one

more parameter than M

1

,always provided a better fit to the data

than M

1

.Because the data were generated by M

1

,M

2

and M

3

must have overfitted the data beyond what is necessary to capture the

underlying regularity.Otherwise,one would have expected M

1

to

fit its own data best at least some of the time.After all,M

1 generated the data!The improved fit of M

2

and M

3

occurred because the extra parameter,b,in these two models enabled them to absorb random error(i.e.,nonsystematic variation)in the data. Absorption of these random fluctuations is the only means by

which M

2

and M

3

could have fitted the data better than the true

model,M

1

.Note also that M

3

provided a better fit than M

2

.This improvement in fit must be due to functional form rather than the number of parameters,because these two models differ only in how the data(x)and the two parameters(a and b)are combined in the model equation.

This example demonstrates clearly that GOF alone is inadequate as a model selection criterion.Because a model’s complexity is not evaluated by the method,the model capable of absorbing the most variation in the data,regardless of its source,will be chosen. Frequently this will be the most complex model.The simulation also highlights the point that model selection is particularly diffi-cult in psychology,and in other social sciences,precisely because random error is present in the data.Although this“noise”can be minimized(in the experimental design),it cannot be eliminated,so in any given data set,variation due to the cognitive process and variation due to random error are entangled,posing a significant obstacle to identifying the best model.

To get around this problem,model selection must be based, instead,on a different criterion—that of generalizability.The goal

1Cutting,Bruno,Brady,and Moore(1992)used the term scope,which is similar to our definition of complexity.They proposed to measure the scope by assessing a model’s ability“to account for all possible data functions,where those functions are generated by a reasonably large sample of random data sets”(p.364).

Table1

Goodness of Fit and Generalizability Measures of Three Models Differing in Complexity

Model M

1

(true model)M

2

M

3 Goodness of fit 2.68(0%) 2.49(31%) 2.41(69%) Generalizability 2.99(52%) 3.08(28%) 3.14(20%) Note.Each cell contains the average root mean squared error of the fit of each model to the data and the percentage of samples(out of1,000)in which that particular model fitted the data best(in parentheses).The three

models were as follows:M

1

:y?ln(x?a)?error;M

2

:y?b*ln(x?a)?

error;and M

3

:y?bx a?error.The error was normally distributed,M?0,

SD?3.Samples were generated from model M

1

using a?1on the same 6points for x,which ranged from1to6in increments of1.

473

MODEL SELECTION AND COMPLEXITY

of generalizability is to predict the statistics of new,as yet unseen,samples generated by the mental process being studied.The ratio-nale underlying the criterion is that the model should be chosen that fits all samples best,not the model that provides the best fit to one particular sample.Only when this condition is met can one be sure a model is accurately capturing the underlying process,not also the idiosyncracies (i.e.,random error)of a particular data sample.More formally,generalizability can be defined in terms of a discrepancy function that measures the expected error in predict-ing future data given the model of interest (Linhart &Zucchini,1986;also see their work for a discussion of the theoretical underpinnings of generalizability).

The results of a second simulation illustrate the superiority of generalizability as a model selection criterion.After each of the data samples was fitted in the first simulation,the parameters of the three models were fixed,and generalizability was assessed by fitting the models to another 1,000samples of data generated from M 1.The average fits are shown in the second row of Table 1.As can be seen,poor generalizability is the cost of overfitting a specific sample of data.Not only are average fits now worse for M 2and M 3than for M 1,but these two models provided the best fit to the second sample much less often than M 1.Generalizability should be preferred over GOF because it does a better job of capturing the general trend in the data and ignoring random variation.

This difference between these two selection criteria is shown in Figure 1.Dots in the panel represent observed data points.Lines are the functions generated by two models varying in complexity.The simpler model (thick line)captures the general trend in the data.If new data points (?)are added to the sample,fit will remain similar.The more complex model (thin line)not only captures the general trend in the data,but also captures many of the idiosyn-cracies of each observation in the data set,which will cause fit to drop when additional observations are added to the sample.Gen-eralizability would favor the simple model,which fits with our intuitions.GOF,on the other hand,would favor the complex model.

The goal of model selection,then,should be to maximize generalizability.This turns out to be quite difficult in practice,because the relationship between complexity and generalizability is not as straightforward as that between complexity and GOF.These differences are illustrated in Figure 2.Model complexity is represented along the horizontal axis and any fit index on the vertical axis,where larger values indicate a better fit (e.g.,percent variance accounted for),with the two functions representing the two selection criteria.As was demonstrated in the first simulation,as complexity increases,so does GOF.Generalizability will also increase positively with complexity,but only up to the point where the model is sufficiently complex to capture the regularities in the data caused by the cognitive process being modeled.Any addi-tional complexity will cause a drop in generalizability,because after that point the model will begin to capture random error,not just the underlying process.The difference between the GOF and generalizability curves represents the amount of overfitting that can occur.Only by taking complexity into account can a selection method accurately measure a model ’s generalizability.The task before the modeling community has been to develop an accurate and complete measure of model complexity,being sensitive not only to the number of parameters in the model but also to its functional form.

Another way to interpret the preceding discussion is that the trademark of a good model is its ability to satisfy the two opposing selection pressures of GOF and complexity,with the end result being good generalizability.These two pressures can be thought of as the two edges of Occam ’s razor:A model must be complex enough to capture the underlying regularity yet simple enough to avoid overfitting the data sample and thus losing generalizability.In this regard,model selection methods should be evaluated on their success in implementing Occam ’s razor.The selection method that we introduce in this article,MDL,achieves this goal.Before we describe this method,we review prior approaches to model selection.

Prior Approaches to Model Selection

We begin this section with a formal definition of a model.From a statistical standpoint,data are a sample generated from a true but unknown probability distribution,which is the regularity underly-ing the data.A statistical model is defined as a collection of probability distributions defined on experimental data and indexed by the model ’s parameter vector,whose values range over the parameter space of the model.If the model contains as a special case the probability distribution that generated the data (i.e.,the “true ”model),then the model is said to be correctly specified;otherwise it is misspecified.Formally,define y ?(y 1,...,y N )as a vector of values of the dependent variable,??(?1,...,?k )as the parameter vector of the model,f (y ??)as the likelihood function as a function of the parameter ?.N is the number of observations and k is the number of parameters.Often it is possible to write y as a sum of a deterministic component plus random error:

y ?g ??,x ??e .

(1)

In the equation,x ?(x 1,...,x N )is a vector of an independent variable x ,and e ?(e 1,...,e N )is the random error vector from a probability distribution with a mean of zero.Quite often the

mean

Figure 1.Illustration of the trade-off between goodness of fit and gen-eralizability.An observed data set (dots)was fitted to a simple model (thick line)and a complex model (thin line).New observations are shown by the plus symbol.

474

PITT,MYUNG,AND ZHANG

function g (?,x )itself is taken to define a mathematical model.However,the specification of the error distribution must be in-cluded in the definition of a model.Additional parameters may be introduced in the model to specify the shape of the error distribu-tion (e.g.,normal).Often its shape is determined by the experi-mental task or design.For example,consider a recognition mem-ory experiment in which the participant is required to respond “old ”or “new ”to a set of pictures presented across a series of n independent trials,with the number of correct responses recorded as the dependent variable.Suppose that a two-parameter model assumes that the probability of a correct response follows a logistic function of the time lag (x i ),for condition i (i ?1,...,N ),between initial exposure and recognition test,in the form of g (?1,?2,x i )?[1??1exp(??2?x i )]?1.In this case,the dependent variable y i will be binomially distributed with probability g (?1,?2,x i )and the number of binomial trials n ,so the shape of error function is completely specified by the experimental task.

Six representative selection methods currently in use are shown in Table 2.They are the Akaike information criterion (AIC;Akaike,1973),the Bayesian information criterion (BIC;Schwarz,1978),the root mean squared deviation (RMSD),the information-theoretic measure of complexity (ICOMP;Bozdogan,1990),cross-validation (CV;Stone,1974),and Bayesian model selection (BMS;Kass &Raftery,1995;Myung &Pitt,1997).Each of these methods assesses a model ’s generalizability by combining a mea-sure of GOF with a measure of complexity.Each prescribes that the model that minimizes the given criterion be chosen.That is,the smaller the criterion value of a model,the better the model gen-eralizes.2A fuller discussion of these methods can be found in Myung,Forster,and Browne (2000;see also Linhart &Zucchini,1986).

AIC and BIC are the two most commonly used selection meth-ods.The first term,?2ln(f (y ???

)),is a maximum likelihood mea-sure of GOF and the second term,involving k ,is a measure of complexity that is sensitive to the number of parameters in the model.As the number of parameters increases,so does the crite-rion.In BIC,the rate of increase is modified by the log of the

sample size,n .3RMSD uses RMSE as the measure of GOF and also takes into account the number of parameters through k .4

These three measures,AIC,BIC,and RMSD ,are all sensitive only to one aspect of complexity,number of parameters,but insensitive to functional form.This is clearly inadequate because,as demon-strated in Table 1,the functional form of a model influences generalizability.ICOMP is an improvement on this shortcoming.Its second and third terms together represent a complexity measure that takes into account the effects of parameter sensitivity through trace(?)and parameter interdependence through det(?),which,according to Li,Lewandowski,and DeBrunner (1996),are two principal components of the functional form that contribute to model complexity.However,ICOMP is also problematic because it is not invariant under reparameterization of the model,in par-ticular under nonlinear forms of reparameterization.5

2

The model selection methods discussed in the present article do not require the assumption that the models being compared are correct or nested.(A model is said to be correct if there is a parameter value of the model that yields the probability distribution that has generated the ob-served data sample.A model is said to be nested within another model if the former can be reduced to a special case of the latter by setting one or more of its parameters to fixed values.)On the other hand,the generalized likelihood ratio test based on the G 2or chi-square statistics (e.g.,Bishop,Fienberg,&Holland,1975,pp.125–127),which are often used to compare two models,assumes that the models are nested and,further,that the reduced model is correct.When these assumptions are met,both types of selection methods should perform similarly.However,the methods should not be viewed as interchangeable because their goals differ.The selection methods presented in this article were designed to identify the model that generalizes best in some defined sense.The generalized likelihood ratio test,in contrast,is a null hypothesis significance test in which the hypoth-esis that the reduced model is correct is tested given a prescribed level of the Type 1error rate (i.e.,?).Accordingly,the model chosen under this test may not necessarily be the one that generalizes best.3

Sample size refers to the number of independent data samples (more accurately,errors,i.e.,e i s)drawn from the same probability distribution.Data size is the number of observed data points that are being fitted to evaluate a model and that may come from different probability distribu-tions,although from the same probability family.Often,the sample size is equal to the data size.A case in point is a linear regression model,y i ??x i ?e i (i ?1,...,N ),where e i ?N (0,?2).Note that errors,e i s,are independent and identically distributed according to the normal probability distribution with mean zero and variance ?2.On the other hand,if it is assumed that each e i is normally distributed with zero mean but with a different value of the variance,that is,e i ?N (0,?i 2),(i ?1,...,N ),then the sample size,n ,will now be equal to 1whereas the data size,N ,remains unchanged.4

The RMSD defined in Table 2differs from the RMSD that has often been used in the psychological literature (e.g.,Friedman,Massaro,Kitzis,&Cohen,1995)where it is defined as RMSD ??SSE/N ,in which (N ?k )is replaced by N ,and therefore does not take into account the number of parameters.This form of RMSD is nothing more than RMSE .As such,it is not appropriate to use as a method of model selection,especially when comparing models that differ in the number of parameters.5

Reparameterization refers to transforming the parameters of a model so that it becomes another,behaviorally equivalent,model.For example,a one-parameter exponential model with normal error,y ?e ?x ?N (0,?2)is a reparameterization of another model:y ??x ?N (0,?2).The latter

is

Figure 2.Illustration of the relationship between goodness of fit and generalizability as a function of model complexity (Myung &Pitt,2001).From Stevens’Handbook of Experimental Psychology (p.449,Figure 11.4),by J.Wixted (Editor),2001,New York:Wiley.Copyright 2001by Wiley.Adapted with permission.

475

MODEL SELECTION AND COMPLEXITY

In CV,the observed data are divided into two subsamples of equal sizes,calibration and validation.The former is used to estimate the best-fitting parameter values of a model.The param-eters are then fixed to these values and used by the model to fit the validation sample,yielding a model ’s CV index.CV is an easy-to-use,heuristic method of estimating a model ’s generalizability (for a brief tutorial,see Myung &Pitt,2001).The emphasis on generalizability makes it reasonable to suppose that CV somehow takes into account the effects of functional form.If,how,and how well it does this is not clear,however.

BMS is a model selection method motivated from Bayesian inference.As such,the method chooses models based on the posterior probability of a model given the data.Calculation of the posterior probability requires the specification of the parameter prior density,?(?),creating the possibility that model selection will depend on the choice of the prior density.As with CV,complexity in BMS is elusive.The integral form of the measure indicates that BMS takes into account functional form and the number of parameters,but how this is achieved is not entirely clear.It can be shown that BIC performs equivalently to BMS as a large sample approximation.

It is important to note that these selection criteria are themselves sample estimates of a true but unknown population parameter (i.e.,generalizability in the population),and thus their values can change from sample to sample.Under the model selection proce-dure described above,however,one is forced to choose one model no matter how small the difference is among models,even when the models are virtually equivalent in their approximation of the underlying regularity.One solution to this dilemma is to conduct a statistical test,before applying the model selection procedure,to decide if two given models provide equally good descriptions of the underlying process.Golden (2000)proposed such a method-ology,in which one can determine whether a subset of models are

equally good approximations of the cognitive process.6If the number of comparisons is not small,however,it can be difficult to control experiment-wise error.

The preceding selection methods represent important progress in tackling the model selection problem.All have shortcomings that limit their usefulness to various degrees.The complexity measure in AIC,BIC,and RMSD is incomplete,and the other three are either not invariant under reparameterization (ICOMP)or lack a clear complexity measure (CV,BMS).The remainder of this article is devoted to the development and testing of a model selection approach that overcomes these limitations.We begin by showing that differential geometry provides a theoretically well-justified and intuitive framework for understanding complexity and model selection in general.

Model Complexity:A Distributional Approach

We begin the discussion of complexity with a graphical defini-tion of the term,intended to clarify what it means for a model to be complex.Depicted in the top panel in Figure 3is the set of all data patterns that are possible given a particular experimental design.Every point in this multidimensional data space represents a particular data pattern in terms of a probability distribution,such as the shape of a frequency distribution of response times.All models occupy a section,or multiple sections,of data space,being able to fit a subset of the possible data patterns that could be observed.It is equally appropriate to think of data space as the universe of all models under consideration,because every model will occupy a region of this space,large or small.

6

This is a null hypothesis significance test,which,as an extension of the Wilke ’s generalized likelihood ratio test,tests the null hypothesis that all models under consideration fit the data equally well.This test,unlike the generalized likelihood ratio test,is applicable to comparing non-nested and misspecified models for a wide range of discrepancy functions,including the ones with penalty terms,such as AIC,BIC,and MDL.In the standard model selection procedure using these criteria,one is forced to decide between two models under comparison.This test allows for a third decision that both models are equally good or there is not enough evidence yet for choosing one model over the other.

Table 2

Six Prior Model Selection Methods

Selection method

Criterion equation

Akaike information criterion (AIC)AIC ??2ln f (y ???

)?2k Bayesian information criterion (BIC)BIC ??2ln f (y ???

)?k ln n Root mean square deviation (RMSD)RMSD ??SSE/(N ?k )

Information-theoretic measure of complexity (ICOMP)ICOMP ??ln f (y ???)?k 2ln ?

trace [?(??

)]k ?

?12ln det(?(??

))Cross-validation (CV)

CV ??ln f (y Val ???

Cal )Bayesian model selection (BMS)

BMS ??ln ?f (y ??)?(?)d ?

Note.y ?data sample of size n ;??

?parameter value that maximizes the likelihood function f (y ??);k ?number of parameters;SSE ?sum of the squared deviations between observed and predicted data;N ?the number of data points fitted;??covariance matrix of the parameter estimates;y Val ?validation sample of observed data;??

Cal ?maximum likelihood parameter estimate for a calibration sample;ln ?the natural logarithm of base e ;?(?)?the prior probability density function of the parameter.

obtained from the former by defining a new parameter,?,as ??e ?.Whenever two models are related to each other through reparameterization,they become equivalent in the sense that both will fit any given data set identically,albeit with different parameter values.Statistically speaking,they are indistinguishable from one another.

476

PITT,MYUNG,AND ZHANG

The amount of space occupied by a model is positively related to its complexity.A simple model (M a )will occupy a small region of data space because it assumes a specific structure in the data,which will manifest itself as a relatively narrow range of similar data patterns.This idea is illustrated in the lower left panel.When one of these few patterns occurs,the model will fit the data well;otherwise,it will fit poorly.Simple models are easily falsifiable,requiring a small minimum number of data points outside of its region of data space to disprove the model.In contrast,a complex model (M b )will occupy a larger portion of data https://www.doczj.com/doc/1f132757.html,plex models do not assume a single structure in the data.Rather,the structure changes as a function of the parameter values of the model.A slight change in a parameter ’s value can have a dramatic change in the model ’s structure.Such chameleon-like behavior enables complex models to be finely tuned to fit a wide range of data patterns.This is illustrated in the lower right panel.Overly complex models are of questionable worth because their ability to fit such a diverse set of data patterns can make them difficult to falsify.In general,a complex model is one with many parameters and a (powerful)nonlinear equation for combining https://www.doczj.com/doc/1f132757.html,plexity is dichotomized in this example for illustrative pur-poses only.It is more accurate to think of it as a continuum,as depicted in Figure 2.

Response Surface Analysis

Although the examples in Figure 3are hypothetical,the graph-ical depiction of mathematical models in this way is not merely illustrative.Response surface analysis (RSA)is a statistical tool that,as in Figure 3,yields graphical representations of models for comparing their relative complexities.In addition,it serves as an informative starting point for the derivation of an elegant quanti-tative measure of complexity.

RSA is a method for studying geometric relations among re-sponses generated by a mathematical model,often used in nonlin-

ear regression (Bates &Watts,1988).For a model with k param-eters and N observations,the response surface is defined as a k -dimensional surface,formed by all possible response vectors that the model can describe.The response surface is embedded in an N -dimensional data space ,which is the set of all possible response vectors that could be generated independently of a model.The response surface is a hyperplane for a linear model but may be curved when the model is nonlinear.The effects of model com-plexity on model fit is easily visible when models are compared in the space of response surfaces.This is shown in the following example.See Myung,Kim,and Pitt (2000)for a more detailed discussion.

Consider the following one-parameter power model:

y ?t ???power model ?,

(2)

where y is the response probability (e.g.,proportion correct),t is a presentation or retention interval greater than 1,and ?(?0)is a parameter.Suppose that y is measured at two different time inter-vals,t 1and t 2.Given two fixed values of t 1and t 2,the response surface is a line or a curve in a two-dimensional data space composed of (y t 1,y t 2)created by plotting the y values at t 1against the corresponding y values at t 2for the full range of the parameter ?,similar to phase plots in dynamical systems research (Kelso,1995).In essence,a model is represented graphically as a plot of y t 1versus y t 2in data space.For example,for the parameter ??1,the y value at t 1?2is obtained as y t 1?(t 1)???(2)?1?0.500.Similarly,the y value at t 2?8is obtained as y t 2?(t 2)???(8)?1?0.125.These two values are then represented as a single point (0.500,0.125)on the (y t 1,y t 2)plane.Additional points are obtained by varying the full range of the parameter (i.e.,0????)to form a continuous curve,which is called the response curve of a model,shown in the middle panel of Figure 4.The equation that describes this relationship can be derived analytically as follows:

y t 2?y t 1

ln t 2/ln t 1

.(3)

Note that the parameter ?has been removed from the equation.

The model is now parameter free,having been redefined as the relationship between two y values instead of a parameter and a y value.Each point on the response curve describes the relationship between two y values that are themselves described perfectly by a power function.Similarly,the response curves for the following one-parameter models can be obtained and are graphed in the adjacent panels in Figure 4:y ?1??t (linear model)

y ?[1.102??sin(5???t /12)?1]/2(blackhole model).

(4)

RSA provides two valuable insights into model complexity.First,RSA makes the meaning of complexity tangible.The re-sponse curve of a model represents a complete visual description of the model (i.e.,all of the data patterns it can describe).The curve is the model.Any point that falls on the curve can be perfectly fit by the model.Thus,RSA clearly reveals what patterns of data a model can describe and what patterns it cannot.For example,the response curve of the linear model reveals that

the

Figure 3.The top panel depicts regions in data space occupied by two models,M a (simple model)and M b (complex model),with the range of data patterns that can be generated by each model in the lower panels.

477

MODEL SELECTION AND COMPLEXITY

model can describe only those (y 2,y 1)points satisfying the equa-tion,y 2?4y 1?3(0.75?y t ?1),no others.

Second,the contributions of functional form to model complex-ity become evident when models are compared in RSA space.All three models in Figure 4have one parameter,but their response curves differ greatly,indicating that their functional forms must also differ.This observation leads to an intuitive measure of model complexity:Given that the response surface of a model represents the collection of all possible data patterns that the model can describe,one could define a natural complexity measure as the total length of the model ’s response curve.For example,for the three response curves in Figure 4,one can conclude that the black-hole model is most complex with its line length of 25.74,followed by the power model (length ?1.50),and then linear model (length ?1.03).Dunn (2000)presents another RSA-based complexity measure.

Despite the possible ways of quantifying complexity within RSA,any such measure would be incomplete because it would not take into account the stochastic nature of the process underlying the data.That is,RSA ignores random variation in the data.The response curves in Figure 4depict the three models without an error term.Recall that data represent a sample from an unknown probability distribution,the shape of which must be specified by the model.A complete measure of complexity must take into account the distributional characteristics of a model (e ),not only that of the mean function,that is,g (?,x )in Equation 1.Only the latter is considered in RSA.Thus,any RSA metric would yield only an approximate measure of complexity.To incorporate ran-dom error into a complexity measure requires that RSA be ex-tended into a space of probability distributions,to which we now turn.

Differential Geometric Approach to Model Complexity

In this section we show that differential geometry,a branch of mathematics,provides a theoretically well-justified and intuitive measure of model complexity.A more technically rigorous pre-sentation of the topic can be found in Myung,Balasubramanian,and Pitt (2000).

Within differential geometry,a model forms a geometric object known as a Riemannian manifold that is embedded in the space of all probability distributions (Amari,1983,1985;Rao,1945).As in

the data space depicted in Figure 3,every distribution is a point in this space,and the collection of points created by varying the parameters of the model gives rise to a hypervolume in which similar distributions are mapped to nearby points,as illustrated in Figure 5.

Earlier,we defined complexity as that characteristic of a model that enables it to fit a wide range of data patterns.In a geometric context,this translates into an inherent characteristic of a model that enables it to describe a wide range of probability distributions.Models that are able to describe more distributions should be more complex.Model complexity would therefore seem to be related to the number of probability distributions that a model can generate.This intuition immediately runs into trouble:The number of all such distributions is uncountably infinite,making the value inde-terminable.Or is it?

Given that not all distributions are equally similar to one an-other,one solution is to count only distinguishable distributions.That is,if two or more probability distributions on a model ’s manifold are sufficiently similar to one another to be statistically indistinguishable,they are counted as one distribution,with a cluster of such distributions occupying a local neighborhood on the manifold.This procedure yields a countably infinite set of distin-guishable distributions,the size of which is a natural measure of complexity.More precisely,two probability distributions should be considered indistinguishable if one is mistaken for the

other

Figure 4.Response curves of three one-parameter models that have the same number of parameters but differ in functional form,each obtained for t 1?2and t 2?

8.

Figure 5.The space of probability distributions forms a manifold on which “similar ”distributions are mapped to nearby points.

478

PITT,MYUNG,AND ZHANG

even in the presence of an infinite amount of data.A measure of volume that counts only distinguishable distributions must be devised to achieve this goal.

The following mental exercise shows how this can be done. Draw data from one distribution,which is indexed by a specific parameter,say?

p

,in the model,and ask how well one can guess

whether the data came from?

p rather than from a nearby?

q

.The

ability to distinguish between these distributions increases with the amount of available data.However,it can be shown that for any fixed amount of data there is a little ellipsoid around?

p

where the probability of error in the guessing game is large.In other words, within this ellipsoid,distributions are not very distinguishable in the statistical sense.To count distinguishable distributions,one should then tile the model manifold with such ellipsoids,counting one distribution for each ellipsoid.This procedure turns the man-ifold into an ellipsoid-covered lattice with a distinguishable distri-bution at each lattice point.Then the limit of infinite sample size should be taken so that the ellipsoids of indistinguishability shrink and the associated lattice becomes finer,forming a continuum in the limit.Taking this limit recovers a continuum measure that counts only distinguishable distributions.When this computation is carried out,the number of distinguishable distributions turns out to be equal to d?{det[I(?)]}1/2where I(?)is the Fisher information matrix of a sample of size1,det(I)is the determinant of the matrix I,and d?the infinitesimal parameter volume(see Footnote6for a definition of the Fisher Information matrix;see also Schervish, 1995).

The number of all distinguishable probability distributions that a model can generate or describe is obtained by integrating d?{det[I(?)]}1/2over the entire parameter manifold as follows:

V M??d??(5)

where the subscript M denotes a particular model under consider-ation.This measure is known as the Riemannian volume in differ-ential geometry.A highly desirable property of the volume mea-sure is that it is invariant under reparameterization.This property is an outgrowth of models being represented as manifolds in the space of all probability distributions.In this context,the parame-ters of a model simply index the collection of distributions a model describes.The choice of the parameters themselves is irrelevant. The manifold is the model,which will never change,regardless of how the model is specified in an equation(see Equation10and accompanying text).

The Riemannian volume makes good sense as a complexity measure.Because complexity is related to the volume of a model in the space of probability distributions,the measure of volume should count only different,or distinguishable,distributions,and not the coordinate volume(?d?)of the manifold.The Riemannian volume,therefore,is a direct function of the number of distin-guishable distributions that a model can generate,with a complex model generating more distributions than a simple model.

Relation to RSA

The differential geometric approach to model complexity is similar to RSA in that a mathematical model is viewed as a geometric shape embedded in a hyperdimensional space,albeit different spaces(probability distributions vs.response vectors). This correspondence is not accidental,because the differential geometric approach is a logical extension of RSA.To understand the connection between the two,think of model selection as an inference game:The goal is to determine,out of a set of probability distributions that index data patterns,which model is most likely to have generated a data sample drawn from an unknown probability distribution.Referring back to Equation1,the main yardstick used in this selection process is the likelihood function,f(y??).The value of the likelihood function depends upon not only the mean func-tion,g(?,x),but also the distributional characteristics of the error term(e).Any justifiable measure of complexity should take into account these two factors.RSA considers only the first term, whereas the differential geometric approach considers both.

To see how the two approaches are related quantitatively,con-sider the response curve of a one-parameter model,such as the power model in Figure4(middle panel).The RSA measure of complexity in this model is the total length of the response curve, which in essence measures the“number”of data points along the curve.In the differential geometric approach,counting is carried out with the additional knowledge of the local distinguishability of data points along the curve.This difference is illustrated in Figure 6for the one-parameter power model.The response curve is split into segments of different lengths,with the points within each segment being statistically indistinguishable.Note that distin-guishability is not uniform along the curve.Points in the middle region are less distinguishable than those at either end.In fact,for any one-parameter model of observed data that follows a binomial probability distribution,one can derive formal expressions for these two measures of complexity as follows(see Appendix A for a complete

derivation):

Figure6.The power model’s response curve from Figure5divided into local regions of indistinguishability(i.e.,the points within each region are statistically indistinguishable).

479

MODEL SELECTION AND COMPLEXITY

RSA:length L M??d???

q?1

N?dg??,x q?d??2;

Differential geometry:volume V M

??d???

q?1

N1

g??,x q??1?g??,x q???dg??,x q?d?

?2.(6)

In the equations,it is assumed that observed data,y

q

,are distrib-

uted binomially,Bin[n,g(?,x

q

)],of sample size n,and probability

g(?,x

q

).Note that the two measures are identical except for the

additional term,1/{g(?,x

q )[1?g(?,x

q

)]},in the differential

geometric complexity measure.This extra term takes into account local distinguishability and is equal to det[I(?)]in Equation5.

MDL Method of Model Selection

Thus far in the article we have introduced a measure of model complexity.Although it is useful for comparing the relative com-plexities of models,as will be shown below,by itself the measure is insufficient as a model selection method.What is missing is a measure of how well the model fits the data(i.e.,a measure of GOF).MDL,a model selection method from algorithmic coding theory in computer science(Grunwald,2000;Rissanen,1983, 1996)combines both of these measures.

The MDL approach to model selection was developed within the domain of information theory,where the goal of model selec-tion is to choose the model that permits the greatest compression of data in its description.The assumption underlying the approach is that regularities or patterns in data imply redundancy.The more the data can be compressed by extracting this redundancy,the more we learn about the underlying regularities governing the cognitive process of interest.The full form of the measure is shown below.The first term is the GOF measure and the second and third together form the intrinsic complexity of the model (Rissanen,1996).

MDL??ln f?y?????k

2

ln?n2?

??ln?d??det?I????,(7)

where y?(y

1,...,y

n

)is a data sample of size n,??is the

maximum likelihood parameter estimate,ln is the natural loga-rithm of base e,I(?)is the Fisher information matrix defined earlier.7The integration of the third term is taken over the param-eter space defined by the model.

As with prior selection methods,MDL prescribes that the model that minimizes the criterion should be chosen,the assumption being that such a model has extracted the most redundancy(i.e., regularity)in the data and thus should generalize best.In practice, the criterion represents the shortest length of computer code(mea-sured in bits of information)necessary to describe the data given the model.The shorter the code,the greater the amount of regu-larity in the data that the model uncovered.The soundness of MDL as a model selection criterion has been well documented by Li and Vitanyi(1997),who showed that there is a close relationship between minimizing MDL and achieving good generalizability.From a decision-theoretic perspective,MDL selects the one model,

among a set of competing models,that minimizes the expected error in predicting future data in which the prediction error is measured using a logarithmic discrepancy function(Rissanen, 1999;Yamanishi,1998).

It turns out that minimization of MDL corresponds to maximi-zation of the posterior probability within the Bayesian statistics framework(i.e.,BMS).Balasubramanian(1997)showed that the MDL criterion can be derived as a finite series of terms in an asymptotic expansion of the Bayesian posterior probability of a model given the data for a special form of the parameter prior density.This connection between the two suggests that choosing the model that gives the shortest description of the observed data is essentially equivalent to choosing the model that is most likely “true”in the sense of probability theory(see Theorem1of Vitanyi &Li,2000).

The theoretical link between BMS and MDL also suggests that they may perform similarly in practice.Barron and Cover(1991) showed that BMS and MDL are asymptotically equivalent given large sample sizes;that is,both will converge to the true model if the true model is correctly specified.On the other hand,if models are misspecified and sample size is relatively small,they can yield disparate results,especially depending on the form of the param-eter prior density used in the calculation of BMS.

Despite these similarities,MDL has at least one advantage over BMS:The complexity measure is well understood.As mentioned above,complexity and GOF are not easily disentangled in the integral form of BMS(Table2).In contrast,a clear understanding of the complexity term in MDL is provided by its counterpart in differential geometry,the geometric complexity measure.This is described in detail in the following section.

The latter two terms of the MDL criterion(Equation7)readily lend themselves to a differential geometric interpretation,which is related to the Riemannian volume measure presented earlier.Con-ceptually,model selection using MDL proceeds by choosing the model that best approximates the true model by counting the number of distinguishable distributions that come close to the true model.Proximity to the true model is assessed by f(y??).Within the differential geometric approach,this corresponds to a volume measure in the space of probability distributions.The following volume,under the assumption of large sample size,is shown to be a valid measure of proximity(Balasubramanian,1997;Myung, Balasubramanian,&Pitt,2000):C

M

?(2?/n)k/2h(??),where k is the number of parameters in the model and h(??)is a data-dependent factor that goes to1as n grows large(some additional conditions are required;see Balasubramanian,1997).Essentially,

C

M

represents the Riemannian volume of a small ellipsoid around ??,within which the probability of the data,f(y??),is appreciable. As such,it measures the number of distinguishable distributions

7The Fisher information matrix I(?)of the MDL criterion is defined as

I

ij

(?)?(?1/n)E(?2ln f(y??)/??i??j)(i,j?1,...,k)for the data vector

y?(y

l

,...,y

n

)where y

q

s are sample values of random variables,Y

q

s

(q?1,...,n;see,e.g.,Rissanen,1996,Equation7).Further,if Y

q

s are independently and identically distributed,the above I(?)reduces to the

Fisher information matrix of sample size n?1,that is,I

ij

(?)??E(?2ln

f(y

q

??)/??i??j)(i,j?1,...,k)for any q.

480PITT,MYUNG,AND ZHANG

that come close to the truth,as measured by predicting the data y

with relatively high probability.

However,C

M

alone is not an adequate measure of proximity because the total number of distinguishable distributions of a

model(V

M

,the Riemannian volume,Equation5)must also be considered.Inclusion of this additional measure leads to a volume

ratio,V

M /C

M

,which penalizes models for having an unnecessarily

large number of distinguishable distributions(V

M

)or having rel-

atively few distinguishable distributions close to the truth(C

M

). Taking the log of this ratio gives

ln?V M C M

??k2ln?n2???ln?d??det?I?????ln h????.(8)

The first and second terms are independent of the true distribution as well as the data,and therefore represent an intrinsic property of the model.Together they will be called the geometric complexity of the model,and are invariant under reparameterization of the model.As sample size n increases,the third term,which is data dependent,becomes negligible.When this occurs,the geometric complexity is equal to the complexity penalty in the MDL criterion in Equation7.

It is also worth noting that the first term of the geometric complexity measure increases logarithmically with sample size n, whereas the second term is independent of n.An implication of this is that as n grows large,the effects of complexity due to functional form,reflected through I(?),will gradually diminish compared to those due to the number of parameters(k).Thus, functional form effects will have their greatest impact on model selection when sample size is small.Because small samples are the norm in experiments in much of cognitive psychology,it is im-perative that the selection method be sensitive to this property of a model.

Differential geometry provides many valuable insights into model complexity and model selection.One is a new explication of MDL.The MDL selection criterion can be rewritten as follows:

MDL??ln?f?y????V M/C M ???ln?“normalized f?y????”?.(9)

This reinterpretation provides a clearer picture of what MDL does in model selection.It selects the model that gives the highest value of the maximum likelihood per the relative ratio of distinguishable

distributions(V

M /C

M

).We might call this the normalized maxi-

mum likelihood.From this perspective,the better model is the one with many distinguishable distributions close to the truth but few distinguishable distributions overall.

Perhaps the most important insight provided by differential geometry is an intuitive understanding of the meaning of complex-ity in MDL:It measures the minus log of the volume of the distinguishable distributions in a model relative to those close to the truth.In this regard,the size of a model manifold in the space of distributions is what matters when measuring complexity.A model’s functional form and its number of parameters can be misleading indicators of complexity because they are simply the apparatus by which a collection of distributions defined by the model is indexed.The geometric approach to complexity presented here makes it clear that neither the parameterization nor the spe-cific functional form used in indexing is relevant so long as the

same collection of distributions is catalogued on the model man-ifold.For example,the following two models,although assuming different functional forms,are equivalent and equally complex in the geometric sense:

Model A:y?exp??1x1??2x2??error,

Model B:y??

1

x

1?2x2?error,(10) where the error has zero mean and follows the same distribution for both models.Here,the parameters of Model A are related to the

parameters of Model B through?

i

?exp(?i),i?1,2.

Three Application Examples

Geometric complexity and MDL constitute a powerful pair of model evaluation tools.When used together in model testing,a deeper understanding of the relationship between models can be gained.The first measure enables one to assess the relative com-plexities of the set of models under consideration.The second builds on the first by suggesting which model is preferable given the data in hand.The following simulations demonstrate the ap-plication of these methods in three areas of cognitive modeling: psychophysics,information integration,and categorization.In each example,two competing models with the same number of parameters but different functional forms were fitted to data sets generated by each of these models(human data were not used).Of interest is the ability of each selection method to recover the model that generated the data.A good selection method should be able to discriminate between data generated by a model from those gen-erated by another model.That is,it should be able to“see through”the random variation in the data sample and accurately infer whether the model being tested generated the data it is being fit to. Errors are a sign of overgeneralization and reveal a bias in the selection method,which could be toward either the more complex or simpler model.The ideal pattern of data is one in which each model generalizes best only to data generated by itself,not to data generated by the competing model.In the2?2sections of Tables 3–5,this corresponds to a mean selection criterion measure that is lowest in the upper left and lower right quadrants,with perfect recovery rates(100%)in these cells as well.

Four selection methods were compared:AIC,ICOMP,CV,and MDL.Given the close relationship between MDL and BMS,the latter was not included in the comparison.BIC and RMSD were also not included because of their equivalence to AIC in the present testing conditions.AIC can be expressed with BIC as a term in the equation:AIC?BIC?k(2?ln n).Consequently, both methods will yield the same results when models with the same number of parameters(i.e.,equal k)are compared.RMSD will generally yield the same outcome as well.8A fuller discussion of the three simulations can be found in Zhang(1999).

8When comparing among models with the same number of parameters, model selection under RMSD will be the same as that under AIC and BIC when errors are normally distributed and have equal variances.This is because in such cases the sum of squares error in RMSD is related to the likelihood function in AIC(and BIC)as SSE??ln f(y??)and hence, minimization of SSE is equivalent to maximization of the likelihood

481

MODEL SELECTION AND COMPLEXITY

Psychophysics

Models of psychophysics(Roberts,1979)were developed to describe the relationship between physical dimensions(e.g.,light intensity)and their psychological counterparts(e.g.,brightness). Two of the most influential have been Stevens’s power model and Fechner’s logarithmic model.

Stevens’s model:Y?aX b?error,

Fechner’s model:Y?a ln?X?b??error.(11) In both models,error is assumed to be normally distributed with a mean of0and a standard deviation of0.3.Data samples were generated from each model using fixed parameter values.Each model was then fitted to both data samples using each of the four selection criteria.The results are displayed in Table3.Shown are the mean criterion values and the percentage of samples(out of1,000)by which the specified model bested its competitor. Under AIC,Stevens’s model was always selected when fitting its own data(100%vs.0%)but was selected about equally often as Fechner’s model when the data were generated by Fechner’s model(47%vs.53%).This asymmetry demonstrates that AIC overestimated the generalizability of Stevens’s model relative to Fechner’s model.That is,Stevens’s model was shown to general-

ize better to data generated by Fechner’s model than Fechner’s

model itself.CV and ICOMP performed no better,with model

recovery rates for CV comparable to those of AIC but considerably

worse for ICOMP.

Because both models have the same number of parameters,the

failure of these three selection methods can be attributed to a

complexity term that does not adequately incorporate functional

form.When MDL was used as the selection method,the model

recovery rates were perfect for both models,demonstrating the

superiority of the method’s complexity measure.(An example of

how to calculate MDL and the geometric complexity measure for

these two models is provided in Appendix B.)

Calculation of the geometric complexities of the two models

reinforces and further elucidates the MDL findings.Stevens’s

model is more complex than Fechner’s,with the complexity dif-

ference being equal to5.52.9Given the logarithmic relationship

between geometric complexity and the number of distinguishable

distributions,this means that for every distribution for which

Fechner’s model can account,Stevens’s model can describe about

e5.52?250distributions.These results clearly demonstrate that appropriately accounting for the complexity of a model is essential

to model selection.Furthermore,the geometric complexity results

validate a long-held suspicion regarding the source of the superior

data-fitting abilities of Stevens’s model(Townsend,1975). Information Integration

In a typical information integration experiment,a range of

stimuli is generated from a factorial manipulation of two or more

stimulus dimensions(e.g.,visual and auditory)and then presented

to participants for categorization as one of two or more possible

response alternatives.Data are scored as the proportion of re-

sponses in one category across the various combinations of stim-

ulus dimensions.For this comparison,we consider two models of

information integration,the fuzzy logical model of perception

(FLMP;Oden&Massaro,1978)and the linear integration model

(LIM;Anderson,1981).Each assumes that the response probabil-

ity(p

ij

)of one category,say A,on the presentation of a stimulus

of the specific i and j feature dimensions in a two-factor informa-

tion integration experiment takes the following form:

FLMP:p ij?

?i?j

?i?j??1??i??1??j?,

LIM:p ij?

?i??j

2

,(12)

where?

i

and?

j

(i?1,...,q

1

;j?1,...,q

2

;0??

i

,?

j

?1)are parameters representing the corresponding feature dimensions. Again,the two models were fitted to data sets generated by each model.In addition,the sample size,n,was varied in this example to demonstrate its influence on the performance of these selection methods.The results are shown in Table4.

9In computing geometric complexity measures,the following parameter

ranges were assumed:0?a??,0?b?3for Stevens’s model,and0?a,b??for Fechner’s model.

Table3

Comparison of Four Selection Methods on Their Ability to Generalize Accurately Using Two Psychophysical Models

Selection method/ model fitted

Data from

Stevens’s Fechner’s

AIC

Stevens’s12.42(100%)11.92(47%) Fechner’s52.22(0%)11.86(53%) ICOMP

Stevens’s 4.92(100%) 5.69(100%) Fechner’s25.40(0%)10.64(0%) CV

Stevens’s9.16(94%)10.70(49%) Fechner’s25.30(6%)11.01(51%) MDL

Stevens’s10.71(100%)10.46(0%)

Fechner’s25.40(0%) 5.21(100%) Note.For each method and model,the mean criterion value and the percentage of samples(in parentheses)in which the particular model was selected under the given method are shown.A thousand samples were generated from each model using the same6points for X,which ranged from1to6in one-step increments.The random error was normally distributed with a mean of zero and a standard deviation of0.3.The parameter values used to generate the simulated data were a?2and b?2 for Stevens’s model and a?2and b?5for Fechner’s model.AIC?Akaike information criterion;ICOMP?information-theoretic measure of complexity;CV?cross-validation;MDL?minimum description length. function.For non-normal or unequal-variance errors,no such relationship exists between SSE and the likelihood function.Therefore,model selection under RMSD will differ from that under AIC and BIC.It has been our experience that RMSD performs worse,never better,than these two methods under such conditions.This appears to be due to RMSD’s insen-sitivity to unequal variances.

When the data were generated by FLMP,regardless of the selection method and sample size,FLMP was recovered 100%of the time,except for MDL when sample size was 20.In this case,MDL performed slightly worse than the other selection methods.When the data were generated by LIM,all prior selection methods faired much more poorly,except MDL,which recovered the correct model (LIM)perfectly across all sample sizes.When AIC was used,FLMP was selected over LIM half of the time with a sample size of 20(51%vs.49%).Such errors were reduced

Table 4

Generalizability Comparisons of Four Selection Methods Over Three Sample Sizes Using Two Information Integration Models

Selection method/model fitted

Data from

FLMP

LIM

Sample size:n ?20

AIC FLMP 59.86(100%)74.96(51%)LIM 89.82(0%)75.24(49%)ICOMP FLMP 25.07(100%)30.27(25%)LIM 40.32(0%)29.63(75%)CV

FLMP 30.43(100%)37.78(29%)LIM 42.88(0%)37.04(71%)MDL FLMP 34.56(89%)42.11(0%)LIM

40.80(11%)33.51(100%)

Sample size:n ?60

AIC FLMP 77.94(100%)93.88(25%)LIM 159.72(0%)92.32(75%)ICOMP FLMP 33.55(100%)39.45(14%)LIM 73.82(0%)37.95(86%)CV

FLMP 39.11(99%)47.28(28%)LIM 76.58(1%)45.80(72%)MDL FLMP 43.70(100%)51.71(0%)LIM

75.75(0%)

42.06(100%)Sample size:n ?150

AIC FLMP 92.98(100%)112.22(17%)LIM 293.74(0%)107.84(83%)ICOMP FLMP 40.98(100%)48.56(10%)LIM 140.36(0%)45.68(90%)CV

FLMP 47.33(100%)55.88(15%)LIM 143.56(0%)53.32(85%)MDL FLMP 51.21(100%)60.83(0%)LIM

142.76(0%)

49.82(100%)Note.For each method and model,the mean criterion value and the percentage of samples (in parentheses)in which the particular model was selected under the given method are shown.Simulated data in a 2?8factorial design (q 1?2;q 2?8)were created from predetermined values of the 10parameters,??(0.15,0.85),??(0.05,0.10,0.26,0.42,0.58,0.74,0.90,0.95).From these,16binomial response probabilities (p ij )were computed using each model equation.For each probability,a series of n (i.e.,n ?20,60,or 150)independent binary outcomes (0or 1)were generated according to the binomial probability distribution.The number of ones in the series was summed and divided by n to obtain an observed proportion.This way,each sample consisted of 16observed propor-tions.For each sample size,1,000samples were generated from each of the two models.In parameter estimation as well as complexity calcu-lation,?1was fixed to 0.15so the models became identifiable.FLMP ?fuzzy logical model of perception;LIM ?linear integration model;AIC ?Akaike information criterion;ICOMP ?information-theoretic measure of complexity;CV ?cross-validation;MDL ?minimum description length.

Table 5

Generalizability Comparisons of Three Selection Methods Over Three Sample Sizes Using Two Categorization Models

Selection method/model fitted

Data from

GCM

PRT

Sample size:n ?20

AIC GCM 74.34(98%)80.36(15%)PRT 85.24(2%)75.66(85%)CV GCM 37.70(86%)40.33(23%)PRT 41.46(14%)37.62(77%)MDL GCM 38.96(96%)41.97(7%)PRT

43.82(4%)

39.03(93%)Sample size:n ?60

AIC GCM 94.26(98%)106.90(4%)PRT 122.10(2%)93.44(96%)CV GCM 45.93(99%)52.37(9%)PRT 59.70(1%)46.58(91%)MDL GCM 48.93(98%)55.25(4%)PRT

62.25(2%)

47.92(96%)Sample size:n ?150

AIC GCM 114.06(99%)137.36(1%)PRT 178.80(1%)106.56(99%)CV GCM 53.30(99%)68.22(4%)PRT 90.09(1%)53.16(96%)MDL GCM 58.83(99%)70.48(1%)PRT

90.60(1%)

54.48(99%)Note.The mean criterion value and the percentage of samples (in paren-theses)in which the particular model was selected under the given method are shown.Simulated data were created from predetermined values of the seven parameters,c ?2.5,w ?(0.2,0.2,0.2,0.2,0.1,0.1).From these,27trinomial response probabilities (p iJ ,i ?1,...,9,J ?1,2,3)were computed using each model equation.For each probability,a series of n (i.e.,n ?20,60,or 150)independent ternary outcomes were generated according to the trinomial probability distribution.The number of out-comes of each type in the series was summed and divided by n to obtain an observed proportion.This way,each sample consisted of 27observed proportions.For each sample size,1,000samples were generated from each of the two models.GCM ?generalized concept model;PRT ?prototype model;AIC ?Akaike information criterion;CV ?cross-validation;MDL ?minimum description length.

considerably by the time the sample size reached 150(17%vs.83%).With ICOMP and CV,this erroneous selection bias in favor of FLMP is less severe,but even with the largest sample size,all prior selection methods except MDL failed to recover the correct model at least 10%of the time.

That FLMP was selected over LIM when methods such as AIC and CV were used,even when the data were generated by LIM,suggests that FLMP is more complex than LIM.This observation was confirmed when the geometric complexity of each model was calculated.The difference in geometric complexity between FLMP and LIM was 8.74,meaning that for every distribution for which LIM can account,FLMP can describe about e 8.74?6,248distri-butions.10As the simulation results show,the complexity term in MDL does an excellent job of correcting for this difference and minimizing overgeneralization of the more complex model.

Categorization

Two models of categorization were considered in the present demonstration.They were the generalized context model (GCM;Nosofsky,1986)and the prototype model (PRT;Reed,1972).Each model assumes that categorization responses follow a multi-nomial probability distribution with p iJ (probability of category C J response given stimulus X i ),which is given by:

GCM:p iJ ?

?j ?C J s ij ?K

?

k ?C K

s ik

where s ij ?exp ??c ??

?

m ?1

M

w m ?x im ?x jm ?r ?1/r ?,

PRT:p iJ ?

s iJ ?

K

s iK

where s iJ ?exp ??c ??

?

m ?1

M

w m ?x im ?x Jm ?r ?1/r ?.(13)

In the equation,s ij is a similarity measure between multidimen-sional stimuli X i and X j ,s iJ is a similarity measure between stimulus X i and the prototypic stimulus X J of category C J ,M is the number of stimulus dimensions,c is a sensitivity parameter,w m is an attention-weight parameter,and r is the Minkowski metric parameter.The two models were fitted to data sets generated by each model using the six-dimensional scaling solution from Ex-periment 1of Shin and Nosofsky (1992)under the Euclidean distance metric of r ?2.As in the last example,three sample sizes were again used.

The results are shown in Table 5.With AIC,virtually no bias in model recovery rate was observed for n ?60and n ?150.Only a small bias toward choosing GCM was found using data gener-ated from PRT when n ?20.CV shows a similar pattern of model recovery,with errors being highest with the smallest sample size and there being a slightly larger bias in favor of GCM.11When MDL was used to choose between the two models,there was no

improvement over the other selection methods when the sample sizes were 60and 150.Even when sample size was 20,the improvement in model recovery was quite modest relative to AIC.Note that this outcome contrasts with that of the preceding exam-ples,in which MDL was generally superior to the other selection methods when sample size was smallest.

On the face of it,these findings might suggest that MDL is not much better than the other selection methods in measuring gener-alizability.After all,what else could cause this result?The only circumstance in which such an outcome is predicted using MDL is when the functional forms of the two models are similar,thus minimizing the differential contribution of functional form in the complexity term (recall that the two models have the same number of parameters).Calculation of the geometric complexity of each model confirmed this prediction.GCM is indeed only slightly more complex than PRT,the difference being equal to 0.60,so GCM can describe about two distributions (e 0.60?1.8)for every distribution PRT can describe.12

The results of these three simulations demonstrate the accuracy of MDL in choosing computational models of cognition.MDL ’s sensitivity to functional form was clearly demonstrated in its superior generalizability (i.e.,good model recovery rate)across all three examples,especially when it counts most:when sample size was small and the complexities of the models differed by a nontrivial amount.But no selection method will always perform as well as one would like.This was found to be true for MDL in the information integration simulation when sample size equaled 20:MDL performed slightly worse than the other three selection methods.One point to be made about this outcome is that regions of data space (Figure 3)can always be found in which any selection method will perform less than optimally.Because these regions are not known in advance,it makes the most sense to use the most robust method available.MDL is the clear choice.

The simulations also demonstrate the value of an independent measure of complexity when comparing models.The measure not only identifies which model is more complex,but knowing this information can provide additional insight into the model selection process.For example,the results of an MDL analysis might lead to Model A being chosen over Model B,even though the results of a complexity analysis showed Model A to be the more complex of the two.This outcome would suggest that the additional complex-ity of A is necessary to capture the underlying regularity in the data.In other words,the mental process may not be as simple as one might have originally thought.

When comparing models,it might be most efficient to compare their relative complexities first before applying a selection method.If the complexities differ significantly,then MDL is the better choice of a selection method.If they are about equally complex,

10

In computing geometric complexity measures,the following param-eter ranges were assumed:0.001??i ,?j ?0.999for both FLMP and LIM.11

Because ICOMP performed similar to AIC and CV in the first two simulations,it was not included in the third test of the selection methods.12

In computing geometric complexity measures,the following param-eter ranges were assumed:0?c ?5,0?w i ?1(i ?1,...,6),w i ?1for both GCM and PRT.

then the choice of a selection method will matter less.Short of doing this,MDL is the safest method to use.

In sum,these examples with simulated data in which the true model was known a priori demonstrate the usefulness of MDL and geometric complexity for selecting among computational models of cognition.An obvious next step is to extend their application to human data as well as to other types of models in the discipline. Such endeavors are currently underway.

General Discussion

The purpose of this article is to introduce the psychological community to MDL as a method of selecting among mathematical models of cognition and to demonstrate its advantages over exist-ing methods.We began by discussing shortcomings of the most widely used selection criterion,GOF.The simulation data in Table1demonstrate that GOF alone is an insufficient criterion with which to compare models.Instead,generalizability should be the goal of model selection,for it overcomes the problem inherent in GOF of teasing apart random variation in the data sample from variation due to the cognitive process.

Fit and complexity were identified as two properties of a model that any selection method must be sensitive to.As schematized in Figure2,optimization of these two opposing properties defines the model selection problem.What is needed,then,is a selection method that acts as a fulcrum on which fit and complexity can be balanced.The development of a theoretically justifiable measure of complexity,which includes the effects of functional form as well as the number of parameters,has been a nontrivial problem. Although selection methods such as AIC were important advances in model selection because they are sensitive to one aspect of complexity(the number of parameters in a model),proper use of such techniques is limited to situations in which the effect of functional form on model fit is minimal or can be ignored.Such selection methods are of limited usefulness for comparing models of cognition,most of which differ in functional form.

At the heart of the difficulty of developing a suitable selection method was discovering a meaningful metric that could be used to compare models with radically different functional forms.For example,how does one compare the functional forms of the logarithmic and exponential models in Table1?The literature has been relatively silent on this issue(but see Cutting et al.,1992; Townsend,1975).As we show above,differential geometry not only provides a solution,but the solution is https://www.doczj.com/doc/1f132757.html,plexity is conceptualized as counting explanations(i.e.,distinguishable probability distributions that the model can generate)that lie close to the true model that generated the data.The relative complexities of the models,which are defined in Equation8,can be estimated by comparing these distributions with the total number of distin-guishable distributions the models can generate.

There are many attractive features of this complexity measure. To begin with,it is independent of any selection method and can therefore serve as an additional tool with which to evaluate and compare https://www.doczj.com/doc/1f132757.html,plexity is no longer hidden in the equation of a particular selection method,such as BMS.In addition,geo-metric complexity equals the complexity term of the MDL crite-rion(Equation7).This link between the two measures reveals how model selection works in MDL:It selects the model that maxi-mizes the“normalized”maximum likelihood(Equation9).The elucidation of MDL by means of differential geometry demon-strates that MDL is a suitable model selection method,and the outcomes of the three application examples support this conten-tion.In the following sections,we discuss some of the limitations of these two tools and the factors that affect their use. Factors That Affect Geometric Complexity

The number of parameters and functional form are not the only factors that influence geometric complexity.There are at least four others on which the complexity measure depends.Each is de-scribed briefly along with a discussion of their implications for model selection.

Extension of the parameter space.As shown in Equation8, the definition of geometric complexity involves the integration of a non-negative quantity—the square root of the determinant of the Fisher information matrix—over a region of the parameter space on which the model is defined.Model complexity is directly related to the extent of the parameter space,with a larger space resulting in a more complex model.An implication of this is that two models that are identical in all respects,except in the range of their parameters,will have different geometric complexity mea-sures.This makes sense from the standpoint of differential geom-etry.A model with a wider parameter space will contain more distinguishable probability distributions than a model with a smaller parameter space.The two are different models as far as model selection is concerned.Accordingly,the parameter range must be clearly specified when defining a model.

Sample size.Just as with the extension of the parameter space, sample size is directly related to complexity,as can be seen in https://www.doczj.com/doc/1f132757.html,plexity increases with larger sample sizes.The measure’s sensitivity to this variable is related to the fact that variance in a probability distribution decreases as sample size increases.With a small sample,two probability distributions on a model’s manifold might not be distinguishable because variance is large.They will thus be counted as a single distribution when calculating the geometric complexity of the model.As sample size increases,the variance will decrease and the two distributions will become more discriminable.At some point,the two distributions will be distinguishable and counted separately in the complexity measure.In other words,the number of distinguishable distribu-tions increases as sample size grows,increasing complexity.The first term of the geometric complexity measure in Equation8 reflects this effect.

Shape of the probability distribution.As described earlier,a model is a parametric family of probability distributions,which are specified by its likelihood function f(y??).Accordingly,a model’s geometric complexity will depend upon the particular shape of the probability distribution assumed.To see this,note that the Fisher information matrix,I(?),which defines geometric complexity,is determined by the likelihood function f(y??),and that the likelihood function takes different forms depending on the shape of the probability distribution.For example,for two psychophysical models that assume the same power function for the deterministic component g(?,x)but different distributions of the error term(e.g., normal vs.uniform),they will in general have different complexity values.It would be interesting to investigate the relative contribu-

tions of the two components (structural and distributional)to the overall value of the complexity measure,although the contribution of the latter is predicted to be small.

Experimental design.To the extent that the Fisher information matrix is sensitive to the experimental design represented by the independent variable x in g (?,x ),the geometric complexity mea-sure can depend upon the specific choices of x values used in an experiment.For example,for a linear model of the form y i ??x i ?N (0,?2)(i ?1,...,n ;0??,???),different complexities can be obtained for two designs such as x ?{1,2,3,4}and x ?{3,4,5,6}even for the same sample size (n ?4)and the same error distribution (see Appendix B for an example).

To summarize,the geometric complexity of a model is deter-mined by a number of factors.Some represent inherent character-istics of the model such as the number of parameters,functional form,and extension of parameter space.Others represent non-model,experimental factors such as the error distribution and the experimental design.As a general rule of thumb,any factor that will affect the number of distinguishable distributions could alter the geometric complexity of the model.

Computing Geometric Complexity

The definition of geometric complexity includes an integral of the determinant of the Fisher information matrix.Therefore,com-puting geometric complexity involves two stages:derivation of the Fisher information matrix and evaluation of the integral.The Fisher information matrix is obtained by calculating the second-derivatives of the log likelihood function of the model of interest,either by hand derivation or using technical computing software such as Mathematica (Wolfram Research,Inc.,Champaign,IL)or Maple (Waterloo Maple,Inc.,Ontario,Canada).

Once the Fisher information matrix is obtained,the next step is to integrate its determinant over the extension of the parameter space defined by the model.It is not in general possible to obtain a closed-form solution of the integral,so the solution must be sought numerically instead.This step can be challenging,although recent advancements in Monte Carlo techniques,as well as the availability of statistical packages that implement these techniques,has made the high-dimensional integration problem technically feasible.In Appendix C,we provide a tutorial of Monte Carlo methods.For a more technically rigorous treatment of the topic,consult Gelman,Carlin,Stern,and Rubin (1995),and Gilks,Richardson,and Spiegelhalter (1996).

The use of MDL requires that the determinant of the Fisher information matrix be nonsingular,meaning that its value must remain finite for the full range of the parameters.If the determinant becomes infinite for certain values of the parameters,the range of parameters must be restricted to ensure the integral of the deter-minant is finite.13This is,in effect,equivalent to redefining the model itself,as discussed earlier.For example,for FLMP,the determinant of the Fisher information matrix becomes singular at the two endpoints of each parameter,that is,?i ,?j ?0,and 1.To avoid this problem,we restricted the range of the parameters to 0.001??i ,?j ?0.999for all i s and j s.As noted above,different parameter ranges will yield different complexity values.

A challenge in computing geometric complexity arises for al-gorithmic models,such as random-walk models (e.g.,Ratcliff,

1978).The likelihood function that predicts a model ’s performance for any given stimulus condition is not defined a priori.Rather,a prediction is obtained only by simulating the model for each given stimulus condition.Further,to get a reliable estimate of the prob-ability of a particular prediction,the procedure must be repeated over a large number of trials.In short,obtaining the likelihood function (or the Fisher information matrix),which is a prerequisite for computing complexity,would be computationally unfeasible,if not impossible.An alternative is to try some sort of an algorithm-based estimate of geometric complexity that in essence imple-ments MDL in principle but does not require the derivation of the Fisher information matrix or its integration (e.g.,Hochreiter &Schmidhuber,1997).

Future Work and Other Issues

Testing qualitative models of cognition.Application of MDL and geometric complexity require that each of the models being compared be quantitative models that can be expressed as a para-metric family of probability distributions.Because of the large number of qualitative (i.e.,verbal)models in cognitive psychol-ogy,it would be ideal to extend these two selection tools to this domain as well.In a qualitative model,predictions are made verbally or graphically at the level of ordinal scales without necessarily making use of mathematical equations or the specifi-cation of the error structure.For example,models of word recog-nition state that lexical decision response times will be faster to high-frequency than low-frequency words,but these models make few statements about the magnitude of the time difference between frequency conditions,how frequency is related to response latency (e.g.,linearly or logarithmically),or the shape of the response time distribution.The axiomatic theory of decision making (e.g.,Fish-burn,1982)is another example of qualitative modeling.The theory is formulated in rigorous mathematical language and makes pre-cise predictions about choice behavior given a set of hypothetical gambles,but it lacks an error theory.Without an appropriate error theory,it is not possible to express the axiomatic theory as a parametric family of probability distributions.

Preliminary work by Grunwald (1999)suggests that extension of the two tools to such cases is possible.Most importantly,there may be no need to develop a statistical form of the model to apply MDL.This line of research may hold great promise.

The scope of the selection methods.The focus of this article has been on how to evaluate a quantitative model ’s account of behavioral data.The tools presented here for doing so (MDL and the complexity measure)can give the impression that model se-lection can be highly automated and purely objective.Nothing could be further from the truth.The outcomes of these tests contribute only a few pieces of evidence to the model selection process and should by no means be the sole selection criteria when comparing models.Not only are they silent on crucial issues such as plausibility,explanatory adequacy,and falsifiability (see Bam-ber &van Santen,1985),but on other issues pertinent to the particular testing situation,such as the quality and significance of the data being modeled.MDL and geometric complexity should be

13

For other regularity conditions,see Rissanen (1996,pp.41–42).

used in conjunction with other criteria in making informed deci-sions about which model to choose.

Conclusion

The goal of model selection as outlined early in the article is to satisfy two opposing goals.Choose the model that provides a sufficiently good fit to the data in the least complex manner,thus ensuring good generalizability.MDL does just this.Geometric complexity provides deeper insight into the models under consid-eration by helping us understand why one model is chosen over another.One model might provide a better fit but only at the cost of additional complexity.This excess complexity might not yet be justified given what is known about the regularities underlying the cognitive process.On the other hand,a model’s additional com-plexity might be justified by its vastly superior fit to the data relative to its competitor.Viewed in this light,the purpose of this article was to make the case that it is no longer enough to select a model on the basis of its superior fit to a sample of data.An additional property of the model must be justified as well,namely its complexity.Doing so will help ensure success in selecting among mathematical models of cognition.

References

Akaike,H.(1973).Information theory and an extension of the maximum likelihood principle.In B.N.Petrox&F.Caski(Eds.),Second interna-tional symposium on information theory(pp.267–281).Budapest,Hun-gary:Akademiai Kiado.

Amari,S.I.(1983).A foundation of information geometry.Electronics and Communications in Japan,66A,1–10.

Amari,S.I.(1985).Differential geometrical methods in statistics.New York:Springer-Verlag.

Anderson,N.H.(1981).Foundations of information integration theory. New York:Academic Press.

Balasubramanian,V.(1997).Statistical inference,Occam’s razor,and statistical mechanics on the space of probability distributions.Neural Computation,9,349–368.

Bamber,D.,&van Santen,J.P.H.(1985).How many parameters can a model have and still be testable?Journal of Mathematical Psychol-ogy,29,443–473.

Barron,A.R.,&Cover,T.M.(1991).Minimum complexity density estimation.IEEE Transaction on Information Theory,37,1034–1054. Bates,D.M.,&Watts,D.G.(1988).Nonlinear regression analysis and its applications.New York:Wiley.

Bishop,Y.M.M.,Fienberg,S.E.,&Holland,P.W.(1975).Discrete multivariate analysis.Cambridge,MA:MIT Press.

Bozdogan,H.(1990).On the information-based measure of covariance complexity and its application to the evaluation of multivariate linear https://www.doczj.com/doc/1f132757.html,munications in Statistics:Theory and Methods,19,221–278.

Cutting,J.E.,Bruno,N.,Brady N.P.,&Moore,C.(1992).Selectivity, scope,and simplicity of models:A lesson from fitting judgments of perceived depth.Journal of Experimental Psychology:General,121, 364–381.

Dunn,J.C.(2000).Model complexity:The fit to random data reconsid-ered.Psychological Research,63,174–182.

Fishburn,P.C.(1982).The foundations of expected utility.Dordrecht,The Netherlands:Reidel.

Friedman,D.,Massaro,D.,Kitzis,S.N.,&Cohen,M.M.(1995).A comparison of learning models.Journal of Mathematical Psychol-ogy,39,164–178.Gelman,A.,Carlin,J.B.,Stern,H.S.,&Rubin,D.B.(1995).Bayesian data analysis.New York:Chapman&Hall.

Gilks,W.R.,Richardson,S.,&Spiegelhalter,D.J.(1996).Markov chain Monte Carlo in practice.New York:Chapman&Hall.

Golden,R.M.(2000).Statistical tests for comparing possibly misspecified and nonnested models.Journal of Mathematical Psychology,44,153–170.

Grunwald,P.(1999,July29–August1).Determining the complexity of arbitrary model classes.Paper presented at the32nd annual meeting of the Society for Mathematical Psychology,Santa Cruz,CA. Grunwald,P.(2000).Model selection based on minimum description length.Journal of Mathematical Psychology,44,133–170. Hochreiter,S.,&Schmidhuber,J.(1997).Flat minima.Neural Computa-tion,9,1–42.

Jacobs,A.M.,&Grainger,J.(1994).Models of visual word recognition—sampling the state of the art.Journal of Experimental Psychology: Human Perception and Performance,29,1311–1334.

Kass,R. E.,&Raftery, A. E.(1995).Bayes factors.Journal of the American Statistical Association,90,773–795.

Kelso,S.J.A.(1995).Dynamic patterns:The self-organization of brain and behavior.Cambridge,MA:MIT Press.

Li,M.,&Vitanyi,P.(1997).An introduction to Kolmogorov complexity and its applications.New York:Springer-Verlag.

Li,S.-C.,Lewandowski,S.,&DeBrunner,V.E.(1996).Using parameter sensitivity and interdependence to predict model scope and falsifiability. Journal of Experimental Psychology:General,125,360–369. Linhart,H.,&Zucchini,W.(1986).Model selection.New York:Wiley. Myung,I.J.(2000).The importance of complexity in model selection. Journal of Mathematical Psychology,44,190–204.

Myung,I.J.,Balasubramanian,V.,&Pitt,M.A.(2000).Counting prob-ability distributions:Differential geometry and model selection.Pro-ceedings of the National Academy of Sciences USA,97,11170–11175. Myung,I.J.,Forster,M.,&Browne,M.W.(Eds.).(2000).Model selection [Special issue].Journal of Mathematical Psychology,44.

Myung,I.J.,Kim,C.,&Pitt,M.A.(2000).Toward an explanation of the power-law artifact:Insights from response surface analysis.Memory& Cognition,28,832–840.

Myung,I.J.,&Pitt,M.A.(1997).Applying Occam’s razor in modeling cognition:A Bayesian approach.Psychonomic Bulletin&Review,4, 79–95.

Myung,I.J.,&Pitt,M.A.(1998).Issues in selecting mathematical models of cognition.In J.Grainger&A.M.Jacobs(Eds.),Localist connection-ist approaches to human cognition(pp.327–355).Mahwah,NJ: Erlbaum.

Myung,I.J.,&Pitt,M.A.(2001).Mathematical modeling.In J.Wixted (Ed.),Stevens’handbook of experimental psychology.Vol.4:Method-ology(pp.429–459).New York:Wiley.

Nosofsky,R.M.(1986).Attention,similarity and the identification–categorization relationship.Journal of Experimental Psychology:Gen-eral,115,39–57.

Oden,G.C.,&Massaro,D.W.(1978).Integration of featural information in speech perception.Psychological Review,85,172–191.

Rao,C.R.(1945).Information and accuracy attainable in the estimation of statistical parameters.Bulletin of the Calcutta Mathematical Society,37, 81–91.

Ratcliff,R.(1978).A theory of memory retrieval.Psychological Re-view,85,59–108.

Reed,S.K.(1972).Pattern recognition and categorization.Cognitive Psychology,3,382–407.

Rissanen,J.(1983).A universal prior for integers and estimation by minimum description length.Annals of Statistics,11,416–431. Rissanen,J.(1996).Fisher information and stochastic complexity.IEEE Transactions on Information Theory,42,40–47.

Rissanen,J.(1999).Hypothesis selection and testing by the MDL princi-ple.The Computer Journal,42,260–269.

Roberts,F.S.(1979).Measurement theory with applications to decision making,utility,and the social sciences.Reading,MA:Addison-Wesley. Roberts,S.,&Pashler,H.(2000).How persuasive is a good fit?A comment on theory testing.Psychological Review,107,358–367. Schervish,M.J.(1995).The theory of statistics.New York:Springer-Verlag. Schwarz,G.(1978).Estimating the dimension of a model.The Annals of Statistics,6,461–464.

Shin,H.J.,&Nosofsky,R.M.(1992).Similarity-scaling studies of dot-patten classification and recognition.Journal of Experimental Psy-chology:General,121,278–304.

Stevens,S.S.(1960).The psychophysics of sensory function.American Scientist,48,226–253.

Stone,M.(1974).Cross-validatory choice and assessment of statistical

predictions(with discussion).Journal of the Royal Statistical Society, Series B,36,111–147.

Townsend,J.T.(1975).The mind–body problem revisited.In C.Cheng (Ed.),Philosophical aspects of the mind–body problem(pp.200–218). Honolulu,HI:Honolulu University Press.

Vitanyi,P.,&Li,M.(2000).Minimum description length,Bayesianism and Kolmogorov complexity.IEEE Transactions on Information The-ory,46,446–464.

Yamanishi,K.(1998).A decision-theoretic extension of stochastic com-plexity and its applications to learning.IEEE Transactions on Informa-tion Theory,44,1424–1439.

Zhang,S.(1999).Applications of geometric complexity and the minimum description length principle in mathematical modeling of cognition. Unpublished doctoral dissertation,Department of Psychology,Ohio State University.

Appendix A

Relationship Between Response Surface Analysis(RSA)and Differential Geometry This appendix derives the equations that show the relation between the

RSA length measure L

M

in response-surface analysis and the volume

measure V

M

in the differential geometric approach for a one-parameter

model.

Suppose that a random variable v

q

(q?1,...,N)is independently and

binomially distributed,Bin[n,g(?,x q)],with probability g(?,x q)and

sample size n where0?g(?,x q)?1.Define a data variable y q?v q/n,

which is the proportion of ones over n binary trials(see Footnote2for the

distinction between N and n).

The response surface of the one-parameter model for data size N is in

fact a curve embedded in the N-dimensional data space and formed by the

expected response vector E(y)?[E(y

1),...,E(y

N

)],which is equal to[g(?,

x

1

),...,g(?,x N)]for any n.The total length of the model’s response curve can be calculated using the standard calculus technique.From Pythagorean theorem,the infinitesimal length ds of the response curve is given by ds??1??2?···??dE?y N??2

?d???q?1N?dg??,x q??2??dE?y q??dg??,x q?d??.

Then,the desired total length L

M

is obtained by integrating the above ds over the parameter space defined by the model.

To derive the differential geometric volume measure,we first need to find the Fisher information matrix I(?)of the model.The Fisher informa-tion matrix for a multi-parameter model is defined as

I ij?????E???2ln f?y???

??i??j?,

where y is the data vector of sample size n?1,and the expectation is taken over the data.Because our model has one parameter,I(?)becomes a scalar

quantity instead of a matrix.Note that for n?1,the data variable y

q

is

equal to the binomial random variable v

q ,(q?1,...,N)where v

q

?{0,

1}is binomially distributed with probability g(?,x q).The log likelihood of

the data vector y?(y

1

,...,y

N

)is then given by

ln f?y???

??q?1N?ln1!y q!?1?y q!??y q ln g??,x q???1?y q?ln?1?g??,x q???.

From the log likelihood,the first and second derivatives are obtained as

follows:

d ln f?y???

d?

??q?1N

?y q?dg??,x q??g??,x q???1?y q???dg??,x q??

?1?g??,x q???and

d2ln f?y???

d?2

??q?1N

??dg??,x q??2

q?2?1?g??,x q??2

?y q2?2y q g??,x q??g??,x q?2??.

By taking the minus expectation of the second derivative and substituting

the relations E[y

q

2]?g(?,x

q

)[1?g(?,x q)]and E[y q]?g(?,x q)for the

binomial distribution of sample size1,we get the following form of the

Fisher information matrix:

I?????E?d2ln f?y???

d?2

???q?1N1

g??,x q??1?g??,x q???dg??,x q?d?

?2.

Finally,the geometric volume measure V

M

is obtained by integrating the

square root of the determinant of the above quantity,?det[I(?)],over the

parameter space defined by the model.

Appendix B

Calculation of Minimum Description Length(MDL)and Geometric Complexity

for the Two Psychophysics Models

This appendix shows the steps involved in calculating geometric com-

plexity for Fechner’s and Stevens’s models of psychophysics.

For Fechner’s model,the probability distribution of data y

i given x

i

and

parameters(a,b)is given by

f?y i?a,b??

1

?2??e?12?2?y i?a ln?x i?b??2,

with its log likelihood

L?ln f?y i?????

1

2?2

?y i?a ln?x i?b??2?ln??

1

2

ln2?.

First,we find the first and second derivatives of the log likelihood as follows:

?L ?a?1

?2?y i?a ln?x i?b??ln?x i?b?,

?2L

?a2??1

?2?ln?x i?b??2,

?L ?b?2a

?2?y i?a ln?x i?b??/?x i?b?,

?2L

?b2??a

?2?y i?a ln?x i?b??a?/?x i?b?2,and

?2L ?a?b?1

?2?y i?2a ln?x i?b??/?x i?b?.

Next,expectations of minus log likelihoods over y

i are sought using E(y

i

)

?a ln(x i?b)as

?E??2L?a2??1?2?ln?x i?b??2,

?E??2L?b2??a2?21?x i?b?2,and

?E??2L?a?b??a?2ln?x i?b?

?x i?b?.

Therefore,we obtain the Fisher information matrix given x

i

as

I1?a,b;x i??1

?2??ln?x i?b??2a ln?x i?b??x i?b?

a

ln?x i?b?

?x i?b?

a2

?x i?b?2?,

where the subscript l stands for sample size1.The desired Fisher infor-

mation matrix I in Equation7for the entire data y?(y

1,...,y

n

),is then

obtained as

I????a,b????I ij??1n E??2ln f?y??y1,...,y n????

??i?j?,i,j?1,2

?

?

1

n?i?1n I1?a,b;x i?

?

1

2??i?1n?ln?x i?b??2?i?1n a ln?x i?b??x i?b?

?

i?1

n

a

ln?x i?b?

?x i?b??i?1n a2?x i?b?2

?,

assuming independent observations,that is,ln f(y??)??i?1n ln f(y i??).

From the above result,the geometric complexity(GC)of Fechner’s

model is obtained as

GC F?

k

2

ln

n

2?

?ln?d??det?I????

?ln???da db a?2H?b???ln n?k2ln n2?

?ln?a da?ln?H?b?db?2ln??ln2?

for k?2,with

H?b?????i?1n?ln?x i?b??2?

??i?1n1?x i?b?2????i?1n ln?x i?b??x i?b??2,

where x?{1,2,...,6}is assumed in the present simulation.

Similarly,for Stevens’s model,the Fisher information matrix for the

data y?(y

1

,...,y

n

)can be obtained as follows:

I?a,b??

1?i?1n I1?a,b;x i??12??i?1n x i2b a?i?1n x i2b ln x i

a?i?1n x i2b ln x i a2

?

i?1

n

x i2b?ln x i?2?From this,the geometric complexity of Stevens’s model is given by

GC S?ln???da db a?2W?b???ln n?k2ln n2?

?ln?a da?ln?W?b?db?2ln??ln2?

for k?2,with

(Appendix continues)

W?b?????i?1n x i2b???i?1n x i2b?ln x i?2????i?1n x i2b ln x i?2,

where x?{1,2,...6}.

We note that in choosing between two models using MDL,only the difference in geometric complexity between the two models matters,not their absolute values.The difference for the two psychophysics models is as follows:

?GC?GC S?GC F

?ln?W?b?db?ln?H?b?db.

Therefore,we only need to evaluate two one-dimensional integrals.

For Stevens’s model,the integral ln?W(b)db over the entire range of the exponent parameter b(i.e.,0?b??)turns out to be infinity.Conse-quently,the range of the parameter must be restricted to make the integral

finite.The range of0?b?3was chosen based on typical values of the parameter observed in experiments(e.g.,Stevens,1960).The integral was then evaluated numerically by the simple Monte Carlo method(see Ap-pendix C)in which100,000random samples were drawn from the uniform distribution on[0,3].The result was8.00.For Fechner’s model,the integral ln?H(b)db is finite over the entire range of the parameter(0?b ??).The integral was evaluated numerically by the simple Monte Carlo method,and its value runs https://www.doczj.com/doc/1f132757.html,bining these two results,the difference in geometric complexity between Stevens’s and Fechner’s mod-els was calculated as?GC?GC

S

?GC F?8.00?2.48?5.52. From this point,calculation of MDL is straightforward.Plug the com-plexity value into Equation7and combine it with the calculation of the log

likelihood measure;the common constant term C

?ln?ada?2ln??ln2?can be ignored:

MDL S??ln f?y???S??8.00

MDL F??ln f?y???F??2.48.

Appendix C

Numerical Integration Methods The definition of the geometric complexity measure includes an integral

of the determinant of the Fisher information matrix.As it is,in general,not

possible to obtain an analytic solution of the integral,integration by

sampling(i.e.,Monte Carlo methods)is the only option.This appendix

provides a tutorial of Monte Carlo integration methods.

Simple Monte Carlo Method

Let f(x)denote the function to be integrated(e.g.,the square root of the

determinant of the Fisher information matrix in calculating geometric

complexity).Suppose we wish to evaluate the following one-dimensional

integral:

I??a b f?x?dx,

where a?b.The value of this integral is equal to(b?a)?f?,where f?is

the mean value of f(x)over the interval[a,b].The mean can be estimated

based on a set of n samples{x

1,x

2

,...,x

n

}randomly selected from the

uniform density over[a,b].From the estimated mean,the desired integral

can then be approximated as

I n??b?a??f?n,

where?f n?1/n?i?1n f?x i?.This method is called the simple Monte Carlo method.Generalization of the method to multidimensional integration problems is straightforward.

The rationale for the simple Monte Carlo method is the strong law of large number in probability theory,and therefore,the accuracy of the simple Monte Carlo method is improved as sample size increases.Specif-

ically,the standard error of I

n

is given by

S n??/?n,

where?is the population standard deviation.Although in theory s n approaches zero as n goes to infinity,in practice the quantity may not be quite close to zero even for an extremely large but finite n(e.g.,million) because?may be quite large,depending on the form of f(x).Other techniques must then be developed to increase the accuracy of the numer-ical approximation.

Importance Sampling Monte Carlo Method Importance sampling improves on the simple Monte Carlo method by directly controlling?.Let us rewrite the original integration problem as

I??a b f?x?dx??a b f?x?g?x?g?x?dx

in terms of some probability density g(x),from which we know how to

sample.Based on a set of n independent samples{x

1

,x

2

,...,x

n

}drawn from g(x),the desired integral I can be estimated as

I n?

1

n?i?1n f?x i?g?x i?.(C1)

For this estimator,?can be shown to be effectively close to zero if g(x)is

chosen such that the importance ratio,f(x

i

)/g(x

i

),is roughly close to a

constant for all x

i

s.Under this condition,a fairly precise estimate of the integral I can be obtained.On the other hand,importance sampling may be poor if the importance ratio varies greatly across x values.When a uniform density is chosen for g(x),the importance Monte Carlo reduces to the simple Monte Carlo.

Markov Chain Monte Carlo

The simple and importance sampling Monte Carlo methods prescribe

that samples{x

1

,x

2

,...,x

n

}be drawn independently.This may not always be feasible,for g(x)can be quite nonstandard.In such cases,a dependent

sample could be used to estimate the integral by applying Markov chain Monte Carlo methods.

The basic idea of this method is to generate samples{x

1,x

2

,...,x

n

},

possibly dependent,from a Markov chain process whose stationary distri-bution is g(x).Because the samples are not necessarily independent,the strong law of large numbers is not applicable.Instead,the asymptotic consistency of the series in Equation C1is guaranteed by the ergodic property of Markov chains,which ensures I

n

3I as n goes to infinity. Among many algorithms that are currently in use to generate samples from a Markov chain,Gibbs sampling is the most popular method because of its simplicity.This algorithm,as a special case of the Metropolis-Hastings algorithm,proceeds as follows:

Step1:Let t?0.Draw a starting point x(t)?[x

1(t),...,x

m

(t)]

from any starting distribution,say g(x),whose full condi-

tional distributions are known.

Step2:For i?1to n,sample x

i

(t?1)from the one-dimensional

conditional distribution g[x

i

(t)?x

1

(t?1),...,x

i?1

(t?1),

x

i?1

(t),...,x

m

(t)].

Step3:Let t?t?1.Go to Step2.

For further detail on this and other Monte Carlo methods,see Gilks et al.

(1996)and Gelman et al.(1995),on which the present tutorial is based.

Received January5,2000

Revision received June13,2001

Accepted June13,2001

从实践的角度探讨在日语教学中多媒体课件的应用

从实践的角度探讨在日语教学中多媒体课件的应用 在今天中国的许多大学,为适应现代化,信息化的要求,建立了设备完善的适应多媒体教学的教室。许多学科的研究者及现场教员也积极致力于多媒体软件的开发和利用。在大学日语专业的教学工作中,教科书、磁带、粉笔为主流的传统教学方式差不多悄然向先进的教学手段而进展。 一、多媒体课件和精品课程的进展现状 然而,目前在专业日语教学中能够利用的教学软件并不多见。比如在中国大学日语的专业、第二外語用教科书常见的有《新编日语》(上海外语教育出版社)、《中日交流标准日本語》(初级、中级)(人民教育出版社)、《新编基础日语(初級、高級)》(上海译文出版社)、《大学日本语》(四川大学出版社)、《初级日语》《中级日语》(北京大学出版社)、《新世纪大学日语》(外语教学与研究出版社)、《综合日语》(北京大学出版社)、《新编日语教程》(华东理工大学出版社)《新编初级(中级)日本语》(吉林教育出版社)、《新大学日本语》(大连理工大学出版社)、《新大学日语》(高等教育出版社)、《现代日本语》(上海外语教育出版社)、《基础日语》(复旦大学出版社)等等。配套教材以录音磁带、教学参考、习题集为主。只有《中日交流標準日本語(初級上)》、《初級日语》、《新编日语教程》等少数教科书配备了多媒体DVD视听教材。 然而这些试听教材,有的内容为日语普及读物,并不适合专业外语课堂教学。比如《新版中日交流标准日本语(初级上)》,有的尽管DVD视听教材中有丰富的动画画面和语音练习。然而,课堂操作则花费时刻长,不利于教师重点指导,更加适合学生的课余练习。比如北京大学的《初级日语》等。在这种情形下,许多大学的日语专业致力于教材的自主开发。 其中,有些大学的还推出精品课程,取得了专门大成绩。比如天津外国语学院的《新编日语》多媒体精品课程为2007年被评为“国家级精品课”。目前已被南开大学外国语学院、成都理工大学日语系等全国40余所大学推广使用。

新视野大学英语全部课文原文

Unit1 Americans believe no one stands still. If you are not moving ahead, you are falling behind. This attitude results in a nation of people committed to researching, experimenting and exploring. Time is one of the two elements that Americans save carefully, the other being labor. "We are slaves to nothing but the clock,” it has been said. Time is treated as if it were something almost real. We budget it, save it, waste it, steal it, kill it, cut it, account for it; we also charge for it. It is a precious resource. Many people have a rather acute sense of the shortness of each lifetime. Once the sands have run out of a person’s hourglass, they cannot be replaced. We want every minute to count. A foreigner’s first impression of the U.S. is li kely to be that everyone is in a rush -- often under pressure. City people always appear to be hurrying to get where they are going, restlessly seeking attention in a store, or elbowing others as they try to complete their shopping. Racing through daytime meals is part of the pace

新视野大学英语第三版第二册课文语法讲解 Unit4

新视野三版读写B2U4Text A College sweethearts 1I smile at my two lovely daughters and they seem so much more mature than we,their parents,when we were college sweethearts.Linda,who's21,had a boyfriend in her freshman year she thought she would marry,but they're not together anymore.Melissa,who's19,hasn't had a steady boyfriend yet.My daughters wonder when they will meet"The One",their great love.They think their father and I had a classic fairy-tale romance heading for marriage from the outset.Perhaps,they're right but it didn't seem so at the time.In a way, love just happens when you least expect it.Who would have thought that Butch and I would end up getting married to each other?He became my boyfriend because of my shallow agenda:I wanted a cute boyfriend! 2We met through my college roommate at the university cafeteria.That fateful night,I was merely curious,but for him I think it was love at first sight."You have beautiful eyes",he said as he gazed at my face.He kept staring at me all night long.I really wasn't that interested for two reasons.First,he looked like he was a really wild boy,maybe even dangerous.Second,although he was very cute,he seemed a little weird. 3Riding on his bicycle,he'd ride past my dorm as if"by accident"and pretend to be surprised to see me.I liked the attention but was cautious about his wild,dynamic personality.He had a charming way with words which would charm any girl.Fear came over me when I started to fall in love.His exciting"bad boy image"was just too tempting to resist.What was it that attracted me?I always had an excellent reputation.My concentration was solely on my studies to get superior grades.But for what?College is supposed to be a time of great learning and also some fun.I had nearly achieved a great education,and graduation was just one semester away.But I hadn't had any fun;my life was stale with no component of fun!I needed a boyfriend.Not just any boyfriend.He had to be cute.My goal that semester became: Be ambitious and grab the cutest boyfriend I can find. 4I worried what he'd think of me.True,we lived in a time when a dramatic shift in sexual attitudes was taking place,but I was a traditional girl who wasn't ready for the new ways that seemed common on campus.Butch looked superb!I was not immune to his personality,but I was scared.The night when he announced to the world that I was his girlfriend,I went along

新视野大学英语读写教程第一册课文翻译及课后答案

Unit 1 1学习外语是我一生中最艰苦也是最有意义的经历之一。虽然时常遭遇挫折,但却非常有价值。 2我学外语的经历始于初中的第一堂英语课。老师很慈祥耐心,时常表扬学生。由于这种积极的教学方法,我踊跃回答各种问题,从不怕答错。两年中,我的成绩一直名列前茅。 3到了高中后,我渴望继续学习英语。然而,高中时的经历与以前大不相同。以前,老师对所有的学生都很耐心,而新老师则总是惩罚答错的学生。每当有谁回答错了,她就会用长教鞭指着我们,上下挥舞大喊:“错!错!错!”没有多久,我便不再渴望回答问题了。我不仅失去了回答问题的乐趣,而且根本就不想再用英语说半个字。 4好在这种情况没持续多久。到了大学,我了解到所有学生必须上英语课。与高中老师不。大学英语老师非常耐心和蔼,而且从来不带教鞭!不过情况却远不尽如人意。由于班大,每堂课能轮到我回答的问题寥寥无几。上了几周课后,我还发现许多同学的英语说得比我要好得多。我开始产生一种畏惧感。虽然原因与高中时不同,但我却又一次不敢开口了。看来我的英语水平要永远停步不前了。 5直到几年后我有机会参加远程英语课程,情况才有所改善。这种课程的媒介是一台电脑、一条电话线和一个调制解调器。我很快配齐了必要的设备并跟一个朋友学会了电脑操作技术,于是我每周用5到7天在网上的虚拟课堂里学习英语。 6网上学习并不比普通的课堂学习容易。它需要花许多的时间,需要学习者专心自律,以跟上课程进度。我尽力达到课程的最低要求,并按时完成作业。 7我随时随地都在学习。不管去哪里,我都随身携带一本袖珍字典和笔记本,笔记本上记着我遇到的生词。我学习中出过许多错,有时是令人尴尬的错误。有时我会因挫折而哭泣,有时甚至想放弃。但我从未因别的同学英语说得比我快而感到畏惧,因为在电脑屏幕上作出回答之前,我可以根据自己的需要花时间去琢磨自己的想法。突然有一天我发现自己什么都懂了,更重要的是,我说起英语来灵活自如。尽管我还是常常出错,还有很多东西要学,但我已尝到了刻苦学习的甜头。 8学习外语对我来说是非常艰辛的经历,但它又无比珍贵。它不仅使我懂得了艰苦努力的意义,而且让我了解了不同的文化,让我以一种全新的思维去看待事物。学习一门外语最令人兴奋的收获是我能与更多的人交流。与人交谈是我最喜欢的一项活动,新的语言使我能与陌生人交往,参与他们的谈话,并建立新的难以忘怀的友谊。由于我已能说英语,别人讲英语时我不再茫然不解了。我能够参与其中,并结交朋友。我能与人交流,并能够弥合我所说的语言和所处的文化与他们的语言和文化之间的鸿沟。 III. 1. rewarding 2. communicate 3. access 4. embarrassing 5. positive 6. commitment 7. virtual 8. benefits 9. minimum 10. opportunities IV. 1. up 2. into 3. from 4. with 5. to 6. up 7. of 8. in 9. for 10.with V. 1.G 2.B 3.E 4.I 5.H 6.K 7.M 8.O 9.F 10.C Sentence Structure VI. 1. Universities in the east are better equipped, while those in the west are relatively poor. 2. Allan Clark kept talking the price up, while Wilkinson kept knocking it down. 3. The husband spent all his money drinking, while his wife saved all hers for the family. 4. Some guests spoke pleasantly and behaved politely, while others wee insulting and impolite. 5. Outwardly Sara was friendly towards all those concerned, while inwardly she was angry. VII. 1. Not only did Mr. Smith learn the Chinese language, but he also bridged the gap between his culture and ours. 2. Not only did we learn the technology through the online course, but we also learned to communicate with friends in English. 3. Not only did we lose all our money, but we also came close to losing our lives.

新大学日语简明教程课文翻译

新大学日语简明教程课文翻译 第21课 一、我的留学生活 我从去年12月开始学习日语。已经3个月了。每天大约学30个新单词。每天学15个左右的新汉字,但总记不住。假名已经基本记住了。 简单的会话还可以,但较难的还说不了。还不能用日语发表自己的意见。既不能很好地回答老师的提问,也看不懂日语的文章。短小、简单的信写得了,但长的信写不了。 来日本不久就迎来了新年。新年时,日本的少女们穿着美丽的和服,看上去就像新娘。非常冷的时候,还是有女孩子穿着裙子和袜子走在大街上。 我在日本的第一个新年过得很愉快,因此很开心。 现在学习忙,没什么时间玩,但周末常常运动,或骑车去公园玩。有时也邀朋友一起去。虽然我有国际驾照,但没钱,买不起车。没办法,需要的时候就向朋友借车。有几个朋友愿意借车给我。 二、一个房间变成三个 从前一直认为睡在褥子上的是日本人,美国人都睡床铺,可是听说近来纽约等大都市的年轻人不睡床铺,而是睡在褥子上,是不是突然讨厌起床铺了? 日本人自古以来就睡在褥子上,那自有它的原因。人们都说日本人的房子小,从前,很少有人在自己的房间,一家人住在一个小房间里是常有的是,今天仍然有人过着这样的生活。 在仅有的一个房间哩,如果要摆下全家人的床铺,就不能在那里吃饭了。这一点,褥子很方便。早晨,不需要褥子的时候,可以收起来。在没有了褥子的房间放上桌子,当作饭厅吃早饭。来客人的话,就在那里喝茶;孩子放学回到家里,那房间就成了书房。而后,傍晚又成为饭厅。然后收起桌子,铺上褥子,又成为了全家人睡觉的地方。 如果是床铺的话,除了睡觉的房间,还需要吃饭的房间和书房等,但如果使用褥子,一个房间就可以有各种用途。 据说从前,在纽约等大都市的大学学习的学生也租得起很大的房间。但现在房租太贵,租不起了。只能住更便宜、更小的房间。因此,似乎开始使用睡觉时作床,白天折小能成为椅子的、方便的褥子。

新视野大学英语第一册Unit 1课文翻译

新视野大学英语第一册Unit 1课文翻译 学习外语是我一生中最艰苦也是最有意义的经历之一。 虽然时常遭遇挫折,但却非常有价值。 我学外语的经历始于初中的第一堂英语课。 老师很慈祥耐心,时常表扬学生。 由于这种积极的教学方法,我踊跃回答各种问题,从不怕答错。 两年中,我的成绩一直名列前茅。 到了高中后,我渴望继续学习英语。然而,高中时的经历与以前大不相同。 以前,老师对所有的学生都很耐心,而新老师则总是惩罚答错的学生。 每当有谁回答错了,她就会用长教鞭指着我们,上下挥舞大喊:“错!错!错!” 没有多久,我便不再渴望回答问题了。 我不仅失去了回答问题的乐趣,而且根本就不想再用英语说半个字。 好在这种情况没持续多久。 到了大学,我了解到所有学生必须上英语课。 与高中老师不同,大学英语老师非常耐心和蔼,而且从来不带教鞭! 不过情况却远不尽如人意。 由于班大,每堂课能轮到我回答的问题寥寥无几。 上了几周课后,我还发现许多同学的英语说得比我要好得多。 我开始产生一种畏惧感。 虽然原因与高中时不同,但我却又一次不敢开口了。 看来我的英语水平要永远停步不前了。 直到几年后我有机会参加远程英语课程,情况才有所改善。 这种课程的媒介是一台电脑、一条电话线和一个调制解调器。 我很快配齐了必要的设备并跟一个朋友学会了电脑操作技术,于是我每周用5到7天在网上的虚拟课堂里学习英语。 网上学习并不比普通的课堂学习容易。 它需要花许多的时间,需要学习者专心自律,以跟上课程进度。 我尽力达到课程的最低要求,并按时完成作业。 我随时随地都在学习。 不管去哪里,我都随身携带一本袖珍字典和笔记本,笔记本上记着我遇到的生词。 我学习中出过许多错,有时是令人尴尬的错误。 有时我会因挫折而哭泣,有时甚至想放弃。 但我从未因别的同学英语说得比我快而感到畏惧,因为在电脑屏幕上作出回答之前,我可以根据自己的需要花时间去琢磨自己的想法。 突然有一天我发现自己什么都懂了,更重要的是,我说起英语来灵活自如。 尽管我还是常常出错,还有很多东西要学,但我已尝到了刻苦学习的甜头。 学习外语对我来说是非常艰辛的经历,但它又无比珍贵。 它不仅使我懂得了艰苦努力的意义,而且让我了解了不同的文化,让我以一种全新的思维去看待事物。 学习一门外语最令人兴奋的收获是我能与更多的人交流。 与人交谈是我最喜欢的一项活动,新的语言使我能与陌生人交往,参与他们的谈话,并建立新的难以忘怀的友谊。 由于我已能说英语,别人讲英语时我不再茫然不解了。 我能够参与其中,并结交朋友。

新大学日语阅读与写作1 第3课译文

习惯与礼仪 我是个漫画家,对旁人细微的动作、不起眼的举止等抱有好奇。所以,我在国外只要做错一点什么,立刻会比旁人更为敏锐地感觉到那个国家的人们对此作出的反应。 譬如我多次看到过,欧美人和中国人见到我们日本人吸溜吸溜地出声喝汤而面露厌恶之色。过去,日本人坐在塌塌米上,在一张低矮的食案上用餐,餐具离嘴较远。所以,养成了把碗端至嘴边吸食的习惯。喝羹匙里的东西也象吸似的,声声作响。这并非哪一方文化高或低,只是各国的习惯、礼仪不同而已。 日本人坐在椅子上围桌用餐是1960年之后的事情。当时,还没有礼仪规矩,甚至有人盘着腿吃饭。外国人看见此景大概会一脸厌恶吧。 韩国女性就座时,单腿翘起。我认为这种姿势很美,但习惯于双膝跪坐的日本女性大概不以为然,而韩国女性恐怕也不认为跪坐为好。 日本等多数亚洲国家,常有人习惯在路上蹲着。欧美人会联想起狗排便的姿势而一脸厌恶。 日本人常常把手放在小孩的头上说“好可爱啊!”,而大部分外国人会不愿意。 如果向回教国家的人们劝食猪肉和酒,或用左手握手、递东西,会不受欢迎的。当然,饭菜也用右手抓着吃。只有从公用大盘往自己的小盘里分食用的公勺是用左手拿。一旦搞错,用黏糊糊的右手去拿,

会遭人厌恶。 在欧美,对不受欢迎的客人不说“请脱下外套”,所以电视剧中的侦探哥隆波总是穿着外套。访问日本家庭时,要在门厅外脱掉外套后进屋。穿到屋里会不受欢迎的。 这些习惯只要了解就不会出问题,如果因为不知道而遭厌恶、憎恨,实在心里难受。 过去,我曾用色彩图画和简短的文字画了一本《关键时刻的礼仪》(新潮文库)。如今越发希望用各国语言翻译这本书。以便能对在日本的外国人有所帮助。同时希望有朝一日以漫画的形式画一本“世界各国的习惯与礼仪”。 练习答案 5、 (1)止める並んでいる見ているなる着色した (2)拾った入っていた行ったしまった始まっていた

新视野大学英语(第三版)读写教程第二册课文翻译(全册)

新视野大学英语第三版第二册读写课文翻译 Unit 1 Text A 一堂难忘的英语课 1 如果我是唯一一个还在纠正小孩英语的家长,那么我儿子也许是对的。对他而言,我是一个乏味的怪物:一个他不得不听其教诲的父亲,一个还沉湎于语法规则的人,对此我儿子似乎颇为反感。 2 我觉得我是在最近偶遇我以前的一位学生时,才开始对这个问题认真起来的。这个学生刚从欧洲旅游回来。我满怀着诚挚期待问她:“欧洲之行如何?” 3 她点了三四下头,绞尽脑汁,苦苦寻找恰当的词语,然后惊呼:“真是,哇!” 4 没了。所有希腊文明和罗马建筑的辉煌居然囊括于一个浓缩的、不完整的语句之中!我的学生以“哇!”来表示她的惊叹,我只能以摇头表达比之更强烈的忧虑。 5 关于正确使用英语能力下降的问题,有许多不同的故事。学生的确本应该能够区分诸如their/there/they're之间的不同,或区别complimentary 跟complementary之间显而易见的差异。由于这些知识缺陷,他们承受着大部分不该承受的批评和指责,因为舆论认为他们应该学得更好。 6 学生并不笨,他们只是被周围所看到和听到的语言误导了。举例来说,杂货店的指示牌会把他们引向stationary(静止处),虽然便笺本、相册、和笔记本等真正的stationery(文具用品)并没有被钉在那儿。朋友和亲人常宣称They've just ate。实际上,他们应该说They've just eaten。因此,批评学生不合乎情理。 7 对这种缺乏语言功底而引起的负面指责应归咎于我们的学校。学校应对英语熟练程度制定出更高的标准。可相反,学校只教零星的语法,高级词汇更是少之又少。还有就是,学校的年轻教师显然缺乏这些重要的语言结构方面的知识,因为他们过去也没接触过。学校有责任教会年轻人进行有效的语言沟通,可他们并没把语言的基本框架——准确的语法和恰当的词汇——充分地传授给学生。

新视野大学英语1课文翻译

新视野大学英语1课文翻译 1下午好!作为校长,我非常自豪地欢迎你们来到这所大学。你们所取得的成就是你们自己多年努力的结果,也是你们的父母和老师们多年努力的结果。在这所大学里,我们承诺将使你们学有所成。 2在欢迎你们到来的这一刻,我想起自己高中毕业时的情景,还有妈妈为我和爸爸拍的合影。妈妈吩咐我们:“姿势自然点。”“等一等,”爸爸说,“把我递给他闹钟的情景拍下来。”在大学期间,那个闹钟每天早晨叫醒我。至今它还放在我办公室的桌子上。 3让我来告诉你们一些你们未必预料得到的事情。你们将会怀念以前的生活习惯,怀念父母曾经提醒你们要刻苦学习、取得佳绩。你们可能因为高中生活终于结束而喜极而泣,你们的父母也可能因为终于不用再给你们洗衣服而喜极而泣!但是要记住:未来是建立在过去扎实的基础上的。 4对你们而言,接下来的四年将会是无与伦比的一段时光。在这里,你们拥有丰富的资源:有来自全国各地的有趣的学生,有学识渊博又充满爱心的老师,有综合性图书馆,有完备的运动设施,还有针对不同兴趣的学生社团——从文科社团到理科社团、到社区服务等等。你们将自由地探索、学习新科目。你们要学着习惯点灯熬油,学着结交充满魅力的人,学着去追求新的爱好。我想鼓励你们充分利用这一特殊的经历,并用你们的干劲和热情去收获这一机会所带来的丰硕成果。 5有这么多课程可供选择,你可能会不知所措。你不可能选修所有的课程,但是要尽可能体验更多的课程!大学里有很多事情可做可学,每件事情都会为你提供不同视角来审视世界。如果我只能给你们一条选课建议的话,那就是:挑战自己!不要认为你早就了解自己对什么样的领域最感兴趣。选择一些你从未接触过的领域的课程。这样,你不仅会变得更加博学,而且更有可能发现一个你未曾想到的、能成就你未来的爱好。一个绝佳的例子就是时装设计师王薇薇,她最初学的是艺术史。随着时间的推移,王薇薇把艺术史研究和对时装的热爱结合起来,并将其转化为对设计的热情,从而使她成为全球闻名的设计师。 6在大学里,一下子拥有这么多新鲜体验可能不会总是令人愉快的。在你的宿舍楼里,住在你隔壁寝室的同学可能会反复播放同一首歌,令你头痛欲裂!你可能喜欢早起,而你的室友却是个夜猫子!尽管如此,你和你的室友仍然可能成

新视野大学英语2课文翻译

新视野大学英语2课文翻译(Unit1-Unit7) Unit 1 Section A 时间观念强的美国人 Para. 1 美国人认为没有人能停止不前。如果你不求进取,你就会落伍。这种态度造就了一个投身于研究、实验和探索的民族。时间是美国人注意节约的两个要素之一,另一个是劳力。 Para. 2 人们一直说:“只有时间才能支配我们。”人们似乎是把时间当作一个差不多是实实在在的东西来对待的。我们安排时间、节约时间、浪费时间、挤抢时间、消磨时间、缩减时间、对时间的利用作出解释;我们还要因付出时间而收取费用。时间是一种宝贵的资源,许多人都深感人生的短暂。时光一去不复返。我们应当让每一分钟都过得有意义。 Para. 3 外国人对美国的第一印象很可能是:每个人都匆匆忙忙——常常处于压力之下。城里人看上去总是在匆匆地赶往他们要去的地方,在商店里他们焦躁不安地指望店员能马上来为他们服务,或者为了赶快买完东西,用肘来推搡他人。白天吃饭时人们也都匆匆忙忙,这部分地反映出这个国家的生活节奏。工作时间被认为是宝贵的。Para. 3b 在公共用餐场所,人们都等着别人吃完后用餐,以便按时赶回去工作。你还会发现司机开车很鲁莽,人们推搡着在你身边过去。你会怀念微笑、简短的交谈以及与陌生人的随意闲聊。不要觉得这是针对你个人的,这是因为人们非常珍惜时间,而且也不喜欢他人“浪费”时间到不恰当的地步。 Para. 4 许多刚到美国的人会怀念诸如商务拜访等场合开始时的寒暄。他们也会怀念那种一边喝茶或咖啡一边进行的礼节性交流,这也许是他们自己国家的一种习俗。他们也许还会怀念在饭店或咖啡馆里谈生意时的那种轻松悠闲的交谈。一般说来,美国人是不会在如此轻松的环境里通过长时间的闲聊来评价他们的客人的,更不用说会在增进相互间信任的过程中带他们出去吃饭,或带他们去打高尔夫球。既然我们通常是通过工作而不是社交来评估和了解他人,我们就开门见山地谈正事。因此,时间老是在我们心中的耳朵里滴滴答答地响着。 Para. 5 因此,我们千方百计地节约时间。我们发明了一系列节省劳力的装置;我们通过发传真、打电话或发电子邮件与他人迅速地进行交流,而不是通过直接接触。虽然面对面接触令人愉快,但却要花更多的时间, 尤其是在马路上交通拥挤的时候。因此,我们把大多数个人拜访安排在下班以后的时间里或周末的社交聚会上。 Para. 6 就我们而言,电子交流的缺乏人情味与我们手头上事情的重要性之间很少有或完全没有关系。在有些国家, 如果没有目光接触,就做不成大生意,这需要面对面的交谈。在美国,最后协议通常也需要本人签字。然而现在人们越来越多地在电视屏幕上见面,开远程会议不仅能解决本国的问题,而且还能通过卫星解决国际问题。

新视野大学英语第三版课文翻译

新视野大学英语3第三版课文翻译 Unit 1 The Way to Success 课文A Never, ever give up! 永不言弃! As a young boy, Britain's great Prime Minister, Sir Winston Churchill, attended a public school called Harrow. He was not a good student, and had he not been from a famous family, he probably would have been removed from the school for deviating from the rules. Thankfully, he did finish at Harrow and his errors there did not preclude him from going on to the university. He eventually had a premier army career whereby he was later elected prime minister. He achieved fame for his wit, wisdom, civic duty, and abundant courage in his refusal to surrender during the miserable dark days of World War II. His amazing determination helped motivate his entire nation and was an inspiration worldwide. Toward the end of his period as prime minister, he was invited to address the patriotic young boys at his old school, Harrow. The headmaster said, "Young gentlemen, the greatest speaker of our time, will be here in a few days to address you, and you should obey whatever sound advice he may give you." The great day arrived. Sir Winston stood up, all five feet, five inches and 107 kilos of him, and gave this short, clear-cut speech: "Young men, never give up. Never give up! Never give up! Never, never, never, never!" 英国的伟大首相温斯顿·丘吉尔爵士,小时候在哈罗公学上学。当时他可不是个好学生,要不是出身名门,他可能早就因为违反纪律被开除了。谢天谢地,他总算从哈罗毕业了,在那里犯下的错误并没影响到他上大学。后来,他凭着军旅生涯中的杰出表现当选为英国首相。他的才思、智慧、公民责任感以及在二战痛苦而黑暗的时期拒绝投降的无畏勇气,为他赢得了美名。他非凡的决心,不仅激励了整个民族,还鼓舞了全世界。 在他首相任期即将结束时,他应邀前往母校哈罗公学,为满怀报国之志的同学们作演讲。校长说:“年轻的先生们,当代最伟大的演说家过几天就会来为你们演讲,他提出的任何中肯的建议,你们都要听从。”那个激动人心的日子终于到了。温斯顿爵士站了起来——他只有5 英尺5 英寸高,体重却有107 公斤。他作了言简意赅的讲话:“年轻人,要永不放弃。永不放弃!永不放弃!永不,永不,永不,永不!” Personal history, educational opportunity, individual dilemmas - none of these can inhibit a strong spirit committed to success. No task is too hard. No amount of preparation is too long or too difficult. Take the example of two of the most scholarly scientists of our age, Albert Einstein and Thomas Edison. Both faced immense obstacles and extreme criticism. Both were called "slow to learn" and written off as idiots by their teachers. Thomas Edison ran away from school because his teacher whipped him repeatedly for asking too many questions. Einstein didn't speak fluently until he was almost nine years old and was such a poor student that some thought he was unable to learn. Yet both boys' parents believed in them. They worked intensely each day with their sons, and the boys learned to never bypass the long hours of hard work that they needed to succeed. In the end, both Einstein and Edison overcame their childhood persecution and went on to achieve magnificent discoveries that benefit the entire world today. Consider also the heroic example of Abraham Lincoln, who faced substantial hardships, failures and repeated misfortunes in his lifetime. His background was certainly not glamorous. He was raised in a very poor family with only one year of formal education. He failed in business twice, suffered a nervous breakdown when his first love died suddenly and lost eight political

新视野大学英语读写教程2-(第三版)-unit-2-课文原文及翻译

Text A 课文 A The humanities: Out of date? 人文学科:过时了吗? When the going gets tough, the tough takeaccounting. When the job market worsens, manystudents calculate they can't major in English orhistory. They have to study something that booststheir prospects of landing a job. 当形势变得困难时,强者会去选学会计。当就业市场恶化时,许多学生估算着他们不能再主修英语或历史。他们得学一些能改善他们就业前景的东西。 The data show that as students have increasingly shouldered the ever-rising c ost of tuition,they have defected from the study of the humanities and toward applied science and "hard"skills that they bet will lead to employment. In oth er words, a college education is more andmore seen as a means for economic betterment rather than a means for human betterment.This is a trend that i s likely to persist and even accelerate. 数据显示,随着学生肩负的学费不断增加,他们已从学习人文学科转向他们相信有益于将来就业的应用科学和“硬”技能。换言之,大学教育越来越被看成是改善经济而不是提升人类自身的手段。这种趋势可能会持续,甚至有加快之势。 Over the next few years, as labor markets struggle, the humanities will proba bly continue theirlong slide in succession. There already has been a nearly 50 percent decline in the portion of liberal arts majors over the past generatio n, and it is logical to think that the trend is boundto continue or even accel erate. Once the dominant pillars of university life, the humanities nowplay li ttle roles when students take their college tours. These days, labs are more vi vid and compelling than libraries. 在未来几年内,由于劳动力市场的不景气,人文学科可能会继续其长期低迷的态势。在上一代大学生中,主修文科的学生数跌幅已近50%。这种趋势会持续、甚至加速的想法是合情合理的。人文学科曾是大学生活的重要支柱,而今在学生们参观校园的时候,却只是一个小点缀。现在,实验室要比图书馆更栩栩如生、受人青睐。 Here, please allow me to stand up for and promote the true value that the h umanities add topeople's lives. 在这儿,请允许我为人文学科给人们的生活所增添的真实价值进行支持和宣传。

新大学日语课文翻译。

第10课 日本的季节 日本的一年有春、夏、秋、冬四个季节。 3月、4月和5月这三个月是春季。春季是个暖和的好季节。桃花、樱花等花儿开得很美。人们在4月去赏花。 6月到8月是夏季。夏季非常闷热。人们去北海道旅游。7月和8月是暑假,年轻人去海边或山上。也有很多人去攀登富士山。富士山是日本最高的山。 9月、10月和11月这3个月是秋季。秋季很凉爽,晴朗的日子较多。苹果、桔子等许多水果在这个季节成熟。 12月到2月是冬季。日本的南部冬天不太冷。北部非常冷,下很多雪。去年冬天东京也很冷。今年大概不会那么冷吧。如果冷的话,人们就使用暖气炉。 第12课 乡下 我爷爷住哎乡下。今天,我要去爷爷家。早上天很阴,但中午天空开始变亮,天转好了。我急急忙忙吃完午饭,坐上了电车。 现在,电车正行驶在原野上。窗外,水田、旱地连成一片。汽车在公路上奔驰。 这时,电车正行驶在大桥上。下面河水在流动。河水很清澈,可以清澈地看见河底。可以看见鱼在游动。远处,一个小孩在挥手。他身旁,牛、马在吃草。 到了爷爷居住的村子。爷爷和奶奶来到门口等着我。爷爷的房子是旧房子,但是很大。登上二楼,大海就在眼前。海岸上,很多人正在全力拉缆绳。渐渐地可以看见网了。网里有很多鱼。和城市不同,乡下的大自然真是很美。 第13课 暑假 大概没有什么比暑假更令学生感到高兴的了。大学在7月初,其他学校在二十四日左右进入暑假。暑假大约1个半月。 很多人利用这个假期去海边、山上,或者去旅行。学生中,也有人去打工。学生由于路费等只要半价,所以在学期间去各地旅行。因此,临近暑假时,去北海道的列车上就挤满了这样的人。从炎热的地方逃避到凉爽的地方去,这是很自然的事。一般在1月、最迟在2月底之前就要预定旅馆。不然的话可能会没有地方住。 暑假里,山上、海边、湖里、河里会出现死人的事,这种事故都是由于不注意引起的。大概只能每个人自己多加注意了。 在东京附近,镰仓等地的海面不起浪,因此挤满了游泳的人。也有人家只在夏季把海边的房子租下来。 暑假里,学校的老师给学生布置作业,但是有的学生叫哥哥或姐姐帮忙。 第14课 各式各样的学生 我就读的大学都有各种各样的学生入学。学生有的是中国人,有的是美国人,有的是英国人。既有年轻的,也有不年轻的。有胖的学生,也有瘦的学生。学生大多边工作边学习。因此,大家看上去都很忙。经常有人边听课边打盹。 我为了学习日本先进的科学技术和日本文化来到日本。预定在这所大学学习3年。既然特意来了日本,所以每天都很努力学习。即便如此,考试之前还是很紧张。其他学生也是这

新视野大学英语5课文翻译(全)

教育界的科技革命 如果让生活在年的人来到我们这个时代,他会辨认出我们当前课堂里发生的许多事情——那盛行的讲座、对操练的强调、从基础读本到每周的拼写测试在内的教学材料和教学活动。可能除了教堂以外,很少有机构像主管下一代正规教育的学校那样缺乏变化了。 让我们把上述一贯性与校园外孩子们的经历作一番比较吧。在现代社会,孩子们有机会接触广泛的媒体,而在早些年代这些媒体简直就是奇迹。来自过去的参观者一眼就能辨认出现在的课堂,但很难适应现今一个岁孩子的校外世界。 学校——如果不是一般意义上的教育界——天生是保守的机构。我会在很大程度上为这种保守的趋势辩护。但变化在我们的世界中是如此迅速而明确,学校不可能维持现状或仅仅做一些表面的改善而生存下去。的确,如果学校不迅速、彻底地变革,就有可能被其他较灵活的机构取代。 计算机的变革力 当今时代最重要的科技事件要数计算机的崛起。计算机已渗透到我们生活的诸多方面,从交通、电讯到娱乐等等。许多学校当然不能漠视这种趋势,于是也配备了计算机和网络。在某种程度上,这些科技辅助设施已被吸纳到校园生活中,尽管他们往往只是用一种更方便、更有效的模式教授旧课程。 然而,未来将以计算机为基础组织教学。计算机将在一定程度上允许针对个人的授课,这种授课形式以往只向有钱人提供。所有的学生都会得到符合自身需要的、适合自己学习方法和进度的课程设置,以及对先前所学材料、课程的成绩记录。 毫不夸张地说,计算机科技可将世界上所有的信息置于人们的指尖。这既是幸事又是灾难。我们再也无须花费很长时间查找某个出处或某个人——现在,信息的传递是瞬时的。不久,我们甚至无须键入指令,只需大声提出问题,计算机就会打印或说出答案,这样,人们就可实现即时的"文化脱盲"。 美中不足的是,因特网没有质量控制手段;"任何人都可以拨弄"。信息和虚假信息往往混杂在一起,现在还没有将网上十分普遍的被歪曲的事实和一派胡言与真实含义区分开来的可靠手段。要识别出真的、美的、好的信息,并挑出其中那些值得知晓的, 这对人们构成巨大的挑战。 对此也许有人会说,这个世界一直充斥着错误的信息。的确如此,但以前教育当局至少能选择他们中意的课本。而今天的形势则是每个人都拥有瞬时可得的数以百万计的信息源,这种情况是史无前例的。 教育的客户化 与以往的趋势不同,从授权机构获取证书可能会变得不再重要。每个人都能在模拟的环境中自学并展示个人才能。如果一个人能像早些时候那样"读法律",然后通过计算机模拟的实践考试展现自己的全部法律技能,为什么还要花万美元去上法学院呢?用类似的方法学开飞机或学做外科手术不同样可行吗? 在过去,大部分教育基本是职业性的:目的是确保个人在其年富力强的整个成人阶段能可靠地从事某项工作。现在,这种设想有了缺陷。很少有人会一生只从事一种职业;许多人都会频繁地从一个职位、公司或经济部门跳到另一个。 在经济中,这些新的、迅速变换的角色的激增使教育变得大为复杂。大部分老成持重的教师和家长对帮助青年一代应对这个会经常变换工作的世界缺乏经验。由于没有先例,青少年们只有自己为快速变化的"事业之路"和生活状况作准备。

相关主题
文本预览
相关文档 最新文档