Comparison of parametric and nonparametric models 参数与非参数模型的比较
- 格式:doc
- 大小:711.50 KB
- 文档页数:19
高三英语统计学分析单选题60题及答案1.The average score of a class is calculated by adding all the scores and dividing by the number of students. This is an example of _____.A.meanB.medianC.modeD.range答案:A。
本题考查统计学基本概念。
A 选项mean(平均数)是通过将所有数据相加再除以数据个数得到的,符合题干描述。
B 选项median((中位数)是将数据从小到大排列后位于中间位置的数。
C 选项mode(众数)是数据中出现次数最多的数。
D 选项range(极差)是数据中的最大值与最小值之差。
2.In a set of data, if there is a value that occurs most frequently, it is called the _____.A.meanB.medianC.modeD.range答案:C。
A 选项mean 是平均数。
B 选项median 是中位数。
C 选项mode 众数是出现次数最多的值,符合题意。
D 选项range 是极差。
3.The middle value in a sorted list of data is called the _____.A.meanB.medianC.modeD.range答案:B。
A 选项mean 是平均数。
B 选项median 中位数是排序后位于中间的数,符合题干描述。
C 选项mode 是众数。
D 选项range 是极差。
4.The difference between the highest and lowest values in a set of data is known as the _____.A.meanB.medianC.modeD.range答案:D。
Parametric and Nonparametric Volatility Measurement *Torben G. Andersen a , Tim Bollerslev b , and Francis X. Diebold cJuly 2002__________________*This paper is prepared for Yacine Aït-Sahalia and Lars Peter Hansen (eds.), Handbook of Financial Econometrics ,Amsterdam: North Holland. We are grateful to the National Science Foundation for research support, and to Nour Meddahi, Neil Shephard and Sean Campbell for useful discussions and detailed comments on earlier drafts.a Department of Finance, Kellogg School of Management, Northwestern University, Evanston, IL 60208, and NBERphone: 847-467-1285, e-mail: t-andersen@bDepartments of Economics and Finance, Duke University, Durham, NC 27708, and NBERphone: 919-660-1846, e-mail: boller@ c Departments of Economics, Finance and Statistics, University of Pennsylvania, Philadelphia, PA 19104, and NBERphone: 215-898-1507, e-mail: fdiebold@Andersen, T.G., Bollerslev, T., and Diebold, F.X. (2005),"Parametric and Nonparametric Volatility Measurement,"in L.P. Hansen and Y. Ait-Sahalia (eds.),Handbook of Financial Econometrics. Amsterdam: North-Holland, forthcoming.Table of ContentsAbstract1. Introduction2. Volatility Definitions2.1. Continuous-Time No-Arbitrage Pricing2.2. Notional, Expected, and Instantaneous Volatility2.3. Volatility Models and Measurements3. Parametric Methods3.1. Discrete-Time Models3.1.1. ARCH Models3.1.2. Stochastic Volatility Models3.2. Continuous-Time Models3.2.1. Continuous Sample Path Diffusions3.2.1. Jump Diffusions and Lévy Driven Processes4. Nonparametric Methods4.1. ARCH Filters and Smoothers4.2. Realized Volatility5. ConclusionReferencesABSTRACTVolatility has been one of the most active areas of research in empirical finance and time series econometrics during the past decade. This chapter provides a unified continuous-time, frictionless, no-arbitrage framework for systematically categorizing the various volatility concepts, measurement procedures, and modeling procedures. We define three different volatility concepts: (i) the notional volatility corresponding to the ex-post sample-path return variability over a fixed time interval, (ii) the ex-ante expected volatility over a fixed time interval, and (iii) the instantaneous volatility corresponding to the strength of the volatility process at a point in time. The parametric procedures rely on explicit functional form assumptions regarding the expected and/or instantaneous volatility. In the discrete-time ARCH class of models, the expectations are formulated in terms of directly observable variables, while the discrete- and continuous-time stochastic volatility models involve latent state variable(s). The nonparametric procedures are generally free from such functional form assumptions and hence afford estimates of notional volatility that are flexible yet consistent (as the sampling frequency of the underlying returns increases). The nonparametric procedures include ARCH filters and smoothers designed to measure the volatility over infinitesimally short horizons, as well as the recently-popularized realized volatility measures for (non-trivial) fixed-length time intervals.1 See, for example, Bollerslev, Chou and Kroner (1992).-1-1. INTRODUCTIONSince Engle’s (1982) seminal paper on ARCH models, the econometrics literature has focused considerable attention on time-varying volatility and the development of new tools for volatility measurement, modeling and forecasting. These advances have in large part been motivated by the empirical observation that financial asset return volatility is time-varying in a persistent fashion, across assets, asset classes, time periods, and countries.1 Asset return volatility,moreover, is central to finance, whether in asset pricing, portfolio allocation, or riskmanagement, and standard financial econometric methods and models take on a very different,conditional , flavor when volatility is properly recognized to be time-varying.The combination of powerful methodological advances and tremendous relevance in empirical finance produced explosive growth in the financial econometrics of volatility dynamics, with the econometrics and finance literatures cross-fertilizing each other furiously. Initial developments were tightly parametric, but the recent literature has moved in less parametric, and even fully nonparametric, directions. Here we review and provide a unified framework for interpreting both the parametric and nonparametric approaches.In section 2, we define three different volatility concepts: (i) the notional volatilitycorresponding to the ex-post sample-path return variability over a fixed time interval, (ii) the ex-ante expected volatility over a fixed time interval, and (iii) the instantaneous volatility corresponding to the strength of the volatility process at a point in time.In section 3, we survey parametric approaches to volatility modeling, which are based on explicit functional form assumptions regarding the expected and/or instantaneous volatility. In the discrete-time ARCH class of models, the expectations are formulated in terms of directlyobservable variables, while the discrete- and continuous-time stochastic volatility models both involve latent state variable(s).In section 4, we survey nonparametric approaches to volatility modeling, which are generally free from such functional form assumptions and hence afford estimates of notional volatility that are flexible yet consistent (as the sampling frequency of the underlying returns increases). The nonparametric approaches include ARCH filters and smoothers designed to measure the volatility over infinitesimally short horizons, as well as the recently-popularized realized volatility measures for (non-trivial) fixed-length time intervals.We conclude in section 5 by highlighting promising directions for future research.2. VOLATILITY DEFINITIONSHere we introduce a unified framework for defining and classifying different notions of return volatility in a continuous-time no-arbitrage setting. We begin by outlining the minimal set of regularity conditions invoked on the price processes and establish the notation used for the decomposition of returns into an expected, or mean, return and an innovation component. The resulting characterization of the price process is central to the development of our different volatility measures, and we rely on the concepts and notation introduced in this section throughout the chapter.2.1. Continuous-Time No-Arbitrage Price ProcessesMeasurement of return volatility requires determination of the component of a given price increment that represents a return innovation as opposed to an expected price movement. In a discrete-time setting this identification may only be achieved through a direct specification of the conditional mean return, for example through an asset pricing model, as economic principles impose few binding constraints on the price process. However, within a frictionless continuous-time framework the no-arbitrage requirement quite generally guarantees that the return innovation is an order of magnitude larger than the mean return. This result is not only critical to the characterization of arbitrage-free continuous-time price processes, but it also has important implications for the approach one may employ for measurement and modeling of volatility over short intradaily return horizons.We take as given a univariate risky logarithmic price process p(t) defined on a complete probability space, (S,ö,P). The price process evolves in continuous time over the interval [0,T], where T is a (finite) integer. The associated natural filtration is denoted (öt )t0[0,T]fö, where the information set,öt , contains the full history (up to time t) of the realized values of the asset price and other relevant (possibly latent) state variables, and is otherwise assumed to satisfy the usual conditions. It is sometimes useful to consider the information set generated by the asset price history alone. We refer to this coarser filtration, consisting of the initial conditions and the history of the asset prices only, by (F t )t0[0,T]f F / F T , so that by definition, F t föt . Finally, we assume there is an asset guaranteeing an instantaneously risk-free rate of interest, although we shall not refer to this rate explicitly. Many more risky assets may, of course, be available but we explicitly retain a univariate focus for notational simplicity. The extension to the multivariate setting is conceptually straightforward as discussed in specific instances below. The continuously compounded return over the time interval [t-h,t] is thenr(t,h) = p(t) - p(t-h), 0#h#t#T .(2.1) We also adopt the following short hand notation for the cumulative return up to time t, i.e., the return over the [0,t] time interval:-2-r(t) / r(t,t) = p(t) - p(0), 0#t#T.(2.2)These definitions imply a simple relation between the period-by-period and the cumulative returns that we use repeatedly in the sequel:r(t,h) = r(t) - r(t-h), 0#h#t#T.(2.3)A maintained assumption throughout is that - almost surely (P) (henceforth denoted (a.s.) ) - the asset price process remains strictly positive and finite, so that p(t) and r(t) are well defined over [0,T] (a.s.). It follows that r(t) has only countably many jump points over [0,T], and we adopt the convention of equating functions that have identical left and right limits everywhere.Defining r(t-) / lim J6 t, J<t r(J) and r(t+) / lim J6 t, J>t r(J), uniquely determines the right-continuous, left-limit (càdlàg) version of the process, for which r(t)=r(t+) (a.s.), and the left-continuous, right-limit (càglàd) version, for which r(t)=r(t-) (a.s.), for all t in [0,T]. In the following, we assume without loss of generality that we are working with the càdlàg version of the various return processes and components.The jumps in the cumulative price and return process are then)r(t) / r(t) - r(t-), 0#t#T.(2.4) Obviously, at continuity points for r(t), we have )r(t) = 0. Notice also that, because there are at most a countably infinite number of jumps, a jump occurrence is unusual in the sense that we generically haveP( )r(t) … 0 ) = 0 ,(2.5)for an arbitrarily chosen t in [0,T]. This does not imply that jumps necessarily are rare. In fact, equation (2.5) is consistent with there being a (countably) infinite number of jumps over any discrete interval - a phenomenon referred to as an explosion. Jump processes that do not explode are termed regular. For regular processes, the anticipated jump frequency is conveniently characterized by the instantaneous jump intensity, i.e., the probability of a jump over the next instant of time, and expressed in units that reflect the expected (and finite) number of jumps per unit time interval.In the following, we invoke the standard assumptions of no arbitrage opportunities and a finite expected return. Within our frictionless setting, it is well known that these conditions imply that the log-price process must constitute a (special) semi-martingale (e.g., Back, 1991). This, in turn, affords the following unique canonical return decomposition (e.g., Protter, 1992). PROPOSITION 1 - Return Decomposition-3-Any arbitrage-free logarithmic price process subject to the regularity conditions outlined above may be uniquely represented asr(t) / p(t) - p(0) = µ(t) + M(t) = µ(t) + M c(t) + M J(t),(2.6) where µ(t) is a predictable and finite variation process, M(t) is a local martingale which may be further decomposed into M c(t), a continuous sample path, infinite variation local martingale component, and M J(t), a compensated jump martingale. All components may be assumed to have initial conditions normalized such that µ(0) / M(0) / M c(0) / M J(0) / 0, which implies that r(t) / p(t).Proposition 1 provides a unique decomposition of the instantaneous return into an expected return component and a (martingale) innovation. Over discrete intervals, the relation becomes slightly more complex. Letting the expected returns over [t-h,t] be denoted by m(t,h), equation (2.6) impliesm(t,h) / E[r(t,h)|öt-h ] = E[µ(t,h)|öt-h ] , 0<h#t#T,(2.7) whereµ(t,h) / µ(t) - µ(t-h) , 0<h#t#T,(2.8) and the return innovation takes the formr(t,h) - m(t,h) = ( µ(t,h) - m(t,h) ) + M(t,h) , 0<h#t#T.(2.9) The first term on the right hand side of (2.9) signifies that the expected return process, even though it is (locally) predictable, may evolve stochastically over the [t-h,t] interval.2If µ(t,h) is predetermined (measurable with respect to öt-h), and thus known at time t-h, then the discrete-time return innovation reduces to M(t,h) / M(t) - M(t-h). However, any shift in the expected return process during the interval will generally render the initial term on the right hand side of (2.9) non-zero and thus contribute to the return innovation over [t-h,t].Although the discrete-time return innovation incorporates two distinct terms, the martingale component, M(t,h), is generally the dominant contributor to the return variation over short intervals, i.e., for h small. In order to discuss the intuition behind this result, which we formalize in the following section, it is convenient to decompose the expected return process into a purely continuous, predictable finite variation part, µc(t), and a purely predictable jump part, µJ(t).2In other words, even though the conditional mean is locally predictable, all return components in the special semi-martingale decomposition are generally stochastic: not only volatility, but also the jump intensity, the jump size distribution and the conditional mean process may evolve randomly over a finite interval.-4-Because the continuous component, µc(t), is of finite variation it is locally an order of magnitude smaller than the corresponding contribution from the continuous component of the innovation term, M c(t). The reason is - loosely speaking - that an asset earning, say, a positive expected return over the risk-free rate must have innovations that are an order of magnitude larger than the expected return over infinitesimal intervals. Otherwise, a sustained long position (infinitely many periods over any interval) in the risky asset will tend to be perfectly diversified due to a Law of Large Numbers, as the martingale part is uncorrelated. Thus, the risk-return relation becomes unbalanced. Only if the innovations are large, preventing the Law of Large Numbers from becoming operative, will this not constitute a violation of the no-arbitrage condition (e.g., Maheswaran and Sims, 1993). The presence of a non-trivial M J(t) component may similarly serve to eliminate arbitrage and retain a balanced risk-return trade-off relationship.Analogous considerations apply to the jump component for the expected return process, µJ(t), if this factor is present. There cannot be a predictable jump in the mean - i.e., a perfectly anticipated jump in terms of both time and size - unless it is accompanied by large jump innovation risk as well, so that Pr()M(t) … 0) > 0. Again - intuitively - if there was a known, say, positive jump then this induces arbitrage (by going long the asset) unless there is offsetting (jump) innovation risk.3 Most of the continuous-time asset pricing literature ignores predictable jumps, even if they are logically consistent with the framework. One reason may be that their existence is fragile in the following sense. A fully anticipated jump must be associated with release of new (price relevant) information at a given point in time. However, if there is any uncertainty about the timing of the announcement so that it is only known to occur within a given minute, or even a few seconds, then the timing of the jump is more aptly modelled by a continuous hazard function where the jump probability at each point in time is zero, and the predictable jump event is thus eliminated. In addition, even if there were predictable jumps associated with scheduled news releases, the size of the predictable component of the jump is likely much smaller than the size of the associated jump innovation, so that the descriptive power lost by ignoring the possibility of predictable jumps is minimal. Thus, rather than modify the standard setup to allow for the presence of predictable (but empirically negligible) jumps, we follow the tradition in the literature and assume away such jumps.Although we will not discuss specific model classes at length until later sections, it may be useful to briefly consider two simple examples to illustrate the somewhat abstract definitions given in the current section.EXAMPLE 1: Stochastic Volatility Jump Diffusion with Non-Zero Mean Jumps3This point is perhaps most readily understood by analogy to a discrete-time setting. When there is a predictable jump at time t, the instant from t- to t is effectively equivalent to a trading period, say from t-1 to t, within a discrete-time model. In that setting, no asset can earn a positive (or negative) excess return relative to the risk-free rate over (t-1,t] without bearing genuine risk as this would otherwise imply a trivial arbitrage opportunity.A predictable price jump without an associated positive probability of a jump innovation works entirely analogous.-5-Consider the following continuous-time jump diffusion expressed in stochastic differential equation (sde) form,dp(t) = ( µ + $F2(t))dt + F(t) dW(t) + 6(t) dq(t) , 0#t#T,where F(t) is a strictly positive continuous sample path process (a.s.), W(t) denotes a standard Brownian motion, q(t) is a pure jump process with q(t)=1 corresponding to a jump at time t and q(t)=0 otherwise, while 6(t) refers to the size of the corresponding jumps. We assume the jump size distribution has a constant mean of µ6 and variance of F62. Finally, the jump intensity is assumed constant (and finite) at a rate 8 per unit time. In the notation of Proposition 1, we then have the return components,µ(t) = µc(t) = µ@ t + $I0t F2(s)ds + 8@ µ6@ t ,M c(t) = I0t F(s)dW(s) ,M J(t) = E0#s#t6(s) q(s) - 8@ µ6@ t .Notice that the last term of the mean representation captures the expected contribution coming from the jumps, while the corresponding term is subtracted from the jump innovation process to provide a unique (compensated) jump martingale representation for M J.EXAMPLE 2: Discrete-Time Stochastic Volatility (ARCH) ModelConsider the discrete-time (jump) process for p(t) defined over the unit time interval, p(t) = p(t-1) + µ + $F2(t) + F(t) z(t) , t = 1, 2, ..., T,where (implicitly),p(t+J) / p(t) ,t = 0, 1, ..., T-1,0 #J <1,and z(t) denotes a martingale difference sequence with unit variance, while F(t) is a (possibly latent) positive (a.s.) stochastic process that is measurable with respect to öt-1. Of course, in this situation the continuous-time no-arbitrage arguments based on infinitely many long-short positions over fixed length time intervals are no longer operative. Nonetheless, we shall later argue that the orders of magnitude of the corresponding terms in the decomposition in Proposition 1 remain suggestive and useful from an empirical perspective. In fact, we may still think of the price process evolving in continuous time, but the market being open and prices observed only at discrete points in time. In this specific model we have, for t = 1, 2, ..., T,µ(t) = µ@ t +$E s=1,...,t F2(s) ,-6-M(t) / M J(t) = E s=1,...,t F(s) z(s) .A final comment is in order. We purposely express the price changes and associated returns in Proposition 1 over a discrete time interval. The concept of an instantaneous return employed in the formulation of continuous-time models, that are given in sde form (as in Example 1 above), is pure short-hand notation that is formally defined only through the corresponding integral representation, such as equation (2.6). Although this is a technical point, it has an important empirical analogy: real-time price data are not available at every instant and, due to pertinent microstructure features, prices are invariably constrained to lie on a discrete grid, both in the price and time dimension. Hence, there is no real-world counterpart to the notion of a continuous sample path martingale with infinite variation over arbitrarily small time intervals (say, less than a second). It is only feasible to measure return (and volatility) realizations over discrete time intervals. Moreover, sensible measures can typically only be constructed over much longer horizons than given by the minimal interval length for which consecutive trade prices or quotes are recorded. We return to this point later. For now, we simply note that our main conceptualization of volatility in the next section conforms directly with the focus on realizations measured over non-trivial discrete time intervals rather than vanishing, or instantaneous, interval lengths.2.2. Notional, Expected, and Instantaneous VolatilityThis section introduces three distinct volatility concepts that serve to formalize the process of measuring and modeling volatility within our frictionless, arbitrage-free setting.By definition volatility seeks to capture the strength of the (unexpected) return variation over a given period of time. However, two distinct features importantly differentiate the construction of all (reasonable) volatility measures. First, given a set of actual return observations, how is the realized volatility computed? Here, the emphasis is explicitly on ex-post measurement of the volatility. Second, decision making often requires forecasts of future return volatility. The focus is then on ex-ante expected volatility. The latter concept naturally calls for a model that may be used to map the current information set into a volatility forecast. In contrast, the (ex-post) realized volatility may be computed (or approximated) without reference to any specific model, thus rendering the task of volatility measurement essentially a nonparametric procedure.It is natural first to concentrate on the behavior of the martingale component in the return decomposition (2.6). However, a prerequisite for observing the M(t) process is that we have access to a continuous record of price data. Such data are simply not available and, even for extremely liquid markets, microstructure effects (discrete price grids, bid-ask bounce effects, etc.) prevent us from ever getting really close to a true continuous sample path realization. Consequently, we focus on measures that represent the (average) volatility over a discrete time-7-interval, rather than the instantaneous (point-in-time) volatility.4 This, in turn, suggests a natural and general notion of volatility based on the quadratic variation process for the local martingale component in the unique semi-martingale return decomposition.Specifically, let X(t) denote any (special) semi-martingale. The unique quadratic variation process, [X,X]t , t0[0,T], associated with X(t) is then formally defined by[X,X]t / X(t)2 - 2 I0t X(s- ) dX(s), 0<t#T,(2.10)where the stochastic integral of the adapted càglàd process, X(s- ), with respect to the càdlàg semi-martingale, X(s), is well-defined (e.g., Protter, 1992). It follows directly that the quadratic variation, [X,X], is an increasing stochastic process. Also, jumps in the sample path of the quadratic variation process necessarily occur concurrent with the jumps in the underlying semi-martingale process, )[X,X] =()X)2.Importantly, if M is a locally square integrable martingale, then the associated (M2 - [M,M]) process is a local martingale,E[ M(t,h)2 - ( [M,M]t - [M,M]t-h ) | öt-h ] = 0, 0 < h # t # T.(2.11)This relation, along with the following well-known result, provide the key to the interpretation of the quadratic variation process as one of our volatility measures.PROPOSITION 2 - Theory of Quadratic VariationLet a sequence of possibly random partitions of [0,T], (J m ), be given s.t. (J m ) / {J m,j }j$0 , m = 1,2, ... where J m,0 #J m,1#J m,2# ... satisfy, with probability one, for m 64 ,J m,06 0 ; sup j$1J m,j6 T; sup j$0 (J m,j+1 - J m,j ) 6 0 .Then, for t 0 [0,T],lim m64{E j$1 (X(t v J m,j) - X(t v J m,j-1))2 } 6 [X,X]t ,where t v J/ min (t,J), and the convergence is uniform in probability.Intuitively, the proposition says that the quadratic variation process represents the (cumulative) realized sample path variability of X(t) over the [0,t] time interval. This observation, together4Of course, by choosing the interval very small, one may in principle approximate the notion of point-in-time volatility, as discussed further below.-8-with the martingale property of the quadratic variation process in (2.11), immediately point to the following theoretical notion of ex-post return variability.DEFINITION 1 - Notional VolatilityThe Notional Volatility over [t-h,t], 0<h # t # T, isL2(t,h)/ [M,M]t - [M,M]t-h= [M c,M c]t - [M c,M c]t-h + E t-h<s#t)M2(s) .(2.12) This same volatility concept has recently been highlighted in a series of papers by Andersen, Bollerslev, Diebold and Labys (2001a,b) and Barndorff-Nielsen and Shephard (2002a,b,c). The latter authors term the corresponding concept Actual Volatility.Under the maintained assumption of no predictable jumps in the return process, and noting that the quadratic variation of any finite variation process, such as µc(t), is zero, we also have L2(t,h)/ [r,r]t - [r,r]t-h= [M c,M c]t - [M c,M c]t-h + E t-h<s#t)r2(s) .(2.13) Consequently, the notional volatility equals (the increment to) the quadratic variation for the return series. Equation (2.13) and Proposition 2 also suggest that (ex-post) it is possible to approximate the notional volatility arbitrarily well through the accumulation of ever finely sampled high-frequency squared return, and that this approach remains consistent independent of the expected return process. We shall return to a much more detailed analysis of this idea in our discussion of nonparametric ex-post volatility measures in section 4 below.Similarly, from (2.13) and Proposition 2 it is evident that the notional volatility,L2(t,h), directly captures the sample path variability of the log-price process over the [t-h,t] time interval. In particular, the notional volatility explicitly incorporates the effect of (realized) jumps in the price process: jumps contribute to the realized return variability and forecasts of volatility must account for the potential occurrence of such jumps. It also follows, from the properties of the quadratic variation process, thatE[L2(t,h)|öt-h ] = E[M(t,h)2 |öt-h ] = E[M2(t)|öt-h ] - M2(t-h), 0<h#t #T.(2.14)Hence, the expected notional volatility represents the expected future (cumulative) squared return innovation. As argued in section 2.1, this component is typically the dominant determinant of the expected return volatility.For illustration, consider again the two examples introduced in section 2.1 above. Alternative more complicated specifications and issues related to longer horizon returns are considered in section 3.-9-EXAMPLE 1: Stochastic Volatility Jump Diffusion with Non-Zero Mean Jumps (Revisited) The log-price process evolves according todp(t) = ( µ + $F2(t))dt + F(t) dW(t) + 6(t) dq(t) , 0#t#T.The notional volatility is thenL2(t,h) = I0h F2(t-h+s) ds + E t-h<s#t62(s) .The expected notional volatility involves taking the conditional expectation of this expression. Without an explicit model for the volatility process, this cannot be given in closed form. However, for small h the (expected) notional volatility is typically very close to the value attained if volatility is constant. In particular, to a first-order approximation,E[L2(t,h)|öt-h ] .F2(t-h)· h + 8· h·( µ62 +F62 ) .[F2(t-h) + 8( µ62 +F62 ) ] · h,whilem(t,h) . [ µ + $ ·F2(t-h) + 8·µ6 ] · h .Thus, the expected notional volatility is of order h, the expected return is of order h (and the variation of the mean return of order h2 ), whereas the martingale (return innovation) is of the order h1/2, and hence an order of magnitude larger for small h.EXAMPLE 2: Discrete-Time Stochastic Volatility (ARCH) Model (Revisited)The discrete-time (jump) process for p(t) defined over the unit time interval isp(t) = p(t-1) + µ + $F2(t) + F(t) z(t) , t = 1, 2, ..., T.The one-period notional volatility measure is thenL2(t,1) = F2(t) z2(t),while, of course, the expected notional volatility isE[L2(t,1)|öt-1 ] =F2(t),and the expected return ism(t,1) = µ + $F2(t).-10-。
国内图书分类号:H314国际图书分类号:802.0文学硕士学位论文中英情景喜剧中幽默策略的对比研究--以《老友记》和《爱情公寓》为例硕士研究生:徐净玉导师:王景惠教授申请学位:文学硕士学科、专业:外国语言学及应用语言学所在单位:外国语学院答辩日期:2012 年7 月 5 日授予学位单位:哈尔滨工业大学Classified Index: H314U.D.C.: 802.0Graduation Thesis for the M. A. DegreeA Comparative Study of Humor Strategiesin Chinese and English Sitcoms----A Case Study of Ipartment and FriendsCandidate: XU JingyuSupervisor: Prof. WANG JinghuiAcademic Degree Applied for: Master of ArtsSpecialty: Foreign Linguistics and Applied LinguisticsAffiliation: School of Foreign LanguagesDate of Oral Examination: July 1, 2012Degree Conferring Institution : Harbin Institute of TechnologyHarbin Institute of Technology Graduation Thesis for the MA Degree 哈尔滨工业大学硕士论文摘要情景喜剧中的幽默策略是近年来语言学领域的研究热点。
西方幽默研究主要关注三大传统幽默理论,语义脚本理论和言语幽默的一般理论。
中国学者则大多从语用学、认知语言学和修辞学角度,对情景喜剧、相声和脱口秀中的言语幽默进行文本分析。
八年级英语议论文论证方法单选题40题1. In the essay, the author mentions a story about a famous scientist to support his idea. This is an example of _____.A.analogyB.exampleparisonD.metaphor答案:B。
本题主要考查论证方法的辨析。
选项A“analogy”是类比;选项B“example”是举例;选项C“comparison”是比较;选项D“metaphor”是隐喻。
文中提到一个关于著名科学家的故事来支持观点,这是举例论证。
2. The writer uses the experience of his own life to prove his point. This kind of method is called _____.A.personal storyB.example givingC.case studyD.reference答案:B。
选项A“personal story”个人故事范围较窄;选项B“example giving”举例;选项C“case study”案例分析;选项D“reference”参考。
作者用自己的生活经历来证明观点,这是举例论证。
3. The author cites several historical events to strengthen his argument. What is this method?A.citing factsB.giving examplesC.making comparisonsing analogies答案:B。
选项A“citing facts”引用事实,历史事件可以作为例子,所以是举例论证;选项B“giving examples”举例;选项C“making comparisons”比较;选项D“using analogies”使用类比。
一、第1章软件工程概述1. Software deteriorates rather than wears out because(软件通常是变坏而不是磨损的原因是)A:Software suffers from exposure to hostile environments(软件暴露在不利的环境中)B:Defects are more likely to arise after software has been used often(软件错误更容易在使用后被发现)C:Multiple change requests introduce errors in component interactions(在组件交互中需求发生变化导致错误)D:Software spare parts become harder to order(软件的备用部分不易组织)2. Today the increased power of the personal computer has brought about an abandonment of the practice of team development of software.(如今个人电脑性能的提升导致遗弃了采用小组开发软件的方式。
)A:True(真)B:False (假)3. Which question no longer concerns the modern software engineer?(现如今的软件工程师不再考虑以下哪个问题?)A:Why does computer hardware cost so much?(计算机硬件为什么如此昂贵?)B:Why does software take a long time to finish?(软件为什么开发时间很长?)C:Why does it cost so much to develop a piece of software?(开发一项软件的开销为什么这么大?)D:Why cann't software errors be removed from products prior to delivery? (软件错误为什么不能在产品发布之前被找出?)4. In general software only succeeds if its behavior is consistent with the objectives of its designers.(通常意义上,只有表现得和设计目标一致的软件才是成功的软件。
ISLR第⼆章1 Statistical Learning1.1 What Is Statistical Learning?More generally, suppose that we observe a quantitative response Y and p different predictors, X1,X2, . . .,Xp. We assume that there is some relationship between Y and X = (X1,X2, . . .,Xp), which can be written in the very general formStatistical learning refers to a set of approaches for estimating f.1.1.1 Why Estimate f?Prediction(预测) and inference(推断).1. Prediction where ˆ f represents our estimate for f, and ˆ Y represents the resulting prediction for Y.^ f is often treated as a black box.The accuracy of ˆ Y as a prediction for Y depends on two quantities, which we will call the reducible error(可约误差) and the irreducible error(不可约误差).reducible error: In general, “^f will not be a perfect estimate for f, and this inaccuracy will introduce some error. This error is reducible because we can potentially improve the accuracy of ˆ f by using the most appropriate statistical learning technique to estimate f irreducible error: Y is also a function of ϵ, which, by definition, cannot be predicted using X. Therefore, variability associated with ϵ also affects the accuracy of our prediction.2. InferenceTo understand the relationship between X and Y , or more specifically, to understand how Y changes as a function of X1, . . .,Xp. Now ˆ f cannot be treated as a black box, because we need to know its exact form.1.1.2 How Do We Estimate f?1. Parametric methods(参数⽅法)Parametric methods involve a two-step model-based approach.(1) First, we make an assumption about the functional form, or shape, of f. For example, one very simple assumption is that f is linear in X: (2) After a model has been selected, we need a procedure that uses the training data to fit or train the modelAdvantage: simplifies the problemDisadvantage: The potential disadvantage of a parametric approach is that the model we choose will usually not match the true unknown form of f. If the chosen model is too far from the true f, then our estimate will be poor.2. Non-parametric Methods(⾮参数⽅法)Non-parametric methods do not make explicit assumptions about the functional form of f.Advantage: by avoiding the assumption of a particular functional form for f, they have the potential to accurately fit a wider range of possible shapes for f.Disadvantage: since they do not reduce the problem of estimating f to a small number of parameters, a very large number of observations (far more than is typically needed for a parametric approach) is required in order to obtain an accurate estimate for f1.1.3 The Trade-Off Between Prediction Accuracy and Model Interpretability(预测精度和模型解释性的权衡)We have established that when inference is the goal, there are clear advantages to using simple and relatively inflexible statistical learning methods. Surprisingly, we will often obtain more accurate predictions using a less flexible method.1.3.4 Supervised Versus Unsupervised Learning1. Supervised LearningFor each observation of the predictor measurement(s) xi, i = 1, …, n there is an associated response measurement yi. Such as linear regression and logistic regression, GAM, boosting, and support vector machines2. Unsupervised LearningUnsupervised learning describes the somewhat more challenging situation in which for every observation i = 1, . . . , n, we observe a vector of measurements xi but no associated response yi. Such as cluster analysis.1.2Assessing Model Accuracy(模型精度评价)1.2.1 Measuring the Quality of Fit(拟合效果检验) mean squared error (MSE): where ˆ f(xi) is the prediction that ˆ f gives for the ith observationtraining MSE(训练均⽅误差):The MSE is computed using the training data that was used to fit the modeltest MSE(测试均⽅误差): 测试均⽅误差越⼩越好1.2.2 The Bias-Variance Trade-Off(偏差⼀⽅差权衡)Variance refers to the amount by which ˆ f would change if we estimated it using a different training data set.Bias refers to the error that is introduced by approximating a real-life problem, which may be extremely complicated, by a much simpler model.1.2.3 The Classification Settingtraining error rate:test error rate:1. The Bayes ClassifierIn a two-class problem where there are Bayes only two possible response values, say class 1 or class 2, the Bayes classifier corresponds to predicting class one if Pr(Y = 1|X = x0) > 0.5, and class two otherwise.the overall Bayes error rate is given by2. K-Nearest NeighborsGiven a positive integer K and a test observation x0, the KNN classifier first identifies the neighbors K points in the training data that are closest to x0, represented by N0. It then estimates the conditional probability for class j as the fraction of points in N0 whose response values equal j:。
A C O M P A R IS O N O F L I N E A R A N D H Y P E R T E X T F O R M A T S I N IN F O R M A T I O N R E T R I E V A L. Cliff McKnight, Andrew Dillon and John RichardsonHUSAT Research Centre, Department of Human Sciences, University of Loughborough.This item is not the definitive copy. Please use the following citation when referencing this material: McKnight, C., Dillon, A. and Richardson, J. (1990) A comparison of linear and hypertext formats in information retrieval. In: R. McAleese and C. Green, HYPERTEXT: State of the Art, Oxford: Intellect, 10-19.AbstractAn exploratory study is described in which the same text was presented to subjects in one of four formats, of which two were hypertext (TIES and Hypercard) and two were linear (Word Processor and paper). Subjects were required to use the text to answer 12 questions. Measurement was made of their time and accuracy and their movement through the document was recorded, in addition to a variety of subjective data being collected. Although there was no significant difference between conditions for task completion time, subjects performed more accurately with linear formats. The implications of these findings and the other data collected are discussed.Introduction.It has been suggested that the introduction of hypertexts could lead to improved access and (human) processing of information across a broad range of situations (Kreitzberg and Shneiderman, 1988). However, the term ‘hypertext’ has been used as though it was a unitary concept when, in fact, major differences exist between the various implementations which are currently available and some (Apple’s Hypercard for example) are powerful enough to allow the construction of a range of different applications. In addition these views tend to disregard the fact that written texts have evolved over several hundred years to support a range of task requirements in a variety of formats. This technical evolution has been accompanied by a comparable evolution in the skills that readers have in terms of rapidly scanning, searching and manipulating paper texts (McKnight et al., 1990).The recent introduction of cheap, commercial hypertext systems has been made possible by the widespread availability of powerful microcomputers. However, the recency of this development means that there is little evidence about the effectiveness of hypertexts and few guidelines for successful implementations.Although a small number of studies have been carried out, their findings have typically been contradictory (cf. Weldon et al.,1985, and Gordon et al., 1988). This outcome is predictable if allowances are not made for the range of text types (e.g., on-linehelp, technical documentation, tutorial packages) and reading purposes (e.g., familiarisation, searching for specific items, reading for understanding). Some text types would appear to receive no advantage from electronic implementation let alone hypertext treatment at the intra-document level (poetry or fiction, for example, where the task is straightforward reading rather than study or analysis of the text per se). Thus there appears to be justification for suggesting that some hypertext packages may be appropriate for some document types and not others. A discussion of text types can be found in Dillon and McKnight (1989).Marchionini and Shneiderman (1988) differentiate between the procedural and often iterative types of information retrieval undertaken by experts on behalf of end users and the more informal methods employed by end users themselves. They suggest that hypertext systems may be well suited to end users because they encourage “informal, personalized, content-oriented information seeking strategies” (p.71).The current exploratory study was designed to evaluate a number of document formats using a task that would tend to elicit ‘content-oriented information seeking strategies’. The study was also undertaken to evaluate a methodology and indicate specific questions to be investigated in future experiments.MethodSubjects16 subjects participated in the study, 9 male and 7 female, age range 21–36. All were members of HUSAT staff and all had experience of using a variety of computer systems and applications.MaterialsThe text used was “Introduction to Wines” by Elizabeth A. Buie and W. Edgar Hassell, a basic guide to the history, production and appreciation of wine. This hypertext was widely distributed by Ben Shneiderman as a demonstration of the TIES (The Interactive Encyclopedia System) package prior to its marketing as HyperTIES.This text was adapted for use in other formats by the present authors. In the TIES version, each topic is held as a separate file, resulting in 40 individual small files. For the Hypercard version, a topic card was created for each corresponding TIES file. For the word processor version, the text was arranged in an order which seemed sensible to the authors starting with the TIES ‘Introduction’ text and grouping the various topics under more general headings. A pilot test confirmed that the final version was generally consistent in structure with the present sample’s likely ordering.The Hypercard and word processor versions were displayed on a monochrome Macintosh II screen and the TIES version was displayed on an IBM PC colour screen. The paper version was a print-out of the word processor text.TaskSubjects were required to use the text to answer a set of 12 questions. These were specially developed by the authors to ensure that a range of information retrieval strategies were employed to answer them and that the questions did not unduly favour any one medium (e.g., one with a search facility).DesignA four-condition, independent subjects design was employed with presentation format (Hypercard, TIES, Paper and Word Processor) as the independent variable. The dependent variables were speed, accuracy, access strategy and subjects’ estimate of document size.ProcedureSubjects were tested individually. One experimenter described the nature of the investigation and introduced the subject to the text and system. Any questions the subject had were answered before a three minute familiarisation period commenced, during which the subject was encouraged to browse through the text. After three minutes the subjects were asked several questions pertaining to estimated document size and range of contents viewed. They were then given the question set and asked to attempt all questions in the presented order. Subjects were encouraged to verbalise their thoughts and a small tie-pin microphone was used to record their comments. Movement through the text was captured by video camera.ResultsEstimates of document sizeAfter familiarisation with the text, subjects were asked to estimate the size of the document in pages or screens. The linear formats contained 13 pages, the Hypercard version contained 53 cards, and the TIES version contained 78 screens. Therefore raw scores were converted to percentages. The responses are presented in Table 1 (where a correct response is 100).Condition TIES Paper HyperCard Word ProcessorSubject1 641.03 76.92 150.94 92.312 58.97 92.31 56.6 76.923 51.28 76.92 465.17 100.04 153.84 153.85 75.47 93.21Mean 226.28 100.0 187.05 90.61 SD 280.41 36.63 189.84 9.75 Table 1: Subjects’ estimates of document size.Subjects in the linear format conditions estimated the size of the document reasonably accurately. However, subjects who read the hypertexts were less accurate, several of them over-estimating the size by a very high margin. While a one-way ANOVA revealed no significant effect (F[3,12] = 0.61, NS) these data are interesting and suggest that subjective assessment of text size as a function of format is an issue worthy of further investigation. Such an assessment may well influence an estimation of the level of detail involved in the content as well as choice of appropriate access strategy.SpeedTime taken to complete the twelve tasks was recorded for each subject. Total time per subject and mean time per condition are presented in Table 2 (all data are in seconds).ProcessorWord Condition TIES Paper HyperCardSubject1 1753 795 1161 14802 1159 1147 655 8273 2139 2231 1013 10144 1073 1115 1610 1836Mean 1531 1322 1110 1289Table 2: Task completion times (in seconds).Clearly, while there is variation at the subject level there is little apparent difference between conditions. A one-way ANOVA confirmed this (F[3,12]= 0.47, p > 0.7) and even a t-test between the fastest and slowest conditions, Hypercard and TIES, failed to show a significant effect for speed (t = 1.31, d.f. = 6, p > 0.2).The term ‘accuracy’ in this instance refers to how many items a subject answers correctly. This was assessed by the experimenters who awarded one point for an unambiguously correct answer, a half-point for a partly correct answer and no points for a wrong answer or abandoned question. The accuracy scores per subject and mean accuracy scores per condition are shown in Table 3.ProcessorWord Condition TIES Paper HyperCardSubject1 6.0 11.0 8.5 11.52 9.5 12.0 7.5 12.08.0 10.5 10.0 10.534 6.0 11.0 7.5 9.0Mean 7.38 11.12 8.38 10.75SD 1.7 0.63 1.18 1.32 Table 3: Accuracy scores.As can be seen from these data, subjects performed better in both linear-format conditions than in the hypertext conditions. A one-way ANOVA revealed a significant effect for format (F[3,12] = 8.24, p < 0.005) and a series of post-hoc tests revealed significant difference between paper and TIES (t = 4.13, d.f. = 6, p < 0.01), Word Processor and TIES (t = 3.13, d.f. = 6, p < 0.05) and between Paper and Hypercard (t = 4.11, d.f. = 6, p < 0.01). Even using a more rigorous basis for rejection than the 5 per cent level, i.e., the 10/k(k-1) level, where k is the number of groups, suggested by Ferguson (1959), which results in a critical rejection level of p < 0.0083 in this instance, the Paper/TIES and Paper/Hypercard differences are still significant.The number of questions abandoned by subjects was also identified. Although there was no significant difference between conditions (F[3,12] = 1.85, NS) subjects using the linear formats abandoned less than those using the hypertext formats (total abandon rates: Paper = 1; Word Processor = 2; Hypercard = 4 and TIES = 9).NavigationTime spent viewing the Contents/Index (where applicable) was determined for each subject and converted to a percentage of total time. These scores are presented in Table 4.WordProcessor Condition TIES Paper HyperCardSubject1 53.28 2.72 47.16 6.342 25.36 1.49 19.1 13.933 49.5 10.24 17.5 12.874 30.84 5.36 23.4 7.54Mean 39.74 4.95 26.79 10.173.88 13.81 3.79 SD 13.72Table 4: Time spent viewing Contents/Index as a percentage of total time.This table demonstrates a very large difference between both hypertext formats and the linear formats. A one-way ANOVA revealed a significant effect for condition (F[3,12]= 9.95, p < 0.005). Once more, applying the more conservative critical rejection level, post-hoc tests revealed significant differences between Paper and TIES (t = 4.90, d.f. = 6, p < 0.003), between Word Processor and TIES (t = 4.16, d.f. = 6, p < 0.006) and between Hypercard and paper (t = 3.06, d.f. = 6, p < 0.03). Thus, interacting with a hypertext document may necessitate heavy usage of indices in order to navigate effectively through the information space.SummaryIn general, subjects performed better with the linear format texts than with the hypertexts. The linear formats led to significantly more accurate performance and to significantlyless time spent in the index and contents. Not surprisingly, estimating document size seems easier with linear formats.DiscussionWhile there was no significant effect for the estimation of document size data, a number of observations can be made. The accurate estimates for the Paper and Word Processor condition may well have resulted from the fact that the Contents pages indicated the total number of pages and that the page number was displayed for each page, and hence browsing through the document would have given repeated cues to the document size. Finally, in the Paper condition the subjects would have received tactile feedback as they manipulated the document.Subjects in the hypertext conditions had none of this information to help them form an impression of the document’s size. While many of the cards were discrete (i.e., there were few continuation cards) and were individually listed in the indices this information did not prevent some subjects from making large over-estimates. A poor estimate of document size could lead to incorrect assumptions concerning the range of coverage and level of a document and the adoption of an inappropriate reading strategy. Future studies might usefully explore the relationship between manipulation strategy and subjective impression of size for larger documents.A number of factors are likely to have influenced the subjects’ task completion times and as a result the lack of an overall speed effect is to be expected. These factors include variation in the subjects’ familiarity with the topic area (wines); variation in the subjects’ reading speeds; the presence or absence of a string search facility in the electronic versions; variation in the subjects’ familiarity with the different software packages; and their determination to continue searching until an answer is found. However, there does not appear to be a speed/accuracy trade-off.The strong effect found for the navigation measure appears to be consistent with the significant difference in accuracy scores between the four conditions. Subjects in the hypertext conditions spent considerably more time viewing the index and contents lists than did subjects in the linear conditions but were less successful in finding the correct answers to the questions. The hypertext systems elicited a style of text manipulation which consisted of selecting likely topics from the index, jumping to them and then returning to the index if the answer was not found. Relatively little time was spent browsing between linked nodes in the database. This is a surprising finding since hypertext systems in general are assumed to be designed with this style of manipulation in mind. It may be argued that a comprehension or summarisation task would have resulted in this style of manipulation but, in contrast to the above, subjects in the linear conditions tended to refer once to the Contents/Index and then scan the text for the answer rather than make repeated use of the Contents or Index.Further evidence of the superiority of scanning the text in the linear conditions as opposed to frequent recourse to the Contents/Index in the hypertext conditions issuggested by considering the relationship between use of the string search facilities and the numbers of questions abandoned before an answer was found. Three of the questions were designed so that a string search strategy would be optimal and two of the electronic text conditions (one linear, one hypertext) supported string searching. The lack of a string search facility in the TIES condition was associated with a very high proportion of abandoned questions (58%) whilst these three questions were answered with 100% accuracy by subjects in the paper condition.In the other two conditions in which string searching was supported it was used to different degrees of effectiveness. In the Hypercard condition the subjects employed string searching with 92% of the relevant questions and this resulted in 66% of the questions being answered correctly (and 17% being abandoned). In the Word Processor condition string searching was employed on 50% of the relevant questions and 92% were answered correctly (0% abandoned). Thus, although string searching was available to the subjects in the Word Processor condition it was used with less frequency than in the Hypercard condition. However, the subjects in the Word Processor condition answered substantially more of the questions correctly and this was presumably using strategies based on visual scanning.ConclusionAlthough some caution should be exercised in interpreting the results of this study, it is clear that for some texts and some tasks hypertext is not the universal panacea which some have claimed it to be. Furthermore, the various implementations of hypertext will support performance to a greater or lesser extent in different tasks. Future work should attempt to establish clearly the situations in which each general style of hypertext confers a positive advantage so that the potential of the medium can be realised.AcknowledgementThis work was funded by the British Library Research and Development Department and was carried out as part of Project Quartet.ReferencesDillon A. and McKnight C (1989) Towards the classification of text types: a repertory grid approach. International Journal of Man-Machine Studies, in press.Ferguson G A (1959) Statistical Analysis in Psychology and Education. McGraw-Hill, New York.Gordon S, Gustavel J, Moore J and Hankey J (1988) The effects of hypertext on reader knowledge representation. Proceedings of the Human Factors Society – 32nd Annual Meeting.Kreitzberg C and Shneiderman B (1988) Restructuring knowledge for an electronic encyclopedia. Proceedings of International Ergonomics Association’s 10th Congress.Marchionini G and Shneiderman B (1988) Finding facts vs. browsing knowledge in hypertext systems. Computer, January, 70–80.McKnight C, Dillon A and Richardson J (1990) From Here to Hypertext. Cambridge University Press, Cambridge, in press.Weldon L J, Mills C B, Koved L and Shneiderman B (1985) The structure of information in online and paper technical manuals. Proceedings of the Human Factors Society – 29th Annual Meeting.。
A Comparison of String Distance Metrics for Name-Matching TasksWilliam W.Cohen∗†Pradeep Ravikumar∗Stephen E.Fienberg∗†‡Center for Automated Center for Computer andLearning and Discovery∗Communications Security†Department of Statistics‡Carnegie Mellon University Carnegie Mellon University Carnegie Mellon University Pittsburgh PA15213Pittsburgh PA15213Pittsburgh PA15213 wcohen@ pradeepr@ fienberg@AbstractUsing an open-source,Java toolkit of name-matchingmethods,we experimentally compare string distancemetrics on the task of matching entity names.We inves-tigate a number of different metrics proposed by differ-ent communities,including edit-distance metrics,fastheuristic string comparators,token-based distance met-rics,and hybrid methods.Overall,the best-performingmethod is a hybrid scheme combining a TFIDF weight-ing scheme,which is widely used in information re-trieval,with the Jaro-Winkler string-distance scheme,which was developed in the probabilistic record linkagecommunity.IntroductionThe task of matching entity names has been explored by a number of communities,including statistics,databases,and artificial intelligence.Each community has formulated the problem differently,and different techniques have been pro-posed.In statistics,a long line of research has been conducted in probabilistic record linkage,largely based on the sem-inal paper by Fellegi and Sunter(1969).This paper for-mulates entity matching as a classification problem,where the basic goal is to classify entity pairs as matching or non-matching.Fellegi and Sunter propose using largely unsupervised methods for this task,based on a feature-based representation of pairs which is manually designed and to some extent problem-specific.These proposals have been,by and large,adopted by subsequent researchers,al-though often with elaborations of the underlying statisti-cal model(Jaro1989;1995;Winkler1999;Larsen1999; Belin&Rubin1997).These methods have been used to match individuals and/or families between samples and cen-suses,e.g.,in evaluation of the coverage of the U.S.decen-nial census;or between administrative records and survey data bases,e.g.,in the creation of an anonymized research data base combining tax information from the Internal Rev-enue Service and data from the Current Population Survey. In the database community,some work on record match-ing has been based on knowledge-intensive approaches Copyright c 2003,American Association for Artificial Intelli-gence().All rights reserved.(Hernandez&Stolfo1995;Galhardas et al.2000;Raman &Hellerstein2001).However,the use of string-edit dis-tances as a general-purpose record matching scheme was proposed by Monge and Elkan(Monge&Elkan1997; 1996),and in previous work,we proposed use of the TFIDF distance metric for the same purpose(Cohen2000). In the AI community,supervised learning has been used for learning the parameters of string-edit distance metrics (Ristad&Yianilos1998;Bilenko&Mooney2002)and combining the results of different distance functions(Te-jada,Knoblock,&Minton2001;Cohen&Richman2002; Bilenko&Mooney2002).More recently,probabilistic ob-ject identification methods have been adapted to matching tasks(Pasula et al.2002).In these communities there has been more emphasis on developing“autonomous”match-ing techniques which require little or or no configuration for a new task,and less emphasis on developing“toolkits”of methods that can be applied to new tasks by experts. Recently,we have begun implementing an open-source, Java toolkit of name-matching methods which includes a va-riety of different techniques.In this paper we use this toolkit to conduct a comparison of several string distances on the tasks of matching and clustering lists of entity names.This experimental use of string distance metrics,while similar to previous experiments in the database and AI com-munities,is a substantial departure from their usual use in statistics.In statistics,databases tend to have more structure and specification,by design.Thus the statistical literature on probabilistic record linkage represents pairs of entities not by pairs of strings,but by vectors of“match features”such as names and categories for variables in survey databases. By developing appropriate match features,and appropriate statistical models of matching and non-matching pairs,this approach can achieve better matching performance(at least potentially).The use of string distances considered here is most useful for matching problems with little prior knowledge,or ill-structured data.Better string distance metrics might also be useful in the generation of“match features”in more struc-tured database situations.MethodsEdit-distance like functionsDistance functions map a pair of strings s and t to a real number r,where a smaller value of r indicates greater sim-ilarity between s and t.Similarity functions are analogous, except that larger values indicate greater similarity;at some risk of confusion to the reader,we will use this terms inter-changably,depending on which interpretation is most natu-ral.One important class of distance functions are edit dis-tances,in which distance is the cost of best sequence of edit operations that convert s to t.Typical edit operations are character insertion,deletion,and substitution,and each op-eration much be assigned a cost.We will consider two edit-distance functions.The sim-ple Levenstein distance assigns a unit cost to all edit opera-tions.As an example of a more complex well-tuned distance function,we also consider the Monger-Elkan distance func-tion(Monge&Elkan1996),which is an affine1variant of the Smith-Waterman distance function(Durban et al.1998) with particular cost parameters,and scaled to the interval [0,1].A broadly similar metric,which is not based on an edit-distance model,is the Jaro metric(Jaro1995;1989;Winkler 1999).In the record-linkage literature,good results have been obtained using variants of this method,which is based on the number and order of the common characters between two strings.Given strings s=a1...a K and t=b1...b L, define a character a i in s to be common with t there is a b j=a i in t such that i−H≤j≤i+H,where H=min(|s|,|t|)2.Let s =a 1...aKbe the characters in s whichare common with t(in the same order they appear in s)andlet t =b 1...bL be analogous;now define a transpositionfor s ,t to be a position i such that a i=b i.Let T s ,t be half the number of transpositions for s and t .The Jaro similarity metric for s and t isJaro(s,t)=13·|s ||s|+|t ||t|+|s |−T s ,t|s |A variant of this due to Winkler(1999)also uses the length P of the longest common prefix of s and t.Letting P = max(P,4)we defineJaro-Winkler(s,t)=Jaro(s,t)+P10·(1−Jaro(s,t))The Jaro and Jaro-Winkler metrics seem to be intended pri-marily for short strings(e.g.,personalfirst or last names.) Token-based distance functionsTwo strings s and t can also be considered as multisets(or bags)of words(or tokens).We also considered several token-based distance metrics.The Jaccard similarity be-tween the word sets S and T is simply|S∩T||S∪T|.TFIDF or1Affine edit-distance functions assign a relatively lower cost to a sequence of insertions or deletions.cosine similarity,which is widely used in the information retrieval community can be defined asTFIDF(S,T)=w∈S∩TV(w,S)·V(w,T)where TF w,S is the frequency of word w in S,N is the size of the“corpus”,IDF w is the inverse of the fraction of names in the corpus that contain w,V (w,S)=log(TF w,S+1)·log(IDF w)and V(w,S)=V (w,S)/wV (w,S)2.Our imple-mentation measures all document frequency statistics from the complete corpus of names to be matched.Following Dagan et al(1999),a token set S can be viewed as samples from an unknown distribution P S of tokens,and a distance between S and T can be computed based on these distributions.Inspired by Dagan et al we considered the Jensen-Shannon distance between P S and P T.Letting KL(P||Q)be the Kullback-Lieber divergence and letting Q(w)=12(P S(w)+P T(w)),this is simplyJensen-Shannon(S,T)=12(KL(P S||Q)+KL(P T||Q)) Distributions P S were estimated using maximum likelihood, a Dirichlet prior,and a Jelenik-Mercer mixture model(Laf-ferty&Zhai2001).The latter two cases require parameters; for Dirichlet,we usedα=1,and for Jelenik-Mercer,we used the mixture coefficientλ=0.5.From the record-linkage literature,a method proposed by Fellegi and Sunter(1969)can be easily extended to a token distance.As notation,let A and B be two sets of records to match,let C=A∩B,let D=A∪B,and for X= A,B,C,D let P X(w)be the empirical probability of a name containing the word w in the set X.Also let e X be the empirical probability of an error in a name in set X;e X,0 be the probability of a missing name in set X;e T be the probability of two correct but differing names in A and B; and let e=e A+e B+e T+e A,0+e B,0.Fellegi and Sunter propose ranking pairs s,t by theodds ratio log(Pr(M|s,t)Pr(U|s,t))where M is the class of matched pairs and U is the class of unmatched pairs.Letting AGREE(s,t,w)denote the event“s and t both agree in containing word w”,Fellegi and Sunter note that under cer-tain assumptionsPr(M|AGREE(s,t,w))≈P C(w)(1−e)Pr(U|AGREE(s,t,w))≈P A(w)·P B(w)(1−e)If we make two addition assumptions suggested by Fellegi and Sunter,and assume that(a)matches on a word w are independent,and(b)that P A(w)≈P B(w)≈P C(w)≈P D(w),wefind that the incremental score for the odds ratio associated with AGREE(s,t,w)is simply−log P D(w).In information retrieval terms,this is a simple un-normalized IDF weight.Unfortunately,updating the log-odds score of a pair on discovering a disagreeing token w is not as sim-ple.Estimates are provided by Fellegi and Sunter forP(M|¬AGREE(s,t,w))and P(U|¬AGREE(s,t,w)), but the error parameters e A,e B,...do not cancel out–instead one is left with a constant penalty term,independent of w.Departing slightly from this(and following the intu-ition that disagreement on frequent terms is less important) we introduce a variable penalty term of k log P D(w),where k is a parameter to be set by the user.In the experiments we used k=0.5,and call this method SFS distance(for Simplified Fellegi-Sunter).Hybrid distance functionsMonge and Elkan propose the following recursive matching scheme for comparing two long strings s and t.First,s and t are broken into substrings s=a1...a K and t=b...b L. Then,similarity is defined assim(s,t)=1KKi=1Lmaxj=1sim (A i,B j)where sim is some secondary distance function.We con-sidered an implementation of this scheme in which the sub-strings are tokens;following Monge and Elkan,we call this a level two distance function.We experimented with level two similarity functions which used Monge-Elkan,Jaro,and Jaro-Winkler as their base function sim .We also considered a“soft”version of TFIDF,in which similar tokens are considered as well as tokens in S∩T. Again let sim be a secondary similarity function.Let CLOSE(θ,S,T)be the set of words w∈S such that there is some v∈T such that dist (w,v)>θ,and for w∈CLOSE(θ,S,T),let D(w,T)=max v∈T dist(w,v). We defineSoftTFIDF(S,T)=w∈CLOSE(θ,S,T)V(w,S)·V(w,T)·D(w,T)In the experiments,we used Jaro-Winkler as a secondary distance andθ=0.9.“Blocking”or pruning methodsIn matching or clustering large lists of names,it is not com-putationally practical to measure distances between all pairs of strings.In statistical record linkage,it is common to group records by some variable which is known a priori to be usually the same for matching pairs.In census work,this grouping variable often names a small geographic region, and perhaps for this reason the technique is usually called “blocking”.Since this paper focuses on the behavior of string match-ing tools when little prior knowledge is available,we will use here knowledge-free approaches to reducing the set of string pairs to compare.In a matching task,there are two sets A and B.We consider as candidates all pairs of strings (s,t)∈A×B such that s and t share some substring v which appears in at most a fraction f of all names.In a clus-tering task,there is one set C,and we consider all candi-dates(s,t)∈C×C such that s=t,and again s and t share some not-too-frequent substring v.For purely token-basedName Src M/C#Strings#Tokensanimal1M570930,006bird11M3771,977bird21M9824,905bird31M38188bird41M7194,618business1M213910,526game1M9115,060park1M6543,425fodorZagrat2M86310,846ucdFolks3M90454census4M8415,765UV A3C1166,845coraATDV5C191647,512 Table1:Datasets used in experiments.Column2indi-cates the source of the data,and column3indicates if it is a matching(M)or clustering(C)problem.Origi-nal sources are1.(Cohen2000)2.(Tejada,Knoblock,& Minton2001)3.(Monge&Elkan1996)4.William Winkler (personal communication)5.(McCallum,Nigam,&Ungar 2000)communication)methods,the substring v must be a token,and otherwise,it must be a ing inverted indices this set of pairs can be enumerated quickly.For the moderate-size test sets considered here,we used f=1.On the matching datasets above,the token blocker finds between93.3%and100.0%of the correct pairs,with an average of98.9%.The4-gram blocker alsofinds between 93.3%and100.0%of the correct pairs,with an average of 99.0%.ExperimentsData and methodologyThe data used to evaluate these methods is shown in Ta-ble1.Most been described elsewhere in the literature.The “coraATDV”dataset includes thefields author,title,date, and venue in a single string.The“census”dataset is a syn-thetic,census-like dataset,from which only textualfields were used(last name,first name,middle initial,house num-ber,and street).To evaluate a method on a dataset,we ranked by distance all candidate pairs from the appropriate grouping algorithm. We computed the non-interpolated average precision of this ranking,the maximum F1score of the ranking,and also in-terpolated precision at the eleven recall levels0.0,0.1,..., 0.9,1.0.The non-interpolated average precision of a rank-ing containing N pairs for a task with m correct matches is 1mNr=1c(i)δ(i)i,where c(i)is the number of correct pairs ranked before position i,andδ(i)=1if the pair at rank i is correct and0otherwise.Interpolated precision at recall r is the max i c(i)i,where the max is taken over all ranks i suchthat c(i)m≥r.Maximum F1is max i>0F1(i),where F1(i) is the harmonic mean of recall at rank i(i.e.,c(i)/m)and precision at rank i(i.e.,c(i)/i).0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 0.50.550.60.650.7 0.75 0.8 0.850.90.951M a x F 1 o f T F I D FMaxF1 of othervs Jaccardvs SFSvs Jensen-Shannony=x0.3 0.4 0.50.6 0.7 0.80.9 1 0.30.4 0.50.6 0.7 0.8 0.9 1A v g P r e c o f T F I D FAvgPrec of othervs Jaccardvs SFSvs Jensen-Shannony=x0.10.2 0.30.4 0.5 0.6 0.7 0.8 0.9 1 0 0.2 0.40.6 0.8 1P r e c i s i o nRecallTFIDFJensen-ShannonSFS JaccardFigure 1:Relative performance of token-based measures.Left,max F1of methods on matching problems,with points above the line y =x indicating better performance of TFIDF.Middle,same for non-interpolated average precision.Right,precision-recall curves averaged over all matching problems.Smoothed versions of Jensen-Shannon (not shown)are comparable in performance to the unsmoother version.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 00.10.20.3 0.4 0.5 0.6 0.7 0.8 0.91m a x F 1 o f M o n g e -E l k a nmax F1 of other distance metricvs Levenstein vs SmithWatermanvs Jarovs Jaro-Winklery=x0 0.10.2 0.3 0.4 0.5 0.6 0.70.8 0.9 1 00.10.20.3 0.4 0.5 0.6 0.7 0.8 0.91a v g P r e c o f M o n g e -E l k a navgPrec of other distance metricvs Levenstein vs SmithWatermanvs Jarovs Jaro-Winklery=xP r e c i s i o nRecallFigure 2:Relative performance of edit-distance measures.Left and middle,points above (below)the line y =x indicating better (worse)performance for Monge-Elkan,the system performing best on average.Results for matchingWe will first consider the matching datasets.As shown in Figure 1,TFIDF seems generally the best among the token-based distance metrics.It does slightly better on average than the others,and is seldom much worse than any other method with respect to interpolated average precision or maximum F1.As shown in Figure 2,the situation is less clear for the edit-distance based methods.The Monge-Elkan method does best on average,but the Jaro and Jaro-Winkler methods are close in average performance,and do noticably better than Monge-Elkan on several problems.The Jaro variants are also substantially more efficient (at least in our imple-mentation),taking about 1/10the time of the Monge-Elkan method.(The token-based methods are faster still,averaging less than 1/10the time of the Jaro variants.)As shown in Figure 3,SoftTFIDF is generally the best among the hybrid methods we considered.In general,the time for the hybrid methods is comparable to using the un-derlying string edit distance.(For instance,the average matching time for SoftTFIDF is about 13seconds on these problems,versus about 20seconds for the Jaro-Winkler method,and 1.2seconds for pure token-based TFIDF.)Finally,Figure 4compares the three best performing edit-distance like methods,the two best token-based methods,and the two best hybrid methods,using a similar method-ology.Generally speaking,SoftTFIDF is the best overalldistance measure for these datasets.Results for clusteringIt should be noted that the test suite of matching problems is dominated by problems from one source—eight of the eleven test cases are associated with the WHIRL project—and a different distribution of problems might well lead to quite different results.For instance,while the token-based methods perform well on average,they perform poorly on the census-like dataset,which contains many misspellings.As an additional experiment,we evaluated the four best-performing distance metrics (SoftTFIDF,TFIDF,SFS,and Level 2Jaro-Winkler)on the two clustering problems,which are taken from sources other than the WHIRL project.Ta-ble 2shows MaxF1and non-interpolated average precision for each method on each problem.SoftTFIDF again slightly outperforms the other methods on both of these tasks.UV ACoraATDVMethod MaxF1AvgPrec MaxF1AvgPrec SoftTFIDF 0.890.910.850.914TFIDF 0.790.840.840.907SFS0.710.750.820.864Level2J-W0.730.690.760.804Table 2:Results for selected distance metrics on two clusteringproblems.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80.9 00.10.20.3 0.4 0.5 0.6 0.7 0.80.91m a x F 1 o f S o f t T F I D Fmax F1 of other distance metricvs Level2 Levenstein vs Level2 Monge-Elkanvs Level2 Jarovs Level2 Jaro-Winklery=x0 0.1 0.2 0.3 0.4 0.5 0.6 0.70.8 0.9 00.10.20.3 0.4 0.5 0.6 0.7 0.80.91a v g P r e c o f S o f t T F I D FavgPrec of other distance metricvs Level2 Levenstein vs Level2 Monge-Elkanvs Level2 Jarovs Level2 Jaro-Winklery=xP r e c i s i o nRecallFigure 3:Relative performance of hybrid distance measures on matching problems,relative to the SoftTFIDF metric,which performs beston average.m a x F 1 o f S o f t T F I D Fmax F1 of other distance metrica v g P r e c o f S o f t T F I D FavgPrec of other distance metricP r e c i s i o nRecallFigure 4:Relative performance of best distance measures of each type on matching problems,relative to the SoftTFIDF metric.Learning to combine distance metricsAnother type of hybrid distance function can be obtained by combining other distance metrics.Following previous researchers (Tejada,Knoblock,&Minton 2001;Cohen &Richman 2002;Bilenko &Mooney 2002)we used a learning scheme to combine several of the distance functions above.Specifically,we represented pairs as feature vectors,using as features the numeric scores of Monge-Elkan,Jaro-Winkler,TFIDF,SFS,and SoftTFIDF.We then trained a binary SVM classifier (using SVM Light (Joachims 2002))using these features,and used its confidence in the “match”class as a score.Figure 5shows the results of a three-fold cross-validation on nine of the matching problems.The learned combina-tion generally slightly outperforms the individual metrics,including SoftTFIDF,particularly at extreme recall levels.Note,however,that the learned metric uses much more in-formation:in particular,in most cases it has been trained on several thousand labeled candidate pairs,while the other metrics considered here require no training data.Concluding remarksRecently,we have begun implementing an open-source,Java toolkit of name-matching methods.This toolkit includes a variety of different techniques,as well as the infrastructure to combine techniques readily,and evaluate them systemat-ically on test ing this toolkit,we conducted a com-parison of several string distances on the tasks of matchingand clustering lists of entity names.Many of these were techniques previously proposed in the literature,and some are novel hybrids of previous methods.We compared these accuracy of these methods for use in an automatic matching scheme,in which pairs of names are proposed by a simple grouping method,and then ranked ac-cording to ed in this way,we saw that the TFIDF ranking performed best among several token-based distance metrics,and that a tuned affine-gap edit-distance metric pro-posed by Monge and Elkan performed best among several string edit-distance metrics.A surprisingly good distance metric is a fast heuristic scheme,proposed by Jaro (Jaro 1995;1989)and later extended by Winkler (Winkler 1999).This works almost as well as the Monge-Elkan scheme,but is an order of magnitude faster.One simple way of combining the TFIDF method and the Jaro-Winkler is to replace the exact token matches used in TFIDF with approximate token matches based on the Jaro-Winkler scheme.This combination performs slightly bet-ter than either Jaro-Winkler or TFIDF on average,and oc-casionally performs much better.It is also close in perfor-mance to a learned combination of several of the best metrics considered in this paper.AcknowledgementsThe preparation of this paper was supported in part by Na-tional Science Foundation Grant No.EIA-0131884to the National Institute of Statistical Sciences and by a contract from the Army Research Office to the Center for Computerm a x F 1 o f l e a r n e d m e t r i cmax F1 of other distance metrica v g P r e c o f l e a r n e d m e t r i cavgPrec of other distance metric0.20.3 0.4 0.5 0.6 0.7 0.80.9 0 0.2 0.40.6 0.8 1P r e c i s i o nRecallSoftTFIDF Learned metricFigure 5:Relative performance of best distance measures of each type on matching problems,relative to a learned combination of the samemeasures.and Communications Security with Carnegie Mellon Uni-versity.ReferencesBelin,T.R.,and Rubin,D.B.1997.A method for calibrating false-match rates in record linkage.In Record Linkage –1997:Proceedings of an International Workshop and Exposition ,81–94.U.S.Office of Management and Budget (Washington).Bilenko,M.,and Mooney,R.2002.Learning to combine trained distance metrics for duplicate detection in databases.Technical Report Technical Report AI 02-296,Artificial Intel-ligence Lab,University of Texas at Austin.Available from /users/ml/papers/marlin-tr-02.pdf.Cohen,W.W.,and Richman,J.2002.Learning to match and cluster large high-dimensional data sets for data integration.In Proceedings of The Eighth ACM SIGKDD International Confer-ence on Knowledge Discovery and Data Mining (KDD-2002).Cohen,W.W.2000.Data integration using similarity joins and a word-based information representation language.ACM Transac-tions on Information Systems 18(3):288–321.Dagan,I.;Lee,L.;and Pereira,F.1999.Similarity-based models of word cooccurrence probabilities.Machine Learning 34(1-3).Durban,R.;Eddy,S.R.;Krogh,A.;and Mitchison,G.1998.Biological sequence analysis -Probabilistic models of proteins and nucleic acids .Cambridge:Cambridge University Press.Fellegi,I.P.,and Sunter,A.B.1969.A theory for record linkage.Journal of the American Statistical Society 64:1183–1210.Galhardas,H.;Florescu,D.;Shasha,D.;and Simon,E.2000.An extensible framework for data cleaning.In ICDE ,312.Hernandez,M.,and Stolfo,S.1995.The merge/purge problem for large databases.In Proceedings of the 1995ACM SIGMOD .Jaro,M.A.1989.Advances in record-linkage methodology as applied to matching the 1985census of Tampa,Florida.Journal of the American Statistical Association 84:414–420.Jaro,M.A.1995.Probabilistic linkage of large public health data files (disc:P687-689).Statistics in Medicine 14:491–498.Joachims,T.2002.Learning to Classify Text Using Support Vec-tor Machines .Kluwer.Lafferty,J.,and Zhai,C.2001.A study of smoothing methods for language models applied to ad hoc information retrieval.In 2001ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR).Larsen,M.1999.Multiple imputation analysis of records linked using mixture models.In Statistical Society of Canada Proceed-ings of the Survey Methods Section ,65–71.Statistical Society of Canada (McGill University,Montreal).McCallum,A.;Nigam,K.;and Ungar,L.H.2000.Efficient clus-tering of high-dimensional data sets with application to reference matching.In Knowledge Discovery and Data Mining ,169–178.Monge,A.,and Elkan,C.1996.The field-matching problem:algorithm and applications.In Proceedings of the Second Inter-national Conference on Knowledge Discovery and Data Mining .Monge,A.,and Elkan,C.1997.An efficient domain-independent algorithm for detecting approximately duplicate database records.In The proceedings of the SIGMOD 1997workshop on data min-ing and knowledge discovery .Pasula,H.;Marthi,B.;Milch,B.;Russell,S.;and Shpitser,I.2002.Identity uncertainty and citation matching.In Advances in Neural Processing Systems 15.Vancouver,British Columbia:MIT Press.Raman,V .,and Hellerstein,J.2001.Potter’s wheel:An interac-tive data cleaning system.In The VLDB Journal ,381–390.Ristad,E.S.,and Yianilos,P.N.1998.Learning string edit distance.IEEE Transactions on Pattern Analysis and Machine Intelligence 20(5):522–532.Tejada,S.;Knoblock,C.A.;and Minton,S.2001.Learning ob-ject identification rules for information rmation Systems 26(8):607–633.Winkler,W.E.1999.The state of record linkage and cur-rent research problems.Statistics of Income Division,In-ternal Revenue Service Publication R99/04.Available from /srd/www/byname.html.。
Package‘mixcat’October13,2022Title Mixed Effects Cumulative Link and Logistic Regression ModelsVersion1.0-4Date2019-12-20Author Georgios Papageorgiou[aut,cre],John Hinde[aut]Maintainer Georgios Papageorgiou<******************>Depends R(>=2.8.1),statmodLazyLoad yesDescription Mixed effects cumulative and baseline logit link models for the analysis of ordi-nal or nominal responses,with non-parametric distribution for the random effects.License GPL(>=2)NeedsCompilation yesRepository CRANDate/Publication2019-12-2012:20:02UTCR topics documented:mixcat-package (1)npmlt (2)schizo (6)summary.npmreg (7)Index9 mixcat-package Mixed effects cumulative link and logistic regression modelsDescriptionMixed effects models for the analysis of binary or multinomial(ordinal or nominal)data with non-parametric distribution for the random effects.The main function is npmlt and itfits cumulative and baseline logit models.1DetailsPackage:mixcatType:PackageVersion: 1.0-4Date:2019-12-20License:GPL(>=2)LazyLoad:noThis program is free software;you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation;either version2of the Li-cense,or(at your option)any later version.This program is distributed in the hope that it will be useful,but WITHOUT ANY W ARRANTY;without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.See the GNU General Public License for more details.For details on the GNU General Public License see /copyleft/gpl.html or write to the Free Software Foundation,Inc.,51Franklin Street,Fifth Floor,Boston,MA02110-1301,USA.AcknowledgmentsPapageorgiou’s work was supported by the Science Foundation Ireland Research Frontiers grant 07/RFP/MATF448.Author(s)Georgios Papageorgiou and John Hinde(2011)Maintainer:Georgios Papageorgiou<******************>ReferencesPapageorgiou,G.and Hinde,J.(2012).Multivariate generalized linear mixed models with semi-nonparametric and smooth nonparametric random effects densities.Statistics and Computing22, 79-92npmlt Mixed effects cumulative link and logistic regression modelsDescriptionFits cumulative logit and baseline logit and link mixed effects regression models with non-para-metric distribution for the random effects.Usagenpmlt(formula,formula.npo=~1,random=~1,id,k=1,eps=0.0001,start.int=NULL,start.reg=NULL,start.mp=NULL,start.m=NULL,link="clogit",EB=FALSE,maxit=500,na.rm=TRUE,tol=0.0001)Argumentsformula a formula defining the response and thefixed,proportional odds,effects part of the model,e.g.y~x.formula.npo a formula defining non proportional odds variables of the model.A response is not needed as it has been provided in formula.Intercepts need not be providedas they are always non proportional.Variables in formula.npo must be a subsetof the variables that appear in the right hand side of formula,e.g.~x.random a formula defining the random part of the model.For instance,random=~1 defines a random intercept model,while random=~1+x defines a model withrandom intercept and random slope for the variable x.If argument k=1,theresulting model is afixed effects model(see below).Variables in random mustbe a subset of the variables that appear in the right hand side of formula.id a factor that defines the primary sampling units,e.g.groups,clusters,classes,or individuals in longitudinal studies.These sampling units have their own randomcoefficient,as defined in random.If argument id is missing it is taken to beid=seq(N),where N is the total number of observations,suitable for overdis-persed independent multinomial data.k the number of mass points and masses for the non-parametric(discrete)random effects distribution.If k=1the functionfits afixed effects models,regerdless ofthe random specification,as with k=1the random effects distribution is degen-erate at zero.eps positive convergence tolerance epsilon.Convergence is declared when the max-imum of the absolute value of the score vector is less than epsilon.start.int a vector of length(number of categories minus one)with the starting values the fixed intercept(s).start.reg a vector with the starting values for the regression coefficients.One starting value for the proportional odds effects and(number of categories minus one)starting values for the non proportional effects,in the same order as they appearin formula.start.mp starting values for the mass points of the random effects distribution in the form:(k starting values for the intercepts,k starting values for thefirst randomslope,...).start.m starting values for the masses of the random effects distribution:a vector of length k with non-negative elements that sum to1.link for a cumulative logit model set link="clogit"(default).For a baseline logit model,set link="blogit".Baseline category is the last category.EB if EB=TRUE the empirical Bayes estimates of the random effects are calculated and stored in the component eBayes.Further,fitted values of the linear predictor(stored in the component fitted)andfitted probabilities(stored in object prob)are obtained at the empirical Bayes estimates of the random effects.Otherwise,if EB=FALSE(default),empirical Bayes estimates are not calculated andfittedvalues of the linear predictors and probabilities are calculated at the zero valueof the random effects.maxit integer giving the maximal number of iterations of thefitting algorithm until convergence.By default this number is set to500.na.rm a logical value indicating whether NA values should be stripped before the com-putation proceeds.tol positive tolerance level used for calculating generalised inverses(g-inverses).Consider matrix A=P DP T,where D=Diag{eigen i}is diagonal withentries the eigen values of A.Its g-inverse is calculated as A−=P D−P T,where D−is diagonal with entries1/eigen i if eigen i>tol,and0otherwise. DetailsMaximizing a likelihood over an unspecified random effects distribution results in a discrete mass point estimate of this distribution(Laird,1978;Lindsay,1983).Thus,the terms‘non-parametric’(NP)and‘discrete’random effects distribution are used here interchangeably.Function npmlt allows the user to choose the number k of mass points/masses of the discrete distribution,a choice that should be based on the log-likelihood.Note that the mean of the NP distribution is constrained to be zero and thus for k=1thefitted model is equivalent to afixed effects model.For k>1and a random slope in the model,the mass points are bivariate with a component that corresponds to the intercept and another that corresponds to the slope.General treatments of non-parametric modeling can be found in Aitkin,M.(1999)and Aitkin et al.(2009).For more details on multinomial data see Hartzel et al(2001).The response variable y can be binary or multinomial.A binary response should take values1and 2,and the function npmlt will model the probability of1.For an ordinal response,taking values 1,...,q,a cumulative logit model can befit.Ignoring the random effects,such a model,withformula y~x,takes the formlogP(Y≤r)1−P(Y≤r)=βr+γx,whereβr,r=1,...,q−1,are the cut-points andγis the slope.Further,if argument formula.npo is specified as~x,the model becomeslogP(Y≤r)1−P(Y≤r)=βr+γr x,Similarly,for a nominal response with q categories,a baseline logit model can befit.Thefixed effects part of the model,y~x,takes the form,log P(Y=r)P(Y=q)=βr+γx,where r=1,...,q−1.Again,formula.npo can be specified as~x,in which case slopeγwill be replaced by category specific slopes,γr.The user is provided with the option of specifying starting values for some or all the model parame-ters.This option allows for starting the algorithm at different starting points,in order to ensure thatit has convered to the point of maximum likelihood.Further,if thefitting algorithm fails,the user can start byfitting a less complex model and use the estimates of this model as starting values for the more complex one.With reference to the tol argument,thefitting algorithm calculates g-inverses of two matrices:1.the information matrix of the model,and2.the covariance matrix of multinomial proportions.The covariance matrix of a multinomial proportion p of length q is calculated as Diag{p∗}−p∗p∗T, where p∗is of length q−1.A g-inverse for this matrix is needed because elements of p∗can become zero or one.ValueThe function npmlt returns an object of class‘npmreg’,a list containing at least the following components:call the matched call.formula the formula supplied.formula.npo the formula for the non proportional odds supplied.random the random effects formula supplied.coefficients a named vector of regression coefficients.mass.points a vector or a table that contains the mass point estimates.masses the masses(probabilities)corresponding to the mass points.vcvremat the estimated variance-covariance matrix of the random effects.var.cor.mat the estimated variance-covariance matrix of the random effects,with the upper triangular covariances replaced by the corresponding correlations.m2LogL minus twice the maximized log-likelihood of the chosen model.SE.coefficientsa named vector of standard errors of the estimated regression coefficients.SE.mass.points a vector or a table that contains the the standard errors of the estimated mass points.SE.masses the standard errors of the estimated masses.VRESE the standard errors of the estimates of the variances of random effects.CVmat the inverse of the observed information matrix of the model.eBayes if EB=TRUE it contains the empirical Bayes estimates of the random effects.Oth-erwise it contains vector(s)of zeros.fitted thefitted values of the linear predictors computed at the empirical Bayes esti-mates of the random effects,if EB=TRUE.Otherwise,if EB=FALSE(default)thesefitted values are computed at the zero value of the random effects.prob the estimated probabilities of observing a response at one of the categories.These probabilities are computed at the empirical Bayes estimates of the ran-dom effects,if EB=TRUE.If EB=FALSE(default)these estimated probabilities arecomputed at the zero value of the random effects.nrp number of random slopes specified.iter the number of iterations of thefitting algorithm.6schizo maxit the maximal allowed number of iterations of thefitting algorithm until conver-gence.flagcvm last iteration at which eigenvalue(s)of covariance matrix of multinomial variable were less than tol argument.flaginfo last iteration at which eigenvalue(s)of model information matrix were less than tol argument.Author(s)Georgios Papageorgiou<******************>ReferencesAitkin,M.(1999).A general maximum likelihood analysis of variance components in generalized linear models.Biometrics55,117-128.Aitkin,M.,Francis,B.,Hinde,J.,and Darnell,R.(2009).Statistical Modelling in R.Oxford Statistical Science Series,Oxford,UK.Hedeker,D.and Gibbons,R.(2006).Longitudinal Data Analysis.Wiley,Palo Alto,CA.Hartzel,J.,Agresti,A.,and Caffo,B.(2001).Multinomial logit random effects models.Statistical Modelling,1(2),81-102.Laird,N.(1978).Nonparametric maximum likelihood estimation of a mixing distribution.Journal of the American Statistical Association,73,805-811.Lindsay,B.G.(1983).The geometry of mixture likelihoods,Part II:The exponential family.The Annals of Statistics,11,783-792.See Alsosummary.npmregExamplesdata(schizo)attach(schizo)npmlt(y~trt*sqrt(wk),formula.npo=~trt,random=~1+trt,id=id,k=2,EB=FALSE)schizo National Institute of Mental Health shizophrenia studyDescriptionSchizophrenia data from a randomized controlled trial with patients assigned to either drug or placebo group."Severity of Illness"was measured,at weeks0,1,...6,on a four category ordered scale:1.normal or borderline mentally ill,dly or moderately ill,3.markedly ill,and4.severely or among the most extremely ill.Most of the observations where made on weeks0,1,3, and6.Usagedata(schizo)FormatA data frame with1603observations on437subjects.Four numerical vectors contain informationonid patient ID.y ordinal response on a4category scale.trt treatment indicator:1for drug,0for placebo.wk week.Source/~hedeker/ml.htmlReferencesHedeker,D.and Gibbons,R.(2006).Longitudinal Data Analysis.Wiley,Palo Alto,CA. summary.npmreg Summarizing mixed multinomial regression modelfitsDescriptionsummary and print methods for objects of type npmreg.Usage##S3method for class npmregsummary(object,digits=max(3,getOption("digits")-3),...)##S3method for class npmregprint(x,digits=max(3,getOption("digits")-3),...)Argumentsobject an object of class npmreg.x an object of class npmreg.digits the minimum number of significant digits to be printed in values....further arguments,which will mostly be ignored.DetailsThe function npmlt returns an object of class"npmreg".The function summary(i.e.,summary.npmreg) can be used to obtain or print a summary of the results,and the function print(i.e.,print.npmreg) to print the results.ValueSummary or print.Author(s)Georgios Papageorgiou<******************>See AlsonpmltExamplesdata(schizo)attach(schizo)fit1<-npmlt(y~trt*sqrt(wk),formula.npo=~trt,random=~1,id=id,k=2) print(fit1)summary(fit1)Index∗datasetsschizo,6∗modelsmixcat-package,1npmlt,2summary.npmreg,7∗regressionmixcat-package,1npmlt,2summary.npmreg,7mixcat(mixcat-package),1mixcat-package,1npmlt,1,2,8print.npmreg(summary.npmreg),7schizo,6summary.npmreg,6,79。
语言学包括语言的特征、语言功能、语言学重要分支(宏观语言学和微观语言学分支)、语音学、音系学、形态学、句法学、语义学、语用学、话语(语篇)分析、语言比较和对比、语言,文化与社会、文体学、二语习得理论、语言学与外语教学和现代语言学主要流派等语言学基础知识和语言学理论知识。
Schemata and inference drawingTree diagram, syntactic structure2003RegisterAn agglutinative languageImmediate constituent analysisCohesionGrim’s lawGrice’s co-operative principleCreoleFree morphemeDeep structureCompetenceC7 Language and culture, relationship; Sapir-Whorf hypothesis (strong & weak) (2014 c-500)C7 Language and society, Lakoff/women register (2004 same question)C1 Saussure’s langue & parole, Chomsky’s competence & performance: comparison (2005 d-; 2007 same topic; 2008 langue and parole, distinctions; 2009 parole and language, distinction; 2011 distinction of competence and performance, relationship between them)C5 seven types of meaning (G.Leech) (2008 C-200)2004An isolating languageA minimal pairGreat V owel ShiftEmbeddingA diachronic view of language (2016 why is diachronic and synchronic an important distinction in linguistics? 10’)C1 Functions of language (2008)C5 semantic triangle, symbol/concept/thought/referentC7 language and society, the influence of power & solidarity;C7 language and society, taboo/euphemistic words/political correctness2005DualityLangue and paroleC2 p45 Distinctive features (2008 C-100)Paradigmatic relations (also called associative relation by Saussure)Speech act theorySelection restrictionsC8 p181 characteristics of implicature (2007 d-/2009 the theory of it/2014 c-300/2016术语) C12 the three important points of the Prague School (P16 Spark)C8 the theory of the illocutionary act (2009 C-100/2011C-100)C5 the referential theory of meaning (2008 same)C5 Componential analysis, advantages and disadvanges (2007 advantages/2009 c-100/2013 weaknesses)C1 communicative competenceC7 Language and society, Lakoff/you are what you say2006DisplacementCompetence and performanceMetalingual function of languageEtic and emicEnocentric and exocentric constructionsC4 IC analysis, advantages and disadvangesC12 systemic-functional grammar (2007/2008/2011)C5 logical semanticsC11 input hypothesis and language learning2007Foregrounding (2010 c-200)Conversational implicatureAllophoneMetalinguistic function (Metalingual function?)Creativity as a property of languageC2 distinctive feature, definitionC5 Componential analysis, advantages (2005)C4 endocentric and exocentric constructions, definition (2013 endocentric c-200)C12 systemic-functional grammar (2006)C8 relevance theory and Grice’s cooperative principleC1 Saussure’s langue & parole, Chomsky’s competence & performance: comparison—similarities and differences (2003)2008ArbitrarinessAnthropological linguistics (2016)Bound morphemeParadigmatic relation (2005)Halliday’s material process in systemic functional grammarC1 Functions of language (2004)C1 langue and parole, distinctionsC5 the referential theory of meaning (2008)C2 p45 Distinctive features (2005-d)C5 seven types of meaning (G.Leech) (2003)C7 Language and society, relationshipC12 American structuralism (2010 c-100): is a branch of synchronic linguistics that emerged in the United States at the beginning of the twentieth century. (2016 what is the structural approach to the analysis of language? Please explain with examples)2009Arbitrariness at the syntactic level (2008单考anbitrariness/2011/2012the design features/2016)MacrolinguisticsForegrounding and grammatical form (2007)Gradable antonymyMinimal pairsC8 the theory of the illocutionary act (2005)Felicity conditionsC5 Componential analysis (2005 advantages and disadvanges/2007 advantages)C12 the London School in linguistics (2012 c-100)C8 p181 the theory of implicature (2007 d-/2005 characteristics): Laurence HornLinguistic relativityC1 Language and parole, distinction (2003 Saussure’s langue & parole, Chomsky’s competence & performance: comparison/2005 d-/2007 same to 2003/2008 langue and parole, distinctions)2010C2 P24 Acoustic phoneticsC2 P37 Co-articulationC2 P41 Free variation (in phonology)C5 Affective meaning (as is defined by G.Leech)C3 P54 Derivational affixPhoneme theory (2016 what is the phoneme theory? 10’)C5 the integrated theory of meaning (2011 adcantages and disadvantages)Context of situation (2016 general context effect and specific context effect)C12 American structuralism (2008 c-200)C7 sociolinguistics and foreign language teaching, implicationC9 foregrounding and literature study (2007)2011C1 Arbitrariness at the syntactic level (2009/2008)C1 Emotive function of languageC1 Computational linguisticsC2 Minimal pairs (in phonology)C11 Interlanguage P254 (2014 its functions): the type of language constructed by second or foreign language learners who are still in the process of learning a language.C1 distinction of competence and performance, relationship between them (2003 Saussure’s langue & parole, Chomsky’s competence & performance: comparison (2005 d-; 2007 same topic; 2008 langue and parole, distinctions; 2009 parole and language, distinction)C11 requirements of a good test (2016 aptitude test 术语)C8 a theory of the illocutionary act (2009 C-100/2005C-100)C12 Halliday functions and structure/c-150 system relationship/the specialty of systemic-functional linguistics (2006/2007/2008)C5 the integrated theory of meaning, advantages and disadvantages (2010)2012C1 The design features of language: arbitrariness, duality, creativity (productivity), displacementC1 Textual function of language P10C12 Functional sentence perspective: FSP, Prague School, theme, rhymeC8 Illocutionary force: extra meaning of the utterances or written text言外之意p175C8 The relevance theory: Sperber and Wilson (2013/2015 c-350)C1 Micro-linguistics and macro-linguisticsC1 the reason why competence and performance an important distinction (2003 Saussure’s langue & parole, Chomsky’s competence & performance: comparison (2005 d-; 2007 same topic; 2008 langue and parole, distinctions; 2009 parole and language, distinction; 2011 distinction of competence and performance, relationship between them)C12 the London School (2009 same)C12 functionalism in linguistics2013 (《语法讲义》朱德熙)C1 The pooh-pooh theory of origin of language (2016)C3 Bound morphemesC1 Informative function of languageC6 Cognitive linguisticsC8 Sperber and Wilson’s relevance theory (2012)C2 Suprasegmental features and their functionsC1 Jakobson’s view of the functions of language (2015 C-300)C4 endocentric constrution c-200 (2007 endocentric and exocentric constructions, definition) C5 Componential analysis, weaknesses (2005 advantages and disadvanges/2007 advantages/2009 c-100)C12 c1500 functional sentence perspectiveFirth’s theory of language2014C1 Duality (2005)C1/C2 Phonology [minimal pairs, free variation]C1 Computational linguistics (2011)C2 A minimal pair (2004/2005/2009/2011)C4 Horizontal relations (or syntactic relations) [Arbitrariness at the syntactic levelC-300C8 p181 characteristics of implicature (2005/2007 d-/2009 the theory of it)C11 functions of inter-language (2001 d-)C4 tree diagram, examplesc-500C9 Sapir-Whorf hypothesis (strong & weak) (2003)C8 post-Gricean developments in pragatics2015C5 PresuppositionC3 Free and bound morphemes (2013/2008/2003)C12 The innateness hypothesisC1 Competence and performance (2006/2003)C9 Text styleC1 Jakobson’s view of the functions of language (2013 C-300)C9 definition, features and effects of free direct thoughtC-350C8 The relevance theory: Sperber and Wilson (2012/2013) Schemata and inference drawing。
Physica A387(2008)2547–2560/locate/physa Parametric and nonparametric Granger causality testing:Linkagesbetween international stock marketsJan G.De Gooijer a,∗,Selliah Sivarajasingham ba Department of Quantitative Economics and Tinbergen Institute,University of Amsterdam Roetersstraat11,1018WB Amsterdam,The Netherlandsb Department of Economics,University of Peradeniya,Peradeniya,Sri LankaReceived12July2007;received in revised form2November2007Available online11January2008AbstractThis study investigates long-term linear and nonlinear causal linkages among eleven stock markets,six industrialized markets andfive emerging markets of South-East Asia.We cover the period1987–2006,taking into account the on-set of the Asianfinancial crisis of1997.Wefirst apply a test for the presence of general nonlinearity in vector time series.Substantial differences exist between the pre-and post-crisis period in terms of the total number of significant nonlinear relationships.We then examine both periods,using a new nonparametric test for Granger noncausality and the conventional parametric Granger noncausality test.One majorfinding is that the Asian stock markets have become more internationally integrated after the Asianfinancial crisis.An exception is the Sri Lankan market with almost no significant long-term linear and nonlinear causal linkages with other markets. To ensure that any causality is strictly nonlinear in nature,we also examine the nonlinear causal relationships of V ARfiltered residuals and V ARfiltered squared residuals for the post-crisis sample.Wefind quite a few remaining significant bi-and uni-directional causal nonlinear relationships in these series.Finally,afterfiltering the V AR-residuals with GARCH-BEKK models, we show that the nonparametric test statistics are substantially smaller in both magnitude and statistical significance than those beforefiltering.This indicates that nonlinear causality can,to a large extent,be explained by simple volatility effects.c 2008Elsevier B.V.All rights reserved.Keywords:GARCH-BEKK;Granger causality;Hypothesis testing;Nonparametric;Stock market linkages1.IntroductionSince the late1980s many national stock exchange markets in industrial countries have become aware of the increased competitiveness among these markets.This,in conjunction with a less restrictive climate toward capital movements has brought about the view among economists that the majorfinancial markets of the world are systematically interrelated.This interrelationship may indicate a growing similarity in reactions toward external developments in macroeconomic policies and in the worldfinancial environment.In addition,it may also reflect a temporary,or perhaps more lasting,causal relationship between various individual stock exchanges.∗Corresponding author.E-mail addresses:j.g.degooijer@uva.nl(J.G.De Gooijer),ssivaraj@pdn.ac.lk(S.Sivarajasingham).0378-4371/$-see front matter c 2008Elsevier B.V.All rights reserved.doi:10.1016/j.physa.2008.01.0332548J.G.De Gooijer,S.Sivarajasingham/Physica A387(2008)2547–2560Causal linkages among stock markets have important implications for security pricing,hedging and trading strategies,andfinancial market regulations.Also the presence of long-term linear and nonlinear relationships may be used to achievefinancial gains from international portfolio diversification and to reduce systematic local risks. Consequently,there exists a large body of literature examining the presence of causal linkages between developed (less risky)markets.They typicallyfind that the US market leads other developed markets(e.g.Ref.[31]).However, there is substantially less literature on stock market linkages between developed markets and emerging markets;see Section2for a selective overview.Moreover,quite a few studies relied on the restrictive assumption of a causal linear relationship between stock markets through the use of Granger’s[21],parametric,causality test.But,as noted by Hsieh[26,27]and many others,financial time series exhibit significant nonlinear features.Indeed,Hiemstra and Jones[25]argue that a nonlinear and nonparametric Granger causality test(hereafter HJ test),based on the work of Baek and Brock[4],is more effective in uncovering certain nonlinear causal relationships in daily stock prices.The HJ causality test seems to be the one most used in economics andfinance.Examples include stock-price–volume relationships[25,43],futures and cash markets[15],stock price dividend relationships[29], fundamentals and exchange rates[35],equity volatility returns[7].However,Diks and Panchenko[12,13]demonstrate that the HJ test can severely over-reject if the null hypothesis of noncausality is not true.In addition,with instantaneous dependence,the HJ test has serious size distortion problems.As an alternative the authors of Ref.[13](hereafter DP) develop a new test statistic which does not suffer from these limitations.Their empirical results suggest that some of the rejections of the Granger noncausality hypothesis,using the HJ causality test,may be spurious.The objective of the current paper is two-fold.Thefirst one is to explore the existence of linear and nonlinear causal relationships among eleven stock markets.Six of these(Germany,Hong Kong,Japan,Singapore,UK,and US) belong to the group of world’s major stock markets,whilefive markets(India,Malaysia,South Korea,Sri Lanka, and Taiwan)are emerging stock markets in South-East Asia.Clearly South-East Asia as a region has undergone rapid market liberalization in the past decade,resulting in increased investmentflows.A possible consequence of thisfinancial openness is an increase in the causal linkages between these emerging markets and the world’s major financial markets.In particular,the time period after the1997Asianfinancial crisis may have changed the direction and strength of the causal relationships among the markets under study.A second objective is to explore the ability of the DP test to detect nonlinear causal relationships.The paper hasfive remaining sections.Section2presents a brief overview of the relevant literature.Also we point out some limitations of the reviewed studies.In Section3we present some selected stock market indicators jointly with a discussion of the eleven stock market indices.Section4,entitled“Testing methodology”,introduces(i)a multivariate test of nonlinearity;and(ii)the nonparametric DP causality test.The empiricalfindings are reported in Section5.Thefinal section closes the paper by discussing some of the main implications of the results and providing directions for future research.2.Literature reviewThere is a wealth of literature on stock market interdependence and integration.However,depending on the data, methodology,and theoretical models used there is no clear resolution of the issue yet.Some previous work has have found that international stock markets are integrated;see,e.g.,Refs.[2,23].Others have found that stock markets are not interlinked;see,e.g.,Refs.[41,44].Most of the studies on stock market interdependence in emerging markets have been done on geographical groups of markets,such as markets in Central and Eastern Europe[20,22],Latin America[10,11,9],and in Asian countries. Since stock markets in South-East Asia form a substantial part of the set of markets considered here,we summarize some of the most recentfindings.Masih and Masih[33,34]found cointegration in the pre-financial crisis period of October1987among the stock markets of Thailand,Malaysia,the US,the UK,Japan,Hong Kong and Singapore.But there were no long-run relationships between these markets for the period after the global stock market crash of1987.By contrast,Phylaktis and Ravazzolo[39]found no linkages and dynamic interactions amongst a group of Pacific-Basin stock markets (Hong Kong,South Korea,Malaysia,Singapore,Taiwan and Thailand)and the industrialized countries of Japan and US for the period1980–1998.Further,Arshanapalli et al.[3]noted an increase in stock market interdependence after the1987crisis for the emerging markets of Malaysia,the Philippines,Thailand,and the developed markets of Hong Kong,Singapore,the US and Japan for the period1986–1992.Likewise,when testing for causality-in-variance,J.G.De Gooijer,S.Sivarajasingham/Physica A387(2008)2547–25602549 Caporale et al.[8]found some empirical evidence on the bi-directional causal relationships between stock prices and exchange rate volatility in the case of Indonesia and Thailand for the post-crisis period1987–2000.Also,Najand[36], using linear state space models,detected stronger interactions among the stock markets of Japan,Hong Kong,and Singapore after the1987stock market crash.Linkages among national stock markets before and during the period of the Asianfinancial crisis in1997/98were explored by Sheng and Tu[42].In particular,adopting multivariate cointegration and error-correction tests,these authors focused on11major stock markets in the Asian-Pacific region(Australia,China,Hong Kong,Indonesia, Japan,Malaysia,Philippines,Singapore,South Korea,Taiwan and Thailand)and the ing daily closing prices, they found empirical evidence that cointegration relationships among the national stock indices has increased during, but not before,the period of thefinancial crises.More recently,Weber[46]revealed various causality-in-variance effects between the volatilities in the nationalfinancial markets in the Asian-Pacific region(Australia,Hong Kong, Indonesia,India,Japan,South Korea,New Zealand,Philippines,Singapore,Taiwan and Thailand)for the post-crises period1999–2006.Also,allowing for structural breaks such as the Asianfinancial crisis,Narayan et al.[37]found that stock prices in Bangladesh,India and Sri Lanka Granger-cause stock prices in Pakistan for the period1995–2001.All the above studies rely on the restrictive assumption of linearity either through the use of linear causality tests or via linear time series methodology.Moreover,some of these studies fail to notice that parametric linear Granger causality tests have low power against nonlinear alternatives;see Ref.[4].Recognition of the nonlinear property of stock prices,and subsequently exploring for possible long-run nonlinear relations among national stock markets,came after publication of the study by Hiemstra and Jones[25].For instance,using the HJ causality test,Hunter[28]focuses on the emerging markets of Argentina,Chile,and Mexico.Similarly,Ozdemir and Cakan[38]examine the dynamic relationships between the stock market indices of the US,Japan,France,and the UK.This latter study reports that there is a strong bi-directional nonlinear causal relationship between the US and the other countries,which has also been documented in the literature using linear causality tests.But,as explained in Section1,the HJ test may lead to false inference.Clearly the causality and nonlinearity tests to be introduced in Section4provide a useful way to extend and update much of the above empirical knowledge on causal(non)linear relationships.3.DataData consists of eleven time series of daily closing(5days)stock market price indices,measured in domestic currencies.The data covers two periods:P1,from November2,1987June30,1997,denoting the pre-Asianfinancial crisis period(2521observations),and P2,the post-Asianfinancial crisis period,with data from June1,1998December 1,2006(2220observations).Recall that the on-set of the Asianfinancial crisis started with a15%–20%devaluation of Thailand’s Baht which took place on July2,1997.This was subsequently followed by devaluations of the Philippine Peso,the Malaysian Ringgit,the Indonesian Rupiah,and the Singaporean Dollar.In addition,the currencies of South Korea and Taiwan suffered.Further in October,1997the Hong Kong stock market collapsed with a40%loss.In January1998,the currencies of most South-East Asian countries regained parts of the earlier losses.The data are taken from‘DataStream’.Table1presents some basic information about the eleven stock markets.Among thefive emerging markets,the Taiwan(TAW)stock market is the largest in terms of market capitalization,followed by Malaysia(MAL)and India (IND).By contrast Sri Lanka(SL)is a relatively small market.Also,in terms of listed companies,the Sri Lankan market is the smallest.As can be seen from the listed trading values,the Asianfinancial crisis is clearly visible with a drop in the1998figures.The six developed stock markets have been deregulated and liberalized for quite a very significant period of time. For the emerging markets,Bekaert et al.[6]dates India’s integration into the world equity market as1992.Malaysia, Taiwan and South Korea,however,are still deregulating and liberalizing their markets,a process which began in the late1980s.Sri Lanka’s stock market(Colombo Stock Exchange(CSE)),on the other hand,underwent a rapid increase in foreign investment following liberalization in1989.According to Ariff and Khalid[1]and Elyasiani et al.[16]the CSE was one of the best performing markets in the1989–1994period,with a15-fold increase in annual turnover and an eight-fold increase in market capitalization.The trading hours of the eleven stock exchanges are not perfectly synchronized,though there are several overlapping hours in each trading day for the developed markets.But,within the group of emerging markets,the trading activity is to a large extent concurrent.Nevertheless,the differences in closing times could cause sequential2550J.G.De Gooijer,S.Sivarajasingham/Physica A387(2008)2547–2560Table1Some selected stock market indicatorsCountry19971998199920002001200220032004 Number of listed companiesGER7007419331022749715684660HNG67169371777965796810271086JAP a23872416247025612471c305831163220 SNG303321355418386434456489UK21572087194519041923240523112486US88518450765175246355568552955231IND58435860586359375795565056444730SL239233239239238238244245MAL708736757795809865897962SK11351079117813081409151815631573 TAW404437462531584638689697 Market capitalization as a%of GDPGER39.151.068.168.157.834.844.243.6 HNG241.7211.2385.1383.4310.8289.5460.7528.5 JAP a51.463.3101.266.354.153.570.979.6SNG112.5114.9240.1164.8138.3115.4158.0160.6UK91.4166.7201.2180.3151.3119.2136.8132.6US137.0154.3180.7154.0137.9106.4130.2139.4IND30.524.541.532.423.125.746.556.1SL13.910.910.1 6.68.510.214.918.2MAL93.4136.0184.0130.4136.4130.0162.0160.6SK9.737.897.437.545.745.654.263.1 TAW101.699.8130.679.8104.190.4132.4144.5 Trading value($USm)GER535,745761,888814,7401,069,1201,419,5791,233,0561,147,2091,406,055 HNG489,365205,918247,428377,866196,361210,662331,615438,966 JAP b1,251,750948,5221,849,2282,693,8561,826,2301,573,2792,272,9893,430,420 SNG63,95450,73597,98591,48563,38556,12987,86481,314 UK829,1311,167,3821,377,8591,835,2781,861,1311,909,7162,211,5333,707,191 US10,216,07413,148,48018,574,10031,862,48529,040,73925,371,27015,547,43119,354,899 IND158,301148,239278,828509,812249,298197,118284,80239,085SL309281209144153318769582MAL153,29229,88948,51258,50020,77227,62350,13559,878SK172,019145,572825,8281,067,999703,960792,165682,706638,891 TAW1,291,524890,491910,016983,491544,808631,931592,012718,619 Source:Standard&Poors[45].a From2002figures include data from both the Tokyo Stock Exchange and JASDAQ.b Figures include data from the Tokyo Stock Exchange and Osaka Stock Exchange.From2002,JASDAQfigures are included.c Indicates a break in the series.price responses to common information that could be mistaken for causal linkages.Intra-day market data may be used to distangle these sequential responses from causal transmissions within a particular day.Regrettably these data are not available for the emerging markets under study.In thefirst part of the study,we analyze daily returns R t=ln P t−ln P t−1,where P t is the closing price of an index on day t.In the second part,we focus on the squared and unsquared residuals from linear V AR modelsfitted to{R t}. The squared residuals may be considered as a useful proxy of volatility.Application of three augmented Dickey–Fuller tests(no trend and no intercept,intercept,and trend plus intercept)indicated that the series{R t}are integrated of order zero,i.e.stationary,with p-values less than0.01.The appropriate lag lengths for the tests were selected by minimizing AIC.The Jarque–Bera test for normality indicated that the returns are not normally distributed.Initial exploratory analysis of the sample cross-correlation matrix at lag0(contemporaneous correlation)indicated that almost all series are positively correlated,and significantly different from zero at the1%level,for the series{R t}, and for both time periods.Very low(insignificant)cross-correlations at lag0were obtained for the pairs SL–JAP,J.G.De Gooijer,S.Sivarajasingham/Physica A387(2008)2547–25602551 SL–TAW,SL–UK,and SL–US.Significant sample cross-correlations at lag1were noted for the UK and US stock markets having uni-directional links with almost all other stock markets.However,it is well-known that correlations cannot fully capture the long-term dynamic linkages between the markets in a reliable way.Hence,these results should be interpreted with caution.Consequently,they are not included in the paper.Indeed,what is needed is a long-term causality analysis between the markets.Empirical experience indicates that it is rather hard to absorb simultaneously large tables with numbers.To overcome this difficulty,and to save space,we use the following simplifying notation:“∗∗”means that the corresponding p-value of a particular test(causality and nonlinearity)is smaller than1%;“∗”means that the corresponding p-value of a test is in the range1%–5%;and“–”denotes that the corresponding p-value of a test is larger than5%.1Bi-and uni-directional causalities will be denoted by the functional representations↔and→, respectively.4.Testing methodology4.1.A multivariate test of nonlinearityLet Y t=(Y1,t,...,Y k,t)(t=1,...,n)denote a stationary k-variate time series of length n.A multivariate nonlinear model can be expressed asY i,t=µi+f i(ε1,t−1,ε2,t−1,...)+εi,t,where{εi,t}are serially uncorrelated,but may be cross-correlated at lag zero,and identically distributed random variables,and f i(·)are measurable real-valued functions.In general,f i(·)(i=1,...,k)can be represented by a discrete-time V olterra series of the formf i(ε1,t−1,ε2,t−1,...)=ks=1∞l=1b i,slεs,t−l+ks,u=1∞l,m=1b i,sulmεs,t−lεu,t−m+ks,u,v=1∞l,m,r=1b i,suvlmrεs,t−lεu,t−mεv,t−r+ (1)This is a generalization of the V olterra representation of a nonlinear stationary univariate time series.A k-variate linear process results if all the coefficients of the second and higher-order terms in(1)equal zero.In practice the upper and lower limits in the summations are replaced byfinite numbers.For convenience,assume that each component of Y t has mean zero.Then the idea for the test is that if a vector time series process is nonlinear,this structure will be reflected in the residuals of afitted linear p th-order V AR model. The test procedure consists of the following steps.1.Fit a V AR(p)model to Y t by regressing Y t on the pk×1vector Z t=(Y1,t−1,...,Y k,t−1,...,Y1,t−p,...,Y k,t−p).Compute the k×1vector offitted valuesˆY t,(t=p+1,...,n),the k×1vector of residuals e t=Y t−ˆY t,and the corresponding k×k matrix SS R1of sum of squared and cross-product terms for the regression.pute a k×1vector of squares offitted values,say X t,from the k-variate AR(p)regressions in step(1).Removethe linear dependence of X t on Z t by a second k-variate AR(p)regression of X t on Z t.Obtain the k×1vector of fitted valuesˆX t,and the k×1vector of residuals u t=X t−ˆX t.3.Regress the vector of residuals e t from step(1)on the vector of residuals u t from step(2).pute the corresponding k×k sum of squared regressions matrix,SS R2,and sum of squared errors matrix,SSE2.Let SS R2|1=SS R2−SS R1,i.e.SS R2|1is the extra sum of squares due to the addition of the second-order term to the model.pute the F-statisticˆF:ˆF= n−p−pk−kk1−Λ1/2Λ1/2,(2)1The actual p-values of all tests used in the paper are available upon request.2552J.G.De Gooijer,S.Sivarajasingham /Physica A 387(2008)2547–2560where Λ=|SSE 2|/|SS R 2|1+SSE 2|is Wilks’lambda statistic.Under the assumption that Y t follows a zero-meanGaussian V AR(p )process,and if the sample size n is large,ˆFfollows approximately an F ν1,ν2distribution with degrees of freedom ν1=k and ν2=(n −p )−pk −k .2Simulation results of Harvill and Ray [24]indicate that,in general,the multivariate version of Keenan’s [30]test is more powerful than the univariate tests for at least one of the component series,in particular when the nonlinearity in one series of the vector process is due solely to terms from the other series.4.2.The nonparametric DP causality testThe general setting for a causality test is as follows.Assume {X t ,Y t ;t ≥1}are two scalar-valued strictly stationary time series.Then {X t }is a strictly Granger-cause of {Y t }if past and current values of X t contain additional information on future values of Y t that is not contained in the past and current Y t -values alone.More formally,let F X ,t and F Y ,t denote the information sets consisting of past observations of X t and Y t up to and including time t ,and let ‘∼’denote equivalence in distribution.Then {X t }is a Granger-cause of {Y t }if,for some k ≥1,(Y t +1,...,Y t +k )|(F X ,t ,F Y ,t )∼(Y t +1,...,Y t +k )|F Y ,t .(3)This definition is general and does not involve model assumptions.In practice one often assumes k =1,i.e.testing for Granger noncausality comes down to comparing the one-step-ahead conditional distribution of {Y t }with and without past and current observed values of {X t },which will also be the case considered here.Note the testing framework introduced above concerns conditional distributions given an infinite number of past observations.In practice,however,tests are usually confined to finite orders in {X t }and {Y t }.To this end,define the delay vectors X X t =(X t − X +1,...,X t ),and Y Y t =(Y t − Y +1,...,Y t ),( X , Y ≥1).If past observations of X X t contain no information about future values,it follows from (3)that the null hypothesis of interest is given byH 0:Y t +1|(X X t ;Y Y t )∼Y t +1|Y Y t .(4)For a strictly stationary bivariate time series,(4)comes down to a statement about the invariant distribution of the X + Y +1-dimensional vector (X X t ,Y Y t ,Z t )where Z t =Y t +1.To simplify notation we drop the time index t .Further,it is assumed that X = Y = 1.Hence,under the null,the conditional distribution of Z given (X ,Y )=(x ,y )is the same as that of Z given Y =y .Then (4)can be restated in terms of ratios of joint distributions.Specifically,the joint probability density function f X ,Y ,Z (x ,y ,z )and its marginals must satisfy the relationship f X ,Y ,Z (x ,y ,z )/f Y (y )=(f X ,Y (x ,y )/f Y (y ))(f Y ,Z (y ,z )/f Y (y )).Thus X and Z are independent conditionally on Y =y ,for each fixed value of y .DP [13]show that this reformulated H 0impliesq ≡E [f X ,Y ,Z (X ,Y ,Z )f Y (Y )−f X ,Y (X ,Y )f Y ,Z (Y ,Z )]=0.Let ˆf W (W i )denote a local density estimator of a d W -variate random vector W at W i defined by ˆf W (W i )=(2εn )−d W (n −1)−1 j ,j =i I W i j ,where I W i j =I ( W i −W j <εn )with I (·)the indicator function and εn the bandwidth,depending on the sample size n .Given this estimator,the test statistic of interest is given byT n (εn )=n −1n (n −2) i ˆf 2Y(Y i )(ˆf X ,Z |Y (X i ,Z i |Y i )−ˆf X |Y (X i |Y i )ˆf Z |Y (Z i |Y i )).(5)Suppose that εn =Cn −β(C >0,14<β<13).Then DP [13]prove that (5)satisfies√n (T n (εn )−q )S nD→N (0,1),(6)where D→denotes convergence in distribution,and S n is an estimator of the asymptotic variance of T n (·)as discussed in detail by DP [13,Appendix A ].32FORTRAN-code for computing ˆF can be obtained from the first author.3C-code for computing the T n(·)test statistic can be downloaded from:http://www1.fee.uva.nl/cendef/upload /15/GCTtest.zip .J.G.De Gooijer,S.Sivarajasingham/Physica A387(2008)2547–25602553 Table2Multivariate nonlinearity test results(above diagonal)and orders offitted V AR(p)models(below diagonal):Periods P1(pre-Asianfinancial crisis) and P2(post-Asianfinancial crisis)GER HNG JAP SNG UK US IND MAL SK SL TAW P1P2P1P2P1P2P1P2P1P2P1P2P1P2P1P2P1P2P1P2P1P2 GER––∗∗–∗∗∗∗∗∗∗∗∗∗∗∗∗∗–∗∗∗∗∗∗∗∗––∗∗–HNG103∗∗–∗∗–––∗–∗∗∗∗∗∗∗∗∗∗–∗–∗∗–JAP21101∗∗∗∗∗∗∗∗∗∗–∗∗∗∗∗∗∗∗––∗∗∗–SNG1410821∗∗∗∗∗∗–∗∗–∗∗∗∗∗∗–∗∗–∗∗–UK1101062518––∗–∗∗∗∗∗∗–∗–––US7375658488∗∗∗∗∗∗–∗∗∗∗∗∗∗––IND110101106111511∗∗–∗∗∗∗∗–––MAL755637510167717∗∗∗∗∗∗∗∗∗∗∗SK1355311115131115–∗∗––SL92102926283721026762–∗∗TAW14551043101624143525645.Empirical results5.1.Multivariate nonlinearity testBefore investigating specific linear and nonlinear causal relationships between pairs of bivariate time series,it is desirable to test for the general vector nonlinear ing linear V AR(p)models having p=0,1,...,10,the AIC model selection criterion was used to select the“best”modelfitted to pairs of time series.The optimal orders p are listed in Table2below the main diagonal.Applying the multivariate Keenan test of Section4,a summary of information about the results of the nonlinearity test statistic is presented above the main diagonal.Some observations are in order.The Asianfinancial crisis has an obvious impact on the orders of the selected V AR(p)models.The total number of pairs of series having the same V AR orders in both periods is8(14.5%)while in47cases(85.5%)there has been a change between the orders p selected for period P1and period P2.Within the set of developed(emerging)markets these percentages are6%(10%)and94%(90%),respectively.Nonlinearity,at the1%level,is detected for35(64%)pairs of indices in period P1,and for23(42%)pairs in period P2.No evidence of general nonlinear structure,at the1%level,in the data is indicated for12(22%)pairs in period P1,and for29(53%)pairs in period P2.Clearly,this is an overall view of the data.But these percentages suggest that there is weak evidence that the number of paired nonlinearities have changed between both periods.At a more detailed level,however,there appear to be more differences between both time periods.In particular,we see that for17pairs(31%)the simplifying notation for the significance of the multivariate nonlinearity test changed from“∗∗”to“–”,and for only3(5%)pairs it changed from“–”to“∗∗”.Among the whole set of nonlinearity test results,those presented for Malaysia are interestingly different.Indeed, there is evidence that this market has significant(∗∗)nonlinear relationships with almost all other markets.Moreover, these relationships remain fairly persistent across both time periods.Broadly,the significant nonlinearity test results reported in Table2motivate the search for causal and persistent nonlinear linkages between pairs of stock indices by a nonparametric test.Nevertheless these results should be interpreted with caution since high kurtosis in the returns may affect the null distribution of the test.5.2.Parametric and nonparametric causality tests:No pre-filteringFor each of the eleven series{R t}pairwise causality testing was carried out using the Wald variant of Granger’s test.Simulation results provided by Geweke et al.[19]show that this type of Granger causality test has a number of advantages over eight alternative tests of causality.In each case,the optimal lag orders reported in Table2are used for the unrestricted V AR model.It is well-known that the results of the Granger causality test are sensitive to the choice of the lag length,even when a sophisticated search routine for the lag length has been implemented in the process of finding the best V AR specification.Therefore it is safe to restrict the discussion below to causality results obtained at。
Linguistics is a scientific study of language .语言学是对语言进行的科学研究。
General linguistics is the study of language as a whole.普通语言学是对语言从整体上进行的研究the major branches of linguistics:语言学内部主要分支Phonetics:the study of the sounds used in linguistic communication..(语音学)对语言交流中语音的研究Phonology the study of how sounds are put together and used to convey meaning in communication. (音位学)如何组合在一起并在交流中形传达意义.Morphology:the study of the way in which morphemes are arranged to form words (词法学、形态学)如何排列以及组合起来构成词语Syntax:the study of those rules that govern the combination of words to form permissible sentences (句法学)如何在组成语法上可接受的句子Semantics(语义学) the study of meaning in abstraction语言是用来传达意义的。
Pragmatics(语用学) the study of meaning in context of use用来研究上下文的意义跨学科分支Sociolinguistics is the study of the relationship between language and society.社会语言学是语言和社会之间关系的研究Psycholinguistics is the study of the relationship between language and the mind.心理语言学是语言与心灵的关系的研究Applied linguistics is the study of the teaching of foreign and second languages.应用语言学是外国和第二语言教学的研究Some important distinctions in linguistic s语言学中一些基本区分1. Descriptive or PrescriptiveA linguistic study is descriptive if it describes and analyses facts observed; it is prescriptive if it tries to lay down rules for "correct" behavior.描述性是在描述和分析人们对语言的实际运用,规定性是在为语言“正确和规范的”使用确立规则。
中央财经大学——金融学院-刘向丽教育背景1996年毕业于山西大学数学系(数学专业),获理学学士学位1999年毕业于北京交通大学应用数学系(应用数学专业),获理学硕士学位2008年毕业于中国科学院研究生院管理学院(管理科学与工程专业),获管理学博士学位“考金融,选凯程”!凯程2014中财金融硕士保录班录取8人,专业课考点全部命中,凯程在金融硕士方面具有独到优势,全日制封闭式高三集训,并且在金融硕士方面有独家讲义\独家课程\独家师资\独家复试资源,确保学生录取.其中8人中有4人是二本学生,1人是三本学生,3人是一本学生,金融硕士只要进行远程+集训,一定可以取得成功.工作经历1999 年 4 月至 2009 年 6 月,北京信息科技大学理学院2009 年 6 月至今,中央财经大学金融学院金融工程系研究领域和专长金融工程、计量分析、市场微观结构讲授课程运筹学金融工程金融衍生品学术论文1 . An empirical study on information spillovers between chinese copper futures market and spot market. Physica A: Statistical Mechanics and its Applications. 2008,387/4 : 899-914. (SCI , SSCI)2.The comparison of china's futures market var estimation using parametric approach, semiparametric approach and nonparametric approach. The fourth international conference on risk management and the fifth international conference on financial system engineering, Tianjin, China, 2007, 10.3.Novel soliton-like and multi-solitary wave solutions of (3+1)-dimensional Burgers equation, Applied Mathematics and Computation, 2008, 204/1: 461-467. (SCI)(第二作者)4. 期现货市场间信息溢出效应研究,管理科学学报 . 2008, 3: 125-139.5. 参数法、半参数法和非参数法计算我国铜期货市场 VaR 之比较,管理评论, 2008 ,7 : 3-8.6. 中国期货市场日内效应分析 . 系统工程理论与实践, 2008, 8: 63-80. ( EI )7. 基于 ACD 模型的中国期货市场波动聚类特征研究 . (已被管理科学学报接收)8. 基于 EXMODEL 对日元美元汇率决定的实证分析 . 系统工程理论与实践, 2009 ( EI )9.GDP 两种测算结果差异原因的实证分析,经济研究, 2007 , 7 , 51-63. (第三作者)10. 带停时的奇异型随机控制问题研究,数学的实践与认识, 2007 , 16 , 96-102.11. 一类随机优化问题的最优决策,统计与决策, 2007 , 5 , 33-34.12. 带有分红过程的比例再保险最佳控制模型之推广,山西大学学报, 2006 , 3 :249-252.13. 关于随机控制的最佳控制不存在问题,太原理工大学学报, 2006 , 6 : 706-709.14. 一类带停时的最优控制策略研究,数学的实践与认识, 2006 , 6 : 193-199.专著及教材1. 中国期货市场微观结构研究,科学出版社, 2009 年2. 复变函数与积分变换,机械工业出版社, 2009 年3. 矩阵理论与方法,电子工业出版社, 2006 年。
Fitting Piecewise Linear Continuous FunctionsAlejandro Toriello∗Daniel J.Epstein Department of Industrial and Systems EngineeringUniversity of Southern California3715McClintock Avenue,GER240Los Angeles,California90089toriello at usc dot eduJuan Pablo VielmaDepartment of Industrial Engineering1048Benedum HallUniversity of Pittsburgh3700O’Hara StreetPittsburgh,Pennsylvania,15261jvielma at pitt dot eduAugust25,2011AbstractWe consider the problem offitting a continuous piecewise linear function to afinite set of data points,modeled as a mathematical program with convex objective.We review somefitting problems that can be modeled as convex programs,and then introduce mixed-binary generalizations that allow variability in the regions defining the best-fit function’s domain.We also study the additional constraints required to impose convexity on the best-fit function.Keywords:integer programming,quadratic programming,datafit-ting/regression,piecewise linear function∗Corresponding author.Office:+1(213)740-4893.Fax:+1(213)740-1120.11IntroductionThe problem offitting a function of some prescribed form to afinite set of data points is fundamental and has been studied for hundreds of years.In one form or another,fitting has applications in areas as varied as statistics, econometrics,forecasting,computer graphics and electrical engineering.Within optimization,fitting problems are cast as convex norm mini-mization models whose properties are well known(Boyd and Vandenberghe (2004);see also Bot and Lorenz(2011);Williams(2007)).For example,the classical least-squares linearfitting problem is an unconstrained quadratic program with a closed-form solution obtained by setting the gradient of the objective equal to zero.Similarly,more complex piecewise linear and piecewise polynomialfitting models can be formulated as constrained convex programs.Continuous piecewise linear functions and their discontinuous extensions are also extensively studied within discrete optimization and mixed-integer programming(MIP),e.g.de Farias et al.(2008);Vielma et al.(2008,2010); Wilson(1998).However,most related work in thisfield concentrates on the modeling of a given piecewise linear function and of the subsequent incorporation of the function into a MIP or more general mathematical program.Nonetheless,recent work in various areas of discrete optimization motivates the issue of efficientlyfitting a continuous piecewise linear function to a set of points.In approximate dynamic programming(ADP)(Bertsekas and Tsitsiklis, 1996;Novoa and Storer,2009;Papadaki and Powell,2002;Powell,2007), solutions to a dynamic or periodic model are generated by approximating the value function of a state variable,and then optimizing single-period subproblems with respect to this approximation.When the state space is continuous,an approximate value function can be constructed by sequen-tially sampling from the state space,observing the samples’values,fitting a function to the observations,and then repeating the process with the up-dated value function.In Toriello et al.(2010),for example,a separable, piecewise linear concave value function is constructed using an algorithm of this type.In mixed-integer nonlinear programming(MINLP),recent algorithmic and software developments combine branch-and-bound frameworks common in MIP with nonlinear and global optimization methodology,e.g.Abhishek et al.(2010);Belotti et al.(2009);Bonami et al.(2008);Geißler et al.(2011); Leyffer et al.(2008).The algorithms construct and refine polyhedral ap-proximations of nonlinear,possibly non-convex constraints.Efficientfitting2models could be used inside this framework to expedite the construction of the approximations.Our main contribution is the introduction of mixed-binary models that solve continuous piecewise linear fitting problems under various conditions.We also devote attention to the additional constraints required to make the best-fit function convex.The continuous models we present encompass sim-pler cases where the function domain’s partition is predetermined,and are extensions of known models (Boyd and Vandenberghe,2004).Conversely,the mixed-binary models we introduce allow for more general fitting prob-lems where the domain partition may be partially or wholly determined along with the best-fit function.We believe the use of integer programming for this type of problem is new,and the only similar work we are aware of is Bertsimas and Shioda (2007),where the authors apply MIP modeling techniques to classification problems.The paper is organized as follows.Section 2introduces our general fitting problem,reviews related work and covers continuous models.Section 3in-troduces mixed-binary models for more complex fitting problems,where the additional difficulty comes from adding variability to the regions that define the best-fit function.Section 4has computational examples that highlight the benefit of considering variable regions.Section 5gives conclusions and indicates some directions for further research.We follow standard MIP notation and terminology as much as possi-ble;see,e.g.Nemhauser and Wolsey (1999).Our disjunctive programming notation follows Balas (1998).2Problem DefinitionSuppose we are given a finite set of data points (x i ,y i )∈R n ×R ,i =1,...,m ∈N .We are interested in the problemmin w −y q(2.1a)s.t.w i =f (x i ),∀i =1,...,m(2.1b)f ∈F ,(2.1c)where F defines a set of continuous piecewise linear functions over a common domain that contains all points x i ,and · q is the q -norm in R m .In other words,we would like the function f ∗∈F that best fits the data set according to the measure · q .For our purposes,a piecewise linear function is a continuous function f with domain P ∈P P ,where P is finite,each P ∈P is a full-dimensional3polytope,the interiors of any two P,Q ∈P are disjoint,and f is affine when restricted to any P ∈P .Although not strictly necessary,we also make the common sense assumption that for distinct P,Q ∈P ,P ∩Q is a face (possibly ∅)of both P and Q .Finally,when fitting convex functions we assume that P ∈P P is convex.Although fitting models have been studied for general values of q (Gonin and Money,1989),we focus on the cases q ∈{1,2}.For q <∞,(2.1)is equivalent tomin m i =1|f (x i )−y i |q (2.2a)s.t.f ∈F .(2.2b)In the sequel,we treat (2.2)as our generic model.However,the extension to the ∞-norm case is achieved by simply replacing the summation with a maximum over all absolute differences.From a fitting perspective,the models we study are either parametric or non-parametric.Parametric models construct a function by estimating the parameters (i.e.slopes and intercepts)that define it.Non-parametric models define function values at a predetermined set of points.The func-tion’s value for a general point is then calculated as a linear interpolation or extrapolation of the function values for a subset of the predetermined set.In MIP modeling terms,parametric fitting problems result in functions easily modeled in a multiple choice model,while non-parametric fitting prob-lems yield functions that fit in a convex combination or disagreggated convex combination model (cf.Vielma et al.(2010)).In either case,however,if the function is convex or concave,it may be possible to model it in a purely linear model.2.1Literature ReviewFitting problems of various kinds arise in many different areas,and even when restricting to the piecewise linear case,we cannot hope to provide an exhaustive list of references.We instead give examples of papers that study (2.1)and related problems from different points of view.A thorough treatment of convex optimization models used in fitting,approximation and interpolation is Chapter 6of Boyd and Vandenberghe (2004).A recent text on statistical techniques used for problems related to (2.1)is Ruppert et al.(2003).The one-dimensional case (n =1)of (2.1)has been extensively stud-ied,e.g.in econometrics and forecasting.For instance,Strikholm (2006)4attempts to determine the number of breaks in a one-dimensional piecewise linear function using statistical inference methods.The computer graphics and visualization community has concentrated on the two-dimensional case.Pottmann et al.(2000)address the issue of choosing an optimal partition of the domain of a quadratic function so the resulting implied piecewise linear interpolation’s error is below an accepted tolerance.For general n ,an im-portant case of (2.1)is when the union of polyhedra P is not predetermined and must be chosen as part of the fitting problem.Some variants of this problem are studied as extensions of classification and clustering problems,e.g.Bertsimas and Shioda (2007);Lau et al.(1999);Pardalos and Kundak-cioglu (2009).Ferrari-Trecate et al.(2001)use neural network methodology for a problem of this kind,where the best-fit piecewise linear function is allowed to be discontinuous.Holmes and Mallick (1999)enforce continuity and allow |P|to vary by assuming a probability distribution exists for all unknowns (including |P|,)and updating the distribution based on observa-tions;this approach is known as Bayesian regression .Magnani and Boyd (2009)introduce the convex piecewise linear fitting problem with undeter-mined P that we study in Section 3.2,and use a Gauss-Newton heuristic related to the k -means clustering algorithm.Bertsimas and Shioda (2007)use integer programming models for classification and discontinuous piece-wise linear fitting.This paper is the most closely related to our work.2.2Known Continuous ModelsWe first consider the case when F is the set of piecewise linear functions defined over a predetermined union of polytopes P .Let V (P )be the set of vertices of P ∈P and let V (P )= P ∈P V (P ).Also,for each v ∈V (P )let P v be the set of polytopes that contain v ,and for each i let P i ∈P be a polytope that contains x i .See Figure 1for a two-dimensional example.Since we explicitly consider each polytope P ∈P ,practical fitting prob-lems must restrict themselves to a fairly low dimension n ,especially because |P|tends to depend exponentially on n .In a parametric model,any f ∈F can be given byf (x )=c P x +d P ,for x ∈P ,(2.3)where (c P ,d P )∈R n +1,∀P ∈P ,c P x denotes the inner product between c P ,x ∈R n ,and the requirement that f is continuous impliesc P x +d P =c Q x +d Q ,∀x ∈P ∩Q .(2.4)5v 2v 1v 45Figure 1:Domain example with n = 2.In this example,V (P 2)={v 1,v 3,v 4},P v 2={P 1,P 3}and P i =P 4.Then (2.2)becomesmin mi =1|c P i x i +d P i −y i |q (2.5a)s.t.c P v +d P =c Q v +d Q ,∀P,Q ∈P v ,∀v ∈V (P )(2.5b)c P ∈R n ,d P ∈R ,∀P ∈P .(2.5c)Note that constraints (2.5b)enforce the continuity equation (2.4)by requir-ing that f (x )be continuous at every vertex v ∈V (P ).To restrict F to convex functions,we make use of the following result.Proposition 2.1.(Carnicer and Floater,1996,Proposition 2.4)Any f ∈F is convex if and only if its restriction to any two polytopes from P that share a facet is convex.Let W (P )be the set of facets of P that are also facets of another polytope in P ,and let W (P )= P ∈P W (P ).For any facet ω∈W (P ),let P ω,Q ω∈P be the unique pair of polytopes that satisfies P ω∩Q ω=ω.Choose r ω∈ωand δω∈R n \{0}to satisfy r ω+δω∈P ω\ωand r ω−δω∈Q ω\ω.To restrict problem (2.5)to convex functions,we add the constraints12(c P ω(r ω+δω)+d P ω+c Q ω(r ω−δω)+d Q ω)≥c P ωr ω+d P ω,∀ω∈W (P ).(2.6)6v 0v 2v=v +2v −2v v 2−3v 0Figure 2:Non-parametric example in two dimensions with V +(P )={v 0,v 1,v 2}.Even though P has five vertices,only three are necessary to ex-press any point as an affine combination.For this example,λP,v 1x =λP,v 2x =1and λP,v 0x =−1.Observe that r ωis the midpoint of r ω+δωand r ω−δω,so constraint (2.6)simply requires the midpoint convexity of f along the segment [r ω−δω,r ω+δω],implying convexity because the defining functions are affine.For the non-parametric case,let V +(P )for each P ∈P be a set of n +1affinely independent vertices of P .For any x ∈P ,let λP,v x ,v ∈V +(P )bethe set of barycentric coordinates (Rockafellar,1970)of x with respect to V +(P );that is,the unique set of affine multipliers that expresses x as an affine combination of the elements of V +(P ).(Figure 2shows an example for n =2.)Then any f ∈F can be expressed asf (x )= v ∈V +(P )λP,v x f v ,for x ∈P ,(2.7)where f v =f (v ),∀v ∈V (P ).For each x i ,define λP i ,v i ,v ∈V +(P i )analogously to λP,v x .The model then becomesmin m i =1 v ∈V +(P i )λP i ,v i f v −y i q (2.8a)s.t.f v = u ∈V +(P )λP,u v f u ,∀P ∈P v ,∀v ∈V (P )(2.8b)f v ∈R ,∀v ∈V (P ).(2.8c)7Constraints(2.8b)are the non-parametric analogues of constraints(2.5b) from the previous model.They are necessary because function values within each P are expressed as affine combinations of the function values from V+(P),and vertex function values must match from one polytope to each adjacent one.However,if P is a triangulation or its higher-dimensional generalization,then V+(P)=V(P),∀P∈P.In this case,constraints (2.8b)are implicitly satisfied,which means they can be removed and(2.8) goes back to a classical unconstrained linearfitting problem.To enforce convexity of the best-fit function,we again use Proposition2.1.For everyω∈W(P),letλPω,vω+,λPω,vω,∀v∈V+(Pω)andλQω,vω−,∀v∈V+(Qω)respectively define the sets of affine multipliers for rω+δω,rωand rω−δω.The convexity enforcing constraints are1 2v∈V+(Pω)λPω,vω+f v+v∈V+(Qω)λQω,vω−f v≥v∈V+(Pω)λPω,vωf v,∀ω∈W(P).(2.9)As in the parametric case,these constraints simply enforce midpoint con-vexity along the segment between the points rω±δω.Note that even though both the parametric and non-parametric models have equally many decision variables and constraints,the non-parametric model requires substantially more pre-processing,because the affine multi-pliersλmust be calculated for every vertex v∈V(P)and every x i.For both models,the number of decision variables isΘ(n|P|),and the number of constraints isΘ(|V(P)|max v|P v|).3Mixed-Binary Models for Fitting over Variable Regions3.1Adding Variability to the Regions in Two Dimensions We next generalize Section2.2’s non-parametric model to include some vari-ability in P.We concentrate on the two-dimensional case,which is already of significant interest,although the model may in theory be extended to any dimension.A similar but more general problem was studied by Pottmann et al.(2000)to approximate two-dimensional quadratic functions.Following our previously introduced notation,let(x i,y i)∈R2×R,∀i= 1,...,m,and assume in addition that y i∈[0,U],∀i.This assumption8can be made without loss of generality by adding an appropriate positive constant to every y i .Let 0<b 1j <···<b p j for p ∈N ,j =1,2withb pj >x i j ,∀i,j be a partition of the function domain into p 2rectangles.(The case when the number of breakpoints varies by dimension is a simple extension.)Let R =[b k 1−11,b k 11]×[b k 2−12,b k 22]be any rectangle in the grid,and let v 0,v 1,v 2and v 3be its four vertices (see Figure 3.)R may be triangulated in one of two ways:If we choose to divide R along segment v 1v 2,we obtain triangles v 0v 1v 2and v 1v 2v 3,whereas if we choose segment v 0v 3,we obtain triangles v 0v 1v 3and v 0v 2v 3.Instead of fixing the triangulation beforehand (as we would do in Section 2.2),our next model adds the choice of two possible triangulations for each rectangle.The partition P is then one of the 2p 2possible triangulations of the domain.v 0v 1v 3v2Figure 3:Two possible triangulations of rectangle R .Let x ∈R and assume x lies both in v 0v 1v 2and in v 0v 1v 3.It may therefore be expressed as a convex combination of the vertices of either triangle.Let λ0,v x ,v ∈{v 0,...,v 3}be the convex multipliers for x with respect to v 0v 1v 2(with λ0,v 3x =0,)and define λ1,v x analogously with respect to v 0v 1v 3.Letting f v =f (v ),we have the natural disjunctionf (x )=λ0,v 0x f v 0+λ0,v 1x f v 1+λ0,v 2x f v 2∨f (x )=λ1,v 0x f v 0+λ1,v 1x f v 1+λ1,v 3x f v 3,expressed more generically asf (x )= v ∈{v 0,...,v 3}λ0,v x f v∨f (x )= v ∈{v 0,...,v 3}λ1,v x f v .(3.1)Similarly,for every x i ∈R ,let λ0,v i ,∀v ∈{v 0,...,v 3}be the convexmultipliers of x i when we triangulate R along v 1v 2,and define λ1,v i similarlyfor the triangulation along v 0v 3.Letting f i =f (x i ),we have the aggregate9disjunctionf i=v∈{v0,...,v3}λ0,vif v∨f i=v∈{v0, (v3)λ1,vif v,∀x i∈R,(3.2)where the f i and f v’s are variables and theλ’s are constants.Unfortunately, a disjunction of this type between two affine subspaces cannot be modeled with standard MIP disjunction modeling techniques unless we can bound the variables(Balas,1998;Jeroslow and Lowe,1984).Therefore,we make the additional assumption that f v∈[0,U],for every vertex v=(b k11,b k22)in the grid.In principle,this assumption can be made without loss of generality, although care must be taken in choosing the bounds(that is,choosing the positive constant to add to the y i’s and then choosing the subsequent upper bound U>0.)Note that even if y i≥0,∀i,it is quite possible that the optimal value of some f v could be negative.Similarly,it is possible for the optimal value of some f v to be greater than max i{y i}.Let R be the set of rectangles defined by the grid breakpoints b.For each R∈R,let V(R)be the four vertices that define it,and let V(R)=R∈R V(R).For each v∈V(R),let R v⊆R be the set of rectangles thatcontain v.Finally,for each i let R i be a rectangle that contains x i.Then (2.2)is given byminmi=1|f i−y i|q(3.3a)s.t.f i=v∈V(R i)(λ0,vif0,R iv+λ1,vif1,R iv),∀i=1,...,m(3.3b)f0,Rv≤U(1−z R),∀v∈V(R),∀R∈R(3.3c)f1,Rv≤Uz R,∀v∈V(R),∀R∈R(3.3d)f0,R v +f1,Rv=f0,Sv+f1,Sv,∀R,S∈R v,∀v∈V(R)(3.3e)z R∈{0,1},∀R∈R(3.3f)f0,R v ,f1,Rv∈[0,U],∀R∈R v,∀v∈V(R)(3.3g)f i∈[0,U],∀i=1,...,m.(3.3h) In this model,the binary variable z R represents the triangulation choice for rectangle R,and the continuous variable f k,Rv,k=0,1represents the value f(v)under triangulation k in rectangle R.The variable upper boundconstraints(3.3c)and(3.3d)allow exactly one value of f k,Rv per rectangle to be positive,and the constraints(3.3e)make sure values of f(v)match10Figure 4:Enforcing convexity across shared edges.The midpoint convexity constraint for points A ,B and C concerns the egde e .from one rectangle to each intersecting one.Constraints (3.3b)along with variable upper bounds (3.3c)and (3.3d)establish disjunction (3.2).To enforce convexity in this model,we use a generalization of constraints (2.9)and exploit the grid’s added structure.We must consider two classes of constraints.The first concerns the edges of the grid R .Let E (R )be the set of edges shared by two rectangles from R ,and for each e ∈E (R )let R e ,S e ∈R be the unique pair of rectangles that satisfy R e ∩S e =e .Choose r e ∈e and δe ∈R 2\{0}so that r e +δe ∈R e lies in the intersection of the two triangles in R e that contain e ,and r e −δe ∈S e satisfies the analogous condition in S e .As an example,in Figure 4we have B =r e ,A =r e +δe ,and C =r e −δe .Extending the previous section’s notation,letλk,R e ,v e,∀v ∈V (R e ),k =0,1be the set of convex multipliers for r e in R e under triangulation k ,and define λk,R e ,v e +and λk,S e ,v e −analogously for r e ±δe .The constraints are then 12 v ∈V (R e )(λ0,R e ,v e +f 0,R e v +λ1,R e ,v e +f 1,R e v )+ v ∈V (S e )(λ0,S e ,v e −f 0,S e v +λ1,S e ,v e −f 1,S e v ) ≥ v ∈V (R e )(λ0,R e ,v ef 0,R e v +λ1,R e ,v e f 1,R e v ),∀e ∈E (R ).(3.4)The second type of convexity enforcing constraint concerns each individ-ual rectangle R ∈R ,and in this case convexity can be enforced directly on the f v variables,without multipliers.Referring again to Figure 3,a trian-gulation along v 1v 2means that convexity is enforced if f v 0+f v 3≥f v 1+f v 2,since the function value of the rectangle’s center would be the average of f v 1and f v 2.Similarly,the opposite triangulation yields the constraint11f v 1+f v 2≥f v 0+f v 3(see Figure 5.)Generalizing to (3.3),we getf 0,R v R 0+f 0,R v R 3≥f 0,R v R 1+f 0,R v R 2f 1,R v R 0+f 1,R v R 3≤f 1,R v R 1+f 1,R v R 2,∀R ∈R ,(3.5)where we use the notation v R 0,v R 1,v R 2,v R 3to identify the four vertices of rectangle R according to Figure 3.Figure 5:Enforcing convexity under two different triangulations.In the left-hand side triangulation,the sum of the top-right and bottom-left function values must be at least the sum of the top-left and bottom-right function values.The converse holds in the right-hand side triangulation.Proposition 3.1.Constraints (3.4)and (3.5)restrict the feasible region of (3.3)to convex functions.Proof.Constraints (3.4)enforce midpoint convexity under any of the four triangulation possibilities of R e and S e .Note that λ0,R e ,v e=λ1,R e ,v e ,∀v ∈V (R e ),because r e only has positive convex multipliers for e ’s two endpoints,and these multipliers are equal under either triangulation.For (3.5),only one of the two constraints applies to every R ,but the variable upper bounds (3.3c)and (3.3d)force the inactive triangulation’s variables to zero,thus satisfying that constraint trivially.Because we have fixed n =2,the model (3.3)has Θ(|R|)=Θ(p 2)continuous variables,as many binary variables,and Θ(m +|R|)=Θ(m +p 2)constraints.However,if we were to generalize the model for any n ,the constants that multiply all of these quantities would grow exponentially with n .123.1.1A Swapping HeuristicAlgorithm1presents a swapping local search heuristic tofind a solution to (3.3).The heuristic is based on a simple swapping idea used,for example,in calculating Delaunay triangulations or convex interpolants(cf.Carnicer and Floater(1996).)In the heuristic,a swap of R∈R simply means replacing Algorithm1Swapping heuristic for(3.3)randomly choose a triangulation of Rsolve(2.8)over thisfixed triangulationwhile swapping some R∈R and re-solving improves the objective in (2.8)doupdate the triangulation by swapping Rend whilereturn the last triangulation and the corresponding solution to(2.8) the existing triangulation of R by the opposite one.The triangulation swap changes the convex multipliers that interpolate each data point x i∈R(i.e. theλi values in(3.3),)and thus also changes the objective(3.3a).Since the swaps always improve the objective value,the heuristic is guaranteed to terminate infinitely many steps.Although we do not consider it here,the heuristic can also be generalized to a k-opt local search by simultaneously considering k rectangles for swapping.3.2Convex Fitting over Variable RegionsIn this section we examine the parametric model that occurs when F is the set of convex piecewise linear functions defined as the maximum of p∈N affine functions;this problem wasfirst studied by Magnani and Boyd(2009). Each function f∈F has the form{c k x+d k},(3.6)f(x)=maxk=1,...,pwhere(c k,d k)∈R n+1,∀k=1,...,p.(Figure6has an example with n=2.) The partition P is implicitly defined by f,as each member polyhedron is given byP k={x∈R n:c k x+d k≥c x+d ,∀ =k},∀k=1,...,p.(3.7) Note that not all polyhedra P k thus defined are bounded,but we can always add bounds if necessary.Thefitting problem(2.2)over functions(3.6)yields13x 2x1Figure 6:Bivariate convex piecewise linear function f (x )=max {1,x 1,x 2}.the non-convex optimization modelminm i =1 max k =1,...,p {c k x i +d k }−y i q s.t.c ∈R n ×p ,d ∈R p .(3.8)Let x ∈R n +;for given c k ∈R n ,d k ∈R ,we cannot model the non-convexrelation (3.6)directly in a MIP.However,we can model f (x )disjunctively asf (x )≥c k x +d k ,∀k =1,...,p(3.9)p k =1{f (x )≤c k x +d k }.(3.10)As in the previous section,the disjunction (3.10)cannot be modeled as stated with MIP disjunctive techniques,because the polyhedra have different recession cones (Balas,1998;Jeroslow and Lowe,1984).However,if we let M >0be an appropriately large number,we can use a big-M approach (see also Bertsimas and Shioda (2007)).Let z k ∈{0,1},∀k be a binary variable that indicates which affine function is maximal at x .Then (3.10)can be expressed asf (x )≤c k x +d k +M (1−z k ),∀k =1,...,p(3.11a)pk =1z k =1.(3.11b)14Let f i=f(x i),and define z k i∈{0,1}analogously to z k with respect to x i.Then(2.2)becomesminmi=1|f i−y i|q(3.12a)s.t.f i≥c k x i+d k,∀k=1,...p,∀i=1,...,m(3.12b)f i≤c k x i+d k+M(1−z k i),∀k=1,...p,∀i=1,...,m(3.12c)pk=1z k i=1,∀i=1,...,m(3.12d) z∈{0,1}m×p(3.12e)c∈R n×p,d∈R p(3.12f)f i∈R,∀i=1,...,m.(3.12g)Any permutation of the k-indices of a feasible solution(c,d)to(3.8) would yield another feasible solution of equal objective,because the maxi-mum operator is invariant under a permutation.This leads to substantial symmetry in the formulation(3.12)and potential increases in computation time.The following result addresses this issue.Proposition3.2.There is an optimal solution of(3.8)that satisfies the constraintsc11≤···≤c p1.(3.13) Proof.Let(˜c,˜d)be any optimal solution of(3.8).Letπ:{1,...,p}→{1,...,p}be a permutation satisfying˜cπ(1)1≤···≤˜cπ(p)1.Definec k∗j=˜cπ(k)j,∀j=1,...,n,∀k=1,...,pd∗k=˜dπ(k),∀k=1,...,p.In words,(c∗,d∗)permutes the k-indices of(˜c,˜d)to order the resulting solution by thefirst coordinate of the c variables.Since the maximum operator is invariant under permutations,(c∗,d∗)is also optimal for(3.8).By adding constraints(3.13)to(3.12),we impose an ordering on the feasible region and remove solution symmetry.15The LP relaxations of mixed-integer models with big-M constraints are notoriously weak.The root cause of the problem is the definition of the maximum operator and the use of constraints (3.11).However,as we have pointed out,the big-M technique is in some way inescapable if we wish to use MIP methodology.We attempt to mitigate the weakness of the LP relaxation with the next set of constraints,for which we first establish a technical result.Lemma 3.3.For each k =1,...,p ,define the setS k ={f ∈R ,g ∈R p +:f =g k ;f ≥g ,∀ =k },and define also the setS = f ∈R ,g ∈R p +:f ≤p k =1g k ;f ≥g k ,∀k =1,...,p .Then S =conv k S k .Proof.Both S and S k ,∀k are pointed polyhedral cones contained in the positive orthant.Let e k ∈R p be the k -th unit vector.For each S k ,a complete set of extreme rays is (f,g )= 1,e k + ∈Te ,∀T ⊆{1,...,p }\{k }.This can be verified by noting that at any extreme ray,exactly one of the pair of constraints f ≥g and g ≥0can be binding for each =k .(Both are binding only at the origin.)Similarly,a complete set of extreme rays of S is (f,g )= 1, k ∈Te k ,∀T ⊆{1,...,p },T =∅.As in the previous case,at any extreme ray exactly one of f ≥g k and g k ≥0can be binding for each k .The constraint f ≤ k g k ensures that at least one of the former is always binding.The union over all k of the sets of extreme rays of S k gives the set of rays for S ,which proves the result.The lemma gives a polyhedral description of the convex hull of a union of polyhedra with differing recession cones (see also Theorem 1in Queyranne and Wang (1992)).Using g k =c k x i +d k ,the next result applies our lemma to (3.12).16Proposition3.4.Suppose some optimal solution of(3.8)satisfiesc k x i+d k≥0,∀k=1,...,p,∀i=1,...,m.(3.14) Then the following constraints are valid for(3.12):f i≤pk=1c k x i+d k,∀i=1,...,m(3.15)Constraints(3.14)and(3.15)may be added to(3.12)without loss of generality if we add a large positive constant to each y i,thus guaranteeing that(3.14)holds for some optimal solution.Model(3.12)hasΘ(mp)binary variables,Θ(m+np)continuous vari-ables,andΘ(mp)constraints.Because of the relatively large number of binary variables,the model may only be computationally tractable for small-to-medium data sets.This computational difficulty would be especially apparent when q=2and(3.12)becomes a MIP with convex quadratic objective.3.2.1Concave Data and Heuristic PerformanceMagnani and Boyd(2009)developed a Gauss-Newton heuristic for(3.8). For convenience,we reproduce it here in Algorithm2with our notation. The heuristic is fast and oftenfinds a high-quality solution if repeated from several different random initial partitions;the interested reader may consult the same article for details.Algorithm2Clustering Gauss-Newton heuristic for(3.8)randomly partition{x}i=1into p non-empty sets with pairwise non-intersecting convex hullsrepeatfor k=1,...,p dosolve linearfitting problem for points in set k to obtain best-fit affine function(c k,d k)end forset f(x)←max k{c k x+d k}partition{x i}m i=1into p sets according to(3.7),breaking ties arbitrarily until partition is unchangedreturn fHowever,as the authors point out,the heuristic may cycle indefinitely even with a tiny data set.In addition,if the heuristic terminates or is cut17。
1,总体总体是根据研究目的确定的同质的研究对象的全体。
更确切地说,是性质相同的所有观察单位某种变量值的集合。
2,样本医学研究中实际观测或调查的一部分个体称为样本3,参数用来描述总体特征的指标叫做参数。
4,统计量5,抽样误差抽样误差是指由于随机抽样的偶然周素使样本各单位的结构对总体各单位结构的代表性差别,而引起的抽样指标和全及指标之间的绝对离差。
如抽样平均数与总体平均数的绝对离差,抽样成数与总体成数的绝对离差等等。
6,概率7,小概率事件8,定量资料定量资料是以数字形式表现出来的研究资料。
9,定性资料定性资料是以文字、图形、录音、录象等非数字形式表现出来的研究资料。
定性资料有两个来源——实地源和文献源。
10,正态分布指变量的频数或频率呈中间最多,两端逐渐对称地减少,表现为钟形的一种概率分布11,正态曲线高峰位于中央(均数所在处),两侧逐渐降低且左右对称,不与横轴相交的光滑曲线图3.1(3)。
这条曲线称为频数曲线或频率曲线,近似于数学上的正态分布12,医学参考值范围医学参考值范围是指绝大多数正常人的人体形态、功能和代谢产物等各种生理及生化指标的波动范围。
这里的“绝大多数”可以是90%、95%或99%等,最常用的是95%。
所谓“正常人”不是指完全健康的人,而是指排除了影响所研究指标的疾病和有关因素的同质人群。
对于服从正态分布的指标,其参考值范围可根据正态分布曲线下面积分布规律确定;对于不服从正态分布的指标,可先进行变量变换使之服从正态分布或直接利用百分位数法制定医学参考值范围。
制定某指标的医学参考值范围时,应根据专业知识确定计算双侧参考值范围或单侧参考值范围。
若一个指标过大或过小均属异常,则相应的参考值范围既有上限,又有下限,是双侧参考值范围;若一个指标仅过大属于异常,则参考值范围仅有上限;若一个指标仅过小属于异常,参考值范围仅有下限,即所谓单侧参考值范围。
13,置信区间总体率的估计包括点估计和区间估计。
Transportation Research Part C 10 (2002) 303-321/locate/trc Comparison of parametric and nonparametric modelsfor traffic flow forecastingBrian L. Smith a,*, Billy M. Williams b, R. Keith Oswald ca Department of Civil Engineering, Thornton Hall, University of Virginia, Charlottesville, VA 22903-2442, USAb School of Civil and Environmental Engineering, Georgia Institute of Technology, Atlanta, GA 30332-0355, USAc Department of Systems Engineering, Olsson Hall, University of Virginia, Charlottesville, VA 22903-2442, USAReceived 12 July 1999; accepted 5 July 2001AbstractSingle point short-term traffic flow forecasting will play a key role in supporting demand forecasts needed by operational network models. Seasonal autoregressive integrated moving average (ARIMA), a classic parametric modeling approach to time series, and nonparametric regression models have been proposed as well suited for application to single point short-term traffic flow forecasting. Past research has shown sea- sonal ARIMA models to deliver results that are statistically superior to basic implementations of non- parametric regression. However, the advantages associated with a data-driven nonparametric forecasting approach motivate further investigation of refined nonparametric forecasting methods. Following this mo- tivation, this research effort seeks to examine the theoretical foundation of nonparametric regression and to answer the question of whether nonparametric regression based on heuristically improved forecast gener- ation methods approach the single interval traffic flow prediction performance of seasonal ARIMA models.2002 Elsevier Science Ltd. All rights reserved.Keywords: Traffic forecasting; Nonparametric regression; ARIMA models; Motorway flows; Short-term traffic prediction; Statistical forecasting1. IntroductionAs intelligent transportation systems (ITS) are implemented widely throughout the world, managers of transportation systems have access to large amounts of‘‘real-time’’ status data. While this data is key, it has been recognized by researchers and practitioners alike that the full * Corresponding author. Tel.: +1-434-243-8585; fax: +1-434-982-2951.E-mail address: briansmith@ (B.L. Smith).0968-090X/02/$ - see front matter 2002 Elsevier Science Ltd. All rights reserved.PII: S 0 9 6 8 - 0 9 0 X ( 0 2 ) 0 0 0 0 9 - 8304 B.L. Smith et al. / Transportation Research Part C 10 (2002) 303-321benefits of ITS cannot be realized without an ability to anticipate traffic conditions in the short-term (less than one hour into the future). The development of traffic condition forecasting models will provide this ability, and support proactive transportation management and comprehensive traveller information services. What distinguishes traffic condition forecasting from the more traditional forecasts of transportation planning is the length of the prediction horizon. While planning models use socioeconomic data and trends to forecast over a period of years, traffic condition forecasting models predict conditions within the hour using data from roadway sensors. Without a predictive capability, ITS will provide services in a reactive manner. In other words, there will be a lag between the collection of data and the implementation of traffic control strat- egies. Therefore, the system will be controlled based on old information. In order to control the system in a proactive manner, ITS must have a predictive capability. ‘‘The ability to make and continuously update predictions of traffic flows and link times for several minutes into the future using real-time data is a major requirement for providing dynamic traffic control’’ (Cheslow et al., 1992). Furthermore, traffic condition forecasting is important in traveller information systems. This is illustrated in the statement, ‘‘the rationale behind using predictive information (for route guidance) is that traveller’s decisions are affected by future traffic conditions expected to be in effect when they reach downstream sections of the network’’ (Kaysi et al., 1993). In fact, current traveller infor- mation activities are being hampered by the lack of a capability to predict future traffic conditions.1.1. Transportation condition forecastingPredicting future traffic conditions is a general concept. In fact, there are a wide variety of needs for condition forecasts depending on particular applications. For example, forecasts of traffic flow, speed, travel time, and queue length are required for specific transportation management applications. Furthermore, it is certainly true that future traffic conditions will be dependent on transportation management strategies employed, and that traffic flow can be tracked, given a sufficient number and placement of sensors, as it progresses through a network. For these reasons, an ideal approach to traffic condition forecasting is a network-based model that simulates the system as it responds to transportation management strategies. In other words, a model that takes advantage of spatial information and traffic flow dynamics.However, every network must have boundary points, locations that serve as entry‘‘nodes’’ tothe network that do not have the advantage of upstream detectors to use in predicting their state.In fact, any dynamic traffic network models, such as commonly used simulation models (for example, CORSIM), are‘‘driven’’ by the traffic flow rate (usually in vehicles per hour) at these boundary points. In essence, the boundary points define the short-term demand for travel throughthe network. For this reason, an important subset of traffic condition forecasting is the ability to predict future traffic flow at a location (a boundary point), given only data describing past conditions at this point. The research described in this paper addresses this specific challenge of single point traffic flow prediction.Based on this definition of the problem, one will note that single point traffic flow prediction canbe described as a time series problem. Given a series of past flow readings measured in regular intervals, a model is desired to predict the flow at the next interval. Thus, we consider the demandfor travel (measured by flow) at a point to be a dynamic system, which by definition consists of two parts: a state, or the essential information describing the system, and a dynamic, which is the ruleB.L. Smith et al. / Transportation Research Part C 10 (2002) 303-321 305 that governs how the state changes over time (Mulhern and Caprara, 1994). A dynamic system evolves towards an attractor, which is the set of values to which the system eventually settles (Thearling, 1995). Dynamic systems are often complex due to chaotic attractors, which have a bounded path that is fully determined, yet never recurs (Mulhern and Caprara, 1994). The presenceof chaotic attractors tends to cloud the distinction between deterministic and stochastic behavior,and makes the selection of an ideal modeling approach difficult. A considerable amount of researchhas addressed dynamic system modeling in areas such as fluid turbulence (Takens, 1981), disease cases (Sugihara and May, 1990), and marketing (Mulhern and Caprara, 1994). In addition, thisarea has seen a large level of interest in the transportation research community of late.When evaluating dynamic systems models for traffic flow forecasting, it is certainly important to consider the accuracy of the model. However, it is equally important to consider the environment in which the model will be developed, used, and maintained. Intelligent transportation systems deployed in metropolitan regions include thousands of traffic detectors. For example, in the Hampton Roads region of Virginia, a moderate-sized metropolitan area, the state department of transportation is in the process of installing over 1200 sensor stations on the freeways alone. In addition, the department, like most others, faces severe personnel shortages. As a result, models that require significant expertise and time to calibrate and maintain, on a site-by-site basis, are simply infeasible for use in this environment. Therefore, one must explicitly consider the requirements for dynamic system model calibration and maintenance.1.2. Modeling approachesPredicting traffic flow at a point decomposes into a time-series modeling problem in which one attempts to predict the value of a variable based on a series of past samples of the variable at regular intervals. Time series models have been studied in many disciplines, including transportation. In particular, transportation researchers have developed time series traffic prediction models using techniques such as autoregressive integrated moving average (ARIMA), nonparametric regression, neural networks, and heuristics. A complete review of transportation applications in this area can be found in Smith (1995) and Williams (1999).Recent research by Williams (1999) has applied traditional parametric Box-Jenkins time series models to the dynamic system of single point traffic flow forecasting. This work has addressed most of the parametric model concerns for traffic condition data by establishing a theoretical foundation for using seasonal ARIMA forecast models (see Section 3.3). ARIMA forecastingis founded in stochastic system theory. ARIMA processes are nondeterministic with linear state transitions and can be periodic with the periodicity accounted for through seasonal differencing. There are, however, some significant concerns with the ability to fit and maintain seasonal ARIMA models on a‘‘production’’ basis in ITS systems, given the level of expertise and amount of time required to do so. For example, the researchers recently developed a number of seasonal ARIMA models for the Hampton Roads system described above. It was found that the necessary outlier detection and parameter estimation for the seasonal ARIMA models was quitetime-consuming. Using only three months of ten-minute time series data, it took more than six daysto detect and model the outliers and estimate the parameters for a seasonal ARIMA model at a gle sensor location. Estimating parameters sequentially for each location in a moderately sized system with 200 detector locations would take more than 3 years. Clearly, this is an extreme306 B.L. Smith et al. / Transportation Research Part C 10 (2002) 303-321example resulting from a non-time critical research environment. In fact, the researchers are currently exploring methods to automate and expedite the seasonal ARIMA‘‘fitting’’ process. However, it does illustrate a significant practical concern regarding the wide-scale use of seasonal ARIMA models in traffic condition forecasting.The alternative modeling approach that we will focus upon in this research is nonparametric regression. Nonparametric regression was selected as a high potential alternative to seasonal ARIMA modeling due to the fact that the authors have demonstrated the advantages ofnon-parametric regression over other approaches, such as neural networks, in previous research efforts (Smith, 1995). Nonparametric regression forecasting is founded on chaotic system theory. By definition, chaotic systems are defined by state transitions that are deterministic and non-linear. Furthermore, the state transitions of a chaotic system are ergodic, not periodic.Given that ARIMA models are founded on stochastic system theory, it may seem inappropriate on the surface to simultaneously consider both nonparametric regression and ARIMA as candidates for modeling and forecasting of the same date type. However, although an argument that traffic condition data is characteristically stochastic may be more easily asserted, the presence of ‘‘chaos like’’ behavior cannot be completely dismissed, especially during congestion when traffic flow is unstable and a stronger causative link may be operating in the time dimension. For example, Disbro and Frame presented evidence that traffic flow exhibits chaotic behavior in a 1989paper (Disbro and Frame, 1989). Furthermore, the characteristic weekly pattern of traffic condition data supports the assertion that historical neighbor states should yield effective forecasts in the same way that past chaotic orbits passing through nearby points provide‘‘good’’ short-term forecasts. Coupling this expectation of reasonably accurate forecasts with the attractive implementation characteristics of a data driven approach provides ample motivation to investigate nonparametric regression performance.2. Nearest neighbor nonparametric regressionNonparametric regression relies on data describing the relationship between dependent and independent variables. The basic approach of nonparametric regression is heavily influenced by its roots in pattern recognition (Karlsson and Yakowitz, 1987). In essence, the approach locates the state of the system (defined by the independent variables) in a‘‘neighborhood’’ of past, similar states. Once this neighborhood has been established, the past cases in the neighborhood are used to estimate the value of the dependent variable.Clearly, this approach assumes that the bulk of the knowledge about the relationship lies in the data rather than the person developing the model (Eubank, 1988). Of course, the quality of the database, particularly in storing past cases that represent the breadth of all possible future conditions, dictates the effectiveness of a nonparametric regression model. To put it simply, if similar cases are not present in the database, an informed forecast cannot be generated.2.1. Implementation challengesThere are four fundamental challenges when applying nonparametric regression: definition ofan appropriate state space, definition of a distance metric to determine nearness of historicalB.L. Smith et al. / Transportation Research Part C 10 (2002) 303-321 307 observation to the current conditions, selection of a forecast generation method given a collection of nearest neighbors, and management of the potential neighbors database.2.1.1. State spaceFor data that is time series in nature, a state vector defines each record with a measurement at time t; t 1; . . . ; t d where d is an appropriate number of lags. For example, a state vector xðtÞ of flow rate measurements collected every 10 min with d ¼ 3 can be written asxðtÞ¼½V ðtÞ; V ðt 1Þ; V ðt 2Þ; V ðt 3Þ ; ð1Þwhere V ðtÞ is the flow rate during the current time interval, V ðt 1Þ is the flow rate during the previous 10-min interval, and so on. Note that an infinite number of possible state vectors exist. Furthermore, they are not restricted to incorporating only lagged values but may also be supplemented with aggregate measures such as historical averages.2.1.2. Distance metricA common approach to measuring‘‘nearness’’ in nonparametric regression is to use Euclidean distance in the independent variable space. Such an approach‘‘weights’’ the value of each in- dependent variable equally. In other words, it considers the information content of each to be of equal value. In many cases, knowledge of the problem domain may make such an assumption unreasonable. In that case, a weighed distance metric where the‘‘dimension’’ of variables with higher information content would be weighed more heavily may be appropriate. While this makes intuitive sense, it is clear that such an approach is heuristic in nature, and requires careful con- sideration by the modeler.2.1.3. Forecast generationThe most straightforward approach to generating the forecast for the dependent variable is to compute a simple average of the dependent variable values of the neighbors that have fallen within the nonparametric regression neighborhood. The weakness of such an approach is that it ignores all information provided by the distance metric. In other words, it is logical to assume that past cases‘‘nearer’’ the current case have higher information content and should play a larger role in generating the forecast.To address this concern, a number of weighting schemes have been proposed for use within nonparametric regression. In general these weights are proportional to the distance between the neighbor and the current condition. An alternative to the use of weights is to fit a linear or nonlinear model to the cases in the neighborhood, and then use the model to forecast the value of the dependent variable. For example, Mulhern and Caprara (1994) use a regression model in the selected neighborhood to generate marketing forecasts with promising results. However, we chose not to investigate this approach due to the fact that it introduces a parametric model to a non- parametric technique.Finally, it should be clear that there are an infinite number of approaches to forecast generation. As we will discuss later in the paper, we were able to improve the effectiveness of our models by weighing our forecast directly with historical data.308 B.L. Smith et al. / Transportation Research Part C 10 (2002) 303-3212.1.4. Management of potential neighbor databaseAs stated earlier, the effectiveness of nonparametric regression is directly dependent on the quality of the database of potential neighbors. Clearly, it is desirable to possess a large database of cases that span the likely conditions that a system is expected to face. However, while as large a database as possible is desirable for increasing the accuracy of the model, the size of the database has significant implications on the timeliness of model execution.When considering how nonparametric models work, one will see that the majority of effort at runtime is expended in searching the database for neighbors. As the database grows, this search process grows accordingly. For real-time applications, such as traffic flow forecasting, this is a significant issue. Steps must be taken to keep the database size manageable, while ensuring that the database has the depth and breadth necessary to support forecasting.Again, the number of approaches to accomplish this is infinite. One approach would be to cluster the database, and only search those cases in the cluster in which the current state falls. Another approach would be to periodically delete records from the database. This process would involve searching for multiple records that are nearly identical. Such an approach would require the use of a distance metric such as those discussed earlier. While examining this issue fully is beyond the scope of this paper, it is important to realize that the management of the database is a critically important issue, particularly in real-time applications of nonparametric regression.2.2. Nonparametric regression theoretical foundationIt is important to recognize that nonparametric regression has a strong theoretical basis. The statistical properties of the nearest neighbor approach are attractive. Using such an approach will result in an asymptotically optimal forecaster (Karlsson and Yakowitz, 1987). This means that fora state space of size m values, the nearest neighbor model will asymptotically be at least as good as any mth order parametric model. This property indicates that if one has access to a large, high- quality database, the nearest neighbor approach should yield results comparable to parametric models.While the state definition for a nonparametric model is flexible, and can be defined in an in- finite number of ways, a common approach for time-series problems is to define the state as the series of system values (in our case, traffic flow measurements) measured during the past d in- tervals. In dynamic systems, an attractor is defined as a value to which a system eventually settlesas t ! 1 (Takens, 1979). In other words, the system dynamic is driven by the attractor. In his work, Takens found that for a D dimension attractor, it is necessary to define the number of lagsðdÞ in the state as 2D þ 1 (1979). Therefore, given that D is not known for single point traffic flow prediction, there is theoretical justification to consider multiple values of d when defining the state.Furthermore, there is justification to investigate the inclusion of other values in the state definition beyond lagged time series values. For example, since nearest neighbor models‘‘geometrically attempt to reconstruct whatever attractor is operating in a time series’’ (Mulhern and Caprara, 1994), including historical averages in the state vector further clarifies the position of each observation along the cyclical flow-time curve, which may improve forecast accuracy by finding neighbors that are more similar to the current conditions.B.L. Smith et al. / Transportation Research Part C 10 (2002) 303-321 309 Thus, based on dynamic systems literature, there is theoretical justification for considering multiple system state definitions when applying nonparametric regression. This highlights a promising opportunity to improve the accuracy of nonparametric regression forecasting.3. Experimental approach3.1. Research questionEarlier work by Smith (1995) found that a simple implementation of the nearest neighbor forecasting approach provides reasonably accurate traffic condition forecasts. Subsequent research by Williams (1999) demonstrated that seasonal ARIMA models outperformed the straight average k-nearest neighbor method. However, the advantages associated with a data-driven nonparametric forecasting approach motivate further investigation of refined nonparametric regression forecasting methods. Following this motivation, the research effort presented in this article sought to answer the question of whether nonparametric regression based on heuristically improved forecast generation methods would approach the single interval traffic flow prediction performance of seasonal ARIMA models.3.2. The dataThe data were collected by the United Kingdom Highways Agency using the motorway inci- dent detection and automatic signaling (MIDAS) system. The pilot project for the MIDAS system included extensive instrumentation of the southwest quadrant of the London Orbital Motorway, M25. The instrumentation included four travel lane specific loop detectors at 500-m intervals. Traffic condition data from these detectors is collected on a one-minute interval, 24 h a day. The data were collected from 4 September to 30 November, 1996. The data are 15-min traffic flow rates from two loop detector locations, detector number 4757A and 4762A, which lie between the M3 and M23 interchanges. The 15-min data were formed by averaging the raw 1-min data over 15- min intervals and aggregating across all travel lanes. The data are for traffic flow in the clockwise direction on the outer loop of the M25. Fig. 1 shows the general location of the data sites.For this research the data were split into two independent samples. The data collected between 4 September and 18 October provided the basis for model development. The models were then evaluated using the data collected between 19 October and 30 November.Table 1 presents descriptive statistics for the data sets. The data contain very few missing observations with only approximately 1% of the data missing for each development data set and no missing observations for the evaluation data. The PM peak flow rates fall in the range of 7000- 8000 vehicles per hour (vph).3.3. Benchmark modelsAs stated in Section 3.1, this research focuses on improving the performance of nonparametric regression. To assess the quality of the forecasts, two benchmark models were used to‘‘frame’’ the^ where ^310B.L. Smith et al. / Transportation Research Part C 10 (2002) 303-321Fig. 1. M25 detector locations.Table 1Descriptive statistics--M25 dataData seriesSeries length (number Missing Percent Series mean Mean absolute of observations) values missing (%) (veh/h) one-step change Detector 4757A Sep/Oct 19964320 44 1.02 3106 309 Detector 4757A Oct/Nov 19964128 0 0.0 2916 282 Detector 4762A Sep/Oct 19964320 51 1.18 3100 315 Detector 4762A Oct/Nov 19964128 0 0.0 2915 287model must certainly outperform a na €ıve model. Secondly, a seasonal ARIMA model was used in the context of a best case. In other words, the goal of modifying the nonparametric regression models was to approach the performance of parametric seasonal ARIMA models. The benchmark models utilized in the work are described more fully below.3.3.1. Na €ıve forecastNa €ıve forecasts were calculated in a way that takes into account both the current conditions and the historical pattern. Average flow rates were calculated for each time-of-day and day-of-the- week using the development data for each site. For the test data, na €ıve forecasts were calculated using these historical average flow rates by the equationV ðt þ 1Þ ¼ V ðt Þ V hist ðt Þ V hist ðt þ 1Þ;ð2Þ V ðt þ 1Þ is the forecast for the next 15 min time interval, V ðt Þ is the traffic flow rate at the current time interval, and V hist ðt Þ is the historical average of the traffic flow rate at the time-of-day and day-of-the-week associated with time interval t.B.L. Smith et al. / Transportation Research Part C 10 (2002) 303-321 311 This forecast approach follows the intuition that the ratio of current conditions to historic conditions will be a good indicator of how the traffic conditions in the next interval will deviate from the historic conditions. A proposed forecast method that could not be shown to consistently produce more accurate forecasts than this na€ıve method would be of little value.3.3.2. Seasonal ARIMA parametric modelRecent work by Williams (1999) put forth a strong theoretical foundation for modeling traffic condition data as seasonal ARIMA processes and demonstrated the forecast accuracy of fitted seasonal ARIMA models. The theoretical foundation rests on two fundamental time series concepts, namely the definition of time series stationarity and the theorem known as the Wold Decomposition.3.3.2.1. Seasonal ARIMA definition. The formal definition of a seasonal ARIMA process follows: Seasonal ARIMA processA time series fX t g is a seasonal ARIMA ðp; d ; qÞðP ; D; QÞS process with period S if d and D arenonnegative integers and if the differenced series Y t¼ð1 BÞdð1 regressive moving average process defined by the expression /ðBÞUðB sÞY t¼ hðBÞHðB sÞe t ;where B is the backshift operator defined by B a W t¼ W t a, /ðzÞ¼ 1 /1 /p z p ; UðzÞ¼ 1 U1Z U P Z P ;hðzÞ¼ 1 h1 h q Z q; HðzÞ¼ 1 H1Z H Q Z Q; B sÞD X t is a stationary auto-ð3Þand e t is identically and normally distributed with mean zero, variance r2 and covðe t ; e t kÞ¼ 08k ¼0, i.e., fe t g WN ð0; r2Þ.In the definition above, the parameters p and q represent the autoregressive and moving average order, respectively, and the parameters P and Q represent the autoregressive and moving average order at the model’s seasonal period length, S. The parameters d and D represent the order of ordinary and seasonal differencing, respectively. The reader is referred to a comprehensive time series text, such as Brockwell and Davis (1996) or Fuller (1996) for additional background on ARIMA time series models.3.3.2.2. Theoretical justification. Formal ARIMA model definitions apply to time series that are weakly stationary or that can be made weakly stationary through differencing. By definition, a time series fX t g is said to be weakly stationary if(i) the unconditional expected value of X t is the same for all t and(ii) the covariance between any two observations in the series is dependent only on the lag between the observations (independent of t).Given the characteristic cyclical pattern of traffic condition data, it is obvious that neither raw data series nor data series that have been transformed by a first ordinary difference are stationary. Traffic flow, for example, rises to and falls from characteristic peaks. Although the daily patterns for the weekdays are quite similar to each other, the weekday patterns do vary from day to day to。