Optimal Trajectory Planning of the Dual Arm Manipulator Using DAMM

格式：pdf
大小：331.37 KB
文档页数：5

下载文档原格式

The Data-Driven Newsvendor Problem_ New Bounds and Insights

INFORMS is the largest professional society in the world for professionals in the fields of operations research, management For more information on INFORMS, its publications, membership, or meetings visitO PERATIONS R ESEARCHArticles in Advance ,pp.1–13ISSN 0030-364X (print) ISSN 1526-5463(online)/10.1287/opre.2015.1422©2015INFORMSThe Data-Driven Newsvendor Problem:New Bounds and InsightsRetsef Levi,Georgia PerakisSloan School of Management,Massachusetts Institute of Technology,Cambridge,Massachusetts 02139{retsef@ ,georgiap@ }Joline UichancoRoss School of Business,University of Michigan,Ann Arbor,Michigan 48109,jolineu@Consider the newsvendor model,but under the assumption that the underlying demand distribution is not known as part of the input.Instead,the only information available is a random,independent sample drawn from the demand distribution.This paper analyzes the sample average approximation (SAA)approach for the data-driven newsvendor problem.We obtain a new analytical bound on the probability that the relative regret of the SAA solution exceeds a threshold.This bound is signiﬁcantly tighter than existing bounds,and it matches the empirical accuracy of the SAA solution observed in extensive computational experiments.This bound reveals that the demand distribution’s weighted mean spread affects the accuracy of the SAA heuristic.Keywords :data-driven stochastic program;newsvendor problem;sample average approximation;weighted mean spread.Subject classiﬁcations :inventory/production:uncertainty.Area of review :Operations and Supply Chains.History :Received August 2010;revisions received October 2011,June 2013,August 2014,June 2015;accepted August 2015.Published online in Articles in Advance .1.IntroductionIn the classical single-period newsvendor problem,a retailer plans to sell a perishable product that has a stochas-tic demand with a known distribution (Zipkin 2000).She needs to commit to an order quantity before observing the actual demand.The retailer incurs an underage cost for each unit of unsatisﬁed demand and an overage cost for each unsold unit of product at the end of the period.Unsold units are discarded at the end of the period.The goal of the retailer is to choose an order quantity that minimizes the expected cost function C .If the demand distribution is known,then the optimal order quantity q ∗=min q 0C q is a well-speciﬁed quantile of the distribution (sometimes called the critical quantile ).In reality,managers need to make inventory decisions without having complete knowledge of the demand distri-bution.Often,the only information available is a set of demand data.Nonparametric data-driven heuristics assume the demand data is a random sample drawn from the unknown demand distribution.Consider a random sam-ple of size N drawn from the unknown distribution.One popular nonparametric data-driven heuristic approach often used in practice is sample average approximation (SAA)(Homem-De-Mello 2000,Kleywegt et al.2001).The SAA solution,which we denote by ˆQN ,minimizes the cost averaged over the empirical distribution induced by the sample of size N ,instead of the true expected cost C ,which cannot be evaluated.The SAA solution is equal tothe sample critical quantile,which can be obtained by sort-ing the N samples in increasing value.Note that ˆQN is stochastic since its value depends on the random sample.In many practical settings,there are either not enough demand data available or data are generated from an expen-sive procedure.Under these settings,it is useful to under-stand the underlying accuracy of a data-driven solution for a given sample size.The following are only a few exam-ples of such settings:(i)A retailer is planning to introduce a new product and may want to gather demand informa-tion for production and inventory planning purposes.One method is through pilot experiments where the product is introduced in a few pilot stores.Setting up a pilot at a store is expensive;however,the retailer can gather useful information to forecast demand.How many pilot stores and pilot weeks are sufﬁcient for the retailer to gather enough demand information to have a production plan it can be conﬁdent with?(ii)When a fast fashion retail company,such as ZARA,introduces a new collection for the sea-son,it waits to receive some initial set of sales samples.It then uses these samples to reﬁne the estimation of demand to be used for quick production.The longer the demand-gathering period,the more accurate the demand forecasts.But shorter demand-gathering periods enable the company to plan smaller production batches,resulting in shorter lead time and quicker abilities to react to demand.(iii)A retailer is planning to introduce a new product.To gather informa-tion about the demand,the retailer designed a simulation model that is computationally complex but can accurately1D o w n l o a d e d f r o m i n f o r m s .o r g b y [61.150.43.34] o n 14 D e c e m b e r 2015, a t 17:57 . F o r p e r s o n a l u s e o n l y , a l l r i g h t s r e s e r v e d .Levi,Perakis,and Uichanco:Data-Driven Newsvendor Problem2Operations Research,Articles in Advance ,pp.1–13,©2015INFORMSsimulate demand based on multiple factors.Because of the computational complexity of the simulation model,the retailer may choose to generate limited demand data points for planning initial production.In this paper,we develop a probabilistic understanding of the accuracy of the popular data-driven SAA heuristic using a novel analysis.We asses the accuracy of the SAA heuristic using its regret ,which is the difference between itscost C ˆQN and the optimal cost C q ∗ .The regret divided by C q ∗ is called the relative regret .The contributions and insights of the paper are outlined next.1.1.Contributions and Insights1.1.1.An Informative Probabilistic Bound.We derive a new analytical bound on the probability that the SAA solution has at most relative regret.This bound depends only on the sample size,the threshold ,the under-age and overage cost parameters,as well as a newly intro-duced property of the demand distribution called weighted mean spread (WMS).To the best of our knowledge,the WMS is an entirely new concept,ﬁrst introduced in this paper.The absolute mean spread (AMS), q ∗ ,is the difference between the conditional expectation of demand above q ∗and the conditional expectation below the q ∗(Deﬁnition 1).The WMS is the AMS weighted by the den-sity function value,i.e., q ∗ f q ∗ .Our analysis shows that the WMS is the key property of a distribution that determines the accuracy of the SAA method.Speciﬁcally,the probability that the SAA solution has a relative regret greater than decays exponentially,with an exponential decay constant proportional to q ∗ f q ∗ .Thus,the SAA solution is more likely to have a smaller relative regret if the sample is drawn from a distribution with a large value for q ∗ f q ∗ .1.1.2.A Tighter Bound That Predicts SAA’s Empiri-cal Behavior.The probabilistic bound we derive for SAA accuracy based on our analysis is signiﬁcantly tighter than those established in previous works (Kleywegt et al.2001,Levi et al.2007).In empirical experiments,the theoreti-cal guarantee of our bound is closer to the actual empiri-cal performance of SAA.Moreover,our bound accurately reﬂects the empirical relationship between regret–sample size and regret–WMS predicted by regression models built from applying SAA on different sample sizes and demand distributions.1.1.3.A Probabilistic Bound for Log-Concave Distri-butions.The new notion of WMS is used to develop a general optimization-based methodology to derive a proba-bilistic bound for the accuracy of the SAA method over any nonparametric family of distributions.This is done through a speciﬁed optimization problem that minimizes the WMS, q ∗ f q ∗ ,over the family of distributions.We are able to solve this problem in closed form for the important fam-ily of log-concave distributions ,providing a provably tightlower bound on q ∗ f q ∗ of all log-concave distribu-tions.As a consequence,we obtain a uniform probabilis-tic bound for the accuracy of the SAA solution under any log-concave demand distribution.This bound is indepen-dent of distribution-speciﬁc parameters and only depends on the sample size,the relative regret threshold,and the underage and overage cost parameters.Note that many of the common distributions,some often assumed in inven-tory and operations management,are log-concave,e.g.,nor-mal,uniform,exponential,logistic,chi-square,chi,beta,and gamma distributions.The methodology we developed could potentially be used to derive probabilistic bounds for SAA accuracy under other distribution families.We believe this is a promising direction for future research.1.1.4.A Minimax Regret Policy for Uncensored Demand Under Continuous Distributions.Based on our probabilistic bound on SAA’s relative regret,we are able to derive an upper bound on the SAA’s cumulative regret over N periods that is O ln N .A recent paper by Besbes and Muharremoglu (2013)shows that,in a repeated newsvendor setting where in each period an addi-tional uncensored demand data point from a continuous distribution is revealed,there is no nonanticipating policy that achieves a worst case cumulative regret smaller than O ln N .Based on our analysis,the policy of ordering the SAA solution in each period achieves a cumulative regret of O ln N .paring SAA with Other Data-Driven Approaches.Finally,we conduct a computational study comparing the accuracy of the SAA method against another naive (but commonly used in industry)approach that ﬁrst ﬁts the sample to a speciﬁed distribution and then solves the newsvendor problem with respect to that distribution.The comparison is made based on the average single-period relative regret.In most cases,even when the crit-ical quantile is high,the accuracy of the SAA method is on par with or dominates those of the distribution-ﬁtting approach.Moreover,when the sample is drawn from a non-standard distribution (e.g.,mixed normals),the distributions ﬁtting method results in huge errors compared to the SAA method.1.2.Literature ReviewA large body of literature on models and heuristics for inventory problems can be applied when limited demand information is known.One may use either a paramet-ric approach or a nonparametric approach.A parametric approach assumes that the true distribution belongs to a parametric family of distributions,but the speciﬁc values of the parameters are unknown.In contrast,a nonparamet-ric approach requires no assumptions regarding the para-metric form of the demand distribution.The following are some examples of parametric approaches.Scarf (1959)pro-posed a Bayesian procedure that updates the belief regard-ing the uncertainty of the parameter based on observationsD o w n l o a d e d f r o m i n f o r m s .o r g b y [61.150.43.34] o n 14 D e c e m b e r 2015, a t 17:57 . F o r p e r s o n a l u s e o n l y , a l l r i g h t s r e s e r v e d .Levi,Perakis,and Uichanco:Data-Driven Newsvendor ProblemOperations Research,Articles in Advance ,pp.1–13,©2015INFORMS3that are collected over time.Liyanage and Shanthikumar (2005)introduced operational statistics ,which,unlike the Bayesian approach,does not assume any prior knowledge on the parameter values.Instead it performs optimization and estimation simultaneously.In another work,Akcay et al.(2011)propose ﬁtting the sample to a distribution in the Johnson Translation System,which is a parametric family that includes many common distributions.Besides SAA,the following are other examples of nonparametric approaches proposed in previous work.Concave adaptive value estimation (CA VE)(Godfrey and Powell 2001)suc-cessively approximates the objective cost function with a sequence of piecewise linear functions.Bookbinder and Lordahl (1989)propose estimating the critical quantile of the demand distribution using the bootstrap method.The inﬁnitesimal perturbation approach (IPA)is a sampling-based stochastic gradient estimation technique that has been used to solve stochastic supply chain models (Glasserman and Ho 1991).Huh and Rusmevichientong (2009)pro-pose an online algorithm for the newsvendor problem with censored demand data (i.e.,data are on sales instead of demand)based on stochastic gradient descent.Another nonparametric method for censored demand is proposed by Huh et al.(2008),based on the well-known Kaplan-Meier estimator.Robust optimization addresses distribution uncer-tainty by providing solutions that are robust against differ-ent distribution scenarios.It does this by allowing the distri-bution to belong to a speciﬁed family of distributions.Then one can use a max-min approach,attempting to maximize the worst-case expected proﬁt over the set of allowed distri-butions.Scarf (1958)and Gallego and Moon (1993)derived the max-min order quantity for the newsvendor model with respect to a family of distributions with the same mean and variance.Another robust approach attempts to minimize the worst case regret over the distribution family.Some recent works using a minimax regret criterion include Ball and Queyranne (2009),Eren and Maglaras (2006),Perakis and Roels (2008),and Levi et al.(2011).In general,the SAA method is used to solve two types of stochastic optimization problems.The ﬁrst type of problems are those that are computationally difﬁcult even though the underlying distribution is known (e.g.,two-stage discrete problems where the expectation is difﬁcult to evaluate because of complicated utility functions and multivariate continuous distributions).Sampling is used to approximate the complicated (but known)objective func-tion.The resulting sample approximation leads to a deter-ministic equivalent problem (e.g.,an integer program)that is ﬁnite,though possibly with a large dimension result-ing from the sample size.Some analytical results about probabilistic bounds for SAA accuracy have been derived for two-stage stochastic integer programs (Kleywegt et al.2001,Swamy and Shmoys 2005,Shapiro 2008).It was shown by Kleywegt et al.(2001)that the optimal solution of the SAA problem converges to the true optimal value with probability 1.They also derive a probabilistic boundon the SAA accuracy that depends on the variability of the objective function and the size of the feasible region;how-ever,they observe it to be too conservative for practical estimates.The second type of problems SAA is used to solve are problems whose objective functions are easy to evaluate if the distribution is known (like for the newsvendor prob-lem);however,the complication is that the distribution is unknown.Samples are used to estimate the unknown objec-tive function.The problem we are dealing with in this paper falls under this second category of problems.The accu-racy of the SAA solution for the newsvendor problem is analyzed by Levi et al.(2007),who derive a bound on the probability that its relative regret exceeds a threshold.This bound is independent of the underlying demand dis-tribution and only depends on the sample size,the error threshold,and the overage and underage cost parameters.Since it applies to any demand distribution,it is uninfor-mative and highly conservative.It is uninformative since it does not reveal the types of distributions for which the SAA procedure is likely to be accurate.It is conserva-tive because,as we demonstrate in computational experi-ments later in this paper,the probabilistic bound in Levi et al.(2007)does not match the empirical accuracy of the SAA seen for many common distributions.We show in §5that the analysis of Levi et al.(2007)greatly underes-timates the accuracy of the SAA method.Similar to the probabilistic bounds derived for the ﬁrst type of problems using SAA,our bound is distribution speciﬁc.However,our bound is signiﬁcantly tighter than previously estab-lished bounds.Since the demand distribution is unknown,we use an optimization-based framework to derive proba-bilistic bounds for SAA accuracy that do not depend on any distribution parameters.There are other metrics that have been used in the lit-erature to evaluate whether a data-driven heuristic con-verges to the true optimal solution.In a repeated setting where the decision maker can adapt the policy based on new data,a popular metric is the cumulative regret over N periods (Flaxman et al.2005,Huh and Rusmevichientong 2009,Besbes and Muharremoglu 2013).A recent paper by Besbes and Muharremoglu (2013)shows that,if the demand distribution is continuous,no nonanticipating pol-icy for uncensored demand can achieve a worst case cumu-lative regret smaller than O ln N .However,since the adaptive policy by Huh and Rusmevichientong (2009),which uses the censored demand data to estimate a cost-decreasing direction,achieves a cumulative regret that is O ln N ,then both the censored and uncensored setting have a minimax cumulative regret that is O ln N .A corol-lary of our analysis in this paper is that the commonly used SAA heuristic similarly achieves a cumulative regret of O ln N for the uncensored demand setting.Our results are also related to quantile estimation litera-ture.This is because the SAA solution for the newsvendor problem is a particular sample quantile of the empiricalD o w n l o a d e d f r o m i n f o r m s .o r g b y [61.150.43.34] o n 14 D e c e m b e r 2015, a t 17:57 . F o r p e r s o n a l u s e o n l y , a l l r i g h t s r e s e r v e d .Levi,Perakis,and Uichanco:Data-Driven Newsvendor Problem4Operations Research,Articles in Advance ,pp.1–13,©2015INFORMSdistribution formed by the demand sample.The conﬁdence interval for the quantile estimator is well known (Asmussen and Glynn 2007).However,unlike in quantile estimation,the accuracy of the SAA solution does not depend on the absolute difference between the true quantile and the quan-tile estimator.Rather,it depends on the cost difference between the true quantile and the estimator.In our work,we ﬁnd a relationship between the two types of accuracies.1.3.OutlineThis paper is structured as follows.In §2,we describe the data-driven single-period newsvendor problem and discuss the probabilistic bound on SAA’s relative regret by Levi et al.(2007).Section 3introduces a novel WMS-based analysis of the SAA relative regret.In §4,we introduce an optimization-driven bound on WMS and apply it to log-concave distributions.In §5,we discuss the implications of the WMS-based bound:it closely matches the empirical behavior of SAA,and it implies that SAA achieves a cumu-lative regret of O ln N in a repeated setting.Finally,in §6,we perform computational experiments that compare the performance of the SAA approach to another popular data-driven method.Unless given,the proofs are provided in the electronic companion (available as supplemental material at /10.1287/opre.2015.1422)to this paper.2.The Data-Driven Newsvendor ProblemIn the newsvendor model,a retailer has to satisfy a stochas-tic demand D for a single product over a single sales period.Prior to observing the demand,the retailer needs to decide how many units q of the product to stock.Only then is demand realized and fulﬁlled to the maximum extent possible from the inventory on hand.At the end of the period,cost is incurred;speciﬁcally,a per unit underage cost b >0for each unit of unmet demand,and a per unit overage cost h >0for each unsold product unit.The goal of the newsvendor is to minimize the total expected cost.That is,min q 0C qwhere C q E b D −q ++h q −D +where x + max 0 x .The expectation is taken with respect to the stochastic demand D ,which has a cumulative distribution function (cdf)F .Much is known about the newsvendor objective function and its optimal solution (see Zipkin 2000).In particular,C is convex in q with a right-sided derivative +C q =−b + b +h F q and a left-sided derivative −C q =−b + b +h Pr D <q .The optimal solution can be characterized through ﬁrst-order conditions.In particular,if −C q 0and +C q 0,then 0is a subgradient,implying that q is optimal (Rockafellar 1970).These con-ditions are met byq ∗ infq F qbb +hwhich is the b/ b +h quantile of D ,also called the critical quantile or the newsvendor quantile .The basic assumption of the newsvendor problem is that the cdf F of the stochastic demand is known.If F is unknown,then the optimal order quantity q ∗cannot be evaluated.Let D 1 D 2 D N be a random,independent sample of size N drawn from the true demand distribution,and let d 1 d 2 d N be a particular realization.Instead of optimizing the unknown expected cost,C · ,the SAA method optimizes the cost averaged over the drawn sample:min q 0ˆCN q where ˆC N q 1N Nk =1b d k −q ++h q −d k + (1)Based on the particular sample,the empirical distribu-tion is formed by putting a weight of 1/N on each ofthe demand data points.Note that the function ˆCN is the expected cost with respect to the empirical distribution.Hence,the optimal solution to (1)is the b/ b +h samplequantile.Formally,we denote the empirical cdf as ˆF N q 1/N Nk =11 D k q .Let ˆQN denote the optimal solution to the SAA counterpart with a sample of size N .Thus,ˆQN is the b/ b +h quantile of the random sample:ˆQ N inf q ˆF N q b b +h(2)Note that ˆQN is a random variable since its value depends on the particular realization of the random sample.2.1.Distribution-Free Analysis for SAA Accuracy We ﬁrst describe the analysis of Levi et al.(2007)with the goal of highlighting its difference with the new WMS-based analysis presented in the next section.Levi et al.(2007)derive a bound on the probability that the relative regret of the SAA solution exceeds an error threshold.We show that,without additional assumptions,their bound can be improved.An order quantity is called -optimal if its relative regret is no more than ,where >0.Let us denote the set of all -optimal order quantities as S (see Figure 1).Levi et al.(2007)use the left and right one-sided derivatives of C ,denoted by −C and +C ,respectively,to deﬁne the following interval:S LRSq −C q 3min b hand +C q −3min b h(3)they then show that S LRS⊆S (see Figure 1(a)).Using large deviations results,speciﬁcally the Hoeffding inequal-ity (Hoeffding 1963),they derive a bound on the prob-ability that the SAA solution ˆQN solved with a sample size N has one-sided derivatives bounded by .When =/3 min b h ,the property is equivalent to ˆQ N ∈S LRS ,implying that that ˆQN is -optimal since S LRS ⊆S .This analysis gives the LRS probability bound (4).D o w n l o a d e d f r o m i n f o r m s .o r g b y [61.150.43.34] o n 14 D e c e m b e r 2015, a t 17:57 . F o r p e r s o n a l u s e o n l y , a l l r i g h t s r e s e r v e d .Levi,Perakis,and Uichanco:Data-Driven Newsvendor ProblemOperations Research,Articles in Advance ,pp.1–13,©2015INFORMS5Figure 1.(Color online)The intervals S LRS and S fof a newsvendor cost function C .(1 + )C (q *)(1 + )C (q *)(a)S LRS(b)S fNote.S LRS is deﬁned in (3)and S fis deﬁned in (6).Theorem 1(LRS Bound,Levi et al.2007).Consider the newsvendor problem with underage cost b >0andoverage cost h >0.Let ˆQN be the SAA solution (2)with sample size N .For a given >0,the SAA solution ˆQN with sample size N is -optimal with probability at least 1−2exp−29N2 min b h b +h2(4)Using the Bernstein inequality (Bernstein 1927),we areable to prove a tighter bound than (4).Unlike the Hoeffd-ing inequality,it uses the fact that the SAA solution seeks to estimate the speciﬁc b/ b +h quantile.The proof is provided in the electronic companion to this paper.Theorem 2(Improved LRS Bound).Consider the news-vendor problem with underage cost b >0and overage costh >0.For any >0,the SAA solution ˆQN with sample size N is -optimal with probability of at least1−2exp−N 218+8 ·min b h b +h (5)The improved LRS bound (5)has an exponential ratethat is proportional to min b h / b +h rather than on min b h / b +h 2.This is signiﬁcant because in many important inventory systems,the newsvendor quantile b/ b +h is typically close to 1,reﬂecting high service-level requirements.Thus,h/ b +h is close to 0,resulting in a very small value for min b h / b +h .Hence,the improved LRS bound gives a signiﬁcantly tighter bound on probability of an -optimal SAA solution.As an illustra-tion,consider the SAA method applied to a newsvendor problem in which the service level increases from 95%to 99%.To maintain the likelihood of achieving the same accuracy,the LRS bound suggests that the sample size needs to be increased by 25times.In contrast,the improved LRS bound suggests that the accuracy is maintained by a sample size that is only ﬁve times as large.The analysis of Levi et al.(2007)yields a probability bound that is general,since it applies to any demand dis-tribution.However,it is uninformative in that it does not reveal any relationship between the accuracy of the SAA heuristic and the particular demand distribution.Further-more,the analysis is not tight since,as we will demon-strate in computational experiments in §5,it signiﬁcantly underestimates the actual accuracy of the SAA solution insimulations.This is because the interval S LRSis typically very small relative to S (see Figure 1(a)).In the next section,we develop a new and informative probabilistic bound on the relative regret of the SAA solu-tion that identiﬁes the important properties of the under-lying demand distribution that determine the procedure’s accuracy.This bound is signiﬁcantly tighter than the previ-ously known bound for the single-period newsvendor prob-lem by Levi et al.(2007).3.Weighted Mean Spread-Based Analysis of SAA AccuracySuppose the demand is a continuous random variable with a probability density function (pdf)f ,which we assume to be continuous everywhere.(In §4,we will show that this assumption is automatically satisﬁed under some sim-ple conditions.)Let ˜Cbe the second-order Taylor series approximation of C at the point q =q ∗,where q ∗is the optimal newsvendor quantile.Note that since a pdf exists,the cost function is twice differentiable.It is straightfor-ward to verify that˜C qbh b +h q ∗ +12 b +h q −q ∗ 2f q ∗ where q ∗ is the AMS deﬁned below.Deﬁnition 1(Absolute Mean Spread).Let D bea random variable.We deﬁne the AMS as q ∗ E D D q ∗ −E D D q ∗ .D o w n l o a d e d f r o m i n f o r m s .o r g b y [61.150.43.34] o n 14 D e c e m b e r 2015, a t 17:57 . F o r p e r s o n a l u s e o n l y , a l l r i g h t s r e s e r v e d .Levi,Perakis,and Uichanco:Data-Driven Newsvendor Problem6Operations Research,Articles in Advance ,pp.1–13,©2015INFORMSObserve that q ∗ is simply equal to b +h /bh ·C q ∗ .Consider an approximation to S using a sublevelset of ˜C(see Figure 1(b)):S f q ˜C q 1+ ˜C q∗ (6)The superscript f is to emphasize that the interval is deﬁned by the particular distribution f .The two endpointsof S f areq q ∗−2 bh b +h 2 q ∗ f q ∗¯q q ∗+2 bh b +h 2 q ∗f q ∗(7)S f is not necessarily a subset of S .However,by imposinga simple assumption on the pdf,we can guarantee that S f∩ q ∗is a subset of S .Assumption 1.The cost parameters b h are such that f q is decreasing for all q q ∗.Observe that ˜Cmatches the ﬁrst two derivatives of C at the point q =q ∗.Moreover,C q = b +h f q and ˜Cq = b +h f q ∗ .Thus,Assumption 1implies that ˜Cincreases faster than C over the interval q ∗ .That is,˜C q C q for each q ∈ q ∗ ,implying that S f ∩q ∗is a subset of S .Hence,under Assumption 1,if anorder quantity q falls within S f∩ q ∗ ,then this implies that q ∈S or,equivalently,C q 1+ C q ∗ .For distributions that are unimodal or with support +,Assumption 1is satisﬁed when the newsvendor quantile b/ b +h is sufﬁciently large.Table EC.1in the elec-tronic companion to this paper summarizes the range of b/ b +h values for which Assumption 1holds under com-mon demand distributions.In many important inventory systems,the newsvendor quantile is typically close to 1,so Assumption 1would hold in a broad set of cases.Recall that Theorem 2gives a lower bound on the prob-ability that the SAA solution (i.e.,the b/ b +h samplequantile)lies in S LRS,implying that it is -optimal.How-ever,in our new analysis,the SAA solution is -optimal ifit lies in the interval S fand if it is at least as large as q ∗.Therefore,instead of taking the b/ b +h sample quantile,we bias the SAA solution by a small amount.In Theo-rem 3,we prove a lower bound on the probability that thisbiased SAA solution lies in S f∩ q ∗ ,implying that it is -optimal.The biasing of the SAA solution is to facilitate the proof of the probabilistic guarantee.However,we will show in §5through empirical experiments that the prob-abilistic guarantee derived for the biased solution is also empirically accurate for the unbiased SAA solution.Recall that ˆFN is the empirical cdf of a random sample of size N drawn from the demand distribution D .For some 0,deﬁne˜Q N infq ˆFN q b b +h +12 b +h(8)Note that ˜Q N is a random variable since its value dependson the particular realization of the random sample.The following theorem states that for an appropriately chosenbias factor ,the probability that ˜Q N is -optimal can bebounded.The proof is found in the electronic companion.Theorem 3(Distribution-Dependent Bound).Con-sider the newsvendor problem with underage cost b >0and overage cost h >0.Let ˜Q N be deﬁned as in (8),with =2 bh q ∗ f q ∗ +O .Under Assumption 1,for a given >0,˜Q N is -optimal with probability at least1−2U ,whereU ∼exp−14N q ∗ f q ∗as →0(9)Note that we say that g 1 x ∼g 2 x as x →0if lim x →0 g 1 x /g 2 x =1.Thus,in an asymptotic regime as →0,the probability bound in Theorem 3only depends on the distribution through the quantity q ∗ f q ∗ .In par-ticular,the data-driven quantity ˜Q N is more likely to benear-optimal when the sample is drawn from a distribution with a high value for q ∗ f q ∗ .Next,we further formal-ize this insight through the following deﬁnition.Deﬁnition 2(Weighted Mean Spread).Let D be a ran-dom variable.We deﬁne the WMS at q ∗as q ∗ f q ∗ .We brieﬂy discuss the intuition behind the dependence of the bound (9)on the weighted mean spread.The AMS q ∗ can be thought of as a measure of dispersion around q ∗.Note that the slope C q =−b + b +h F q is 0at q ∗.How fast the slope changes depends on how fast the dis-tribution changes around the neighborhood of q ∗.In other words,a distribution whose mass is concentrated around q ∗(i.e.,has a small AMS)has a cost function C with a steeper slope around q ∗.This is illustrated in Figure 2.The left plot shows the pdf of a Normal and a uniform distribu-tion.The right plot shows the relative regret (as a function of q )when b =h =5and the optimal order quantity is 100units for both distributions.The uniform distribution,which has a smaller AMS,has a steeper relative regret function.The decision of ordering 110units has a larger relative regret under the uniform distribution.Hence,dis-tributions with a small AMS value incur a larger relative regret for deviating from the optimal order quantity.In con-trast,the size of the conﬁdence interval for the quantile esti-mator of q ∗is inversely proportional to f q ∗ (Asmussen and Glynn 2007).Thus,if f q ∗ is small,a larger samplesize is needed for the quantity ˆQN to be close (in absolute terms)to q ∗.Therefore,if a distribution has a small WMSq ∗ f q ∗ ,then the SAA solution ˆQN is likelier to incur a large relative regret.The new WMS-based bound predicts that the probabil-ity the SAA single-period relative regret exceeds decays exponentially in O .This rate of decay is signiﬁcantly faster than the best known bound by Levi et al.(2007),D o w n l o a d e d f r o m i n f o r m s .o r g b y [61.150.43.34] o n 14 D e c e m b e r 2015, a t 17:57 . F o r p e r s o n a l u s e o n l y , a l l r i g h t s r e s e r v e d .。

Optimal and efficient path planning for partially known environments

Optimal and Efﬁcient Path Planning for Partially-Known EnvironmentsAnthony StentzThe Robotics Institute; Carnegie Mellon University; Pittsburgh, PA 15213AbstractThe task of planning trajectories for a mobile robot has received considerable attention in the research literature. Most of the work assumes the robot has a complete and accurate model of its environment before it begins to move; less attention has been paid to the problem of partially known environments. This situation occurs for an exploratory robot or one that must move to a goal location without the benefit of a floorplan or terrain map. Existing approaches plan an initial path based on known information and then modify the plan locally or replan the entire path as the robot discovers obstacles with its sensors, sacriﬁcing optimality or computational efﬁciency respectively. This paper introduces a new algorithm, D*, capable of planning paths in unknown, partially known, and changing environments in an efficient, optimal, and complete manner.1.0IntroductionThe research literature has addressed extensively the motion planning problem for one or more robots moving through a ﬁeld of obstacles to a goal. Most of this work assumes that the environment is completely known before the robot begins its traverse (see Latombe [4] for a good survey). The optimal algorithms in this literature search a state space (e.g., visibility graph, grid cells) using the dis-tance transform [2] or heuristics [8] to ﬁnd the lowest cost path from the robot’s start state to the goal state. Cost can be deﬁned to be distance travelled, energy expended, time exposed to danger, etc.Unfortunately, the robot may have partial or no information about the environment before it begins its traverse but is equipped with a sensor that is capable of measuring the environment as it moves. One approach to path planning in this scenario is to generate a “global”path using the known information and then attempt to “locally” circumvent obstacles on the route detected by the sensors [1]. If the route is completely obstructed, a new global path is planned. Lumelsky [7] initially assumes the environment to be devoid of obstacles and moves the robot directly toward the goal. If an obstacle obstructs the path, the robot moves around the perimeter until the point on the obstacle nearest the goal is found. The robot then proceeds to move directly toward the goal again. Pirzadeh [9] adopts a strategy whereby the robot wanders about the environment until it discovers the goal. The robot repeatedly moves to the adjacent location with lowest cost and increments the cost of a location each time it visits it to penalize later traverses of the same space. Korf [3] uses initial map information to estimate the cost to the goal for each state and efﬁciently updates it with backtracking costs as the robot moves through the environment.While these approaches are complete, they are also suboptimal in the sense that they do not generate the lowest cost path given the sensor information as it is acquired and assuming all known, a priori information is correct. It is possible to generate optimal behavior by computing an optimal path from the known map information, moving the robot along the path until either it reaches the goal or its sensors detect a discrepancy between the map and the environment, updating the map, and then replanning a new optimal path from the robot’s current location to the goal. Although this brute-force, replanning approach is optimal, it can be grossly inefficient, particularly in expansive environments where the goal is far away and little map information exists. Zelinsky [15] increases efficiency by using a quad-tree [13] to represent free and obstacle space, thus reducing the number of states to search in the planning space. For natural terrain, however, the map can encode robot traversability at each location ranging over a continuum, thus rendering quad-trees inappropriate or suboptimal.This paper presents a new algorithm for generating optimal paths for a robot operating with a sensor and a map of the environment. The map can be complete, empty, or contain partial information about the environment. ForIn Proceedings IEEE International Conference on Robotics and Automation, May 1994.regions of the environment that are unknown, the map may contain approximate information, stochastic models for occupancy, or even a heuristic estimates. The algorithm is functionally equivalent to the brute-force,optimal replanner, but it is far more efﬁcient.The algorithm is formulated in terms of an optimal find-path problem within a directed graph, where the arcs are labelled with cost values that can range over a continuum. The robot’s sensor is able to measure arc costs in the vicinity of the robot, and the known and estimated arc values comprise the map. Thus, the algorithm can be used for any planning representation, including visibility graphs [5] and grid cell structures. The paper describes the algorithm, illustrates its operation, presents informal proofs of its soundness, optimality, and completeness, and then concludes with an empirical comparison of the algorithm to the optimal replanner.2.0The D* AlgorithmThe name of the algorithm, D*, was chosen because it resembles A* [8], except that it is dynamic in the sense that arc cost parameters can change during the problem-solving process. Provided that robot motion is properly coupled to the algorithm, D* generates optimal trajecto-ries. This section begins with the deﬁnitions and notation used in the algorithm, presents the D* algorithm, and closes with an illustration of its operation.2.1DeﬁnitionsThe objective of a path planner is to move the robot from some location in the world to a goal location, such that it avoids all obstacles and minimizes a positive cost metric (e.g., length of the traverse). The problem space can be formulated as a set of states denoting robot loca-tions connected by directional arcs , each of which has an associated cost. The robot starts at a particular state and moves across arcs (incurring the cost of traversal) to other states until it reaches the goal state, denoted by . Every state except has a backpointer to a next state denoted by . D* uses backpointers to represent paths to the goal. The cost of traversing an arc from state to state is a positive number given by the arc cost function . If does not have an arc to , then is undeﬁned. Two states and are neighbors in the space if or is deﬁned.Like A*, D* maintains an list of states. The list is used to propagate information about changes to the arc cost function and to calculate path costs to states in the space. Every state has an associated tag ,such that if has never been on the list, if is currently on the list, andG X G Y b X ()Y =Y X c X Y ,()Y X c X Y ,()X Y c X Y ,()c Y X ,()OPEN OPEN X t X ()t X ()NEW =X OPEN t X ()OPEN =XOPEN if is no longer on the list. For each state , D* maintains an estimate of the sum of the arc costs from to given by the path cost function . Given the proper conditions, this estimate is equivalent to the optimal (minimal) cost from state to , given by the implicit function . For each state on the list (i.e.,), the key function,, is defined to be equal to the minimum of before modification and all values assumed by since was placed on the list. The key function classiﬁes a state on the list into one of two types:a state if , and a state if . D* uses states on the listto propagate information about path cost increases (e.g.,due to an increased arc cost) and states to propagate information about path cost reductions (e.g.,due to a reduced arc cost or new path to the goal). The propagation takes place through the repeated removal of states from the list. Each time a state is removed from the list, it is expanded to pass cost changes to its neighbors. These neighbors are in turn placed on the list to continue the process.States on the list are sorted by their key function value. The parameter is defined to be for all such that . The parameter represents an important threshold in D*: path costs less than or equal to are optimal, and those greater than may not be optimal. The parameter is deﬁned to be equal to prior to most recent removal of a state from the list. If no states have been removed,is undeﬁned.An ordering of states denoted by is defined to be a sequence if for all such that and for all such that . Thus, a sequence deﬁnes a path of backpointers from to . A sequence is defined to be monotonic if( a n d) o r ( and ) for all such that . D* constructs and maintains a monotonic sequence , representing decreasing current or lower-bounded path costs, for each state that is or was on the list. Given a sequence of states , state is an ancestor of state if and a descendant of if .For all two-state functions involving the goal state, the following shorthand notation is used:.Likewise, for sequences the notation is used.The notation is used to refer to a function independent of its domain.t X ()CLOSED =X OPEN X X G h G X ,()X G o G X ,()X OPEN t X ()OPEN =k G X ,()h G X ,()h G X ,()X OPEN X OPEN RAISE k G X ,()h G X ,()<LOWER k G X ,()h G X ,()=RAISE OPEN LOWER OPEN OPEN OPEN k min min k X ()()X t X ()OPEN =k min k min k min k old k min OPEN k old X 1X N {,}b X i 1+()X i =i 1i ≤N <X i X j ≠i j (,)1i ≤j <N ≤X N X 1X 1X N {,}t X i ()CLOSED =h G X i ,()h G X i 1+,()<t X i ()OPEN =k G X i ,()h G X i 1+,()<i 1i ≤N <G X {,}X OPEN X 1X N {,}X i X j 1i j N ≤<≤X j 1j i N ≤<≤f X ()f G X ,()≡X {}G X {,}≡f °()2.2Algorithm DescriptionThe D* algorithm consists primarily of two functions: and .is used to compute optimal path costs to the goal, and is used to change the arc cost function and enter affected states on the list. Initially, is set to for all states,is set to zero, and is placed on the list. The ﬁrst function,, is repeatedly called until the robot’s state,, is removed from the list (i.e.,) or a value of -1 is returned, at which point either the sequence has been computed or does not exist respectively. The robot then proceeds to follow the backpointers in the sequence until it eitherreaches the goal or discovers an error in the arc cost func-tion (e.g., due to a detected obstacle). The second function,, is immediately called to cor-rect and place affected states on the list. Let be the robot’s state at which it discovers an error in .By calling until it returns ,the cost changes are propagated to state such that . At this point, a possibly new sequence has been constructed, and the robot continues to follow the backpointers in the sequence toward the goal.T h e a l g o r i t h m s f o r a n d are presented below. The embedded routines are , which returns the state on the list with minimum value ( if the list is empty);, which returns for the list (-1 if the list is empty);, which deletes state from the list and sets ; and , w h i c h c o m p u t e s i f , if , and i f , s e t s and , and places or re-positions state on the list sorted by .In function at lines L1 through L3,the state with the lowest value is removed from the list. If is a state (i.e.,), its path cost is optimal since is equal to the old . At lines L8 through L13, each neighbor of is examined to see if its path cost can be lowered. Additionally,neighbor states that are receive an initial path cost value, and cost changes are propagated to each neighbor that has a backpointer to , regardless of whether the new cost is greater than or less than the old. Since these states are descendants of , any change to the path cost of affects their path costs as well. The backpointer of is redirected (if needed) so that the monotonic sequence is constructed. All neighbors that receive a new pathPROCESS STATE –MODIFY COST –PROCESS STATE –MODIFY COST –c °()OPEN t °()NEW h G ()G OPEN PROCESS STATE –X OPEN t X ()CLOSED =X {}X {}c °()MODIFY COST –c °()OPEN Y c °()PROCESS STATE –k min h Y ()≥Y h Y ()o Y ()=Y {}PROCESS STATE –MODIFY COST –MIN STATE –OPEN k °()NULL GET KMIN –k min OPEN DELETE X ()X OPEN t X ()CLOSED =INSERT X h new ,()k X ()h new =t X ()NEW =k X ()min k X ()h new ,()=t X ()OPEN =k X ()min h X ()h new ,()=t X ()CLOSED =h X ()h new =t X ()OPEN =X OPEN k °()PROCESS STATE –X k °()OPEN X LOWER k X ()h X ()=h X ()k min Y X NEW Y X X X Y Y {}cost are placed on the list, so that they will propagate the cost changes to their neighbors.If is a state, its path cost may not be optimal.Before propagates cost changes to its neighbors, its optimal neighbors are examined at lines L4 through L7 to see if can be reduced. At lines L15 through L18, cost changes are propagated to states and immediate descendants in the same way as for states. If is able to lower the path cost of a state that is not an immediate descendant (lines L20 and L21), is placed back on the list for future expansion. It is shown in the next section that this action is required to avoid creating a closed loop in the backpointers. If the path cost of is able to be reduced by a suboptimal neighbor (lines L23 through L25), the neighbor is placed back on the list. Thus, the update is “postponed” until the neighbor has an optimal path cost.Function: PROCESS-STATE ()L1L2if then return L3;L4if then L5for each neighbor of :L6if and then L7;L8if then L9for each neighbor of :L10if or L11( and ) or L12( and ) then L13;L14else L15for each neighbor of :L16if or L17( and ) then L18;L19else L20if and then L21L22else L23if and and L24 and then L25L26return OPEN X RAISE X h X ()NEW LOWER X X OPEN X OPEN X MIN STATE ()–=X NULL =1–k old GET KMIN –()=DELETE X ()k old h X ()<Y X h Y ()k old ≤h X ()h Y ()c Y X ,()+>b X ()Y =h X ()h Y ()c Y X ,()+=k old h X ()=Y X t Y ()NEW =b Y ()X =h Y ()h X ()c X Y ,()+≠b Y ()X ≠h Y ()h X ()c X Y ,()+>b Y ()X =INSERT Y h X ()c X Y ,()+,()Y X t Y ()NEW =b Y ()X =h Y ()h X ()c X Y ,()+≠b Y ()X =INSERT Y h X ()c X Y ,()+,()b Y ()X ≠h Y ()h X ()c X Y ,()+>INSERT X h X (),()b Y ()X ≠h X ()h Y ()c Y X ,()+>t Y ()CLOSED =h Y ()k old >INSERT Y h Y (),()GET KMIN ()–In function , the arc cost function is updated with the changed value. Since the path cost for state will change, is placed on the list. When is expanded via , it computes a new and places on the list.Additional state expansions propagate the cost to the descendants of .Function: MODIFY-COST (X, Y, cval)L1L2if then L3return 2.3Illustration of OperationThe role of and states is central to the operation of the algorithm. The states (i.e.,) propagate cost increases, and the states (i.e.,) propagate cost reductions. When the cost of traversing an arc is increased, an affected neighbor state is placed on the list, and the cost increase is propagated via states through all state sequences containing the arc. As the states come in contact with neighboring states of lower cost, these states are placed on the list, and they sub-sequently decrease the cost of previously raised states wherever possible. If the cost of traversing an arc isdecreased, the reduction is propagated via states through all state sequences containing the arc, as well as neighboring states whose cost can also be lowered.Figure 1: Backpointers Based on Initial PropagationFigure 1 through Figure 3 illustrate the operation of the algorithm for a “potential well” path planning problem.MODIFY COST –Y X OPEN X PROCESS STATE –h Y ()h X ()c X Y ,()+=Y OPEN Y c X Y ,()cval=t X ()CLOSED =INSERT X h X (),()GET KMIN ()–RAISE LOWER RAISE k X ()h X ()<LOWER k X ()h X ()=OPEN RAISE RAISE LOWER OPEN LOWER GThe planning space consists of a 50 x 50 grid of cells .Each cell represents a state and is connected to its eight neighbors via bidirectional arcs. The arc cost values are small for the cells and prohibitively large for the cells.1 The robot is point-sized and is equipped with a contact sensor. Figure 1 shows the results of an optimal path calculation from the goal to all states in the planning space. The two grey obstacles are stored in the map, but the black obstacle is not. The arrows depict the backpointer function; thus, an optimal path to the goal for any state can be obtained by tracing the arrows from the state to the goal. Note that the arrows deflect around the grey, known obstacles but pass through the black,unknown obstacle.Figure 2: LOWER States Sweep into WellThe robot starts at the center of the left wall and follows the backpointers toward the goal. When it reaches the unknown obstacle, it detects a discrepancy between the map and world, updates the map, colors the cell light grey, and enters the obstacle cell on the list.Backpointers are redirected to pull the robot up along the unknown obstacle and then back down. Figure 2illustrates the information propagation after the robot has discovered the well is sealed. The robot’s path is shown in black and the states on the list in grey.states move out of the well transmitting path cost increases. These states activate states around the1. The arc cost value of OBSTACLE must be chosen to be greater than the longest possible sequence of EMPTY cells so that a simple threshold can be used on the path cost to determine if the optimal path to the goal must pass through an obstacle.EMPTY OBSTACLE GLOWER statesRAISE statesLOWER statesOPEN OPEN RAISE LOWER“lip” of the well which sweep around the upper and lower obstacles and redirect the backpointers out of the well.Figure 3: Final Backpointer ConﬁgurationThis process is complete when the states reach the robot’s cell, at which point the robot moves around the lower obstacle to the goal (Figure 3). Note that after the traverse, the backpointers are only partially updated.Backpointers within the well point outward, but those in the left half of the planning space still point into the well.All states have a path to the goal, but optimal paths are computed to a limited number of states. This effect illustrates the efficiency of D*. The backpointer updates needed to guarantee an optimal path for the robot are limited to the vicinity of the obstacle.Figure 4 illustrates path planning in fractally generated terrain. The environment is 450 x 450 cells. Grey regions are fives times more difficult to traverse than white regions, and the black regions are untraversible. The black curve shows the robot’s path from the lower left corner to the upper right given a complete, a priori map of the environment. This path is referred to as omniscient optimal . Figure 5 shows path planning in the same terrain with an optimistic map (all white). The robot is equipped with a circular field of view with a 20-cell radius. The map is updated with sensor information as the robot moves and the discrepancies are entered on the list for processing by D*. Due to the lack of a priori map information, the robot drives below the large obstruction in the center and wanders into a deadend before backtracking around the last obstacle to the goal. The resultant path is roughly twice the cost of omniscientGLOWER OPEN optimal. This path is optimal, however, given the information the robot had when it acquired it.Figure 4: Path Planning with a Complete MapFigure 5: Path Planning with an Optimistic MapFigure 6 illustrates the same problem using coarse map information, created by averaging the arc costs in each square region. This map information is accurate enough to steer the robot to the correct side of the central obstruction, and the resultant path is only 6% greater incost than omniscient optimal.Figure 6: Path Planning with a Coarse-Resolution Map3.0Soundness, Optimality, andCompletenessAfter all states have been initialized to and has been entered onto the list, the function is repeatedly invoked to construct state sequences. The function is invoked to make changes to and to seed these changes on the list. D* exhibits the following properties:Property 1: If , then the sequence is constructed and is monotonic.Property 2: When the value returned by e q u a l s o r e x c e e d s , t h e n .Property 3: If a path from to exists, and the search space contains a finite number of states, will be c o n s t r u c t e d a f t e r a fi n i t e n u m b e r o f c a l l s t o . I f a p a t h d o e s n o t e x i s t , will return -1 with .Property 1 is a soundness property: once a state has been visited, a finite sequence of backpointers to the goal has been constructed. Property 2 is an optimality property.It defines the conditions under which the chain of backpointers to the goal is optimal. Property 3 is a completeness property: if a path from to exists, it will be constructed. If no path exists, it will be reported ina finite amount of time. All three properties holdX t X ()NEW =G OPEN PROCESS STATE –MODIFY COST –c °()OPEN t X ()NEW ≠X {}k min PROCESS STATE –h X ()h X ()o X ()=X G X {}PROCESS STATE –PROCESS STATE –t X ()NEW =X G regardless of the pattern of access for functions and .For brevity, the proofs for the above three properties are informal. See Stentz [14] for the detailed, formal p r o o f s. C o n s i d e r P r o p e r t y 1 fi r s t. W h e n e v e r visits a state, it assigns to point to an existing state sequence and sets to preserve monotonicity. Monotonic sequences are subsequently manipulated by modifying the functions ,,, and . When a state is placed on the list (i.e.,), is set to preserve monotonicity for states with backpointers to .Likewise, when a state is removed from the list, the values of its neighbors are increased if needed to preserve monotonicity. The backpointer of a state ,, can only be reassigned to if and if the sequence contains no states. Since contains no states, the value of every state in the sequence must be less than . Thus, cannot be an ancestor of , and a closed loop in the backpointers cannot be created.Therefore, once a state has been visited, the sequence has been constructed. Subsequent modifications ensure that a sequence still exists.Consider Property 2. Each time a state is inserted on or removed from the list, D* modifies values so that for each pair of states such that is and is . Thus, when is chosen for expansion (i.e.,), the neighbors of cannot reduce below , nor can the neighbors, since their values must be greater than . States placed on the list during the expansion of must have values greater than ; thus, increases or remains the same with each invocation of . If states with values less than or equal to are optimal, then states with values between (inclusively) and are optimal, since no states on the list can reduce their path costs. Thus, states with values less than or equal to are optimal. By induction,constructs optimal sequences to all reachable states. If the a r c c o s t i s m o d i fi e d , t h e f u n c t i o n places on the list, after which is less than or equal to . Since no state with can be affected by the modified arc cost, the property still holds.Consider Property 3. Each time a state is expanded via , it places its neighbors on the list. Thus, if the sequence exists, it will be constructed unless a state in the sequence,, is never selected for expansion. But once a state has been placedMODIFY COST –PROCESS STATE –PROCESS STATE –NEW b °()h °()t °()h °()k °()b °()X OPEN t X ()OPEN =k X ()h X ()X X h °()X b X ()Y h Y ()h X ()<Y {}RAISE Y {}RAISE h °()h Y ()X Y X X {}X {}OPEN h °()k X ()h Y ()c Y X ,()+≤X Y ,()X OPEN Y CLOSED X k min k X ()=CLOSED X h X ()k min OPEN h °()k min OPEN X k °()k X ()k min PROCESS STATE –h °()k old h °()k old k min OPEN h °()k min PROCESS STATE –c X Y ,()MODIFY COST –X OPEN k min h X ()Y h Y ()h X ()≤PROCESS STATE –NEW OPEN X {}Yon the list, its value cannot be increased.Thus, due to the monotonicity of , the state will eventually be selected for expansion.4.0Experimental ResultsD* was compared to the optimal replanner to verify its optimality and to determine its performance improve-ment. The optimal replanner initially plans a single path from the goal to the start state. The robot proceeds to fol-low the path until its sensor detects an error in the map.The robot updates the map, plans a new path from the goal to its current location, and repeats until the goal isreached. An optimistic heuristic function is used to focus the search, such that equals the “straight-line”cost of the path from to the robot’s location assuming all cells in the path are . The replanner repeatedly expands states on the list with the minimum value. Since is a lower bound on the actual cost from to the robot for all , the replanner is optimal [8].The two algorithms were compared on planning problems of varying size. Each environment was square,consisting of a start state in the center of the left wall and a goal state in center of the right wall. Each environment consisted of a mix of map obstacles (i.e., available to robot before traverse) and unknown obstacles measurable b y t h e r o b o t ’s s e n s o r. T h e s e n s o r u s e d w a s omnidirectional with a 10-cell radial field of view. Figure 7 shows an environment model with 100,000 states. The map obstacles are shown in grey and the unknown obstacles in black.Table 1 shows the results of the comparison for environments of size 1000 through 1,000,000 cells. The runtimes in CPU time for a Sun Microsystems SPARC-10processor are listed along with the speed-up factor of D*over the optimal replanner. For both algorithms, the reported runtime is the total CPU time for all replanning needed to move the robot from the start state to the goal state, after the initial path has been planned. For each environment size, the two algorithms were compared on five randomly-generated environments, and the runtimes were averaged. The speed-up factors for each environment size were computed by averaging the speed-up factors for the ﬁve trials.The runtime for each algorithm is highly dependent on the complexity of the environment, including the number,size, and placement of the obstacles, and the ratio of map to unknown obstacles. The results indicate that as the environment increases in size, the performance of D*over the optimal replanner increases rapidly. The intuitionOPEN k °()k min Y gˆX ()gˆX ()X EMPTY OPEN gˆX ()h X ()+g ˆX ()X X for this result is that D* replans locally when it detects an unknown obstacle, but the optimal replanner generates a new global trajectory. As the environment increases in size, the local trajectories remain constant in complexity,but the global trajectories increase in complexity.Figure 7: Typical Environment for Algorithm ComparisonTable 1: Comparison of D* to Optimal Replanner5.0Conclusions5.1SummaryThis paper presents D*, a provably optimal and efﬁ-cient path planning algorithm for sensor-equipped robots.The algorithm can handle the full spectrum of a priori map information, ranging from complete and accurate map information to the absence of map information. D* is a very general algorithm and can be applied to problems in artiﬁcial intelligence other than robot motion planning. In its most general form, D* can handle any path cost opti-mization problem where the cost parameters change dur-ing the search for the solution. D* is most efﬁcient when these changes are detected near the current starting point in the search space, which is the case with a robot equipped with an on-board sensor.Algorithm 1,00010,000100,0001,000,000Replanner 427 msec 14.45 sec 10.86 min 50.82 min D*261 msec 1.69 sec 10.93 sec 16.83 sec Speed-Up1.6710.1456.30229.30See Stentz [14] for an extensive description of related applications for D*, including planning with robot shape,field of view considerations, dead-reckoning error,changing environments, occupancy maps, potential ﬁelds,natural terrain, multiple goals, and multiple robots.5.2Future WorkFor unknown or partially-known terrains, recentresearch has addressed the exploration and map building problems [6][9][10][11][15] in addition to the path ﬁnding problem. Using a strategy of raising costs for previously visited states, D* can be extended to support exploration tasks.Quad trees have limited use in environments with cost values ranging over a continuum, unless the environment includes large regions with constant traversability costs.Future work will incorporate the quad tree representation for these environments as well as those with binary cost values (e.g., and ) in order to reduce memory requirements [15].Work is underway to integrate D* with an off-road obstacle avoidance system [12] on an outdoor mobile robot. To date, the combined system has demonstrated the ability to find the goal after driving several hundred meters in a cluttered environment with no initial map.AcknowledgmentsThis research was sponsored by ARPA, under contracts “Perception for Outdoor Navigation” (contract number DACA76-89-C-0014, monitored by the US Army TEC)and “Unmanned Ground Vehicle System” (contract num-ber DAAE07-90-C-R059, monitored by TACOM).The author thanks Alonzo Kelly and Paul Keller for graphics software and ideas for generating fractal terrain.References[1] Goto, Y ., Stentz, A., “Mobile Robot Navigation: The CMU System,” IEEE Expert, V ol. 2, No. 4, Winter, 1987.[2] Jarvis, R. A., “Collision-Free Trajectory Planning Using the Distance Transforms,” Mechanical Engineering Trans. of the Institution of Engineers, Australia, V ol. ME10, No. 3, Septem-ber, 1985.[3] Korf, R. E., “Real-Time Heuristic Search: First Results,”Proc. Sixth National Conference on Artiﬁcial Intelligence, July,1987.[4] Latombe, J.-C., “Robot Motion Planning”, Kluwer Aca-demic Publishers, 1991.[5] Lozano-Perez, T., “Spatial Planning: A Conﬁguration Space Approach”, IEEE Transactions on Computers, V ol. C-32, No. 2,February, 1983.OBSTACLE EMPTY [6] Lumelsky, V . J., Mukhopadhyay, S., Sun, K., “Dynamic Path Planning in Sensor-Based Terrain Acquisition”, IEEE Transac-tions on Robotics and Automation, V ol. 6, No. 4, August, 1990.[7] Lumelsky, V . J., Stepanov, A. A., “Dynamic Path Planning for a Mobile Automaton with Limited Information on the Envi-ronment”, IEEE Transactions on Automatic Control, V ol. AC-31, No. 11, November, 1986.[8] Nilsson, N. J., “Principles of Artiﬁcial Intelligence”, Tioga Publishing Company, 1980.[9] Pirzadeh, A., Snyder, W., “A Uniﬁed Solution to Coverage and Search in Explored and Unexplored Terrains Using Indirect Control”, Proc. of the IEEE International Conference on Robot-ics and Automation, May, 1990.[10] Rao, N. S. V ., “An Algorithmic Framework for Navigation in Unknown Terrains”, IEEE Computer, June, 1989.[11] Rao, N.S.V ., Stoltzfus, N., Iyengar, S. S., “A ‘Retraction’Method for Learned Navigation in Unknown Terrains for a Cir-cular Robot,” IEEE Transactions on Robotics and Automation,V ol. 7, No. 5, October, 1991.[12] Rosenblatt, J. K., Langer, D., Hebert, M., “An Integrated System for Autonomous Off-Road Navigation,” Proc. of the IEEE International Conference on Robotics and Automation,May, 1994.[13] Samet, H., “An Overview of Quadtrees, Octrees andRelated Hierarchical Data Structures,” in NATO ASI Series, V ol.F40, Theoretical Foundations of Computer Graphics, Berlin:Springer-Verlag, 1988.[14] Stentz, A., “Optimal and Efﬁcient Path Planning for Unknown and Dynamic Environments,” Carnegie MellonRobotics Institute Technical Report CMU-RI-TR-93-20, August,1993.[15] Zelinsky, A., “A Mobile Robot Exploration Algorithm”,IEEE Transactions on Robotics and Automation, V ol. 8, No. 6,December, 1992.。

optimal brain damage 翻译

optimal brain damage 翻译The translation of "optimal brain damage" is "最优脑损伤" (zuì yōu nǎo sǔn shāng) in Chinese."Optimal brain damage" refers to a technique used in neural network pruning, where unnecessary connections or parameters in a neural network are identified and eliminated to reduce the complexity and improve the efficiency of the network. The goal is to find the optimal balance between network size and performance.Here are eight bilingual sentences:1.最优脑损伤是一种神经网络剪枝技术，用于减少网络复杂性并提高效率。

Optimal brain damage is a neural network pruning technique used to reduce network complexity and improve efficiency.2.运用最优脑损伤技术，可以精简神经网络的连接和参数。

By applying optimal brain damage technique, the connections and parameters of a neural network can be pruned.3.最优脑损伤的目标是在网络大小和性能之间找到最佳平衡点。

考虑车辆运动约束的最优避障轨迹规划算法

2021年（第43卷）第4期汽车工程Automotive Engineering2021（Vol.43）No.4 doi：10.19562/j.chinasae.qcgc.2021.04.014考虑车辆运动约束的最优避障轨迹规划算法*杨彬，宋学伟，高振海（吉林大学，汽车仿真与控制国家重点实验室，长春130022）［摘要］无人驾驶车辆进行障碍物规避时，须考虑车辆的运动特性以保证避障过程的安全性、舒适性、操纵稳定性等。

本文中提出了一种在车辆运动约束下的避障轨迹规划算法。

该算法结合障碍物的运动状态和位姿信息，在局部环境地图中对超越障碍物的期望位置进行局部采样组成离散终端状态点集，将复杂道路环境中的避障轨迹搜索问题转换为自车与状态点集之间的轨迹拟合和寻优问题。

轨迹的拟合通过基于车辆侧向动力学模型的Bézier 曲线规划器实现，而寻优过程则考虑到了车辆进行轨迹跟随过程中的行驶平顺性和操纵稳定性。

通过与常规State Lattice算法和MPC算法在多种测试环境中进行避障效果的对比，结果表明本文中提出的规划方法在测试场景中能够使车辆安全、合理地规避障碍，同时在轨迹平顺性、操纵稳定性等方面有较好的表现。

关键词：车辆侧向动力学模型；离散终端状态点集；Bézier曲线；避障轨迹规划Optimal Obstacle Avoidance Trajectory Planning Algorithm ConsideringVehicle Motion ConstraintsYang Bin，Song Xuewei&Gao ZhenhaiJilin University，State Key Laboratory of Automobile Simulation and Control，Changchun130022［Abstract］The motion characteristics of the autonomous vehicle should be taken into account to ensure safety，comfort and stability when avoiding obstacles.An obstacle avoidance trajectory planning algorithm with the constraints of vehicle motion is proposed in this paper.Firstly，combining with the motion state and position informa⁃tion of obstacles，the algorithm derives the derivative status region where the vehicle can safely avoid obstacles from the local environment map.Then，the terminal state points are sampled in the derivative status region to form a dis⁃crete terminal state point set.The obstacle avoidance trajectory search problem in complex road environment is trans⁃formed into the problem of trajectory fitting and optimization between the vehicle and the state point set.Trajectory fitting is realized by Bézier curve planner based on vehicle lateral dynamics model while the optimization process takes into consideration of driving smoothness and operation stability in the process of vehicle trajectory following. By comparing the obstacle avoidance effect in multiple scenarios with the conventional State Lattice algorithm and MPC algorithm，the results show that the proposed algorithm can make the vehicle avoid obstacles safely and reason⁃ably in the test scenarios，and has better performance in the aspects of trajectory smoothness and vehicle handling stability.Keywords：vehicle lateral dynamic model；discrete terminal state points set；Bézier curve；obstacle avoidance planning*国家自然科学基金（U1564214、51775236）和国家重点研发计划（2017YFB0102600）资助。

Global minimum-jerk trajectory planning of robot manipulators

Publisher Item Identifier S 0278-0046(00)01040-6.
acceleration [1] and prevent large oscillations of the trajectory which can result with higher order polynomials. In this framework, the total motion time can be minimized subject to constraints on velocities, accelerations, and jerks [1], [2].
In this paper, in order to obtain MJ cubic splines joint trajectories, we first formalized the proposed planning as a global constrained minimax optimization problem. Then, to solve it, we expose a newly devised algorithm, based on interval analysis, which converges within an arbitrary precision to a global solution. This MJ algorithm is logically divided into two parts. The first one is dedicated to construct a new search domain that eliminates the linear constraint (given by the imposed total motion time) so to obtain an equivalent unconstrained minimax optimization problem. The second part solves this unconstrained problem through an exhaustive branch-and-bound procedure where the bounding is performed via inclusion functions. The concept of inclusion function (see Section III) is fundamental in interval analysis, which is a generalization of the “standard” real analysis over the arithmetic of real intervals [7]. Interval analysis has been proved to be a very useful tool for global optimization [8], and it has already been applied to motion planning problems [2], [9].

Control Theory-- History, Mathematical Achievements and Perspectives

1.
Introduction
This article is devoted to present some of the mathematical milestones of Control Theory. We will focus on systems described in terms of ordinary diﬀerential equations. The control of (deterministic and stochastic) partial diﬀerential equations remains out of our scope. However, it must be underlined that most ideas, methods and results presented here do extend to this more general setting, which leads to very important technical developments. The underlying idea that motivated this article is that Control Theory is certainly, at present, one of the most interdisciplinary areas of research. Control Theory arises in most modern applications. The same could be said about the very ﬁrst technological discoveries of the industrial revolution. On the other hand, Control Theory has been a discipline where many mathematical ideas and methods have melt to produce a new body of important Mathematics. Accordingly, it is nowadays a rich crossing point of Engineering and Mathematics. Along this paper, we have tried to avoid unnecessary technical diﬃculties, to make the text accessible to a large class of readers. However, in order to introduce some of the main achievements in Control Theory, a minimal body of basic mathematical concepts and results is needed. We develop this material to make the text self-contained. These notes contain information not only on the main mathematical results in Control Theory, but also about its origins, history and the way applications and interactions of Control Theory with other Sciences and Technologies have conducted the development of the discipline. The plan of the paper is the following. Section 2 is concerned with the origins and most basic concepts. In Section 3 we study a simple but very interesting example: the pendulum. As we shall see, an elementary analysis of this simple but important mechanical system indicates that the fundamental ideas of Control Theory are extremely meaningful from a physical viewpoint. In Section 4 we describe some relevant historical facts and also some important contemporary applications. There, it will be shown that Control Theory is in fact an interdisciplinary subject that has been strongly involved in the development of the contemporary society. In Section 5 we describe the two main approaches that allow to give rigorous formulations of control problems: controllability and optimal control. We also discuss their mutual relations, advantages and drawbacks. In Sections 6 and 7 we present some basic results on the controllability of linear and nonlinear ﬁnite dimensional systems. In particular, we revisit the Kalman approach to the controllability of linear systems, and we recall the use of Lie brackets in the control of nonlinear systems, discussing a simple example

Time-optimal Paths for a Dubins airplane

I. I NTRODUCTION One can simplify the aircraft control and planning problem, usually in the presence of obstacles, by piecing together, in an appropriate way, a set of elementary trajectories chosen from a library. Such pieces of trajectories that can be combined sequentially to produce more complicated trajectories are called motion primitives [5], [18], [19]. They may even be computed and stored ofﬂine, in particular when there are symmetries, to yield speedup in online motion planning applications such as computer games. Finding suitable motion primitives for a robot is an area of recent, active research. One approach is using the optimal trajectories as motion primitives [3]. Many motion planning approaches have relied on good motion primitives, including optimal kinematic trajectories. Latombe successfully used Reeds-Shepp curves in a fast path planner for an indoor mobile robot among obstacles [23]. Conner et al used a set of continuous local feedback control policies and a discrete automaton to plan veriﬁably correct motions for a mobile robot in a changing environment [14]. Mehta and Egerstedt used optimal control for constructing control programs from a given collection of motion primitives, and also for augmenting the motion primitive set [25]. Frazzoli et al proposed a set of motion primitives, for a six-dimensional aircraft, which contains pieces of optimal trajectories called trim trajectories [18]. Particularly, the optimal trajectories for an aircraft may yield a useful set of motion primitives. We study the time-optimal trajectories for a simpliﬁed airplane model in this paper. The time-optimal trajectories play also a crucial role in air trafﬁc management systems [22], [34], [42], e.g. in detecting the safety regions.

optimal selection英语缩写

optimal selection英语缩写English: "Optimal Selection is often abbreviated as 'OS'. It refers to the process of choosing the best option or decision among various alternatives. This concept is widely used in fields such as decision theory, operations research, and management science. Optimal Selection involves evaluating different choices based on specific criteria, such as cost-effectiveness, efficiency, risk, and performance. It aims to identify the option that maximizes benefits or achieves the desired outcome while minimizing drawbacks or costs. This process typically entails thorough analysis, mathematical modeling, and sometimes simulation to assess the potential impact of each alternative. By applying rigorous methodologies, Optimal Selection helps organizations make informed decisions and allocate resources efficiently, leading to improved performance and competitive advantage."中文翻译: “Optimal Selection通常缩写为'OS'。

Time-optimal path tracking for robots -- a convex optimization approach

Time-Optimal Path Tracking for Robots
15
Existing solution methods:
Indirect methods
Numerical searches and forward and backward integration to determine “switching points” tedious implementation. exhaustive searches for candidate switching points. not very ﬂexible.
subject to τ (s) = m(s)a(s) + c(s)b(s) + g(s), b(0) = and b(1) = sT , ˙2 b (s) = 2a(s), b(s) ≥ 0, τ (s) ≤ τ (s) ≤ τ (s), for s ∈ [0, 1]. Convex objective function, linear constraints and time is eliminated.
Time-Optimal Path Tracking for Robots 13
Convex reformulation
objective function can be rewritten as
T s(T )
T =
0
1dt =
s(0)
1 ds = s ˙
1 0
1 ds. s ˙
(12)
introduce optimization variables a(s) = ¨, s b(s) = s ˙ and additional linear constraint b (s) = 2a(s). (15)

Optimal Schedules for Parallelizing Anytime Algorithms

Journal of Artiﬁcial Intelligence Research 19 (2003) 73-138
Submitted 12/02; published 8/03
Optimal Schedules for Parallelizing Anytime Algorithms: The Case of Shared Resources
c 2003 AI Access Foundation and Morgan Kaufmann Publishers. All rights reserved.
Finkelstein, Markovitch & Rivlin
• Each process is executed with the purpose of satisfying a given goal predicate. The task is considered accomplished when one of the runs succeeds. • If the goal predicate is satisﬁed at time t ∗ , then it is also satisﬁed at any time t > t ∗ . This property is equivalent to utility monotonicity of anytime algorithms (Dean & Boddy, 1988; Horvitz, 1987), where solution quality is restricted to Boolean values. Our objective is to provide a schedule that minimizes the expected cost, possibly under some constraints (for example, processes may share resources). Such problem deﬁnition is typical for rational-bounded reasoning (Simon, 1982; Russell & Wefald, 1991). This problem resembles those faced by contract algorithms (Russell & Zilberstein, 1991; Zilberstein, 1993). There, given the allocated resources, the task is to construct an algorithm providing a solution of the highest quality. In our case, given quality requirements, the task is to construct an algorithm that solves the problem using minimal resources. There are several research works that deal with similar problems. Simple parallelization, with no information exchange between the processes, may speed up the process due to high diversity in solution times. For example, Knight (1993) showed that using many reactive agents employing RTA* search (Korf, 1990) is more beneﬁcial than using a single deliberative agent. Another example is the work of Yokoo and Kitamura (1996), who used several search agents in parallel, with agent rearrangement after preallotted periods of time. Janakiram, Agrawal, and Mehrotra (1988) showed that for many common distributions of solution time, simple parallelization leads to at most linear speedup. One exception is the family of heavy-tailed distributions (Gomes, Selman, & Kautz, 1998) for which it is possible to obtain superlinear speedup by simple parallelization. A superlinear speedup can also be obtained when we have access to the internal structure of the processes involved. For example, Clearwater, Hogg, and Huberman (1992) reported superlinear speedup for cryptarithmetic problems as a result of information exchange between the processes. Another example is the works of Kumar and Rao (Rao & Kumar, 1987; Kumar & Rao, 1987; Rao & Kumar, 1993), devoted to parallelizing standard search algorithms, where superlinear speedup is obtained by dividing the search space. An interesting domain-independent approach is based on “portfolio” construction (Huberman, Lukose, & Hogg, 1997; Gomes & Selman, 1997). In this approach, a diﬀerent amount of resources is allotted to each process. This can reduce both expected resource consumption and its variance. In the case of non-deterministic algorithms, another way to beneﬁt from solution time diversity is to restart the same algorithm in attempt to switch to a better trajectory. Such a framework was analyzed in detail by Luby, Sinclair, and Zuckerman (1993) for the case of a single processor and by Luby and Ertel (1994) for the multiprocessor case. In particular, it was proven that for a single processor, the optimal strategy is to periodically restart the algorithm after a constant amount of time until the solution is found. This strategy was successfully applied to combinatorial search problems by Gomes, Selman, and Kautz (1998). There are several settings, however, where the restart strategy is not optimal. If the goal is to schedule a number of runs of a single non-deterministic algorithm, such that this number is limited due to the nature of the problem (for example, robotic search), the restart strategy is applicable but not optimal. A special case of the above settings is scheduling a number of runs of a deterministic algorithm with a ﬁnite set of available initial

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。