Markov games as a framework for multi-agent reinforcement learning
- 格式:pdf
- 大小:70.88 KB
- 文档页数:7
Package‘ChannelAttribution’May17,2023Type PackageTitle Markov Model for Online Multi-Channel AttributionVersion2.0.7Date2023-05-17Maintainer Davide Altomare<**************************>Description Advertisers use a variety of online marketing channels to reach con-sumers and they want to know the degree each channel contributes to their marketing suc-cess.This is called online multi-channel attribution problem.This package contains a probabilis-tic algorithm for the attribution problem.The model uses a k-order Markov representa-tion to identify structural correlations in the customer journey data.The package also con-tains three heuristic algorithms(first-touch,last-touch and linear-touch ap-proach)for the same problem.The algorithms are implemented in C++.License GPL-3|file LICENSEURL https://channelattribution.ioLinkingTo Rcpp,RcppArmadilloImports RcppNeedsCompilation yesAuthor Davide Altomare[cre,aut],David Loris[aut]Repository CRANDate/Publication2023-05-1719:20:02UTCR topics documented:ChannelAttribution-package (2)auto_markov_model (3)choose_order (4)Data (5)heuristic_models (6)markov_model (7)transition_matrix (8)Index1012ChannelAttribution-packageChannelAttribution-packageMarkov Model for Online Multi-Channel AttributionDescriptionAdvertisers use a variety of online marketing channels to reach consumers and they want to know the degree each channel contributes to their marketing success.This is called online multi-channel attribution problem.In many cases,advertisers approach this problem through some simple heuris-tics methods that do not take into account any customer interactions and often tend to underestimate the importance of small channels in marketing contribution.This package provides a function that approaches the attribution problem in a probabilistic way.It uses a k-order Markov representation to identify structural correlations in the customer journey data.This would allow advertisers to givea more reliable assessment of the marketing contribution of each channel.The approach basicallyfollows the one presented in Eva Anderl,Ingo Becker,Florian v.Wangenheim,Jan H.Schumann (2014).Differently for them,we solved the estimation process using stochastic simulations.In this way it is also possible to take into account conversion values and their variability in the computa-tion of the channel importance.The package also contains a function that estimates three heuristic models(first-touch,last-touch and linear-touch approach)for the same problem.DetailsPackage:ChannelAttributionType:PackageVersion: 2.0.7Date:2023-05-17License:GPL(>=2)Package contains functions for channel attribution in web marketing.Author(s)Davide Altomare,David LorisMaintainer Davide Altomare<**************************>ReferencesChannelAttribution Official Website:https://channelattribution.ioEva Anderl,Ingo Becker,Florian v.Wangenheim,Jan H.Schumann:Mapping the Customer Jour-ney,2014,doi:10.2139/ssrn.2343077auto_markov_model3 auto_markov_model Automatic Markov Model.DescriptionEstimate a Markov model from customer journey data after automatically choosing a suitable order.It requires paths that do not lead to conversion as input.Usageauto_markov_model(Data,var_path,var_conv,var_null,var_value=NULL,max_order=10,roc_npt=100,plot=FALSE,nsim_start=1e5,max_step=NULL,out_more=FALSE,sep=">",ncore=1,nfold=10,seed=0,conv_par=0.05,rate_step_sim=1.5,verbose=TRUE,flg_adv=TRUE)ArgumentsData data.frame containing customer journeys data.var_path column name containing paths.var_conv column name containing total conversions.var_null column name containing total paths that do not lead to conversions.var_value column name containing total conversion value.max_order maximum Markov Model order considered.roc_npt number of points used for approximating roc and auc.plot if TRUE,a plot with penalized auc with respect to order will be displayed.nsim_start minimum number of simulations used in computation.max_step maximum number of steps for a single simulated path.if NULL,it is the maxi-mum number of steps found into Data.out_more if TRUE,transition probabilities between channels and removal effects will be shown.sep separator between the channels.ncore number of threads used in computation.nfold how many repetitions are used to verify if convergence is reached at each itera-tion.seed random seed.Giving this parameter the same value over different runs guaran-tees that results will not vary.conv_par convergence parameter for the algorithm.The estimation process ends when the percentage of variation of the results over different repetitions is less thanconvergence parameter.rate_step_sim number of simulations used at each iteration is equal to the number of simula-tions used at previous iteration multiplied by rate_step_sim.verbose if TRUE,additional information about process convergence will be shown.flg_adv if TRUE,ChannelAttribution Pro banner is printed.4choose_orderValueAn object of class data.frame with the estimated number of conversions and the estimated con-version value attributed to each channel.Author(s)Davide Altomare(<**************************>).Examples##Not run:library(ChannelAttribution)data(PathData)auto_markov_model(Data,"path","total_conversions","total_null")##End(Not run)choose_order Choose order for Markov model.DescriptionFind the minimum Markov Model order that gives a good representation of customers’behaviour for data considered.It requires paths that do not lead to conversion as input.Minimum order is found maximizing a penalized area under ROC curve.Usagechoose_order(Data,var_path,var_conv,var_null,max_order=10,sep=">",ncore=1,roc_npt=100,plot=TRUE,flg_adv=TRUE)ArgumentsData data.frame containing customer journeys.var_path column name of Data containing paths.var_conv column name of Data containing total conversions.var_null column name of Data containing total paths that do not lead to conversion.max_order maximum Markov Model order considered.Data5sep separator between channels.ncore number of threads used in computation.roc_npt number of points used for approximating roc and auc.plot if TRUE,a plot with penalized auc with respect to order will be displayed.flg_adv if TRUE,ChannelAttribution Pro banner is printed.ValueAn object of class List with the estimated roc,auc and penalized auc.Author(s)Davide Altomare(<**************************>).Examples##Not run:library(ChannelAttribution)data(PathData)res=choose_order(Data,var_path="path",var_conv="total_conversions",var_null="total_null")#plot auc and penalized aucplot(res$auc$order,res$auc$auc,type="l",xlab="order",ylab="pauc",main="AUC")lines(res$auc$order,res$auc$pauc,col="red")legend("right",legend=c("auc","penalized auc"),col=c("black","red"),lty=1)##End(Not run)Data Customer journeys data.DescriptionExample dataset.Usagedata(PathData)6heuristic_modelsFormatData is a data.frame with10.000rows and4columns:"path"containing customer paths,"to-tal_conversions"containing total number of conversions,"total_conversion_value"containing total conversion value and"total_null"containing total number of paths that do not lead to conversion.heuristic_models Heuristic models for the online attribution problem.DescriptionEstimate theree heuristic models(first-touch,last-touch and linear)from customer journey data. Usageheuristic_models(Data,var_path,var_conv,var_value=NULL,sep=">",flg_adv=TRUE) ArgumentsData data.frame containing paths and conversions.var_path column name containing paths.var_conv column name containing total conversions.var_value column name containing total conversion value.sep separator between the channels.flg_adv if TRUE,ChannelAttribution Pro banner is printed.ValueAn object of class data.frame with the estimated number of conversions and the estimated con-version value attributed to each channel for each model.Author(s)Davide Altomare(<**************************>).Examples##Not run:library(ChannelAttribution)data(PathData)heuristic_models(Data,"path","total_conversions")heuristic_models(Data,"path","total_conversions",var_value="total_conversion_value") ##End(Not run)markov_model7 markov_model Markov model for the online attribution problem.DescriptionEstimate a k-order Markov model from customer journey data.Differently from markov_model, this function iterates estimation until convergence is reached and enables multiprocessing. Usagemarkov_model(Data,var_path,var_conv,var_value=NULL,var_null=NULL,order=1,nsim_start=1e5,max_step=NULL,out_more=FALSE,sep=">",ncore=1,nfold=10,seed=0,conv_par=0.05,rate_step_sim=1.5,verbose=TRUE,flg_adv=TRUE)ArgumentsData data.frame containing customer journeys data.var_path column name containing paths.var_conv column name containing total conversions.var_value column name containing total conversion value.var_null column name containing total paths that do not lead to conversions.order Markov Model order.nsim_start minimum number of simulations used in computation.max_step maximum number of steps for a single simulated path.if NULL,it is the maxi-mum number of steps found into Data.out_more if TRUE,transition probabilities between channels and removal effects will be returned.sep separator between the channels.ncore number of threads used in computation.nfold how many repetitions are used to verify if convergence has been reached at each iteration.seed random seed.Giving this parameter the same value over different runs guaran-tees that results will not vary.conv_par convergence parameter for the algorithm.The estimation process ends when the percentage of variation of the results over different repetions is less thanconvergence parameter.rate_step_sim number of simulations used at each iteration is equal to the number of simula-tions used at previous iteration multiplied by rate_step_sim.verbose if TRUE,additional information about process convergence will be shown.flg_adv if TRUE,ChannelAttribution Pro banner is printed.ValueAn object of class data.frame with the estimated number of conversions and the estimated con-version value attributed to each channel.Author(s)Davide Altomare(<**************************>).Examples##Not run:library(ChannelAttribution)data(PathData)#Estimate a Makov model using total conversionsmarkov_model(Data,var_path="path","total_conversions")#Estimate a Makov model using total conversions and revenuesmarkov_model(Data,"path","total_conversions",var_value="total_conversion_value")#Estimate a Makov model using total conversions,revenues and paths that do not lead to conversions markov_model(Data,"path","total_conversions",var_value="total_conversion_value",var_null="total_null")#Estimate a Makov model returning transition matrix and removal effectsmarkov_model(Data,"path","total_conversions",var_value="total_conversion_value",var_null="total_null",out_more=TRUE)#Estimate a Markov model using4threadsmarkov_model(Data,"path","total_conversions",var_value="total_conversion_value",ncore=4)##End(Not run)transition_matrix Transition matrix.DescriptionEstimate a k-order transition matrix from customer journey data.Usagetransition_matrix(Data,var_path,var_conv,var_null,order=1,sep=">",flg_equal=TRUE,flg_adv=TRUE)ArgumentsData data.frame containing customer journeys data.var_path column name containing paths.var_conv column name containing total conversions.var_null column name containing paths that do not lead to conversions.order Markov Model order.sep separator between the channels.flg_equal if TRUE,transitions from a channel to itself will be considered.flg_adv if TRUE,ChannelAttribution Pro banner is printed.ValueAn object of class List containing a dataframe with channel names and a dataframe with the estimated transition matrix.Author(s)Davide Altomare(<**************************>).Examples##Not run:library(ChannelAttribution)data(PathData)transition_matrix(Data,var_path="path",var_conv="total_conversions",var_null="total_null",order=1,sep=">",flg_equal=TRUE) transition_matrix(Data,var_path="path",var_conv="total_conversions",var_null="total_null",order=3,sep=">",flg_equal=TRUE) ##End(Not run)Index∗channel attributionChannelAttribution-package,2∗channel marketingChannelAttribution-package,2∗choose markov graph orderchoose_order,4∗choose markov model orderchoose_order,4∗customer journey datasetData,5∗customer journeyChannelAttribution-package,2∗customer path dataData,5∗datasetData,5∗first touchheuristic_models,6∗last touchheuristic_models,6∗linear touchheuristic_models,6∗marketing attributionChannelAttribution-package,2∗markov graphauto_markov_model,3markov_model,7transition_matrix,8∗markov modelauto_markov_model,3markov_model,7transition_matrix,8∗multi channel funnelChannelAttribution-package,2∗multi channel marketingChannelAttribution-package,2∗online attributionChannelAttribution-package,2∗web marketingChannelAttribution-package,2∗web statisticsChannelAttribution-package,2 auto_markov_model,3ChannelAttribution(ChannelAttribution-package),2 ChannelAttribution-package,2choose_order,4Data,5heuristic_models,6markov_model,7transition_matrix,810。
正确答案:A、B 你选对了Quizzes for Chapter 11 单选(1 分)图灵测试旨在给予哪一种令人满意的操作定义得分/ 5 多选(1 分)选择下列计算机系统中属于人工智能的实例得分/总分总分A. W eb搜索引擎A. 人类思考B.超市条形码扫描器B. 人工智能C.声控电话菜单该题无法得分/1.00C.机器智能 1.00/1.00D.智能个人助理该题无法得分/1.00正确答案:A、D 你错选为C、DD.机器动作正确答案: C 你选对了6 多选(1 分)选择下列哪些是人工智能的研究领域得分/总分2 多选(1 分)选择以下关于人工智能概念的正确表述得分/总分A.人脸识别0.33/1.00A. 人工智能旨在创造智能机器该题无法得分/1.00B.专家系统0.33/1.00B. 人工智能是研究和构建在给定环境下表现良好的智能体程序该题无法得分/1.00C.图像理解C.人工智能将其定义为人类智能体的研究该题无法D.分布式计算得分/1.00正确答案:A、B、C 你错选为A、BD.人工智能是为了开发一类计算机使之能够完成通7 多选(1 分)考察人工智能(AI) 的一些应用,去发现目前下列哪些任务可以通过AI 来解决得分/总分常由人类所能做的事该题无法得分/1.00正确答案:A、B、D 你错选为A、B、C、DA.以竞技水平玩德州扑克游戏0.33/1.003 多选(1 分)如下学科哪些是人工智能的基础?得分/总分B.打一场像样的乒乓球比赛A. 经济学0.25/1.00C.在Web 上购买一周的食品杂货0.33/1.00B. 哲学0.25/1.00D.在市场上购买一周的食品杂货C.心理学0.25/1.00正确答案:A、B、C 你错选为A、CD.数学0.25/1.008 填空(1 分)理性指的是一个系统的属性,即在_________的环境下正确答案:A、B、C、D 你选对了做正确的事。
得分/总分正确答案:已知4 多选(1 分)下列陈述中哪些是描述强AI (通用AI )的正确答案?得1 单选(1 分)图灵测试旨在给予哪一种令人满意的操作定义得分/ 分/总分总分A. 指的是一种机器,具有将智能应用于任何问题的A.人类思考能力0.50/1.00B.人工智能B. 是经过适当编程的具有正确输入和输出的计算机,因此有与人类同样判断力的头脑0.50/1.00C.机器智能 1.00/1.00C.指的是一种机器,仅针对一个具体问题D.机器动作正确答案: C 你选对了D.其定义为无知觉的计算机智能,或专注于一个狭2 多选(1 分)选择以下关于人工智能概念的正确表述得分/总分窄任务的AIA. 人工智能旨在创造智能机器该题无法得分/1.00B.专家系统0.33/1.00B. 人工智能是研究和构建在给定环境下表现良好的C.图像理解智能体程序该题无法得分/1.00D.分布式计算C.人工智能将其定义为人类智能体的研究该题无法正确答案:A、B、C 你错选为A、B得分/1.00 7 多选(1 分)考察人工智能(AI) 的一些应用,去发现目前下列哪些任务可以通过AI 来解决得分/总分D.人工智能是为了开发一类计算机使之能够完成通A.以竞技水平玩德州扑克游戏0.33/1.00常由人类所能做的事该题无法得分/1.00正确答案:A、B、D 你错选为A、B、C、DB.打一场像样的乒乓球比赛3 多选(1 分)如下学科哪些是人工智能的基础?得分/总分C.在Web 上购买一周的食品杂货0.33/1.00A. 经济学0.25/1.00D.在市场上购买一周的食品杂货B. 哲学0.25/1.00正确答案:A、B、C 你错选为A、CC.心理学0.25/1.008 填空(1 分)理性指的是一个系统的属性,即在_________的环境下D.数学0.25/1.00 做正确的事。
Conditional Random Fields:Probabilistic Modelsfor Segmenting and Labeling Sequence DataJohn Lafferty LAFFERTY@ Andrew McCallum MCCALLUM@ Fernando Pereira FPEREIRA@ WhizBang!Labs–Research,4616Henry Street,Pittsburgh,PA15213USASchool of Computer Science,Carnegie Mellon University,Pittsburgh,PA15213USADepartment of Computer and Information Science,University of Pennsylvania,Philadelphia,PA19104USAAbstractWe present,a frame-work for building probabilistic models to seg-ment and label sequence data.Conditional ran-domfields offer several advantages over hid-den Markov models and stochastic grammarsfor such tasks,including the ability to relaxstrong independence assumptions made in thosemodels.Conditional randomfields also avoida fundamental limitation of maximum entropyMarkov models(MEMMs)and other discrimi-native Markov models based on directed graph-ical models,which can be biased towards stateswith few successor states.We present iterativeparameter estimation algorithms for conditionalrandomfields and compare the performance ofthe resulting models to HMMs and MEMMs onsynthetic and natural-language data.1.IntroductionThe need to segment and label sequences arises in many different problems in several scientificfields.Hidden Markov models(HMMs)and stochastic grammars are well understood and widely used probabilistic models for such problems.In computational biology,HMMs and stochas-tic grammars have been successfully used to align bio-logical sequences,find sequences homologous to a known evolutionary family,and analyze RNA secondary structure (Durbin et al.,1998).In computational linguistics and computer science,HMMs and stochastic grammars have been applied to a wide variety of problems in text and speech processing,including topic segmentation,part-of-speech(POS)tagging,information extraction,and syntac-tic disambiguation(Manning&Sch¨u tze,1999).HMMs and stochastic grammars are generative models,as-signing a joint probability to paired observation and label sequences;the parameters are typically trained to maxi-mize the joint likelihood of training examples.To define a joint probability over observation and label sequences, a generative model needs to enumerate all possible ob-servation sequences,typically requiring a representation in which observations are task-appropriate atomic entities, such as words or nucleotides.In particular,it is not practi-cal to represent multiple interacting features or long-range dependencies of the observations,since the inference prob-lem for such models is intractable.This difficulty is one of the main motivations for looking at conditional models as an alternative.A conditional model specifies the probabilities of possible label sequences given an observation sequence.Therefore,it does not expend modeling effort on the observations,which at test time arefixed anyway.Furthermore,the conditional probabil-ity of the label sequence can depend on arbitrary,non-independent features of the observation sequence without forcing the model to account for the distribution of those dependencies.The chosen features may represent attributes at different levels of granularity of the same observations (for example,words and characters in English text),or aggregate properties of the observation sequence(for in-stance,text layout).The probability of a transition between labels may depend not only on the current observation, but also on past and future observations,if available.In contrast,generative models must make very strict indepen-dence assumptions on the observations,for instance condi-tional independence given the labels,to achieve tractability. Maximum entropy Markov models(MEMMs)are condi-tional probabilistic sequence models that attain all of the above advantages(McCallum et al.,2000).In MEMMs, each source state1has a exponential model that takes the observation features as input,and outputs a distribution over possible next states.These exponential models are trained by an appropriate iterative scaling method in the 1Output labels are associated with states;it is possible for sev-eral states to have the same label,but for simplicity in the rest of this paper we assume a one-to-one correspondence.maximum entropy framework.Previously published exper-imental results show MEMMs increasing recall and dou-bling precision relative to HMMs in a FAQ segmentation task.MEMMs and other non-generativefinite-state models based on next-state classifiers,such as discriminative Markov models(Bottou,1991),share a weakness we callhere the:the transitions leaving a given state compete only against each other,rather than againstall other transitions in the model.In probabilistic terms, transition scores are the conditional probabilities of pos-sible next states given the current state and the observa-tion sequence.This per-state normalization of transition scores implies a“conservation of score mass”(Bottou, 1991)whereby all the mass that arrives at a state must be distributed among the possible successor states.An obser-vation can affect which destination states get the mass,but not how much total mass to pass on.This causes a bias to-ward states with fewer outgoing transitions.In the extreme case,a state with a single outgoing transition effectively ignores the observation.In those cases,unlike in HMMs, Viterbi decoding cannot downgrade a branch based on ob-servations after the branch point,and models with state-transition structures that have sparsely connected chains of states are not properly handled.The Markovian assump-tions in MEMMs and similar state-conditional models in-sulate decisions at one state from future decisions in a way that does not match the actual dependencies between con-secutive states.This paper introduces(CRFs),a sequence modeling framework that has all the advantages of MEMMs but also solves the label bias problem in a principled way.The critical difference between CRFs and MEMMs is that a MEMM uses per-state exponential mod-els for the conditional probabilities of next states given the current state,while a CRF has a single exponential model for the joint probability of the entire sequence of labels given the observation sequence.Therefore,the weights of different features at different states can be traded off against each other.We can also think of a CRF as afinite state model with un-normalized transition probabilities.However,unlike some other weightedfinite-state approaches(LeCun et al.,1998), CRFs assign a well-defined probability distribution over possible labelings,trained by maximum likelihood or MAP estimation.Furthermore,the loss function is convex,2guar-anteeing convergence to the global optimum.CRFs also generalize easily to analogues of stochastic context-free grammars that would be useful in such problems as RNA secondary structure prediction and natural language pro-cessing.2In the case of fully observable states,as we are discussing here;if several states have the same label,the usual local maxima of Baum-Welch arise.bel bias example,after(Bottou,1991).For concise-ness,we place observation-label pairs on transitions rather than states;the symbol‘’represents the null output label.We present the model,describe two training procedures and sketch a proof of convergence.We also give experimental results on synthetic data showing that CRFs solve the clas-sical version of the label bias problem,and,more signifi-cantly,that CRFs perform better than HMMs and MEMMs when the true data distribution has higher-order dependen-cies than the model,as is often the case in practice.Finally, we confirm these results as well as the claimed advantages of conditional models by evaluating HMMs,MEMMs and CRFs with identical state structure on a part-of-speech tag-ging task.2.The Label Bias ProblemClassical probabilistic automata(Paz,1971),discrimina-tive Markov models(Bottou,1991),maximum entropy taggers(Ratnaparkhi,1996),and MEMMs,as well as non-probabilistic sequence tagging and segmentation mod-els with independently trained next-state classifiers(Pun-yakanok&Roth,2001)are all potential victims of the label bias problem.For example,Figure1represents a simplefinite-state model designed to distinguish between the two words and.Suppose that the observation sequence is. In thefirst time step,matches both transitions from the start state,so the probability mass gets distributed roughly equally among those two transitions.Next we observe. Both states1and4have only one outgoing transition.State 1has seen this observation often in training,state4has al-most never seen this observation;but like state1,state4 has no choice but to pass all its mass to its single outgoing transition,since it is not generating the observation,only conditioning on it.Thus,states with a single outgoing tran-sition effectively ignore their observations.More generally, states with low-entropy next state distributions will take lit-tle notice of observations.Returning to the example,the top path and the bottom path will be about equally likely, independently of the observation sequence.If one of the two words is slightly more common in the training set,the transitions out of the start state will slightly prefer its cor-responding transition,and that word’s state sequence will always win.This behavior is demonstrated experimentally in Section5.L´e on Bottou(1991)discussed two solutions for the label bias problem.One is to change the state-transition struc-ture of the model.In the above example we could collapse states1and4,and delay the branching until we get a dis-criminating observation.This operation is a special case of determinization(Mohri,1997),but determinization of weightedfinite-state machines is not always possible,and even when possible,it may lead to combinatorial explo-sion.The other solution mentioned is to start with a fully-connected model and let the training procedurefigure out a good structure.But that would preclude the use of prior structural knowledge that has proven so valuable in infor-mation extraction tasks(Freitag&McCallum,2000). Proper solutions require models that account for whole state sequences at once by letting some transitions“vote”more strongly than others depending on the corresponding observations.This implies that score mass will not be con-served,but instead individual transitions can“amplify”or “dampen”the mass they receive.In the above example,the transitions from the start state would have a very weak ef-fect on path score,while the transitions from states1and4 would have much stronger effects,amplifying or damping depending on the actual observation,and a proportionally higher contribution to the selection of the Viterbi path.3In the related work section we discuss other heuristic model classes that account for state sequences globally rather than locally.To the best of our knowledge,CRFs are the only model class that does this in a purely probabilistic setting, with guaranteed global maximum likelihood convergence.3.Conditional Random FieldsIn what follows,is a random variable over data se-quences to be labeled,and is a random variable over corresponding label sequences.All components of are assumed to range over afinite label alphabet.For ex-ample,might range over natural language sentences and range over part-of-speech taggings of those sentences, with the set of possible part-of-speech tags.The ran-dom variables and are jointly distributed,but in a dis-criminative framework we construct a conditional model from paired observation and label sequences,and do not explicitly model the marginal..Thus,a CRF is a randomfield globally conditioned on the observation.Throughout the paper we tacitly assume that the graph isfixed.In the simplest and most impor-3Weighted determinization and minimization techniques shift transition weights while preserving overall path weight(Mohri, 2000);their connection to this discussion deserves further study.tant example for modeling sequences,is a simple chain or line:.may also have a natural graph structure;yet in gen-eral it is not necessary to assume that and have the same graphical structure,or even that has any graph-ical structure at all.However,in this paper we will be most concerned with sequencesand.If the graph of is a tree(of which a chain is the simplest example),its cliques are the edges and ver-tices.Therefore,by the fundamental theorem of random fields(Hammersley&Clifford,1971),the joint distribu-tion over the label sequence given has the form(1),where is a data sequence,a label sequence,and is the set of components of associated with the vertices in subgraph.We assume that the and are given andfixed. For example,a Boolean vertex feature might be true if the word is upper case and the tag is“proper noun.”The parameter estimation problem is to determine the pa-rameters from training datawith empirical distribution. In Section4we describe an iterative scaling algorithm that maximizes the log-likelihood objective function:.As a particular case,we can construct an HMM-like CRF by defining one feature for each state pair,and one feature for each state-observation pair:. The corresponding parameters and play a simi-lar role to the(logarithms of the)usual HMM parameters and.Boltzmann chain models(Saul&Jor-dan,1996;MacKay,1996)have a similar form but use a single normalization constant to yield a joint distribution, whereas CRFs use the observation-dependent normaliza-tion for conditional distributions.Although it encompasses HMM-like models,the class of conditional randomfields is much more expressive,be-cause it allows arbitrary dependencies on the observationFigure2.Graphical structures of simple HMMs(left),MEMMs(center),and the chain-structured case of CRFs(right)for sequences. An open circle indicates that the variable is not generated by the model.sequence.In addition,the features do not need to specify completely a state or observation,so one might expect that the model can be estimated from less training data.Another attractive property is the convexity of the loss function;in-deed,CRFs share all of the convexity properties of general maximum entropy models.For the remainder of the paper we assume that the depen-dencies of,conditioned on,form a chain.To sim-plify some expressions,we add special start and stop states and.Thus,we will be using the graphical structure shown in Figure2.For a chain struc-ture,the conditional probability of a label sequence can be expressed concisely in matrix form,which will be useful in describing the parameter estimation and inference al-gorithms in Section4.Suppose that is a CRF given by(1).For each position in the observation se-quence,we define the matrix random variableby,where is the edge with labels and is the vertex with label.In contrast to generative models,con-ditional models like CRFs do not need to enumerate over all possible observation sequences,and therefore these matrices can be computed directly as needed from a given training or test observation sequence and the parameter vector.Then the normalization(partition function)is the entry of the product of these matrices:. Using this notation,the conditional probability of a label sequence is written as, where and.4.Parameter Estimation for CRFsWe now describe two iterative scaling algorithms tofind the parameter vector that maximizes the log-likelihood of the training data.Both algorithms are based on the im-proved iterative scaling(IIS)algorithm of Della Pietra et al. (1997);the proof technique based on auxiliary functions can be extended to show convergence of the algorithms for CRFs.Iterative scaling algorithms update the weights asand for appropriately chosen and.In particular,the IIS update for an edge feature is the solution ofdef.where is thedef. The equations for vertex feature updates have similar form.However,efficiently computing the exponential sums on the right-hand sides of these equations is problematic,be-cause is a global property of,and dynamic programming will sum over sequences with potentially varying.To deal with this,thefirst algorithm,Algorithm S,uses a“slack feature.”The second,Algorithm T,keepstrack of partial totals.For Algorithm S,we define the bydef,where is a constant chosen so that for all and all observation vectors in the training set,thus making.Feature is“global,”that is,it does not correspond to any particular edge or vertex.For each index we now define thewith base caseifotherwiseand recurrence.Similarly,the are defined byifotherwiseand.With these definitions,the update equations are,where.The factors involving the forward and backward vectors in the above equations have the same meaning as for standard hidden Markov models.For example,is the marginal probability of label given that the observation sequence is.This algorithm is closely related to the algorithm of Darroch and Ratcliff(1972),and MART algorithms used in image reconstruction.The constant in Algorithm S can be quite large,since in practice it is proportional to the length of the longest train-ing observation sequence.As a result,the algorithm may converge slowly,taking very small steps toward the maxi-mum in each iteration.If the length of the observations and the number of active features varies greatly,a faster-converging algorithm can be obtained by keeping track of feature totals for each observation sequence separately. Let def.Algorithm T accumulates feature expectations into counters indexed by.More specifically,we use the forward-backward recurrences just introduced to compute the expectations of feature and of feature given that.Then our param-eter updates are and,whereand are the unique positive roots to the following polynomial equationsmax max,(2)which can be easily computed by Newton’s method.A single iteration of Algorithm S and Algorithm T has roughly the same time and space complexity as the well known Baum-Welch algorithm for HMMs.To prove con-vergence of our algorithms,we can derive an auxiliary function to bound the change in likelihood from below;this method is developed in detail by Della Pietra et al.(1997). The full proof is somewhat detailed;however,here we give an idea of how to derive the auxiliary function.To simplify notation,we assume only edge features with parameters .Given two parameter settings and,we bound from below the change in the objective function with anas followsdefwhere the inequalities follow from the convexity of and.Differentiating with respect to and setting the result to zero yields equation(2).5.ExperimentsWefirst discuss two sets of experiments with synthetic data that highlight the differences between CRFs and MEMMs. Thefirst experiments are a direct verification of the label bias problem discussed in Section2.In the second set of experiments,we generate synthetic data using randomly chosen hidden Markov models,each of which is a mix-ture of afirst-order and second-order peting models are then trained and compared on test data.As the data becomes more second-order,the test er-ror rates of the trained models increase.This experiment corresponds to the common modeling practice of approxi-mating complex local and long-range dependencies,as oc-cur in natural data,by small-order Markov models.OurFigure3.Plots of error rates for HMMs,CRFs,and MEMMs on randomly generated synthetic data sets,as described in Section5.2. As the data becomes“more second order,”the error rates of the test models increase.As shown in the left plot,the CRF typically significantly outperforms the MEMM.The center plot shows that the HMM outperforms the MEMM.In the right plot,each open square represents a data set with,and a solid circle indicates a data set with.The plot shows that when the data is mostly second order(),the discriminatively trained CRF typically outperforms the HMM.These experiments are not designed to demonstrate the advantages of the additional representational power of CRFs and MEMMs relative to HMMs.results clearly indicate that even when the models are pa-rameterized in exactly the same way,CRFs are more ro-bust to inaccurate modeling assumptions than MEMMs or HMMs,and resolve the label bias problem,which affects the performance of MEMMs.To avoid confusion of dif-ferent effects,the MEMMs and CRFs in these experiments use overlapping features of the observations.Fi-nally,in a set of POS tagging experiments,we confirm the advantage of CRFs over MEMMs.We also show that the addition of overlapping features to CRFs and MEMMs al-lows them to perform much better than HMMs,as already shown for MEMMs by McCallum et al.(2000).5.1Modeling label biasWe generate data from a simple HMM which encodes a noisy version of thefinite-state network in Figure1.Each state emits its designated symbol with probabilityand any of the other symbols with probability.We train both an MEMM and a CRF with the same topologies on the data generated by the HMM.The observation fea-tures are simply the identity of the observation symbols. In a typical run using training and test samples, trained to convergence of the iterative scaling algorithm, the CRF error is while the MEMM error is, showing that the MEMM fails to discriminate between the two branches.5.2Modeling mixed-order sourcesFor these results,we usefive labels,(),and26 observation values,();however,the results were qualitatively the same over a range of sizes for and .We generate data from a mixed-order HMM with state transition probabilities given byand,simi-larly,emission probabilities given by.Thus,for we have a standardfirst-order HMM.In order to limit the size of the Bayes error rate for the resulting models,the con-ditional probability tables are constrained to be sparse. In particular,can have at most two nonzero en-tries,for each,and can have at most three nonzero entries for each.For each randomly gener-ated model,a sample of1,000sequences of length25is generated for training and testing.On each randomly generated training set,a CRF is trained using Algorithm S.(Note that since the length of the se-quences and number of active features is constant,Algo-rithms S and T are identical.)The algorithm is fairly slow to converge,typically taking approximately500iterations for the model to stabilize.On the500MHz Pentium PC used in our experiments,each iteration takes approximately 0.2seconds.On the same data an MEMM is trained using iterative scaling,which does not require forward-backward calculations,and is thus more efficient.The MEMM train-ing converges more quickly,stabilizing after approximately 100iterations.For each model,the Viterbi algorithm is used to label a test set;the experimental results do not sig-nificantly change when using forward-backward decoding to minimize the per-symbol error rate.The results of several runs are presented in Figure3.Each plot compares two classes of models,with each point indi-cating the error rate for a single test set.As increases,the error rates generally increase,as thefirst-order models fail tofit the second-order data.Thefigure compares models parameterized as,,and;results for models parameterized as,,and are qualitatively the same.As shown in thefirst graph,the CRF generally out-performs the MEMM,often by a wide margin of10%–20% relative error.(The points for very small error rate,with ,where the MEMM does better than the CRF, are suspected to be the result of an insufficient number of training iterations for the CRF.)HMM 5.69%45.99%MEMM 6.37%54.61%CRF 5.55%48.05%MEMM 4.81%26.99%CRF 4.27%23.76%Using spelling featuresFigure4.Per-word error rates for POS tagging on the Penn tree-bank,usingfirst-order models trained on50%of the1.1million word corpus.The oov rate is5.45%.5.3POS tagging experimentsTo confirm our synthetic data results,we also compared HMMs,MEMMs and CRFs on Penn treebank POS tag-ging,where each word in a given input sentence must be labeled with one of45syntactic tags.We carried out two sets of experiments with this natural language data.First,we trainedfirst-order HMM,MEMM, and CRF models as in the synthetic data experiments,in-troducing parameters for each tag-word pair andfor each tag-tag pair in the training set.The results are con-sistent with what is observed on synthetic data:the HMM outperforms the MEMM,as a consequence of the label bias problem,while the CRF outperforms the HMM.The er-ror rates for training runs using a50%-50%train-test split are shown in Figure5.3;the results are qualitatively sim-ilar for other splits of the data.The error rates on out-of-vocabulary(oov)words,which are not observed in the training set,are reported separately.In the second set of experiments,we take advantage of the power of conditional models by adding a small set of or-thographic features:whether a spelling begins with a num-ber or upper case letter,whether it contains a hyphen,and whether it ends in one of the following suffixes:.Here wefind,as expected,that both the MEMM and the CRF benefit signif-icantly from the use of these features,with the overall error rate reduced by around25%,and the out-of-vocabulary er-ror rate reduced by around50%.One usually starts training from the all zero parameter vec-tor,corresponding to the uniform distribution.However, for these datasets,CRF training with that initialization is much slower than MEMM training.Fortunately,we can use the optimal MEMM parameter vector as a starting point for training the corresponding CRF.In Figure5.3, MEMM was trained to convergence in around100iter-ations.Its parameters were then used to initialize the train-ing of CRF,which converged in1,000iterations.In con-trast,training of the same CRF from the uniform distribu-tion had not converged even after2,000iterations.6.Further Aspects of CRFsMany further aspects of CRFs are attractive for applica-tions and deserve further study.In this section we briefly mention just two.Conditional randomfields can be trained using the expo-nential loss objective function used by the AdaBoost algo-rithm(Freund&Schapire,1997).Typically,boosting is applied to classification problems with a small,fixed num-ber of classes;applications of boosting to sequence labeling have treated each label as a separate classification problem (Abney et al.,1999).However,it is possible to apply the parallel update algorithm of Collins et al.(2000)to op-timize the per-sequence exponential loss.This requires a forward-backward algorithm to compute efficiently certain feature expectations,along the lines of Algorithm T,ex-cept that each feature requires a separate set of forward and backward accumulators.Another attractive aspect of CRFs is that one can imple-ment efficient feature selection and feature induction al-gorithms for them.That is,rather than specifying in ad-vance which features of to use,we could start from feature-generating rules and evaluate the benefit of gener-ated features automatically on data.In particular,the fea-ture induction algorithms presented in Della Pietra et al. (1997)can be adapted tofit the dynamic programming techniques of conditional randomfields.7.Related Work and ConclusionsAs far as we know,the present work is thefirst to combine the benefits of conditional models with the global normal-ization of randomfield models.Other applications of expo-nential models in sequence modeling have either attempted to build generative models(Rosenfeld,1997),which in-volve a hard normalization problem,or adopted local con-ditional models(Berger et al.,1996;Ratnaparkhi,1996; McCallum et al.,2000)that may suffer from label bias. Non-probabilistic local decision models have also been widely used in segmentation and tagging(Brill,1995; Roth,1998;Abney et al.,1999).Because of the computa-tional complexity of global training,these models are only trained to minimize the error of individual label decisions assuming that neighboring labels are correctly -bel bias would be expected to be a problem here too.An alternative approach to discriminative modeling of se-quence labeling is to use a permissive generative model, which can only model local dependencies,to produce a list of candidates,and then use a more global discrimina-tive model to rerank those candidates.This approach is standard in large-vocabulary speech recognition(Schwartz &Austin,1993),and has also been proposed for parsing (Collins,2000).However,these methods fail when the cor-rect output is pruned away in thefirst pass.。
中英文翻译A configurable method for multi-style license platerecognitionAutomatic license plate recognition (LPR) has been a practical technique in the past decades. Numerous applications, such as automatic toll collection, criminal pursuit and traffic law enforcement , have been benefited from it . Although some novel techniques, for example RFID (radio frequency identification), WSN (wireless sensor network), etc., have been proposed for car ID identification, LPR on image data is still an indispensable technique in current intelligent transportation systems for its convenience and low cost. LPR is generally divided into three steps: license plate detection, character segmentation and character recognition. The detection step roughly classifies LP and non-LP regions, the segmentation step separates the symbols/characters from each other in one LP so that only accurate outline of each image block of characters is left for the recognition, and the recognition step finally converts greylevel image block into characters/symbols by predefined recognition models. Although LPR technique has a long research history, it is still driven forward by various arising demands, the most frequent one of which is the variation of LP styles, for example:(1) Appearance variation caused by the change of image capturingconditions.(2)Style variation from one nation to another.(3)Style variation when the government releases new LP format. Wesummed them up into four factors, namely rotation angle,line number, character type and format, after comprehensive analyses of multi-style LP characteristics on real data. Generally speaking, any change of the above four factors can result in the change of LP style or appearance and then affect the detection, segmentation or recognition algorithms. If one LP has a large rotation angle, the segmentation and recognition algorithms for horizontal LP may not work. If there are more than one character lines in one LP, additional line separation algorithm is needed before a segmentation process. With the variation of character types when we apply the method from one nation to another, the ability to re-define the recognition models is needed. What is more, the change of LP styles requires the method to adjust by itself so that the segmented and recognized character candidates can match best with an LP format.Several methods have been proposed for multi-national LPs or multiformat LPs in the past years while few of them comprehensively address the style adaptation problem in terms of the abovementioned factors. Some of them only claim the ability of processing multinational LPs by redefining the detection and segmentation rules or recognition models.In this paper, we propose a configurable LPR method which is adaptable from one style to another, particularly from one nation to another, by defining the four factors as parameters.1Users can constrain the scope of a parameter and at the same time the method will adjust itself so that the recognition can be faster and more accurate. Similar to existing LPR techniques, we also provide details of detection, segmentation and recognition algorithms. The difference is that we emphasize on the configurable framework for LPR and the extensibility of the proposed method for multistyle LPs instead of the performance of each algorithm.In the past decades, many methods have been proposed for LPR that contains detection, segmentation and recognition algorithms. In the following paragraphs, these algorithms and LPR methods based on them are briefly reviewed.LP detection algorithms can be mainly classified into three classes according to the features used, namely edgebased algorithms, colorbased algorithms and texture-based algorithms. The most commonly used method for LP detection is certainly the combinations of edge detection and mathematical morphology .In these methods, gradient (edges) is first extracted from the image and then a spatial analysis by morphology is applied to connect the edges into LP regions. Another way is counting edges on the image rows to find out regions of dense edges or to describe the dense edges in LP regions by a Hough transformation .Edge analysis is the most straightforward method with low computation complexity and good extensibility. Compared with edgebased algorithms, colorbased algorithms depend more on the application conditions. Since LPs in a nation often have several2predefined colors, researchers have defined color models to segment region of interests as the LP regions .This kind of method can be affected a lot by lighting conditions. To win both high recall and low false positive rates, texture classification has been used for LP detection. In Ref.Kim et al. used an SVM to train texture classifiers to detect image block that contains LP pixels.In Ref. the authors used Gabor filters to extract texture features in multiscales and multiorientations to describe the texture properties of LP regions. In Ref. Zhang used X and Y derivative features,grey-value variance and Adaboost classifier to classify LP and non-LP regions in an image.In Refs. wavelet feature analysis is applied to identify LP regions. Despite the good performance of these methods the computation complexity will limit their usability. In addition, texture-based algorithms may be affected by multi-lingual factors.Multi-line LP segmentation algorithms can also be classified into three classes, namely algorithms based on projection,binarization and global optimization. In the projection algorithms, gradient or color projection on vertical orientation will be calculated at first. The “valleys”on the projection result are regarded as the space between characters and used to segment characters from each other.Segmented regions are further processed by vertical projection to obtain precise bounding boxes of the LP characters. Since simple segmentation methods are easily affected by the rotation of LP, segmenting the skewed LP becomes a key issue to be solved. In the binarization algorithms, global or local methods are often used3to obtain foreground from background and then region connection operation is used to obtain character regions. In the most recent work, local threshold determination and slide window technique are developed to improve the segmentation performance. In the global optimization algorithms, the goal is not to obtain good segmentation result for independent characters but to obtain a compromise of character spatial arrangement and single character recognition result. Hidden Markov chain has been used to formulate the dynamic segmentation of characters in LP. The advantage of the algorithm is that the global optimization will improve the robustness to noise. And the disadvantage is that precise format definition is necessary before a segmentation process.Character and symbol recognition algorithms in LPR can be categorized into learning-based ones and template matching ones. For the former one, artificial neural network (ANN) is the mostly used method since it is proved to be able to obtain very good recognition result given a large training set. An important factor in training an ANN recognition model for LP is to build reasonable network structure with good features. SVM-based method is also adopted in LPR to obtain good recognition performance with even few training samples. Recently, cascade classifier method is also used for LP recognition. Template matching is another widely used algorithm. Generally, researchers need to build template images by hand for the LP characters and symbols. They can assign larger weights for the important points, for example, the corner points, in the4template to emphasize the different characteristics of the characters. Invariance of feature points is also considered in the template matching method to improve the robustness. The disadvantage is that it is difficult to define new template by the users who have no professional knowledge on pattern recognition, which will restrict the application of the algorithm.Based on the abovementioned algorithms, lots of LPR methods have been developed. However, these methods aremainly developed for specific nation or special LP formats. In Ref. the authors focus on recognizing Greek LPs by proposing new segmentation and recognition algorithms. The characters on LPs are alphanumerics with several fixed formats. In Ref. Zhang et al. developed a learning-based method for LP detection and character recognition. Their method is mainly for LPs of Korean styles. In Ref. optical character recognition (OCR) technique are integrated into LPR to develop general LPR method, while the performance of OCR may drop when facing LPs of poor image quality since it is difficult to discriminate real character from candidates without format supervision. This method can only select candidates of best recognition results as LP characters without recovery process. Wang et al. developed a method to recognize LPR with various viewing angles. Skew factor is considered in their method. In Ref. the authors proposed an automatic LPR method which can treat the cases of changes of illumination, vehicle speed, routes and backgrounds, which was realized by developing new detection and segmentation algorithms with robustness to the5illumination and image blurring. The performance of the method is encouraging while the authors do not present the recognition result in multination or multistyle conditions. In Ref. the authors propose an LPR method in multinational environment with character segmentation and format independent recognition. Since no recognition information is used in character segmentation, false segmented characters from background noise may be produced. What is more, the recognition method is not a learning-based method, which will limit its extensibility. In Ref. Mecocci et al. propose a generative recognition method. Generative models (GM) are proposed to produce many synthetic characters whose statistical variability is equivalent (for each class) to that showed by real samples. Thus a suitable statistical description of a large set of characters can be obtained by using only a limited set of images. As a result, the extension ability of character recognition is improved. This method mainly concerns the character recognition extensibility instead of whole LPR method.From the review we can see that LPR method in multistyle LPR with multinational application is not fully considered. Lots of existing LPR methods can work very well in a special application condition while the performance will drop sharply when they are extended from one condition to another, or from several styles to others.多类型车牌识别配置的方法自动车牌识别(LPR)在过去的几十年中的实用技术。
Probability and Stochastic ProcessesProbability and stochastic processes are important concepts in the field of mathematics and have applications in various areas such as engineering, finance, and computer science. In this response, I will discuss the significance of probability and stochastic processes from multiple perspectives, highlightingtheir practical applications, theoretical foundations, and potential limitations. From a practical perspective, probability and stochastic processes play a crucial role in decision-making under uncertainty. Whether it is predicting the weather, estimating the risk of a financial investment, or designing a reliable communication system, the ability to quantify and analyze uncertainty is essential. Probability theory provides a framework for modeling and analyzing random events, enabling us to make informed decisions based on the likelihood of different outcomes. Stochastic processes, on the other hand, allow us to model systems that evolve over time in a probabilistic manner, providing valuable insights into the behavior of complex systems. In the field of engineering, probability and stochastic processes are used extensively in reliability analysis and system design. By modeling the failure rates of components and the interactions between them, engineers can evaluate the reliability of a system and identify potential weaknesses. This information is crucial for designing robust systems that can withstand uncertainties and minimize the risk of failure. Stochastic processes, such as Markov chains and queuing theory, are also used to model and analyze various engineering systems, including communication networks, manufacturing processes, and transportation systems. From a financial perspective, probability and stochastic processes are essential tools for risk management and investment analysis. Financial markets are inherently uncertain, and understanding the probabilistic nature of asset prices and returns is crucial for making informed investment decisions. By modeling the behavior of financial variables using stochastic processes, such as geometric Brownian motion or jump-diffusion processes, analysts can estimate the probabilities of different market scenarios and assess the risk associated with different investment strategies. This information is invaluable for portfolio management, option pricing, and hedging strategies. From a theoretical perspective, probability theory and stochasticprocesses provide a rigorous mathematical foundation for understanding randomness and uncertainty. Probability theory, with its axioms and theorems, allows us to reason logically about uncertain events and make precise statements about their probabilities. Stochastic processes, as mathematical models for random phenomena, provide a framework for studying the long-term behavior of systems and analyzing their statistical properties. This theoretical understanding is not only important for practical applications but also for advancing our knowledge in various scientific disciplines, including physics, biology, and social sciences. However, it is important to acknowledge the limitations of probability and stochastic processes. Firstly, these concepts are based on assumptions and simplifications that may not always hold in real-world situations. For example, many stochastic models assume that the underlying processes are stationary and independent, which may not be true in practice. Secondly, probability and stochastic processes can only provide probabilistic predictions and estimates, rather than deterministic outcomes. This inherent uncertainty means that even with the best models and data, there will always be a degree of unpredictability. Lastly, the accuracy of probability and stochastic models heavily relies on the availability and quality of data. In situations where data is limited or unreliable, the predictions and estimates obtained from these models may be less accurate or even misleading. In conclusion, probability and stochastic processes are fundamental concepts with wide-ranging applications and theoretical significance. They provide a powerful framework for quantifying and analyzing uncertainty, enabling us to make informed decisions and understand the behavior of complex systems. From practical applications in engineering and finance to theoretical foundations in mathematics and science, probability and stochastic processes play a crucial role in our understanding of the world. However, it is important to recognize theirlimitations and the inherent uncertainties they entail. By embracing uncertainty and using probability and stochastic processes as tools for reasoning anddecision-making, we can navigate the complexities of the world with greater confidence and understanding.。
卡马克算法(地图重复利⽤,跑酷类游戏)----------------------------下⾯是理论知识--------------------------卡马克算法:由约翰·卡马克(John Carmack)开发的⼀种游戏地图处理⽅法,被⼴泛运⽤到2D卷轴式游戏和⼿机游戏中。
约翰·卡马克:id Software创始⼈之⼀,技术总监。
享誉世界的著名程序员,以卡马克算法和3D游戏引擎开发⽽闻名世界,被奉为游戏⾏业偶像。
同时他也是个全⾯型的技术天才,现在致⼒于民⽤航天器开发,是民⽤航天器开发⼩组Armadillo Aerospace的主要创办⼈和技术⾻⼲。
约翰·卡马克(百度百科):约翰·卡马克(维基百科):地图是游戏中必不可少的⼀种预算元素,尤其是在RPG、ACT等类型的游戏中作⽤更为重要,⼀个漂亮的地图效果和⼀个流畅的卷动速度会⼤⼤增加玩家的游戏体验。
⽽游戏中地图滚动的重绘有多种算法,由于⼿机性能的限制和开发周期等其他⾮技术条件,需要根据情况灵活选择所需的技术。
本⽂将主要介绍如何使⽤OPhone API来绘制2D游戏中的场景,也即地图的绘制⽅法。
地图绘制及滚动的常⽤算法⽆缝图⽚滚动画法最简单的⼀种画地图⽅法,⽆需使⽤数组,只需要使⽤⼀张⽆缝的背景图⽚,在屏幕上绘制两次,以此来实现最简单的地图滚动效果和图⽚的重复使⽤以节约资源。
如下图,红⾊虚线部分为屏幕,使⽤⼀个偏移量在屏幕中错开位置贴上两次图⽚,通过不断改变偏移量的⼤⼩来实现动画效果。
代码举例://imgBack图⽚对象//posX图⽚在X轴⽅向上的偏移量canvas.drawBitmap(imgBack, -posX, 0, paint);canvas.drawBitmap(imgBack, imgBack.getHeight()+posX, 0, paint);if(posX==-imgBack.getHeight())posX=0; //imgBack图⽚对象 //posX图⽚在X轴⽅向上的偏移量 canvas.drawBitmap(imgBack, -posX, 0, paint); canvas.drawBitmap(imgBack, imgBack.getHeight()+posX, 0, paint); if(posX==-imgBack.getHeight()) posX=0;优点与局限:此算法⾮常简单,由于是单张图⽚反复滚动⽣成的背景图⽚,所以对于美术⼈员的限制较少,利于发挥,⽽且外观效果好。
Consensus and Cooperation in Networked Multi-Agent SystemsAlgorithms that provide rapid agreement and teamwork between all participants allow effective task performance by self-organizing networked systems.By Reza Olfati-Saber,Member IEEE,J.Alex Fax,and Richard M.Murray,Fellow IEEEABSTRACT|This paper provides a theoretical framework for analysis of consensus algorithms for multi-agent networked systems with an emphasis on the role of directed information flow,robustness to changes in network topology due to link/node failures,time-delays,and performance guarantees. An overview of basic concepts of information consensus in networks and methods of convergence and performance analysis for the algorithms are provided.Our analysis frame-work is based on tools from matrix theory,algebraic graph theory,and control theory.We discuss the connections between consensus problems in networked dynamic systems and diverse applications including synchronization of coupled oscillators,flocking,formation control,fast consensus in small-world networks,Markov processes and gossip-based algo-rithms,load balancing in networks,rendezvous in space, distributed sensor fusion in sensor networks,and belief propagation.We establish direct connections between spectral and structural properties of complex networks and the speed of information diffusion of consensus algorithms.A brief introduction is provided on networked systems with nonlocal information flow that are considerably faster than distributed systems with lattice-type nearest neighbor interactions.Simu-lation results are presented that demonstrate the role of small-world effects on the speed of consensus algorithms and cooperative control of multivehicle formations.KEYWORDS|Consensus algorithms;cooperative control; flocking;graph Laplacians;information fusion;multi-agent systems;networked control systems;synchronization of cou-pled oscillators I.INTRODUCTIONConsensus problems have a long history in computer science and form the foundation of the field of distributed computing[1].Formal study of consensus problems in groups of experts originated in management science and statistics in1960s(see DeGroot[2]and references therein). The ideas of statistical consensus theory by DeGroot re-appeared two decades later in aggregation of information with uncertainty obtained from multiple sensors1[3]and medical experts[4].Distributed computation over networks has a tradition in systems and control theory starting with the pioneering work of Borkar and Varaiya[5]and Tsitsiklis[6]and Tsitsiklis,Bertsekas,and Athans[7]on asynchronous asymptotic agreement problem for distributed decision-making systems and parallel computing[8].In networks of agents(or dynamic systems),B con-sensus[means to reach an agreement regarding a certain quantity of interest that depends on the state of all agents.A B consensus algorithm[(or protocol)is an interaction rule that specifies the information exchange between an agent and all of its neighbors on the network.2 The theoretical framework for posing and solving consensus problems for networked dynamic systems was introduced by Olfati-Saber and Murray in[9]and[10] building on the earlier work of Fax and Murray[11],[12]. The study of the alignment problem involving reaching an agreement V without computing any objective functions V appeared in the work of Jadbabaie et al.[13].Further theoretical extensions of this work were presented in[14] and[15]with a look toward treatment of directed infor-mation flow in networks as shown in Fig.1(a).Manuscript received August8,2005;revised September7,2006.This work was supported in part by the Army Research Office(ARO)under Grant W911NF-04-1-0316. R.Olfati-Saber is with Dartmouth College,Thayer School of Engineering,Hanover,NH03755USA(e-mail:olfati@).J.A.Fax is with Northrop Grumman Corp.,Woodland Hills,CA91367USA(e-mail:alex.fax@).R.M.Murray is with the California Institute of Technology,Control and Dynamical Systems,Pasadena,CA91125USA(e-mail:murray@).Digital Object Identifier:10.1109/JPROC.2006.8872931This is known as sensor fusion and is an important application of modern consensus algorithms that will be discussed later.2The term B nearest neighbors[is more commonly used in physics than B neighbors[when applied to particle/spin interactions over a lattice (e.g.,Ising model).Vol.95,No.1,January2007|Proceedings of the IEEE2150018-9219/$25.00Ó2007IEEEThe common motivation behind the work in [5],[6],and [10]is the rich history of consensus protocols in com-puter science [1],whereas Jadbabaie et al.[13]attempted to provide a formal analysis of emergence of alignment in the simplified model of flocking by Vicsek et al.[16].The setup in [10]was originally created with the vision of de-signing agent-based amorphous computers [17],[18]for collaborative information processing in ter,[10]was used in development of flocking algorithms with guaranteed convergence and the capability to deal with obstacles and adversarial agents [19].Graph Laplacians and their spectral properties [20]–[23]are important graph-related matrices that play a crucial role in convergence analysis of consensus and alignment algo-rithms.Graph Laplacians are an important point of focus of this paper.It is worth mentioning that the second smallest eigenvalue of graph Laplacians called algebraic connectivity quantifies the speed of convergence of consensus algo-rithms.The notion of algebraic connectivity of graphs has appeared in a variety of other areas including low-density parity-check codes (LDPC)in information theory and com-munications [24],Ramanujan graphs [25]in number theory and quantum chaos,and combinatorial optimization prob-lems such as the max-cut problem [21].More recently,there has been a tremendous surge of interest V among researchers from various disciplines of engineering and science V in problems related to multia-gent networked systems with close ties to consensus prob-lems.This includes subjects such as consensus [26]–[32],collective behavior of flocks and swarms [19],[33]–[37],sensor fusion [38]–[40],random networks [41],[42],syn-chronization of coupled oscillators [42]–[46],algebraic connectivity 3of complex networks [47]–[49],asynchro-nous distributed algorithms [30],[50],formation control for multirobot systems [51]–[59],optimization-based co-operative control [60]–[63],dynamic graphs [64]–[67],complexity of coordinated tasks [68]–[71],and consensus-based belief propagation in Bayesian networks [72],[73].A detailed discussion of selected applications will be pre-sented shortly.In this paper,we focus on the work described in five key papers V namely,Jadbabaie,Lin,and Morse [13],Olfati-Saber and Murray [10],Fax and Murray [12],Moreau [14],and Ren and Beard [15]V that have been instrumental in paving the way for more recent advances in study of self-organizing networked systems ,or swarms .These networked systems are comprised of locally interacting mobile/static agents equipped with dedicated sensing,computing,and communication devices.As a result,we now have a better understanding of complex phenomena such as flocking [19],or design of novel information fusion algorithms for sensor networks that are robust to node and link failures [38],[72]–[76].Gossip-based algorithms such as the push-sum protocol [77]are important alternatives in computer science to Laplacian-based consensus algorithms in this paper.Markov processes establish an interesting connection between the information propagation speed in these two categories of algorithms proposed by computer scientists and control theorists [78].The contribution of this paper is to present a cohesive overview of the key results on theory and applications of consensus problems in networked systems in a unified framework.This includes basic notions in information consensus and control theoretic methods for convergence and performance analysis of consensus protocols that heavily rely on matrix theory and spectral graph theory.A byproduct of this framework is to demonstrate that seem-ingly different consensus algorithms in the literature [10],[12]–[15]are closely related.Applications of consensus problems in areas of interest to researchers in computer science,physics,biology,mathematics,robotics,and con-trol theory are discussed in this introduction.A.Consensus in NetworksThe interaction topology of a network of agents is rep-resented using a directed graph G ¼ðV ;E Þwith the set of nodes V ¼f 1;2;...;n g and edges E V ÂV .TheFig.1.Two equivalent forms of consensus algorithms:(a)a networkof integrator agents in which agent i receives the state x j of its neighbor,agent j ,if there is a link ði ;j Þconnecting the two nodes;and (b)the block diagram for a network of interconnecteddynamic systems all with identical transfer functions P ðs Þ¼1=s .The collective networked system has a diagonal transfer function and is a multiple-input multiple-output (MIMO)linear system.3To be defined in Section II-A.Olfati-Saber et al.:Consensus and Cooperation in Networked Multi-Agent Systems216Proceedings of the IEEE |Vol.95,No.1,January 2007neighbors of agent i are denoted by N i ¼f j 2V :ði ;j Þ2E g .According to [10],a simple consensus algorithm to reach an agreement regarding the state of n integrator agents with dynamics _x i ¼u i can be expressed as an n th-order linear system on a graph_x i ðt Þ¼X j 2N ix j ðt ÞÀx i ðt ÞÀÁþb i ðt Þ;x i ð0Þ¼z i2R ;b i ðt Þ¼0:(1)The collective dynamics of the group of agents following protocol (1)can be written as_x ¼ÀLx(2)where L ¼½l ij is the graph Laplacian of the network and itselements are defined as follows:l ij ¼À1;j 2N i j N i j ;j ¼i :&(3)Here,j N i j denotes the number of neighbors of node i (or out-degree of node i ).Fig.1shows two equivalent forms of the consensus algorithm in (1)and (2)for agents with a scalar state.The role of the input bias b in Fig.1(b)is defined later.According to the definition of graph Laplacian in (3),all row-sums of L are zero because of P j l ij ¼0.Therefore,L always has a zero eigenvalue 1¼0.This zero eigenvalues corresponds to the eigenvector 1¼ð1;...;1ÞT because 1belongs to the null-space of L ðL 1¼0Þ.In other words,an equilibrium of system (2)is a state in the form x üð ;...; ÞT ¼ 1where all nodes agree.Based on ana-lytical tools from algebraic graph theory [23],we later show that x Ãis a unique equilibrium of (2)(up to a constant multiplicative factor)for connected graphs.One can show that for a connected network,the equilibrium x üð ;...; ÞT is globally exponentially stable.Moreover,the consensus value is ¼1=n P i z i that is equal to the average of the initial values.This im-plies that irrespective of the initial value of the state of each agent,all agents reach an asymptotic consensus regarding the value of the function f ðz Þ¼1=n P i z i .While the calculation of f ðz Þis simple for small net-works,its implications for very large networks is more interesting.For example,if a network has n ¼106nodes and each node can only talk to log 10ðn Þ¼6neighbors,finding the average value of the initial conditions of the nodes is more complicated.The role of protocol (1)is to provide a systematic consensus mechanism in such a largenetwork to compute the average.There are a variety of functions that can be computed in a similar fashion using synchronous or asynchronous distributed algorithms (see [10],[28],[30],[73],and [76]).B.The f -Consensus Problem and Meaning of CooperationTo understand the role of cooperation in performing coordinated tasks,we need to distinguish between un-constrained and constrained consensus problems.An unconstrained consensus problem is simply the alignment problem in which it suffices that the state of all agents asymptotically be the same.In contrast,in distributed computation of a function f ðz Þ,the state of all agents has to asymptotically become equal to f ðz Þ,meaning that the consensus problem is constrained.We refer to this con-strained consensus problem as the f -consensus problem .Solving the f -consensus problem is a cooperative task and requires willing participation of all the agents.To demonstrate this fact,suppose a single agent decides not to cooperate with the rest of the agents and keep its state unchanged.Then,the overall task cannot be performed despite the fact that the rest of the agents reach an agree-ment.Furthermore,there could be scenarios in which multiple agents that form a coalition do not cooperate with the rest and removal of this coalition of agents and their links might render the network disconnected.In a dis-connected network,it is impossible for all nodes to reach an agreement (unless all nodes initially agree which is a trivial case).From the above discussion,cooperation can be infor-mally interpreted as B giving consent to providing one’s state and following a common protocol that serves the group objective.[One might think that solving the alignment problem is not a cooperative task.The justification is that if a single agent (called a leader)leaves its value unchanged,all others will asymptotically agree with the leader according to the consensus protocol and an alignment is reached.However,if there are multiple leaders where two of whom are in disagreement,then no consensus can be asymptot-ically reached.Therefore,alignment is in general a coop-erative task as well.Formal analysis of the behavior of systems that involve more than one type of agent is more complicated,partic-ularly,in presence of adversarial agents in noncooperative games [79],[80].The focus of this paper is on cooperative multi-agent systems.C.Iterative Consensus and Markov ChainsIn Section II,we show how an iterative consensus algorithm that corresponds to the discrete-time version of system (1)is a Markov chainðk þ1Þ¼ ðk ÞP(4)Olfati-Saber et al.:Consensus and Cooperation in Networked Multi-Agent SystemsVol.95,No.1,January 2007|Proceedings of the IEEE217with P ¼I À L and a small 90.Here,the i th element of the row vector ðk Þdenotes the probability of being in state i at iteration k .It turns out that for any arbitrary graph G with Laplacian L and a sufficiently small ,the matrix P satisfies the property Pj p ij ¼1with p ij !0;8i ;j .Hence,P is a valid transition probability matrix for the Markov chain in (4).The reason matrix theory [81]is so widely used in analysis of consensus algorithms [10],[12]–[15],[64]is primarily due to the structure of P in (4)and its connection to graphs.4There are interesting connections between this Markov chain and the speed of information diffusion in gossip-based averaging algorithms [77],[78].One of the early applications of consensus problems was dynamic load balancing [82]for parallel processors with the same structure as system (4).To this date,load balancing in networks proves to be an active area of research in computer science.D.ApplicationsMany seemingly different problems that involve inter-connection of dynamic systems in various areas of science and engineering happen to be closely related to consensus problems for multi-agent systems.In this section,we pro-vide an account of the existing connections.1)Synchronization of Coupled Oscillators:The problem of synchronization of coupled oscillators has attracted numer-ous scientists from diverse fields including physics,biology,neuroscience,and mathematics [83]–[86].This is partly due to the emergence of synchronous oscillations in coupled neural oscillators.Let us consider the generalized Kuramoto model of coupled oscillators on a graph with dynamics_i ¼ Xj 2N isin ð j À i Þþ!i (5)where i and !i are the phase and frequency of the i thoscillator.This model is the natural nonlinear extension of the consensus algorithm in (1)and its linearization around the aligned state 1¼...¼ n is identical to system (2)plus a nonzero input bias b i ¼ð!i À"!Þ= with "!¼1=n P i !i after a change of variables x i ¼ð i À"!t Þ= .In [43],Sepulchre et al.show that if is sufficiently large,then for a network with all-to-all links,synchroni-zation to the aligned state is globally achieved for all ini-tial states.Recently,synchronization of networked oscillators under variable time-delays was studied in [45].We believe that the use of convergence analysis methods that utilize the spectral properties of graph Laplacians willshed light on performance and convergence analysis of self-synchrony in oscillator networks [42].2)Flocking Theory:Flocks of mobile agents equipped with sensing and communication devices can serve as mobile sensor networks for massive distributed sensing in an environment [87].A theoretical framework for design and analysis of flocking algorithms for mobile agents with obstacle-avoidance capabilities is developed by Olfati-Saber [19].The role of consensus algorithms in particle-based flocking is for an agent to achieve velocity matching with respect to its neighbors.In [19],it is demonstrated that flocks are networks of dynamic systems with a dynamic topology.This topology is a proximity graph that depends on the state of all agents and is determined locally for each agent,i.e.,the topology of flocks is a state-dependent graph.The notion of state-dependent graphs was introduced by Mesbahi [64]in a context that is independent of flocking.3)Fast Consensus in Small-Worlds:In recent years,network design problems for achieving faster consensus algorithms has attracted considerable attention from a number of researchers.In Xiao and Boyd [88],design of the weights of a network is considered and solved using semi-definite convex programming.This leads to a slight increase in algebraic connectivity of a network that is a measure of speed of convergence of consensus algorithms.An alternative approach is to keep the weights fixed and design the topology of the network to achieve a relatively high algebraic connectivity.A randomized algorithm for network design is proposed by Olfati-Saber [47]based on random rewiring idea of Watts and Strogatz [89]that led to creation of their celebrated small-world model .The random rewiring of existing links of a network gives rise to considerably faster consensus algorithms.This is due to multiple orders of magnitude increase in algebraic connectivity of the network in comparison to a lattice-type nearest-neighbort graph.4)Rendezvous in Space:Another common form of consensus problems is rendezvous in space [90],[91].This is equivalent to reaching a consensus in position by a num-ber of agents with an interaction topology that is position induced (i.e.,a proximity graph).We refer the reader to [92]and references therein for a detailed discussion.This type of rendezvous is an unconstrained consensus problem that becomes challenging under variations in the network topology.Flocking is somewhat more challenging than rendezvous in space because it requires both interagent and agent-to-obstacle collision avoidance.5)Distributed Sensor Fusion in Sensor Networks:The most recent application of consensus problems is distrib-uted sensor fusion in sensor networks.This is done by posing various distributed averaging problems require to4In honor of the pioneering contributions of Oscar Perron (1907)to the theory of nonnegative matrices,were refer to P as the Perron Matrix of graph G (See Section II-C for details).Olfati-Saber et al.:Consensus and Cooperation in Networked Multi-Agent Systems218Proceedings of the IEEE |Vol.95,No.1,January 2007implement a Kalman filter [38],[39],approximate Kalman filter [74],or linear least-squares estimator [75]as average-consensus problems .Novel low-pass and high-pass consensus filters are also developed that dynamically calculate the average of their inputs in sensor networks [39],[93].6)Distributed Formation Control:Multivehicle systems are an important category of networked systems due to their commercial and military applications.There are two broad approaches to distributed formation control:i)rep-resentation of formations as rigid structures [53],[94]and the use of gradient-based controls obtained from their structural potentials [52]and ii)representation of form-ations using the vectors of relative positions of neighboring vehicles and the use of consensus-based controllers with input bias.We discuss the later approach here.A theoretical framework for design and analysis of distributed controllers for multivehicle formations of type ii)was developed by Fax and Murray [12].Moving in formation is a cooperative task and requires consent and collaboration of every agent in the formation.In [12],graph Laplacians and matrix theory were extensively used which makes one wonder whether relative-position-based formation control is a consensus problem.The answer is yes.To see this,consider a network of self-interested agents whose individual desire is to minimize their local cost U i ðx Þ¼Pj 2N i k x j Àx i Àr ij k 2via a distributed algorithm (x i is the position of vehicle i with dynamics _x i ¼u i and r ij is a desired intervehicle relative-position vector).Instead,if the agents use gradient-descent algorithm on the collective cost P n i ¼1U i ðx Þusing the following protocol:_x i ¼Xj 2N iðx j Àx i Àr ij Þ¼Xj 2N iðx j Àx i Þþb i (6)with input bias b i ¼Pj 2N i r ji [see Fig.1(b)],the objective of every agent will be achieved.This is the same as the consensus algorithm in (1)up to the nonzero bias terms b i .This nonzero bias plays no role in stability analysis of sys-tem (6).Thus,distributed formation control for integrator agents is a consensus problem.The main contribution of the work by Fax and Murray is to extend this scenario to the case where all agents are multiinput multioutput linear systems _x i ¼Ax i þBu i .Stability analysis of relative-position-based formation control for multivehicle systems is extensively covered in Section IV.E.OutlineThe outline of the paper is as follows.Basic concepts and theoretical results in information consensus are presented in Section II.Convergence and performance analysis of consensus on networks with switching topology are given in Section III.A theoretical framework for cooperative control of formations of networked multi-vehicle systems is provided in Section IV.Some simulationresults related to consensus in complex networks including small-worlds are presented in Section V.Finally,some concluding remarks are stated in Section VI.RMATION CONSENSUSConsider a network of decision-making agents with dynamics _x i ¼u i interested in reaching a consensus via local communication with their neighbors on a graph G ¼ðV ;E Þ.By reaching a consensus,we mean asymptot-ically converging to a one-dimensional agreement space characterized by the following equation:x 1¼x 2¼...¼x n :This agreement space can be expressed as x ¼ 1where 1¼ð1;...;1ÞT and 2R is the collective decision of the group of agents.Let A ¼½a ij be the adjacency matrix of graph G .The set of neighbors of a agent i is N i and defined byN i ¼f j 2V :a ij ¼0g ;V ¼f 1;...;n g :Agent i communicates with agent j if j is a neighbor of i (or a ij ¼0).The set of all nodes and their neighbors defines the edge set of the graph as E ¼fði ;j Þ2V ÂV :a ij ¼0g .A dynamic graph G ðt Þ¼ðV ;E ðt ÞÞis a graph in which the set of edges E ðt Þand the adjacency matrix A ðt Þare time-varying.Clearly,the set of neighbors N i ðt Þof every agent in a dynamic graph is a time-varying set as well.Dynamic graphs are useful for describing the network topology of mobile sensor networks and flocks [19].It is shown in [10]that the linear system_x i ðt Þ¼Xj 2N ia ij x j ðt ÞÀx i ðt ÞÀÁ(7)is a distributed consensus algorithm ,i.e.,guarantees con-vergence to a collective decision via local interagent interactions.Assuming that the graph is undirected (a ij ¼a ji for all i ;j ),it follows that the sum of the state of all nodes is an invariant quantity,or P i _xi ¼0.In particular,applying this condition twice at times t ¼0and t ¼1gives the following result¼1n Xix i ð0Þ:In other words,if a consensus is asymptotically reached,then necessarily the collective decision is equal to theOlfati-Saber et al.:Consensus and Cooperation in Networked Multi-Agent SystemsVol.95,No.1,January 2007|Proceedings of the IEEE219average of the initial state of all nodes.A consensus algo-rithm with this specific invariance property is called an average-consensus algorithm [9]and has broad applications in distributed computing on networks (e.g.,sensor fusion in sensor networks).The dynamics of system (7)can be expressed in a compact form as_x ¼ÀLx(8)where L is known as the graph Laplacian of G .The graph Laplacian is defined asL ¼D ÀA(9)where D ¼diag ðd 1;...;d n Þis the degree matrix of G with elements d i ¼Pj ¼i a ij and zero off-diagonal elements.By definition,L has a right eigenvector of 1associated with the zero eigenvalue 5because of the identity L 1¼0.For the case of undirected graphs,graph Laplacian satisfies the following sum-of-squares (SOS)property:x T Lx ¼12Xði ;j Þ2Ea ij ðx j Àx i Þ2:(10)By defining a quadratic disagreement function as’ðx Þ¼12x T Lx(11)it becomes apparent that algorithm (7)is the same as_x ¼Àr ’ðx Þor the gradient-descent algorithm.This algorithm globallyasymptotically converges to the agreement space provided that two conditions hold:1)L is a positive semidefinite matrix;2)the only equilibrium of (7)is 1for some .Both of these conditions hold for a connected graph and follow from the SOS property of graph Laplacian in (10).Therefore,an average-consensus is asymptotically reached for all initial states.This fact is summarized in the following lemma.Lemma 1:Let G be a connected undirected graph.Then,the algorithm in (7)asymptotically solves an average-consensus problem for all initial states.A.Algebraic Connectivity and Spectral Propertiesof GraphsSpectral properties of Laplacian matrix are instrumen-tal in analysis of convergence of the class of linear consensus algorithms in (7).According to Gershgorin theorem [81],all eigenvalues of L in the complex plane are located in a closed disk centered at Áþ0j with a radius of Á¼max i d i ,i.e.,the maximum degree of a graph.For undirected graphs,L is a symmetric matrix with real eigenvalues and,therefore,the set of eigenvalues of L can be ordered sequentially in an ascending order as0¼ 1 2 ÁÁÁ n 2Á:(12)The zero eigenvalue is known as the trivial eigenvalue of L .For a connected graph G , 290(i.e.,the zero eigenvalue is isolated).The second smallest eigenvalue of Laplacian 2is called algebraic connectivity of a graph [20].Algebraic connectivity of the network topology is a measure of performance/speed of consensus algorithms [10].Example 1:Fig.2shows two examples of networks of integrator agents with different topologies.Both graphs are undirected and have 0–1weights.Every node of the graph in Fig.2(a)is connected to its 4nearest neighbors on a ring.The other graph is a proximity graph of points that are distributed uniformly at random in a square.Every node is connected to all of its spatial neighbors within a closed ball of radius r 90.Here are the important degree information and Laplacian eigenvalues of these graphsa Þ 1¼0; 2¼0:48; n ¼6:24;Á¼4b Þ 1¼0; 2¼0:25; n ¼9:37;Á¼8:(13)In both cases, i G 2Áfor all i .B.Convergence Analysis for Directed Networks The convergence analysis of the consensus algorithm in (7)is equivalent to proving that the agreement space characterized by x ¼ 1; 2R is an asymptotically stable equilibrium of system (7).The stability properties of system (7)is completely determined by the location of the Laplacian eigenvalues of the network.The eigenvalues of the adjacency matrix are irrelevant to the stability analysis of system (7),unless the network is k -regular (all of its nodes have the same degree k ).The following lemma combines a well-known rank property of graph Laplacians with Gershgorin theorem to provide spectral characterization of Laplacian of a fixed directed network G .Before stating the lemma,we need to define the notion of strong connectivity of graphs.A graph5These properties were discussed earlier in the introduction for graphs with 0–1weights.Olfati-Saber et al.:Consensus and Cooperation in Networked Multi-Agent Systems220Proceedings of the IEEE |Vol.95,No.1,January 2007。
fundamental of reinforcement learning -回复Fundamentals of Reinforcement Learning: An Introduction to the BasicsIntroduction:Reinforcement Learning (RL) is an area of machine learning that focuses on teaching algorithms to make decisions based on trial and error. It is widely used in various fields such as artificial intelligence, robotics, and game theory. RL algorithms learn to make optimal decisions by interacting with an environment and receiving feedback in the form of rewards or penalties. In this article, we will explore the fundamental concepts of reinforcement learning and understand the step-by-step process involved.1. Understanding RL Agent and Environment:In RL, an agent is the learner or decision-maker, and the environment is the setting in which the agent operates. The environment can be as simple as a game or as complex asself-driving cars. The agent interacts with the environment by performing actions, and the environment responds with rewards orpenalties. The goal of the agent is to learn the best actions to maximize the cumulative reward over time.2. Markov Decision Processes (MDPs):MDPs are mathematical models that formalize decision-making problems in RL. They consist of a set of states, actions, transition probabilities, and rewards. The agent starts in a particular state, performs an action, and transitions to a new state according to the transition probabilities. The agent receives a reward based on the state-action pair. MDPs provide a framework for modeling RL problems and allow for the application of various learning algorithms.3. Policy and Value Functions:A policy in RL determines the behavior of the agent. It is a mapping from states to actions and represents the strategy fordecision-making. The policy can be deterministic, where it directly selects an action for each state, or stochastic, where it selects actions with certain probabilities. Value functions, on the other hand, estimate the expected cumulative reward from a particularstate or state-action pair. The value functions help the agent in evaluating the potential returns from different states and actions and are crucial for making optimal decisions.4. Q-Learning Algorithm:Q-Learning is one of the most popular algorithms in RL. It is a model-free method that does not require prior knowledge of the environment's dynamics. The Q-Learning algorithm estimates the action-value function or Q-values, which represent the expected cumulative reward of taking a particular action in a given state. The algorithm iteratively updates the Q-values based on the rewards received and the new information obtained from the environment. By exploring and exploiting different actions, the agent gradually learns the optimal policy.5. Exploration and Exploitation:In RL, exploration refers to the act of trying out new actions to gather information about the environment. Exploitation, on the other hand, involves using the learned knowledge to maximize the cumulative reward. Balancing exploration and exploitation is afundamental challenge in RL. Too much exploration may result in inefficient learning, while too much exploitation may lead tosub-optimal solutions. Various exploration strategies, such as epsilon-greedy and softmax, are used to strike a balance.6. Temporal Difference Learning:Temporal Difference (TD) learning is another key concept in RL. It combines elements of both Monte Carlo methods (which learn from complete episodes) and dynamic programming (which learns from intermediate states). TD learning allows the agent to update its value function estimates after experiencing only a partial trajectory, making it more computationally efficient. TD algorithms, such as SARSA and Q-Learning, use the temporal difference error to update the value function estimates incrementally.7. Deep Reinforcement Learning:Deep Reinforcement Learning (DRL) combines RL with deep neural networks to handle high-dimensional state spaces. Traditional RL algorithms may struggle with complex problems due to the curse of dimensionality. DRL uses deep neural networks as functionapproximators to learn directly from raw sensory inputs. Techniques like Deep Q-Networks (DQN) and Proximal Policy Optimization (PPO) have achieved remarkable results in challenging domains such as video games and robotics.Conclusion:Reinforcement Learning is an exciting field that enables machines to learn optimal decision-making strategies through interaction with the environment. By understanding the fundamental concepts, such as agents, environments, policies, value functions, and learning algorithms like Q-Learning, exploration-exploitation trade-offs, temporal difference learning, and deep reinforcement learning, we can build intelligent systems capable of learning from experiences and adapting to new situations. RL has the potential to revolutionize various industries, making it a domain worth exploring and studying further.。
马尔可夫《概率演算》(中英文实用版)Title: Markov"s "Probability Calculus"Title: 马尔可夫《概率演算》In the world of mathematics, Andrey Markov"s work on "Probability Calculus" is highly regarded.His book, published in 1912, was one of the first to systematically study the theory of stochastic processes.在数学世界中,安德烈·马尔可夫关于《概率演算》的工作备受推崇。
他于1912年出版的书籍,是首批系统研究随机过程理论的著作之一。
Markov"s work was groundbreaking because it introduced the concept of a Markov chain, a mathematical system in which the future state of a process depends only on the current state and not on the sequence of events that preceded it.马尔可夫的工作之所以具有划时代意义,是因为他引入了马尔可夫链的概念,这是一种数学系统,其中过程的未来状态只取决于当前状态,而与之前事件的序列无关。
This idea was revolutionary because it provided a way to model and analyze complex systems that are influenced by random events.Today, Markov chains are used in a wide variety of fields, including physics, finance, economics, and computer science.这个想法之所以具有革命性,是因为它为建模和分析受随机事件影响复杂的系统提供了一种方法。
ABB Network PartnerRelay Configuration Tool (4)Specification for RE_ 54_ Configurations (5)Editing the RE_ 54_ Relay Terminal Configurations (6)Getting started (6)Libraries (6)Logical POUs (7)Program Organisation Unit (POU) (7)Physical hardware (9)Configuration (10)Resource (11)Hardware version (11)Analogue channels (11)Binary inputs (16)Measurements (17)Condition monitoring (18)Tasks (19)Programs and tasks (19)Task interval (19)Global variables (20)Declaring variables (21)Variables declared as global (25)Compiling the project (28)Main Configuration Rules for RE_ 54_ (29)General (29)Binary inputs (29)Explicit feedback (30)Measurement function blocks (31)Warnings (31)Execution order (32)F-key (33)APPENDIX A (35)APPENDIX B............................................................................................ 37-50 APPENDIX C............................................................................................ 51-65About this manualThis guideline describes in general the procedures for configuring the REF 54_ feeder terminals or REM 543 machine terminals correctly with the Relay Configuration Tool. Furthermore, chapter 4 provides some practical tips as well as recommendations and restrictions for doing the configuration.Please note that the examples and dialogue pictures of the Relay Configuration Tool in this manual refer to REF 54_ feeder terminals.1.Relay Configuration ToolThe Relay Configuration Tool, which is a standard programming system for RED500devices, is used for configuring the protection, control, condition monitoring, measure-ment and logic functions of the feeder terminal. The tool is based on the IEC 1131-3standard, which defines the programming language for relay terminals, and includes thefull range of IEC features. The PLC logics are programmed with Boolean functions,timers, counters, comparators and flip-flops. The programming language described inthis manual is a function block diagram (FBD) language.2.Specification for RE_ 54_ ConfigurationsPrior to starting the configuration of a relay terminal, the Specification for Relay Con-figuration is to be filled out. The specification can be found as appendix B for REF 54_and appendix C for REM 543 in the end of this manual.The purpose of the specification is to provide the technical information required for theproper configuration of the relay terminals.3.Editing the RE_ 54_ Relay Terminal Configurations 3.1.Getting startedStart up the tool by double clicking the icon. After adding a new object as an empty con-figuration to the CAP505 environment (refer to the CAP505 Operator’s Manual, 1MRS750838-RUM), the program opens an empty project template (see figure below) with atoolbar at the top. The next step is to build the project tree structure by inserting librar-ies, program organisation units (POUs) and target specific items to the project tree.The project tree editor is a window in which the whole project is represented as a tree.The project tree is illustrated with several icons. Most of the icons represent a file of theproject and different looking icons represent different types of files. The tree alwayscontains 4 subtrees: Libraries, Data Types, Logical POUs and Physical Hardware.Fig. 3.1-1. The project tree with its four subtrees.The project tree is the main tool for editing the project structure. Editing the projectstructure means inserting POUs or worksheets to the project structure or deleting exist-ing ones. The editors for editing the data of the code bodies and the variable declarationcan be called by double clicking the corresponding object icons.If you intend to edit an old project, note that saving the changes made with the “save as”project unchanged, the project has to be saved with a new name before making anychanges.3.1.1.LibrariesBefore editing any worksheets of POUs, the whole project tree structure must be build.The function block library (protection, control, measurement, condition monitoring andstandard functions) needed in the relay configuration is to be inserted to the “Libraries”subtree.Before inserting the library to the project, all worksheets are to be closed; otherwise theI/O description of function blocks will be confused. The programs, function blocks (e.g.NOC3Low, the low set stage of non-directional three-phase overcurrent protection) andfunctions of the library can be reused in the new project, which is edited.The library, e.g. REFLIB01 for REF54_ (see the figure below), includes all availablefunction blocks. Therefore, attention is to be paid to which function blocks can be usedin a specific configuration. For example, if the protection library PR116005(1MRS116005) has been ordered, the auto-reclose function block AR5Func cannot beinserted to the project since AR5Func is not included in that library.Fig. 3.1.1.-1. Libraries for REF54_ and REM 543.3.1.2.Logical POUsIn the project tree editor and in the library editor, the “Logical POUs” subtree representsa directory for all POUs related to the project. The maximum of 20 POUs can beinserted to the subtree. The figure below shows a “Logical POUs” subtree with 6 POUs;“Measure”, “Prot_Me” and “Co_CM_A” represent FBD (Function Block Diagram)programs, and “Horizon”, “CONDIS” and “SWG” are FBD function blocks.Fig. 3.1.2.-1. “Logical POUs” subtree with 6 POUs.3.1.3.Program Organisation Unit (POU)Each Program Organisation Unit, a POU, consists of several worksheets: a POUdescription worksheet for comments, a variable worksheet for variable declarations anda code body worksheet for the relay configuration. The name of each worksheet is indi-cated beside the corresponding icon and the *-symbol after the name of a worksheetindicates that the worksheet has not been compiled yet.Fig. 3.1.3.-1. Program organisation unit with three worksheets.The description worksheet (e.g. ProtectT) illustrated below is for describing the POU or the configuration element. The worksheet is named by adding a ’T’ to the name of the POU.Fig. 3.1.3.-2. Description worksheet.The variable worksheet (e.g. ProtectV) is for the variable declaration. The worksheet is named by adding a ’V’ to the name of the POU.Fig. 3.1.3.-3. Variable declaration worksheet.A code body worksheet (e.g. Protect, Measure) is for a code body declaration in theform of an FBD, a Function Block Diagram. All configurations for relays of the RED500 platform are made in the graphical FBD language. A code body programmed in theFBD language is composed of functions and function blocks that are connected to eachother using variables or lines. An output of a function block can be connected to the out-put of another function block only via an OR gate (refer to section 4.1.)Fig. 3.1.3.-4. Code body declaration in FBD language.Even though the tool permits adding several graphical worksheets under one POU, onlyone worksheet is recommended to be used per POU. If more space is needed for a con-figuration, the worksheet size can be increased or, if more I/O points are required, thefunctionality can be divided to several POUs. Note that one POU has altogether 511 I/Opoints available for function blocks. For example, the function block NOC3High in thefigure above includes 15 I/O points, which means that there are still 496 I/O pointsavailable.3.1.4.Physical hardwareIn the project tree editor, the physical hardware is represented as a subtree (see below)after the hardware of the relay terminal, i.e. Configuration, Resource and Tasks, hasbeen defined.Fig. 3.1.4.-1. Example of a subtree for the physical hardware.The configuration elements available in the “Physical Hardware” subtree may differfrom configuration to configuration. Each terminal of the RED 500 platform can beconfigured separately.3.1.4.1.ConfigurationThe name and target of the configuration are first defined in the dialogue Properties/Configuration.Fig. 3.1.4.1.-1. Defining the configuration type.3.1.4.2.ResourceThe configuration “Conf” above is for the REF 543 resource (the selected processortype), which is defined in the dialogue Properties/Resource. The resource must also begiven a name.Fig. 3.1.4.2.-1. Defining the processor type.Hardware versionAfter selecting the processor type, click “Settings...” in the dialogue Properties/Resource (see figure above) to define the correct hardware version.Note! After selecting the correct hardware version (Relay Variant; see figure below), donot click OK but wait until the next dialogue opens and select “Analog Channels” (seefigure 3.1.4.2.-3.).Fig. 3.1.4.2.-2. Defining the hardware version.Analogue channelsIn the dialogue Settings/Analog Channels, click each channel in turn to select the meas-uring device and signal type for the channels used and select “Not in use” for otherchannels.Furthermore, the technical data and measurements for the selected channels are to becompleted correctly before the configuration is used in a real application.Fig. 3.1.4.2.-3. Defining the analogue channels.Technical dataFig. 3.1.4.2.-4. Defining the rated values for the selected channel.MeasurementsFor information about the special measurements required for each function block, refer to the Technical Descriptions of Functions (CD-ROM 1MRS 750889-MCD).Phase currents and voltagesIf the signal type selected for an analogue channel is to be shown in the MIMIC view via the MMIDATA_ function block (MIMIC dynamic data point), the true RMS mode must be selected in the “Special Measurements” dialogue. Moreover, in case the Inrush3 function block (3-phase transformer inrush and motor start-up current detector) is to be used, the 2nd harmonic restraint must be selected for the analogue channels (IL1, IL2, IL3) used.Fig. 3.1.4.2.-5. Selecting the required special measurement modes for phase current measurement.Neutral currentWhen e.g. the DEF2_ function block (directional earth-fault protection) is going to be used, intermittent earth-fault protection must be selected for the channel via which the current Io is measured. Unless intermittent earth-fault protection has been chosen, the following configuration error indication will appear on the display of the relay terminal ( # denotes the number of the analogue channel in question):System: SUPERVCh # errorFig. 3.1.4.2.-6. Selecting the required special measurement modes for neutral current measurement.FrequencyWhen, for example, any of the function blocks MEFR_ (system frequency measure-ment), SCVCSt_ or Freq1St_ is in use, frequency measurement must be selected for the channel via which the voltage is measured for frequency measurement (for example: Channel 10, Voltage Transformer 4, Signal type U3 / Measurements button in the “Con-figuration of REF543” dialogue).Fig. 3.1.4.2.-7. Selecting the required special measurement modes for frequency meas-urement.Virtual channelsIn case no measuring devices are applied for measuring residual voltage (Uo) and neu-tral current (Io), the virtual channels 11 and 12 must be used.Fig. 3.1.4.2.-8. Using virtual channels 11 and 12 in case no measuring devices are applied for measuring Io and Uo.In case of the virtual channel 12, voltage measuring devices must be associated with phase voltages.Fig. 3.1.4.2.-9. Associating voltage measuring devices with phase voltages.After a compiled configuration is downloaded to a relay and the relay is started (storing and resetting are done), it will internally check whether the analogue channels are cor-rectly configured regarding the analogue inputs of function blocks. If the connectedchannels have been configured incorretly, the ERR output signal of the specific function block goes active and the event E48 is sent. Furthermore, either the event E13 or E5 is sent depending on the function block in question.Binary inputsThe filter time is set for each binary input of the relay terminal via the resource settings dialogue “Binary Inputs”. Inversion of the inputs can also be set. For further informa-tion refer to the Technical Reference Manual of REF 54_ or REM 543.Fig. 3.1.4.2.-10. Defining the binary inputs.MeasurementsWhen the MEPE7 function block (power and energy measurement) is used, the measur-ing mode must be selected via the resource settings dialogue “Measurements”.Fig. 3.1.4.2.-11. Selecting the measuring mode for power and energy measurement.Condition monitoringValues for the circuit-breaker wear function blocks CMBWEAR 1 and 2 can be set via the resource settings dialogue “Condition Monitoring”.Fig. 3.1.4.2.-12. Setting the values for circuit-breaker wear.3.1.4.3.TasksPrograms and tasksPrograms are associated with tasks via the dialogues Properties/Task and Properties/Program. Cyclic tasks are activated within a specific time interval and the program isexecuted periodically.The two dialogues below illustrate the association of a program type (Prot_Me) with atask (Task1) (see also figure 3.1.4.-1. in section “Physical hardware”).Fig. 3.1.4.3.-1. Naming a cyclic task.Fig. 3.1.4.3.-2. Associating the selected task with the desired program type.Task intervalGenerally, the operation accuracy is increased when the task speed is increased, but atthe same time, the load of the microprocessors is increased as well. Although the taskspeed can be freely chosen with the tool, it is necessary to determine a maximum taskexecution interval for each function block; otherwise the operation accuracy and operatetimes for protection functions cannot be guaranteed. The maximum task executioninterval is based on test results and has also been used in the type testing of the functionblocks. The recommended task execution interval quaranteed by the manufactirer canbe found in section Technical Data in the technical description of each function block.According to the standard, the Relay Configuration Tool includes the possibility ofdefining the tasks on two different levels:1) each POU (= program organisation unit) can be tied to a separate task2) a separate function block inside a POU can be tied to any taskHowever, the alternative 2) is not yet supported in the RED environment, which meansthat if a separate function block inside a POU is given a separate task definition, it willbe ignored when transferred to the relay. This means that when the function blocks arebeing placed in different POUs, not only the category of the function (protection, con-trol, etc.) but also the maximum task execution interval should be considered since allfunction blocks inside a POU will run at the same speed.For further information about the microprocessor loads and task execution intervals offunction blocks refer to the manual “Technical Descriptions of Functions, Introduction”(CD-ROM: Technical Descriptions of Functions, 1MRS750889-MCD).The task execution interval for each task is defined via the dialogue Properties/Task(click “Settings...”). For example, the task execution interval for Task1 in figure belowis defined as 10 ms, which means that the program Prot_Me is run 100 times per onesecond.Fig. 3.1.4.3.-3. Setting the task execution interval for a program.3.2.Global variablesThe physical contacts of RE_ 54_ are defined in the “Global Variables” worksheet.Declarations for the physical contacts are automatically defined when the correct hard-ware version of RE_ 54_ is selected. Declarations for the analogue channels are createdafter the analogue channel settings defined in the resource settings dialogue have beenapproved.Fig. 3.2-1. Global Variables worksheet.3.3.Declaring variablesAt its beginning, each programmable controller POU type declaration is to contain atleast one declaration part that specifies the types of the variables used in the organisa-tion unit. The declaration part shall have the textual form of one of the keywordsV AR_INPUT, VAR_OUTPUT, V AR and V AR_EXTERNAL followed by one or moredeclarations separated by semicolons and terminated by the keyword END_V AR. Allthe comments you write must be edited in parentheses and asterisks.(**********************************)(*Variable declaration*)(*of REF 541*)(**********************************)Caution is required regarding comments and variable declarations. The following codeexample will be compiled successfully but because of the non-closed comment theEND_VAR - V AR_EXTERNAL couple will be excluded and thus the channel numbersbecome local variables of the POU and they get the initial value zero.V AR (*AUTOINSERT*)NOC3Low_1:NOC3Low; (* Erroneous nonclosed comment *END_VARV AR_EXTERNALU12:SINT;(* Measuring channel 8 *)U23:SINT;(* Measuring channel 9 *)U31:SINT;(* Measuring channel 10 *)END_VARThree examples of creating the textual declaration for different kinds of graphical pro-grams are given below.EXAMPLE 1.•POU type:FBD program •Function block type declaration: VARINPUT1:BOOL :=FALSEINPUT2:BOOL :=FALSEINPUT3:BOOL :=FALSEOUTPUT:BOOL :=FALSE END_VARFig. 3.3-1. Function block image.EXAMPLE 2.•POU type:NOC3Low, manufacturer dependent function block •Function block type declaration:V AR_INPUTIL1:SINT :=0;(* Analogue channel *)IL2:SINT :=0;(* Analogue channel *)IL3:SINT :=0;(* Analogue channel *)BS1:BOOL :=FALSE;(* Blocking signal *)BS2:BOOL :=FALSE;(* Blocking signal *)TRIGG:BOOL :=FALSE;(* Triggering *)GROUP:BOOL :=FALSE;(* Grp1/Grp2 select *)DOUBLE:BOOL :=FALSE;(* Doubling signal *)BSREG:BOOL :=FALSE;(* Blocking registering *)RESET:BOOL :=FALSE;(* Reset signal *)END_VARV AR_OUTPUTSTART:BOOL :=FALSE; (* Start signal *)TRIP:BOOL :=FALSE; (* Trip signal *)CBFP:BOOL :=FALSE; (* CBFP signal *)ERR:BOOL :=FALSE; (* Error signal *)END_VARFig. 3.3-2. Function block image of NOC3Low.EXAMPLE 3.•POU type:Configurer dependent FBD function block CONDIS •Function block type declaration:Fig. 3.3-3. Type declaration of the configurer made function block CONDIS.Fig. 3.3-4. Use of the configurer made function block CONDIS.Fig. 3.3-5. Contents of the CONDIS function block.In the example 3 above, part of the configuration has been separated to a configurermade function block called CONDIS. Such function blocks may not be given namesalready belonging to library functions blocks or IEC standard function blocks. Thefunction block CONDIS has been used like any other function block in the graphicalprogram. The order of inputs of a function block that has been inserted to a worksheetmay not be changed. It must also be remembered that a function block with an instancenamed by the configurer can only be inserted to the project once.3.4.Variables declared as globalThe range of validity of the declarations included in the declaration part shall be“local”to the POU in which the declaration part is contained. One exception to this rule are var-iables that have been declared to be “global”. Such variables are only accessible to aPOU via a V AR_EXTERNAL declaration. The type of a variable declared in aV AR_EXTERNAL block shall agree with the type declared in the V AR_GLOBALblock of the associated program, configuration or resource.Fig. 3.4-1. Local and global variables.The figure above illustrates the ways how values of variables can be communicated among software elements. Variable values within a program can be communicated directly by connecting the output of one program element to the input of another or via local variables such as the variable y illustrated in the upper left corner of the figure above. In the same configuration, variable values can be communicated between pro-grams via global variables such as the variable x illustrated in “Configuration C” in the figure above. In such a case, make sure that the global variable is only written from one location in the project. The global variable can still be read from several locations. Do not use binary inputs or outputs for data transfer between tasks.Despite the fact that according to the IEC standard 1131-3 all local variables that have no initial value are initially FALSE (0), the initial values for all local variables have to be given in the variable worksheet of the logical POU. This is because somewhere in the configuration, especially in the beginning of running the configuration i.e. when the relay is started up, the variable can be used before the initial value has been created for the variable. The initial values for global variables are given in the global variable worksheet (see CASE 1 below).CASE 1. Variables declarationV ARIABLE WORKSHEET of logical POU*********************************************************************V ARTRIPPING:BOOL:= FALSE;BLOCK:BOOL:= TRUE;TMP1:BOOL:= FALSE;END_VARV AR_EXTERNALPS1_4_HSPO1 :BOOL; (* Double pole high speed power output *)(* 4.1/10,11,12,13 *)PS1_4_HSPO2 :BOOL; (* Double pole high speed power output *)(* X4.1/15,16,17,18 *)PS1_4_HSPO3 :BOOL; (* Double pole high speed power output *)(* X4.1/6,7,8,9 *)END_VARV AR_EXTERNALTCS1_ALARM:BOOL;END_VAR********************************************************************* GLOBAL V ARIABLE WORKSHEET*********************************************************************V AR_GLOBALPS1_4_HSPO1AT %QX 1.1.2 :BOOL := FALSE;(* Double pole high speed power output X4.1/10,11,12,13 *) PS1_4_HSPO2AT %QX 1.2.2 :BOOL := FALSE;(* Double pole high speed power output X4.1/15,16,17,18 *) PS1_4_HSPO3AT %QX 1.3.2 :BOOL := FALSE;(* Double pole high speed power output X4.1/6,7,8,9 *) END_VARV AR_GLOBALTCS1_ALARM:BOOL:= FALSE;END_VAR*********************************************************************piling the projectThe “Build Project” mode in the “Make” menu is used to compile the whole project forthe first time after editing, which means compiling all POUs, global variables, resourcesetc., whereas the “Make” mode can be used to compile the worksheets that have beenedited. The changed worksheets are marked with an asterisk in the project tree editor.“Make” is the standard mode for compiling and should normally be used when you havefinished editing.In the Relay ConfigurationTool you can view the execution order of the different func-tions or function blocks in your worksheet. The execution order corresponds to theintermediate PLC code created while compiling. Note that the execution order can onlybe seen if you have already compiled the worksheet using the menu item “CompleteWorksheet” in the submenu “Make”.By editing the mwt.ini file, one can affect the execution order. The line below can beadded to section [MAKE]:SH_EXECUTION_ORDER_TOP_BOTTOMSets the order of execution of functions and function blocks in graphical editor0Sets the execution order from left to right1Sets the execution order from top to bottomThe default value is …0“. The execution order can only be changed when compiling aworksheet again.4.Main Configuration Rules for RE_ 54_4.1.General Make sure that all analogue signals are connected and all necessary inputs and outputs are wired. Note that the outputs of function blocks may not be connected together. There are also many other FBD programming rules to follow. One of the most typical rules is not to use the “wired-OR” connection. All signals that are connected to the same output signal (both output relays and horizontal communication outputs) must be connected via an OR-gate (see figure below).Fig. 4.1-1. Use of an explicit Boolean OR gate (on the right).4.2.Binary inputs Do not write binary inputs - like BIO2_7_BI2.Fig. 4.2-1. Writing global binary inputs is not allowed.Binary inputs can be read from several locations.I>I>>TRIP TRIP PS1_4_HSPO1PS1_4_HSPO1I>I>>TRIP TRIP PS1_4_HSPO1"wired-OR" structure is not allowed an explicit Boolean "OR" block is required instead ORFig. 4.2-2. Correct way of using global binary inputs.4.3.Explicit feedbackAn explicit feedback loop may not be used. In case the tool gives a warning about thefeedback loop after compiling, the loop must be broken up by connecting the input ofthe function block and the output of another function block separately to the auxiliaryvariable.Fig. 4.3-1. Explicit feedback loop is not allowed.Fig. 4.3-2. Explicit feedback via the auxiliary variable FEEDBACK.4.4.Measurement function blocksAll the measurement inputs of the function blocks must be connected. In case e.g. thethree-phase current measurement function block MECU3A is implemented by twophases of currents, one of the currents available must be connected to the remainingthird input of MECU3A. Unless something is connected to the third current input, thefunction block will attempt to use the sensor channel 1 instead of the empty channel,and if no measurements have been defined for channel 1, a supervision error will result.4.5.WarningsIn case of the indication “Warning: Instance ‘xx’ is never used” in connection with Array compilation, remove the corresponding instances of the function block from the varia-bles worksheet of the POU. The tool will not give a separate warning for unused varia-bles, which is why they need to be removed manually. When a global variable is addedto a sheet as a copy-paste -function, the global radio button has to be chosen (see figurebelow); otherwise the variable becomes a local variable of the POU, which is due to theauto-insert feature of the tool (global variable = V AR_EXTERNAL, local variable =V AR).Fig. 4.5-1. Copying a global variable to a worksheet of a POU.4.6.Execution orderCheck the execution order after the compilation. The connection of simple variables toeach other generates a code, the execution order of which in relation to the callingsequence of POUs cannot be seen by means of the Layout Execution Order function.Fig. 4.6-1. The INTERLOCKING variable is updated (TMP1) during the task executioncycle (see the execution order 1,2,3).In addition, the execution order may be illogical or even incorrect considering the func-tionality.Fig. 4.6-2. The explicit feedback (TMP1) delays the updating of the INTERLOCKINGvariable by one task execution cycle.If the MOVE function is used instead of direct connection, the execution order can beutilised in concluding whether the result is desirable.4.7.F-keyThe freely programmable F-key is declared as VAR_GLOBAL in the global variableworksheet as follows:F001V021:BOOL:=0;(* Free configuration point (F-key) *)The F-key parameter can be added to the configuration logic as an external variable(V AR_EXTERNAL). Similar parameters that can be inserted to a relay logic are listedbelow:F001V011:BOOL:=0;(*(W) Resetting of operation indications*)F001V012:BOOL:=0;(*(W) Resetting of operation indications & latched output signals*)F001V013:BOOL:=0;(*(*(W) Resetting of operation indications, latched output signals & wave-form memory*)*)F001V020:BOOL:=0;(*(W) Resetting of accumulated energy measurement*) F001V021:BOOL:=0;(’(R, W) Free configuration point (F-key)*)F002V004:BOOL:=0;(*(*(R, W) Control: Interlocking bypass mode for all control objects (Enables all)*)*)F002V005:USINT:=0;(*(W) Control: Recent control position*) F002V006:BOOL:=0;(*(W) Control: Virtual LON input poll status*)F900V251:BOOL:=0;(*(*(W) Control: Execute all command for selected objects (inside module)*)*)F900V252:BOOL:=0;(*(W) Control: Cancel all command for selected objects (inside module)*)F000V251:BOOL:=0;(*(*(W) Control: Execute all command for selected objects (inside module)*)*)F000V252:BOOL:=0;(*(W) Control: Cancel all command for selected objects (inside module)*)。
引言英文文献原文Digital image processing and pattern recognition techniques for the detection of cancerCancer is the second leading cause of death for both men and women in the world , and is expected to become the leading cause of death in the next few decades . In recent years , cancer detection has become a significant area of research activities in the image processing and pattern recognition community .Medical imaging technologies have already made a great impact on our capabilities of detecting cancer early and diagnosing the disease more accurately . In order to further improve the efficiency and veracity of diagnoses and treatment , image processing and pattern recognition techniques have been widely applied to analysis and recognition of cancer , evaluation of the effectiveness of treatment , and prediction of the development of cancer . The aim of this special issue is to bring together researchers working on image processing and pattern recognition techniques for the detection and assessment of cancer , and to promote research in image processing and pattern recognition for oncology . A number of papers were submitted to this special issue and each was peer-reviewed by at least three experts in the field . From these submitted papers , 17were finally selected for inclusion in this special issue . These selected papers cover a broad range of topics that are representative of the state-of-the-art in computer-aided detection or diagnosis(CAD)of cancer . They cover several imaging modalities(such as CT , MRI , and mammography) and different types of cancer (including breast cancer , skin cancer , etc.) , which we summarize below .Skin cancer is the most prevalent among all types of cancers . Three papers in this special issue deal with skin cancer . Y uan et al. propose a skin lesion segmentation method. The method is based on region fusion and narrow-band energy graph partitioning . The method can deal with challenging situations with skin lesions , such as topological changes , weak or false edges , and asymmetry . T ang proposes a snake-based approach using multi-direction gradient vector flow (GVF) for the segmentation of skin cancer images . A new anisotropic diffusion filter is developed as a preprocessing step . After the noise is removed , the image is segmented using a GVF1snake . The proposed method is robust to noise and can correctly trace the boundary of the skin cancer even if there are other objects near the skin cancer region . Serrano et al. present a method based on Markov random fields (MRF) to detect different patterns in dermoscopic images . Different from previous approaches on automatic dermatological image classification with the ABCD rule (Asymmetry , Border irregularity , Color variegation , and Diameter greater than 6mm or growing) , this paper follows a new trend to look for specific patterns in lesions which could lead physicians to a clinical assessment.Breast cancer is the most frequently diagnosed cancer other than skin cancer and a leading cause of cancer deaths in women in developed countries . In recent years , CAD schemes have been developed as a potentially efficacious solution to improving radiologists’diagnostic accuracy in breast cancer screening and diagnosis . The predominant approach of CAD in breast cancer and medical imaging in general is to use automated image analysis to serve as a “second reader”, with the aim of improving radiologists’diagnostic performance . Thanks to intense research and development efforts , CAD schemes have now been introduces in screening mammography , and clinical studies have shown that such schemes can result in higher sensitivity at the cost of a small increase in recall rate . In this issue , we have three papers in the area of CAD for breast cancer . Wei et al. propose an image-retrieval based approach to CAD , in which retrieved images similar to that being evaluated (called the query image) are used to support a CAD classifier , yielding an improved measure of malignancy . This involves searching a large database for the images that are most similar to the query image , based on features that are automatically extracted from the images . Dominguez et al. investigate the use of image features characterizing the boundary contours of mass lesions in mammograms for classification of benign vs. Malignant masses . They study and evaluate the impact of these features on diagnostic accuracy with several different classifier designs when the lesion contours are extracted using two different automatic segmentation techniques . Schaefer et al. study the use of thermal imaging for breast cancer detection . In their scheme , statistical features are extracted from thermograms to quantify bilateral differences between left and right breast regions , which are used subsequently as input to a fuzzy-rule-based classification system for diagnosis.Colon cancer is the third most common cancer in men and women , and also the third mostcommon cause of cancer-related death in the USA . Y ao et al. propose a novel technique to detect colonic polyps using CT Colonography . They use ideas from geographic information systems to employ topographical height maps , which mimic the procedure used by radiologists for the detection of polyps . The technique can also be used to measure consistently the size of polyps . Hafner et al. present a technique to classify and assess colonic polyps , which are precursors of colorectal cancer . The classification is performed based on the pit-pattern in zoom-endoscopy images . They propose a novel color waveler cross co-occurence matrix which employs the wavelet transform to extract texture features from color channels.Lung cancer occurs most commonly between the ages of 45 and 70 years , and has one of the worse survival rates of all the types of cancer . Two papers are included in this special issue on lung cancer research . Pattichis et al. evaluate new mathematical models that are based on statistics , logic functions , and several statistical classifiers to analyze reader performance in grading chest radiographs for pneumoconiosis . The technique can be potentially applied to the detection of nodules related to early stages of lung cancer . El-Baz et al. focus on the early diagnosis of pulmonary nodules that may lead to lung cancer . Their methods monitor the development of lung nodules in successive low-dose chest CT scans . They propose a new two-step registration method to align globally and locally two detected nodules . Experments on a relatively large data set demonstrate that the proposed registration method contributes to precise identification and diagnosis of nodule development .It is estimated that almost a quarter of a million people in the USA are living with kidney cancer and that the number increases by 51000 every year . Linguraru et al. propose a computer-assisted radiology tool to assess renal tumors in contrast-enhanced CT for the management of tumor diagnosis and response to treatment . The tool accurately segments , measures , and characterizes renal tumors, and has been adopted in clinical practice . V alidation against manual tools shows high correlation .Neuroblastoma is a cancer of the sympathetic nervous system and one of the most malignant diseases affecting children . Two papers in this field are included in this special issue . Sertel et al. present techniques for classification of the degree of Schwannian stromal development as either stroma-rich or stroma-poor , which is a critical decision factor affecting theprognosis . The classification is based on texture features extracted using co-occurrence statistics and local binary patterns . Their work is useful in helping pathologists in the decision-making process . Kong et al. propose image processing and pattern recognition techniques to classify the grade of neuroblastic differentiation on whole-slide histology images . The presented technique is promising to facilitate grading of whole-slide images of neuroblastoma biopsies with high throughput .This special issue also includes papers which are not derectly focused on the detection or diagnosis of a specific type of cancer but deal with the development of techniques applicable to cancer detection . T a et al. propose a framework of graph-based tools for the segmentation of microscopic cellular images . Based on the framework , automatic or interactive segmentation schemes are developed for color cytological and histological images . T osun et al. propose an object-oriented segmentation algorithm for biopsy images for the detection of cancer . The proposed algorithm uses a homogeneity measure based on the distribution of the objects to characterize tissue components . Colon biopsy images were used to verify the effectiveness of the method ; the segmentation accuracy was improved as compared to its pixel-based counterpart . Narasimha et al. present a machine-learning tool for automatic texton-based joint classification and segmentation of mitochondria in MNT-1 cells imaged using an ion-abrasion scanning electron microscope . The proposed approach has minimal user intervention and can achieve high classification accuracy . El Naqa et al. investigate intensity-volume histogram metrics as well as shape and texture features extracted from PET images to predict a patient’s response to treatment . Preliminary results suggest that the proposed approach could potentially provide better tools and discriminant power for functional imaging in clinical prognosis.We hope that the collection of the selected papers in this special issue will serve as a basis for inspiring further rigorous research in CAD of various types of cancer . We invite you to explore this special issue and benefit from these papers .On behalf of the Editorial Committee , we take this opportunity to gratefully acknowledge the autors and the reviewers for their diligence in abilding by the editorial timeline . Our thanks also go to the Editors-in-Chief of Pattern Recognition , Dr. Robert S. Ledley and Dr.C.Y. Suen , for their encouragement and support for this special issue .英文文献译文数字图像处理和模式识别技术关于检测癌症的应用世界上癌症是对于人类(不论男人还是女人)生命的第二杀手。
文献出处:Gümüs, A. Taskin, and A. Fuat Güneri. "Multi-echelon inventory management in supplychains with uncertain demand and lead times: literature review from an operational research perspective." Proceedings of the Institution of Mechanical Engineers, Part B: Journal of EngineeringManufacture 221.10 (2007):1553-1570.原文Multi-echelon inventory management in supply chains with uncertain demandand lead times: literature review from an operational research perspectiveA Taskin Gu¨mu¨s* and A Fuat Gu¨neriAbstract:Historically, the echelons of the supply chain, warehouse, distributors, retailers, etc., have been managed independently, bufferedby large inventories. Increasing competitive pressures and market globalization are forcing firms to develop supply chains that can quicklyrespond to customer needs. To remain competitive and decrease inventory,these firms must use multi-echelon inventory management interactively,while reducing operating costs and improving customer service. The currentpaper reviews the literature, addressing multiechelon inventory managementin supply chains from 1996 to 2005. The behavior of the papers against demandand lead-time uncertainty is the key analysis point of the literature reviewpresented here and it is conducted from an operational research point ofview. Finally, directions for future research are suggested.Keywords: supply chain, multi-echelon inventory management, demand uncertainty, lead-time uncertainty1 INTRODUCTIONSupply chain management (SCM) is an integrative approach for planningand control of materials and information flows with suppliers and customers,as well as between different functions within a company. This area has drawn considerable attention in recent years and is seen as a tool that provides competitive power. SCM is a set of approaches to integrate suppliers, manufacturers, warehouses, and stores efficiently, so that merchandise is produced and dis-tributed at right quantities, to the right locations, and at the right time, in order to minimize system- wide costs while satisfying service-level requirements. So the supply chain consists of various members or stages. A supply chain is a dynamic, stochastic, and complex system that might involve hundreds of participant.Inventory usually represents from 20 to 60 per cent of the total assets of manufacturing firms. Therefore inventory management policies prove critical in determining the profit of such firms [4]. Inventory management is, to a greater extent, relevant when a whole supply chain (SC), namely a network of procurement, transformation, and delivering firms, is considered. Inventory management is indeed a major issue in SCM, i.e. an approach that addresses SC issues under an integrated perspective.Inventories exist throughout the SC in various forms for various reasons [6]. The lack of a coordinated inventory management throughout the SC often causes the bullwhip effect, namely an amplification of demand variability moving towards the upstream stages. This causes excessive inventory investments, lost revenues, misguided capacity plans, ineffective transportation, missed production schedules, and poor customer service [5].Many scholars have studied these problems, as well as emphasized the need of integration among SC stages, to make the chain effectively and efficiently satisfy customer requests (e.g. reference [7]). Beside the integration issue, uncertainty has to be dealt with in order to define an effective SC inventory policy. In addition to the uncertainty on supply (e.g. lead times) and demand, information delays associated with the manufacturing and distribution processes characterize SCs.Inventory management in multi-echelon SCs is an important issue, becausethere are many elements that have to coordinate with each other. They must also arrange their inventories to coordinate. There are many factors that complicate successful inventory management, e.g. uncertain demands, lead times, production times, product prices, costs, etc., especially the uncertainty in demand and lead times where the inventory cannot be managed between echelons optimally.In the current paper, a detailed literature review is presented, addressing multi-echelon inventory management in SCs from 1996 to 2005. Here, the behavior of the papers against demand and lead time uncertainty is emphasized. First, echelon concept and multi-echelon inventory management in SCs are defined. Then, the literature review conducted from an operational research point of view between 1996 and 2005, is presented. Finally, directions for future research are suggested.2 MULTI-ECHELON INVENTORY MANAGEMENT IN SUPPLY CHAINSMost manufacturing enterprises are organized into networks of manufacturing and distribution sites that procure raw material, process them into finished goods, and distribute the finish goods to customers. The terms ‘multi-echelon’ or ‘multilevel ‘production/ distribution networks are also synonymous with such networks (or SC), when an item moves through more than one step before reaching the final customer. Inventories exist throughout the SC in various forms for various reasons. At any manufacturing point, they may exist as raw materials, work in progress, or finished goods. They exist at the distribution warehouses, and they exist in-transit, or ‘in the pipeline’, on each path linking these facilities.Manufacturers procure raw material from suppliers and process them into finished goods, sell the finished goods to distributors, and then to retail and/ or customers. When an item moves through more than one stage before reaching the final customer, it forms a ‘multi-echelon’ inventory system [8]. The echelon stock of a stock point equals all stock at this stock point, plus in-transit to or on-hand at any of its downstream stock points, minusthe backorders at its downstream stock points.The analysis of multi-echelon inventory systems that pervades the business world has a long history [10]. Multi-echelon inventory systems are widely employed to distribute products to customers over extensive geographical areas. Given the importance of these systems, many researchers have studiedtheir operating characteristics under a variety of conditions and assumptions [11]. Since the development of the economic order quantity (EOQ) formula by Harris (1913), researchers and practitioners have been actively concerned with the analysis and mod elling of inventory systems under different operating parameters and modelling assumptions [2]. Research on multi-echelon inventory models has gained importance over the last decade mainly because integrated control of SCs consisting of several processing and distribution stages has become feasible through modern information technology [8, 9, 12].Clark and Scarf [13] were the first to study the two echelon inventory model [8, 9, 10, 14–17]. They proved the optimality of a base-stock policy for the pure-serial inventory system and developed an efficient decomposing method to compute the optimal base-stock ordering policy. Bessler and Veinott [18] extended the Clark and Scarf [13] model to include general arborescent structures. The depot warehouse problem described above was addressed by Eppen and Schrage [19] who analysed a model with a stockless central depot [20]. They derived a closed-form expression for the order-up-to-level under the equal fractile allocation assumption. Several authors have also considered this problem in various forms [11, 14–17, 20–30]. Owing to the complexity and intractability of the multi-echelon problem, Hadley and Whitin [31] recommend the adoption of single-location, single-echelon models for the inventory systems .Sherbrooke considered an ordering policy of a two-echelon model for warehouse and retailer. It is assumed that stockouts at the retailers arecompletely backlogged [8]. Also, Sherbrooke [32] constructed the METRIC (multi-echelon technique for recoverable item control) model, which identifies the stock levels that minimize the expected number of backorders at the lower-echelon subject to a budget constraint. This model is the first multi-echelon inventory model for managing the inventory of service parts [6, 10, 33]. Thereafter, a large set of models, which generally seek to identify optimal lot sizes and safety stocks in a multi-echelon framework, were produced by many researchers [27, 34–37]). In addition to analytical models, simulation models have also been developed to capture the complex interactions of the multi-echelon inventory problems.Figure 1 shows a multi-echelon system consisting of a number of suppliers, plants, warehouses, distribution centres, and customers [27, 42, 43].So far literature has devoted major attention to the forecasting of lumpy demand, and to the development of stock policies for multi-echelon SCs [13]. Inventory control policy for multi-echelon systems with stochastic demand has been a widely researched area. More recent papers have been covered by Silver and Pyke. The advantage of centralized planning, available in periodic review policies, can be obtained in continuous review policies, by defining the reorder levels of different stages, in terms of echelon stock rather than installation stock3 LITERATURE REVIEW: FROM 1996 TO 2005In this section, a detailed literature review, conducted from an operational research point of view, is presented. It addresses multi-echelon inventory management in SCs, from 1996 to 2005. The selection criteria of the papers that are reviewed are: using operational research techniques to overcome multiechelon inventory management problems, and being demand and lead time sensitive (there are uncertain demand and lead times). Here, the behavior of the papers against demand and lead time uncertainty is emphasized.The papers reviewed here are categorized into groups on the basis ofthe research techniques in which they are used. These techniques can be grouped as:(a) mathematic modelling (only);(b) mathematic modelling and other techniques (in the same paper);(c) METRIC modelling;(d) Markov decision process;(e) simulation (only);(f) Stackelberg game;(g) literature review;(h) other techniques (vari-METRIC method, heuristics, scenario analysis, fuzzy logic, etc.).While the research techniques are common for papers that are grouped according to their research techniques, the number of echelons they consider, inventory/system policies, demand and lead time assumptions, the objectives, and the solutions’ exact ness may be different. Therefore these factors are also analysed.Mathematic modelling techniqueRau et al. [8], Diks and de Kok [9], Dong and Lee, Mitra and Chatterjee [45], Hariga [46], Chen, Axsater and Zhang [48], Nozick and Turnquist, and So and Zheng [50] use a mathematic modelling technique in their studies to manage multi-echelon inventory in SCs. Diks and de Kok’s study [9] con siders a divergent multi-echelon inventory system, such as a distribution system or a production system, and assumes that the order arrives after a fixed lead time. Hariga [46], presents a stochastic model for a single-period production system composed of several assembly/processing and storage facilities in series.Chen [47], Axsater and Zhang [48], and Nozick and Turnquist [49] consider a two-stage inventory system in their papers. Axsater and Zhang [48] and Nozick and Turnquist [49] assume that the retailers face stationary and independent Poisson demand. Mitra and Chatterjee [45] examine De Bodt andGraves’model(1985),which they developed in their paper ‘Continuous-review policies for a multi-echelon inventory problem wi th stochastic demand’, for fast moving items from the implementation point of view. The proposed modification of the model can be extended to multi-stage serial and two-echelon assembly systems. In Rau et al.’s [8] model, shortage is not allowed, lead time is assumed to be negligible, and demand rate and production rate is deterministic and constant. So and Zheng [50] used an analytical model to analyse two important factors that can contribute to the high degree of order-quantity variability experienced by semiconductor manufacturers: supplier’s lead time and forecast demand updating. They assume that the external demands faced by the retailer are correlated between two successive time periods and that the retailer uses the latest demand information to update its future demand forecasts.Furthermore, they assu me that the supplier’s deli very lead times are variable and are affected by the retailer’s order quantities. Dong and Lee’s paper revisits the serial multi-echelon inventory system of Clark and Scarf [13] and develops three key results. First, they provide a simple lower-bound approximation to the optimal echelon inventory levels and an upper bound to the total system cost for the basic model of Clark and Scarf [13]. Second, they show that the structure of the optimal stocking policy of Clark and Scarf [13] holds under time-correlated demand processing using a Martingale model of forecast evolution. Third, they extend the approximation to the time-correlated demand process and study, in particular for an autoregressive demand model, the impact of lead times, and autocorrelation on the performance of the serial inventory system.After reviewing the literature about multiechelon inventory management in SCs using mathematic modelling technique, it can be said that, in summary, these papers consider two, three, or N-echelon systems with stochastic or deterministic demand. They assume lead times to be fixed, zero, constant, deterministic, or negligible. They gain exact or approximate solutions.Mathematic modelling and other techniques togetherDekker et al. analyses the effect of the break quantity rule on the inventory costs. The break quantity rule is to deliver large orders from the warehouse, and small orders from the nearest retailer, where a so-called break quantity determines whether an order is small or large. In most l-warehouse N-retailers distribution systems, it is assumed that all customer demand takes place at the retailers [19, 22, 24, 70, 71]. However, it was shown by Dekker et al. that delivering large orders from the warehouse can lead to a consid erable reduction in the retailer’s inventory costs. In Dekker et al. [54] the results of Dekker et al. [72] were extended by also including the inventory costs at the warehouse. The study by Mohebbi and Posner’s [53] contains a cost analysis in the context of a continuous-review inventory system with replenishment orders and lost sales. The policy considered in the paper by van der Haiden et al. [56] is an echelon stock, periodic review, order up-to (R,S) policy, under both stochastic demand and lead times.Andersson and Markland’s [57] approach is based on an approximate cost-evaluation technique. Axsater presents a method for exact evaluation of control policies that provides the complete probability distributions of the retailer inventory levels. Mitra and Chatterjee [65] examine the effect of utilizing demand information in a multi-echelon system. Seferlis and Giannelos [41] present an optimization-based control approach that applies multivariable model-predictive control principles to the entire network. The invent ory system under Seifbarghy and Jokar’s [68] consideration uses continuous review inventory policy (R,Q) and assumes constant lead times. In Moinzadeh’s paper [62], each retailer places their order to the supplier according to the well-known ‘Q,R’ policy. It is assumed that the supplier has online information about the demand, as well as inventory activities of the product at each retailer, and uses this information when making order/replenishment decisions. Tang formulae aredeveloped for solving the optimal planned lead times with the objective of minimizing total stock out and invent or holding costs. Axsater [43] assumes that the system is controlled by continuous review installation stock (R,Q) policies with given batch quantities and presents a simple technique for approximate optimization of the reorder points.Cachon and Fisher [58] and Tsiakis et al. [61] use mathematical modelling and scenario analysis in their studies. Cachon and Fisher [58] consider a twoechelon inventory system with stochastic demand, while Tsiakis et al.[61] consider a four-echelon inventory system with time-invariant demand, differently from most studies. Cachon and Fisher [58] study the value of sharing demand and inventory data in a two-echelon inventory system, while Tsiakis et al.’s objective is the minimization of the total annualized cost of the network Chiu and Huang [64] use mathematical modelling and simulated annealing algorithm in their studies and consider an N-echelon serial SC. Their paper proposes a multi-echelon integrated just-in-time inventory (MEIJITI) model with random-delivery lead times for a serial SC in which members exchange information to make purchase, production, and delivery decisions jointly.Parker and Kapuscinski [30] use mathematical modelling and Markov decision processes in their paper, and consider a two-echelon inventor system with stochastic demand. Extending the Clark and Scarf [13] model to include installations with production capacity limits, they demonstrate that a modified echelon base-stock policy is optimal in a twostage system when there is a smaller capacity at the downstream facility.A multi-product, multi-stage, and multi-period production and distribution planning model is proposed in Chen and Lee [66] to tackle the compromised sales prices and the total profit problem of a multi-echelon SC network with uncertain sales prices. They use mathematical modelling (mixed integer non-linear programming) and fuzzy optimization in their study.Jalbar et al. [67] use mathematical modelling, Schwarz heuristic, Graves and Schwarz procedure, Muckstadt and Roundy approach, and O(N log N) heuristic in their paper, and consider a two-echelon inventory system with one-warehouse and N-retailers.The goal is to determine single-cycle policies that minimize the average cost per unit time, that is, the sum of the average holding and set-up costs per unit time at the retailers and at the warehouse.In Routroy and Koda li’s paper [2] mathematical mod elling and differential evolution algorithms are used. A three-echelon inventory system is considered consisting of a retailer, a warehouse, and a manufacturer.Han and Damrongwongsiri’s [69] purpose is establishing a strategic resource allocation model to capture and encapsulate the complexity of the modern global SC management problem. A mathematical model is constructed to describe the stochastic multi-period two-echelon inventory with the many to-many demand–supplier network problem. Genetic algorithm (GA) is applied to derive near optimal solutions through a two-stage optimization process. Demand in each period can be represented by the probability distribution, such as normal distribution or exponential distribution.Most of the papers reviewed here use simulation with mathematical modelling. They consider intensively two-echelon inventory system with stochastic demand, 1, 3, or N-echelon systems are rarely considered. They gain exact or approximate solutions.METRIC modelling techniqueMoinzadeh and Aggarwal [11] use METRIC modelling and simulation techniques in their study, while Andersson and Melchiors [42] and Wang et al. [73] use METRIC modelling only. The three of them consider a two-echelon inventory system with stochastic demand, and obtain approximate solutions.Moinzadeh and Aggarwal [11] study a (S-1,S)-type multi-echelon inventory system where all the stocking locations have the option toreplenish their inventory through either a normal or a more expensive emergency resupply channel. Wang et al. [73] study the impact of such centre-dependent depotreplenishment lead times (DRLTs) on system performance. Andersson and Melchiors [42] evaluate and optimize S-1,S-policies for a two-echelon inventory system consisting of one central warehouse and an arbitrary number of retailers.Markov decision process techniqueIida [74], Chen and Song [75], Chen et al. [76], and Minner et al. [77] use the Markov decision process in their studies, while Chiang and Monahan [10] use Markov decision process and scenario analysis, and Johansen [78] uses Markov decision process, simulation, and Erlang’s loss formula together. Iida [74] and Chen and Song [75] consider an N-echelon inventory system, but under stochastic demand in the first study and Markov-modulated demand in the second one, respectively. Chen et al. [76], Minner et al. [77], and Chiang and Monahan [10] consider a two-echelon inventory system with stochastic demand. Johansen [78] considers a single-item inventory system and a sequential supply system with stochastic demand.The main purpose of Iida’s [74] paper is to show that near-myopic policies are acceptable for a multiechelon inventory problem. It is assumed that lead times at each echelon are constant. Chen and Song’s[75] objective is to minimize the long-run average costs in the system. In the system by Chen et al. [76], each location employs a periodic-review (R,nQ), or lot-size reorder point inventory policy. They show that each location’s inventory positions are stationary and the stationary distribution is uniform and independent of any other. In the study by Minner et al. [77], the impact of manufacturing flexibility on inventory investments in a distribution network consisting of a central depot and a number of localOther techniquesIn multi-echelon inventory management there are some other research techniques used in literature, such as heuristics, vari-METRIC method, fuzzysets, model predictive control, scenario analysis, statistical analysis, and GAs. These methods are used rarely and only by a few authors.The paper by Chandra and Grabis quantifies the bullwhip effect in the case of serially correlated external demand, if autoregressive models are applied to obtain multiple steps demand forecasts. Here, under autoregressive demand, inventory management of a two-echelon SC consisting of a retailer and a distributor is considered. It is assumed that the lead time is deterministic. The papers using the other techniques consider (one-, two-, three-, four-, five-, or N-echelon systems) assume stochastic, constant, fuzzy, or deterministic demand and lead times. All of them obtain approximate solutions.4 FINDINGS OF THE LITERATURE REVIEWLimited echelons of a multi-echelon inventory system is usually considered in the literature. They rarely generalize their models to N-echelon. Similarly, they usually consider serial systems, instead of a tree conformation.The authors generally assume demand and lead times to be stochastic, deterministic, constant, or negligible. There are only a few studies that find these variables with heuristics, fuzzy logic, and GAs. These techniques are not examined adequately yet in inventory management in multi-echelon SC.In addition, the papers present mostly approximate models. There are a small amount of papers that give exact solutions.译文不确定的需求和交货期供应链下的多级存货管理:运筹学的角度的一个文献回顾塔斯金;费阿德摘要:从历史上看,供应链下的仓储、分销商、零售商等各层级一直都是独立管理,通过维持很多库存来保证价交易的正常进行。
第42卷第5期兵工学报Vol.42No.5 2021年5月ACTA ARMAMENTARII May2021基于深度强化学习的巡飞弹突防控制决策高昂1,董志明1,叶红兵2,宋敬华1,郭齐胜1(1.陆军装甲兵学院演训中心,北京100072;2.湘南学院,湖南郴州423099)摘要:巡飞弹突防控制决策(LMPCD)问题是“多域战”作战概念背景下的重要研究方向。
针对该问题,建立基于马尔可夫决策过程的LMPCD模型。
拟合LMPCD函数与飞行状态-动作值函数,构建基于演员-评论家方法的LMPCD框架,给出基于深度确定性策略梯度算法的深度强化学习模型求解方法,生成巡飞弹突防控制最优决策网络。
通过1000次巡飞弹突防仿真测试,结果表明,巡飞弹执行任务成功率为82.1%,平均决策时间为1.48ms,验证了LMPCD模型及其求解过程的有效性。
关键词:巡飞弹;深度强化学习;马尔可夫决策过程;突防;控制决策中图分类号:E911文献标志码:A文章编号:1000-1093(2021)05-1101-10DOI:10.3969/j.issn.1000-1093.2021.05.023Loitering Munition Penetration Control Decision Based onDeep Reinforcement LearningGAO Ang1,DONG Zhiming1,YE Hongbing2,SONG jinghua1,GUO Qisheng1(itary Exercise and Training Center,Army Academy of Armored Forces,Beijing100072,China;2.Xiangnan University,Chenzhou423099,Hunan,China)Abstract:Loitering munition penetration control decision(LMPCD)is an important research direction under the concept of"multi-domain war冶.The research on real-time route planning of loitering munition penetration has important military significance.Traditional knowledge,reasoning,and planning methods do not have the ability to explore and discover new knowledge outside the framework.The bionic optimization method is suitable for solving the path planning problem in static environment,such as traveling salesman problem,and is difficult to be applied to the penetration problem of loitering munition with high requirement of environmental dynamics and real-time decision-making.For the limitations of the first two methods,the applicability of the deep reinforcement learning method is analyzed,and the domain knowledge of loitering munition is introduced into each element of the deep reinforcement learning algorithm.The flight motion model of loitering munition is analyzed,the state space,action space and reward function of loitering munition are designed,the algorithm framework of loitering munition penetration control decision is analyzed,and the training process of loitering munition penetration control decision algorithm is designed.Through the penetration simulation test of1000rounds of loitering munition,the result shows that the penetration success rate of loitering munition is82.1%and the average decision time is1.48ms,which verifies the effectiveness of the algorithm training process and the收稿日期:2020-07-21基金项目:军队科研计划项目(41405030302、41401020301)作者简介:高昂(1988—),男,博士研究生。
基于混合马尔科夫链模型和扩散的图像检索冯秋燕【摘要】提出一种简单而有效的基于图和多特征融合的通用图像检索框架.对于每个特征,给定查询对象和初始检索的图像,构造一个易处理图,其结点表示图像,边是图像之间的组对相似性得分.基于多个图的随机路径选择模型,采用混合马尔科夫链模型和多特征融合,将多个易处理图融合成融合图,并将扩散应用于融合图以减少噪声.通过与现有方法的实验对比,证明该方法的有效性,提升了基准性能.通过实验评估了该方法的组件、参数和特征组合,验证了该方法的合理性和对参数变化的鲁棒性.%This paper present a simple and effective framework for general image retrieval based on graph and multi -feature fusion.For each feature, given a query object and an initially retrieved image, an easy-to-handle graph was constructed whose nodes represented the image and edges were the group -pair similarity scores between the images. Based on a multi-graph stochastic path selection model,a hybrid Markov chain model and multi-feature fusion were used to fuse multiple easy-to-handle graphs into a fusion graph and applied diffusion to the fusion graph to reduce noise.By comparing with the existing methods,it proved the validity of this method and improved the benchmark performance.The components,parameters and feature combinations of this method were evaluated through experiments,which verified the rationality of the method and the robustness to the parameter changes.【期刊名称】《计算机应用与软件》【年(卷),期】2018(035)005【总页数】7页(P258-263,286)【关键词】多特征融合;图像检索;易处理图;组对相似性得分;融合图;扩散【作者】冯秋燕【作者单位】河南财经政法大学河南郑州453800【正文语种】中文【中图分类】TP3910 引言图像检索在信息检索中占据越来越重要的位置。
Michael L.Littman Brown University/Bellcore Department of Computer Science Brown UniversityProvidence,RI02912-1910mlittman@AbstractIn the Markov decision process(MDP)formaliza-tion of reinforcement learning,a single adaptiveagent interacts with an environment defined by aprobabilistic transition function.In this solipsis-tic view,secondary agents can only be part of theenvironment and are thereforefixed in their be-havior.The framework of Markov games allowsus to widen this view to include multiple adap-tive agents with interacting or competing goals.This paper considers a step in this direction inwhich exactly two agents with diametrically op-posed goals share an environment.It describesa Q-learning-like algorithm forfinding optimalpolicies and demonstrates its application to a sim-ple two-player game in which the optimal policyis probabilistic.1INTRODUCTIONNo agent lives in a vacuum;it must interact with other agents to achieve its goals.Reinforcement learning is a promis-ing technique for creating agents that co-exist[Tan,1993, Yanco and Stein,1993],but the mathematical frame-work that justifies it is inappropriate for multi-agent en-vironments.The theory of Markov Decision Processes (MDP’s)[Barto et al.,1989,Howard,1960],which under-lies much of the recent work on reinforcement learning, assumes that the agent’s environment is stationary and as such contains no other adaptive agents.The theory of games[von Neumann and Morgenstern, 1947]is explicitly designed for reasoning about multi-agent systems.Markov games(see e.g.,[V an Der Wal,1981])is an extension of game theory to MDP-like environments. This paper considers the consequences of using the Markov game framework in place of MDP’s in reinforcement learn-ing.Only the specific case of two-player zero-sum games is addressed,but even in this restricted version there are insights that can be applied to open questions in thefield of reinforcement learning.2DEFINITIONSAn MDP[Howard,1960]is defined by a set of states, ,and actions,.A transition function,:PD,defines the effects of the various actions on the state of the environment.(PD represents the set of discrete probability distributions over the set.)The reward function,:,specifies the agent’s task.In broad terms,the agent’s objective is tofind a policy mapping its interaction history to a current choice of action so as to maximize the expected sum of discounted reward, 0,where is the reward received steps into the future.A discount factor,01controls how much effect future rewards have on the optimal decisions, with small values of emphasizing near-term gain and larger values giving significant weight to later rewards.In its general form,a Markov game,sometimes called a stochastic game[Owen,1982],is defined by a set of states, ,and a collection of action sets,1,one for each agent in the environment.State transitions are controlled by the current state and one action from each agent:: 1PD.Each agent also has an associated reward function,:1, for agent,and attempts to maximize its expected sum of discounted rewards,,where is the reward received steps into the future by agent.In this paper,we consider a well-studied specialization in which there are only two agents and they have diametrically opposed goals.This allows us to use a single reward func-tion that one agent tries to maximize and the other,called the opponent,tries to minimize.In this paper,we use to denote the agent’s action set,to denote the opponent’s action set,and to denote the immediate reward to the agent for taking action in state when its opponent takes action.Adopting this specialization,which we call a two-player zero-sum Markov game,simplifies the mathematics but makes it impossible to consider important phenomena such as cooperation.However,it is afirst step and can be consid-ered a strict generalization of both MDP’s(when1) and matrix games(when1).As in MDP’s,the discount factor,,can be thought of as the probability that the game will be allowed to con-tinue after the current move.It is possible to define a no-tion of undiscounted rewards[Schwartz,1993],but not all Markov games have optimal strategies in the undiscounted case[Owen,1982].This is because,in many games,it is best to postpone risky actions indefinitely.For current pur-poses,the discount factor has the desirable effect of goading the players into trying to win sooner rather than later.3OPTIMAL POLICIESThe previous section defined the agent’s objective as max-imizing the expected sum of discounted reward.There are subtleties in applying this definition to Markov games, however.First,we consider the parallel scenario in MDP’s. In an MDP,an optimal policy is one that maximizes the expected sum of discounted reward and is, meaning that there is no state from which any other policy can achieve a better expected sum of discounted reward. Every MDP has at least one optimal policy and of the op-timal policies for a given MDP,at least one is stationary and deterministic.This means that,for any MDP,there is a policy:that is optimal.The policy is called stationary since it does not change as a function of time and it is called deterministic since the same action is always chosen whenever the agent is in state,for all.For many Markov games,there is no policy that is undomi-nated because performance depends critically on the choice of opponent.In the game theory literature,the resolu-tion to this dilemma is to eliminate the choice and evaluate each policy with respect to the opponent that makes it look the worst.This performance measure prefers conservative strategies that can force any opponent to a draw to more daring ones that accrue a great deal of reward against some opponents and lose a great deal to others.This is the essence of minimax:Behave so as to maximize your reward in the worst case.Given this definition of optimality,Markov games have several important properties.Like MDP’s,every Markov game has a non-empty set of optimal policies,at least one of which is stationary.Unlike MDP’s,there need not be a deterministic optimal policy.Instead,the optimal stationary policy is sometimes probabilistic,mapping states to discrete probability distributions over actions,:PD.A classic example is“rock,paper,scissors”in which any deterministic policy can be consistently defeated.The idea that optimal policies are sometimes stochastic may seem strange to readers familiar with MDP’s or games with alternating turns like backgammon or tic-tac-toe,since in these frameworks there is always a deterministic policy that does no worse than the best probabilistic one.The need for probabilistic action choice stems from the agent’s uncer-tainty of its opponent’s current move and its requirement to avoid being“second guessed.”Agentrock paper scissors0-1 Opponent paper010 Table1:The matrix game for“rock,paper,scissors.”paper scissors(vs.rock) rock scissors(vs.paper) rock paper(vs.scissors) rock paper scissors1Table2:Linear constraints on the solution to a matrix game. 4FINDING OPTIMAL POLICIESThis section reviews methods forfinding optimal policies for matrix games,MDP’s,and Markov games.It uses a uni-form notation that is intended to emphasize the similarities between the three frameworks.To avoid confusion,func-tion names that appear more than once appear with different numbers of arguments each time.4.1MATRIX GAMESAt the core of the theory of games is the matrix game defined by a matrix,,of instantaneous ponentis the reward to the agent for choosing action when its opponent chooses action.The agent strives to maximize its expected reward while the opponent tries to minimize it.Table1gives the matrix game corresponding to“rock, paper,scissors.”The agent’s policy is a probability distribution over actions, PD.For“rock,paper,scissors,”is made up of 3components:rock,paper,and scissors.According to the notion of optimality discussed earlier,the optimal agent’s minimum expected reward should be as large as possible. How can wefind a policy that achieves this?Imagine that we would be satisfied with a policy that is guaranteed an expected score of no matter which action the opponent chooses.The inequalities in Table2,with0,constrain the components of to represent exactly those policies—any solution to the inequalities would suffice.For to be optimal,we must identify the largest for which there is some value of that makes the constraints hold.Linear programming(see,e.g.,[Strang,1980])is a general technique for solving problems of this kind.In this example,linear programmingfinds a value of0for and (1/3,1/3,1/3)for.We can abbreviate this linear program as:maxPDminwhere expresses the expected reward to the agent for using policy against the opponent’s action.4.2MDP’sThere is a host of methods for solving MDP’s.This section describes a general method known as value iteration[Bert-sekas,1987].The value of a state,,is the total expected discounted reward attained by the optimal policy starting from state .States for which is large are“good”in that a smart agent can collect a great deal of reward starting from those states.The quality of a state-action pair,is the total expected discounted reward attained by the non-stationary policy that takes action from stateand then follows the optimal policy from then on.These functions satisfy the following recursive relationship for all and:(1)max(2) This says that the quality of a state-action pair is the im-mediate reward plus the discounted value of all succeeding states weighted by their likelihood.The value of a state is the quality of the best action for that state.It follows that knowing is enough to specify an optimal policy since in each state,we can choose the action with the highest -value.The method of value iteration starts with estimates for and and generates new estimates by treating the equal signs in Equations1–2as assignment operators.It can be shown that the estimated values for and converge to their true values[Bertsekas,1987].4.3MARKOV GAMESGiven,an agent can maximize its reward using the “greedy”strategy of always choosing the action with the highest-value.This strategy is greedy because it treats as a surrogate for immediate reward and then acts to maximize its immediate gain.It is optimal because the -function is an accurate summary of future rewards.A similar observation can be used for Markov games once we redefine to be the expected reward for the optimal policy starting from state,and as the expected reward for taking action when the opponent chooses from state and continuing optimally thereafter.We can then treat the values as immediate payoffs in an unrelated sequence of matrix games(one for each state,), each of which can be solved optimally using the techniques of Section4.1.Thus,the value of a state in a Markov game ismaxPDminand the quality of action against action in state is The resulting recursive equations look much like Equations1–2and indeed the analogous value iteration algorithm can be shown to converge to the correct values[Owen,1982]. It is worth noting that in games with alternating turns,thevalue function need not be computed by linear programming since there is an optimal deterministic policy.In this case we can write max min.5LEARNING OPTIMAL POLICIES Traditionally,solving an MDP using value iteration involvesapplying Equations1–2simultaneously over all. Watkins[Watkins,1989]proposed an alternative approach that involves performing the updates asynchronously with-out the use of the transition function,.In this Q-learning formulation,an update is performed byan agent whenever it receives a reward of when making a transition from to after taking action.The up-date is:which takes the place ofEquation1.The probability with which this happens is precisely which is why it is possible for an agent to carry out the appropriate update without explicitly using .This learning rule converges to the correct values for and,assuming that every action is tried in every state infinitely often and that new estimates are blended with previous ones using a slow enough exponentially weighted average[Watkins and Dayan,1992].It is straightforward,though seemingly novel,to apply thesame technique to solving Markov games.A completely specified version of the algorithm is given in Figure1.The variables in thefigure warrant explanation since some are given to the algorithm as part of the environment,others are internal to the algorithm and still others are parameters of the algorithm itself.V ariables from the environment are:the state set,S;the action set,A;the opponent’s action set,O;and the discount factor,gamma.The variables internal to the learner are:a learning rate,alpha,which is initialized to1.0and decays over time;the agent’s estimate of the-function,Q;the agent’s estimate of the-function,V;and the agent’s cur-rent policy for state,pi[s,.].The remaining variables are parameters of the algorithm:explor controls how often the agent will deviate from its current policy to en-sure that the state space is adequately explored,and decay controls the rate at which the learning rate decays.This algorithm is called minimax-Q since it is essentially identical to the standard Q-learning algorithm with a mini-max replacing the max.6EXPERIMENTSThis section demonstrates the minimax-Q learning algo-rithm using a simple two-player zero-sum Markov game modeled after the game of soccer.Initialize:With probability explor,return an action uniformly at random. Otherwise,if current state is s,Return action a with probability pi[s,a].Learn:6.1SOCCERThe game is played on a4x5grid as depicted in Figure2. The two players,A and B,occupy distinct squares of the grid and can choose one of5actions on each turn:N,S,E, W,and stand.Once both players have selected their actions, the two moves are executed in random order.The circle in thefigures represents the“ball.”When the player with the ball steps into the appropriate goal(left for A,right for B),that player scores a point and the board is reset to the configuration shown in the left half of the figure.Possession of the ball goes to one or the other player at random.When a player executes an action that would take it to the square occupied by the other player,possession of the ball goes to the stationary player and the move does not take place.A good defensive maneuver,then,is to stand where the other player wants to go.Goals are worth one point and the discount factor is set to0.9,which makes scoring sooner somewhat better than scoring later.For an agent on the offensive to do better than breaking even against an unknown defender,the agent must use a probabilistic policy.For instance,in the example situation shown in the right half of Figure2,any deterministic choice for A can be blocked indefinitely by a clever opponent.Only by choosing randomly between stand and S can the agent guarantee an opening and therefore an opportunity to score.6.2TRAINING AND TESTINGFour different policies were learned,two using the minimax-Q algorithm and two using Q-learning.For each learning algorithm,one learner was trained against a ran-dom opponent and the other against another learner of identical design.The resulting policies were named MR, MM,QR,and QQ for minimax-Q trained against random, minimax-Q trained against minimax-Q,Q trained against random,and Q trained against Q.The minimax-Q algorithm(MR,MM)was as described in Figure1with02and10log001106 09999954and learning took place for one million steps. (The value of decay was chosen so that the learning rate reached0.01at the end of the run.)The Q-learning algo-rithm(QR,QQ)was identical except a“max”operator was used in place of the minimax and the-table did not keep information about the opponent’s action.Parameters were set identically to the minimax-Q case.For MR and QR,the opponent for training was afixed policy that chose actions uniformly at random.For MM and QQ, the opponent was another learner identical to thefirst but with separate and-tables.The resulting policies were evaluated in three ways.First, each policy was run head-to-head with a random policy for one hundred thousand steps.To emulate the discount factor,every step had a0.1probability of being declared a draw.Wins and losses against the random opponent were tabulated.The second test was a head-to-head competition with a hand-built policy.This policy was deterministic and had simple rules for scoring and blocking.In100,000steps,it completed5600games against the random opponent and won99.5%of them.The third test used Q-learning to train a“challenger”op-ponent for each of MR,MM,QR and QQ.The training procedure for the challengers followed that of QR where the“champion”policy was heldfixed while the challenger was trained against it.The resulting policies were then evaluated against their respective champions.This test was repeated three times to ensure stability with only thefirst reported here.All evaluations were repeated three times and averaged.6.3RESULTSTable3summarizes the results.The columns marked “games”list the number of completed games in100,000 steps and the columns marked“%won”list the percent-age won by the associated policy.Percentages close to50 indicate that the contest was nearly a draw.All the policies did quite well when tested against the ran-dom opponent.The QR policy’s performance was quite remarkable,however,since it completed more games than the other policies and won nearly all of them.This might be expected since QR was trained specifically to beat this op-ponent whereas MR,though trained in competition with the random policy,chooses actions with an idealized opponent in mind.Against the hand-built policy,MM and MR did well, roughly breaking even.The MM policy did marginally better.In the limit,this should not be the case since an agent trained by the minimax-Q algorithm should be in-sensitive to the opponent against which it was trained and always behave so as to maximize its score in the worst case. The fact that there was a difference suggests that the algo-rithm had not converged on the optimal policy yet.Prior to convergence,the opponent can make a big difference to the behavior of a minimax-Q agent since playing against a strong opponent means the training will take place in important parts of the state space.The performance of the QQ and QR policies against the hand-built policy was strikingly different.This points out an important consequence of not using a minimax criterion.A close look at the two policies indicated that QQ,by luck, implemented a defense that was perfect against the hand-built policy.The QR policy,on the other hand,happened to converge on a strategy that was not appropriate.Against a slightly different opponent,the tables would have been turned.The fact that the QQ policy did so well against the ran-dom and hand-built opponents,especially compared to the%won games%won gamesvs.random99.3720099.58600 vs.hand-built53.7530076.33300 vs.MR-challengervs.MM-challenger37.54400vs.QR-challengervs.QQ-challenger0.01200 Table3:Results for policies trained by minimax-Q(MR and MM)and Q-learning(QR and QQ).minimax policies,was somewhat surprising.Simultane-ously training two adaptive agents using Q-learning is not mathematically justified and in practice is prone to“lock-ing up,”that is,reaching a mutual local maximum in which both agents stop learning prematurely(see,e.g.,[Boyan, 1992]).In spite of this,some researchers have re-ported amazing success with this approach[Tesauro,1992, Boyan,1992]and it seemed to have been successful in this instance as well.The third experiment was intended to measure the worst case performance of each of the policies.The learned poli-cies were heldfixed while a challenger was trained to beat them.This is precisely the scenario that the minimax poli-cies were designed for because,in a fair game such as this,it should break even against even the strongest chal-lenger.The MR policy did not quite achieve this level of performance,indicating that one million steps against a random opponent was insufficient for convergence to the optimal strategy.The MM policy did slightly better,win-ning against its challenger more often than did MR.It was beatable but only barely so.The algorithms trained by Q-learning did significantly worse and were incapable of scoring at all against their challengers.This was due in great part to the fact that Q-learning is designed tofind deterministic policies and every deterministic offense in this game has a perfect de-fense,much like rock,paper,scissors.Correctly extending Q-learning tofind optimal probabilistic policies is exactly what minimax-Q was designed for.7DISCUSSIONThis paper explores the Markov game formalism as a math-ematical framework for reasoning about multi-agent envi-ronments.In particular,the paper describes a reinforcement learning approach to solving two-player zero-sum games in which the“max”operator in the update step of a standard Q-learning algorithm is replaced by a“minimax”operator that can be evaluated by solving a linear program.The use of linear programming in the innermost loop of a learning algorithm is somewhat problematic since the com-putational complexity of each step is large and typically many steps will be needed before the system reaches con-vergence.It is possible that approximate solutions to the linear programs would suffice.Iterative methods are also quite promising since the relevant linear programs change slowly over time.For most applications of reinforcement learning to zero-sum games,this is not an impediment.Games such as check-ers[Samuel,1959],tic-tac-toe[Boyan,1992],backgam-mon[Tesauro,1992],and Go[Schraudolph et al.,1994] consist of a series of alternating moves and in such games the minimax operator can be implemented extremely effi-ciently.The strength of the minimax criterion is that it allows the agent to converge to afixed strategy that is guaranteed to be “safe”in that it does as well as possible against the worst possible opponent.It can be argued that this is unnecessary if the agent is allowed to adapt continually to its opponent. This is certainly true to some extent but any such agent will in principle be vulnerable to a devious form of trickery in which the opponent leads the agent to learn a poor policy and then exploits it.Identifying an opponent of this type for the Q-learning agent described in this paper would be an interesting topic for future research.The use of the minimax criterion and probabilistic policies is closely connected to other current research.First,a minimax criterion can be used in single-agent environments to produce more risk-averse behavior[Heger,1994].Here, the random transitions of the environment play the role of the opponent.Secondly,probabilistic policies have been used in the context of acting optimally in environments where the agent’s perception is incomplete[Singh et al., 1994].In these environments,random actions are used to combat the agent’s uncertainty as to the true state of its environment much as random actions in games help deal with the agent’s uncertainty of the opponent’s move. Although two-player Markov games are a fairly restricted class of multi-agent environments,they are of independent interest and include Markov decision processes as a special case.Applying insights from the theory of cooperative and multi-player games could also prove fruitful although finding useful connections may be challenging. AcknowledgmentsThanks to David Ackley,Justin Boyan,Tony Cassandra, and Leslie Kaelbling for ideas and suggestions.References[Barto et al.,1989]Barto, A.G.;Sutton,R.S.;and Watkins,C.J.C.H.1989.Learning and sequential decision making.Technical Report89-95,Department of Computer and Information Science,University of Massachusetts,Amherst,Massachusetts.Also pub-lished in Learning and Computational Neuroscience: Foundations of Adaptive Networks,Michael Gabriel and John Moore,editors.The MIT Press,Cambridge,Mas-sachusetts,1991.[Bertsekas,1987]Bertsekas,D.P.1987.Dynamic Pro-gramming:Deterministic and Stochastic Models.Prentice-Hall.[Boyan,1992]Boyan,Justin A.1992.Modular neural net-works for learning context-dependent game strategies.Master’s thesis,Department of Engineering and Com-puter Laboratory,University of Cambridge,Cambridge, England.[Heger,1994]Heger,Matthias1994.Consideration of risk in reinforcement learning.In Proceedings of the Machine Learning Conference.To appear.[Howard,1960]Howard,Ronald A.1960.Dynamic Pro-gramming and Markov Processes.The MIT Press,Cam-bridge,Massachusetts.[Owen,1982]Owen,Guillermo1982.Game Theory:Sec-ond edition.Academic Press,Orlando,Florida. [Samuel,1959]Samuel,A.L.1959.Some studies in ma-chine learning using the game of checkers.IBM Journal of Research and Development3:211–229.Reprinted inE.A.Feigenbaum and J.Feldman,editors,Computersand Thought,McGraw-Hill,New Y ork1963. [Schraudolph et al.,1994]Schraudolph,Nicol N.;Dayan, Peter;and Sejnowski,Terrence ing the td(lambda)algorithm to learn an evaluation function for the game of go.In Advances in Neural Information Pro-cessing Systems6,San Mateo,CA.Morgan Kaufman.To appear.[Schwartz,1993]Schwartz,Anton1993.A reinforcement learning method for maximizing undiscounted rewards.In Proceedings of the Tenth International Conference on Machine Learning,Amherst,Massachusetts.Morgan Kaufmann.298–305.[Singh et al.,1994]Singh,Satinder Pal;Jaakkola,Tommi;and Jordan,Michael I.1994.Model-free reinforcement learning for non-markovian decision problems.In Pro-ceedings of the Machine Learning Conference.To ap-pear.[Strang,1980]Strang,Gilbert1980.Linear Algebra and its applications:second edition.Academic Press,Or-lando,Florida.[Tan,1993]Tan,M.1993.Multi-agent reinforcement learning:independent vs.cooperative agents.In Pro-ceedings of the Tenth International Conference on Ma-chine Learning,Amherst,Massachusetts.Morgan Kauf-mann.[Tesauro,1992]Tesauro,G.J.1992.Practical issues in temporal difference.In Moody,J.E.;Lippman,D.S.;and Hanson,S.J.,editors1992,Advances in Neural In-formation Processing Systems4,San Mateo,CA.Mor-gan Kaufman.259–266.[V an Der Wal,1981]V an Der Wal,J.1981.Stochastic dy-namic programming.In Mathematical Centre Tracts 139.Morgan Kaufmann,Amsterdam.[von Neumann and Morgenstern,1947]von Neumann,J.and Morgenstern,O.1947.Theory of Games and Eco-nomic Behavior.Princeton University Press,Princeton, New Jersey.[Watkins and Dayan,1992]Watkins, C.J. C.H.and Dayan,P.1992.Q-learning.Machine Learning 8(3):279–292.[Watkins,1989]Watkins,C.J.C.H.1989.Learning with Delayed Rewards.Ph.D.Dissertation,Cambridge Uni-versity.[Yanco and Stein,1993]Yanco,Holly and Stein,Lynn An-drea1993.An adaptive communication protocol for co-operating mobile robots.In Meyer,Jean-Arcady;Roit-blat,H.L.;and Wilson,Stewart W.,editors1993,From Animals to Animats:Proceedings of the Second Interna-tional Conference on the Simultion of Adaptive Behavior.MIT Press/Bradford Books.478–485.。