The Maximum Likelihood Degree

格式：pdf
大小：330.37 KB
文档页数：32

下载文档原格式

/ 32

最大似然检测

最大似然检测最大似然检测（Maximum Likelihood，ML）检测，也被称作最大似然序列估计（MLSE），从严格意义上讲它不是均衡方案而是接收机方式，其中接收端的检测处理显式地考虑了无线信道时间弥散的影响。

从根本上讲，ML检测器考虑了时间弥散对接收信号的影响，用整个接收信号来确定最有可能被发送的序列。

为了实现最大似然检测，通常使用Viterbi算法。

然而，尽管基于Viterbi算法的最大似然检测被广泛应用于诸如GSM的2G通信，该算法还是因为太过复杂而无法应用在LTE上，这是因为更宽的传输带宽将导致更广泛的信道频率选择性和更高的采样速率。

总的来说，信号信息经过信道估计和均衡后，通过资源逆映射映射到不同的物理信道上进行处理。

1.1最大似然估计原理给定一个概率分布D，假定其概率密度函数（连续分布）或概率聚集函数（离散分布）为fD，以及一个分布参数θ，我们可以从这个分布中抽出一个具有n个值的采样通过利用fD，我们就能计算出其概率：但是，我们可能不知道θ的值，尽管我们知道这些采样数据来自于分布D。

那么我们如何才能估计出θ呢？一个自然的想法是从这个分布中抽出一个具有n个值的采样，然后用这些采样数据来估计θ.一旦我们获得，，我们就能从中找到一个关于θ的估计。

最大似然估计会寻找关于θ的最可能的值（即，在所有可能的θ取值中，寻找一个值使这个采样的“可能性”最大化）。

这种方法正好同一些其他的估计方法不同，如θ的非偏估计，非偏估计未必会输出一个最可能的值，而是会输出一个既不高估也不低估的θ值。

要在数学上实现最大似然估计法，我们首先要定义似然函数:并且在θ的所有取值上，使这个函数最大化。

这个使可能性最大的值即被称为θ的最大似然估计。

1.2最大似然译码算法在LTE上的应用假定调制星座图中的所有信号都是等概的，最大似然译码器对所有可能的见，和妥2值，从信号调制星座图中选择一对信号(二.，见2)使下面的距离量度最小(1)化简得最大似然译码判决准则为：(2)上式中:C为调制符号对所有可能的集合;和是通过合并接收信号和信道状态信息构造产生的两个判决统计。

马尔可夫网络的参数估计方法(十)

马尔可夫网络的参数估计方法马尔可夫网络是一种描述随机过程的数学工具，它可以用来建模时间序列数据、自然语言处理等领域。

在实际应用中，我们通常需要对马尔可夫网络的参数进行估计，以便更准确地模拟和预测系统的行为。

在本文中，我们将讨论一些常见的马尔可夫网络参数估计方法，并对它们的优缺点进行比较。

1. 最大似然估计（Maximum Likelihood Estimation, MLE）最大似然估计是一种常见的参数估计方法，它通过最大化观测数据的似然函数来估计参数值。

对于马尔可夫链模型来说，我们可以通过观测数据的转移概率来估计状态转移矩阵。

具体来说，对于一个马尔可夫链模型，我们可以定义观测数据的似然函数为所有状态转移的联合概率，然后通过最大化这个似然函数来估计状态转移矩阵的参数值。

虽然最大似然估计是一种直观简单的估计方法，但是它也存在一些缺点。

首先，当观测数据较少时，似然函数可能存在多个局部最优解，使得估计结果不够稳定。

其次，当模型的参数维度较高时，最大似然估计可能会导致过拟合，从而影响模型的泛化能力。

2. 贝叶斯估计（Bayesian Estimation）贝叶斯估计是一种基于贝叶斯统计理论的参数估计方法，它通过引入先验概率分布来对参数进行估计。

对于马尔可夫链模型来说，我们可以通过引入状态转移概率的先验分布来对状态转移矩阵进行估计。

具体来说，我们可以选择一个合适的先验分布，然后通过观测数据来更新参数的后验分布，最终得到参数的估计值。

贝叶斯估计的优点在于它可以有效地利用先验信息，从而提高参数估计的稳定性和泛化能力。

另外，贝叶斯估计还可以提供参数估计的不确定性信息，这对于模型的评估和选择非常有帮助。

然而，贝叶斯估计也存在一些问题，比如选择合适的先验分布可能会影响参数估计的结果，而且计算复杂度较高。

3. 最大后验概率估计（Maximum a posteriori Estimation, MAP）最大后验概率估计是贝叶斯估计的一种特殊情况，它通过最大化后验概率来估计参数值。

机器学习题库

机器学习题库一、极大似然1、 ML estimation of exponential model （10)A Gaussian distribution is often used to model data on the real line ， but is sometimesinappropriate when the data are often close to zero but constrained to be nonnegative 。

In such cases one can fit an exponential distribution, whose probability density function is given by()1xb p x e b-=Given N observations x i drawn from such a distribution ：(a) Write down the likelihood as a function of the scale parameter b.（b ） Write down the derivative of the log likelihood 。

(c ） Give a simple expression for the ML estimate for b 。

2、换成Poisson 分布：()|,0,1,2,...!x e p x y x θθθ-==()()()()()1111log |log log !log log !N Ni i i i N N i i i i l p x x x x N x θθθθθθ======--⎡⎤=--⎢⎥⎣⎦∑∑∑∑二、贝叶斯1、贝叶斯公式应用假设在考试的多项选择中,考生知道正确答案的概率为p ，猜测答案的概率为1-p ，并且假设考生知道正确答案答对题的概率为1，猜中正确答案的概率为1m ，其中m 为多选项的数目。

stata极大似然估计的实例

stata极大似然估计的实例Stata极大似然估计的实例：一步一步回答简介：极大似然估计（Maximum Likelihood Estimation, MLE）是一种常用的参数估计方法，其基本思想是找到一组参数值使得给定数据的似然函数达到最大。

Stata作为一种流行的统计分析软件，提供了丰富的功能和命令来实现极大似然估计。

本文将以实例的形式介绍如何使用Stata进行极大似然估计，并逐步解释相关的步骤和概念。

实例背景：假设我们有一组来自二项分布的数据，我们希望通过极大似然估计来估计出分布的参数。

步骤1：准备数据首先，我们需要准备数据。

假设我们有一个样本容量为100的二项分布数据，其中成功的次数为40次，失败的次数为60次。

步骤2：构建似然函数在进行极大似然估计之前，我们需要构建似然函数。

对于二项分布，似然函数的形式是：L(p) = (n choose k) * p^k * (1-p)^(n-k)，其中n是样本容量，k是成功次数，p是成功的概率。

在Stata中，我们可以使用"ml model"命令来指定模型和似然函数的形式。

在本例中，我们使用二项分布的似然函数，其中p是我们要估计的参数。

步骤3：指定模型和似然函数在Stata中，我们可以使用以下命令来指定模型和似然函数：stataclearset seed 12345input success failure40 60endml model d2 (success = failure, noweight)ml maximize上述命令的含义是：清除现有数据，设置随机数种子，输入我们的样本数据，然后使用“ml model”命令指定模型和似然函数。

在这里，d2代表二项分布，success和failure是数据变量，noweight表示没有加权。

最后，我们使用“ml maximize”命令来最大化似然函数。

步骤4：查看估计结果在进行极大似然估计后，Stata会返回估计的参数值和其他统计信息。

最大约登指数对应的截断值

最大约登指数对应的截断值
最大约登指数（Maximum Likelihood Estimation，简称MLE）是一种估计参数值的统计方法，常用于概率分布的参数估计。

在最大约登估计中，参数的估计值是使似然函数取得最大值时的参数值。

截断值（cutoff value）指的是对于某个连续随机变量，其取值必须大于或小于某个特定值的情况。

在使用最大约登估计进行参数估计时，可能会考虑截断值来限制参数的范围。

具体来说，截断值可以通过设定一个上限或下限来限制参数的取值范围。

根据具体的分布和问题，截断值的选择可能有所不同。

例如，对于正态分布（Normal distribution），如果我们知道变量的取值必须大于某个数值，则可以设置一个下限，即使得所有估计的参数都保持在给定的下限以上。

需要注意的是，截断值的选择应该基于具体的问题和样本数据的特点，需要进行合理的判断和探究。

没有一个通用的公式或规则来确定截断值，它通常是根据实际需求和领域知识来确定的。

因此，在使用最大约登估计进行参数估计时，如果需要考虑截断值，应该根据具体的问题和数据特点来确定合适的截断值。

最大似然法

最大似然法最大似然法（the method of maximum likelihood）也称极大似然法，它最早是由高斯所提出的,后来由英国统计学家费歇于1912年在其一篇文章中重新提出,并且证明了这个方法的一些性质.最大似然估计这一名称也是费歇给的.它是建立在最大似然原理的基础上的一个统计方法.为了对最大似然原理有一个直观的认识,我们先来看一个例子.例设有外形完全相同的两个箱子,甲箱有99个白球1个黑球,乙箱有1个白球99个黑球.今随机地抽取一箱,然后再从这箱中任取一球,结果发现是白球.问这个箱子是甲箱还是乙箱?分析注意我们这里做的是统计推断而不是逻辑推断。

所谓统计推断，就是根据已知的部分数据对总体的进行估计的一种推断方法。

从部分推断总体，必然伴随着一定的犯错误的概率。

因此从逻辑上认起死理来，统计推断似乎因为不太严谨而被排斥在“科学推断”之外了。

但是在实际生活中，如果都要按照逻辑推断来思考，那么将会给你的生活带来很大的麻烦。

比如出门，则难免会有一定的概率出一定的意外，因此所谓“安全回家”在逻辑上便不再是绝对可靠的，故而你只能选择闭门不出。

回到刚才的例题。

现在的问题是，仅仅从取出的球是白球这一点是无法从逻辑上严格加以判定该箱究竟是甲箱还是乙箱的。

但是如果现在一定要我们做出选择，那么我们只能这样来考虑：从箱中取出的球是白球这一点来看，甲箱和乙箱哪个看上去更像是真正从中取球的箱子？我们这样来分析：如果该箱是甲箱,则取得白球的概率为0.99；如果该箱是乙箱,则取得白球的概率0.01．因此，用“该箱是甲箱”来解释所取的球是白球这一事件更有说服力一些，从而我们判定甲箱比乙箱更像一些。

最后我们做出推断,这球是从甲箱取出的.其实，如果我们从“最大似然”的原文maximum likelihood来看，就会发现这个名称的原始含义就是“看起来最像”的意思。

“看起来最像”，在很多情况下其实就是我们决策时的依据。

一个总体往往都有若干个重要的参数。

计量经济学期末复习总结

第一章导论1．计量经济学是一门什么样的学科？答：“经济计量学”不仅要研究经济问题的计量方法，还要研究经济问题发展变化的数量规律。

可以认为，计量经济学是以经济理论为指导，以经济数据为依据，以数学、统计方法为手段，通过建立、估计、检验经济模型，揭示客观经济活动中存在的随机因果关系的一门应用经济学的分支学科。

2．计量经济学与经济理论、数学、统计学的联系和区别是什么？答：计量经济学是经济理论、数学、统计学的结合，是经济学、数学、统计学的交叉学科（或边缘学科）。

6．计量经济学模型的检验包括哪几个方面？为什么要进行模型的检验？答：对模型的检验通常包括经济意义经验、统计推断检验、计量经济检验、模型预测检验四个方面。

8．计量经济学模型中的被解释变量和解释变量、内生变量和外生变量是如何划分的？答：在联立方程计量经济学模型中，按是否由模型系统决定，将变量分为内生变量（endogenous variables）和外生变量（exogenous variables）两大类。

内生变量是由模型系统决定同时可能也对模型系统产生影响的变量，是具有某种概率分布的随机变量，外生变量是不由模型系统决定但对模型系统产生影响的变量，是确定性的变量。

9．计量经济学模型中包含的变量之间的关系主要有哪些？答：计量经济学模型中变量之间的关系主要是解释变量与被解释变量之间的因果关系，包括单向因果关系、相互影响关系、恒等关系。

12．计量经济学中常用的数据类型有哪些？答：根据生成过程和结构方面的差异，计量经济学中应用的数据可分为时间序列数据（time series data）、截面数据（cross sectional data）、面板数据（panal data）和虚拟变量数据（dummy variables data）。

13．什么是数据的完整性、准确性、可比性、一致性？答：1）完整性，指模型中所有变量在每个样本点上都必须有观察数据，所有变量的样本观察数据都一样多。

不完整数据估计参数的方法

不完整数据估计参数的方法在统计学中，不完整数据是指样本中存在一些缺失或缺损的观测值。

这种情况下，我们需要使用特定的方法来估计参数。

以下是几种常见的不完整数据估计参数的方法：1. 最大似然估计（Maximum Likelihood Estimation, MLE）：最大似然估计是一种常用的参数估计方法，它假设数据的缺失是随机且与完整数据的观测无关的。

该方法通过与已观测到的数据相比较，寻找最大化未观测数据可能性的参数值。

通过最大化似然函数来找到符合已观测数据的最优参数。

2. 指数经验似然估计（Exponential Empirical Likelihood Estimation, EEL）：指数经验似然估计是一种鲁棒的参数估计方法，用于处理不完整数据。

该方法利用指数分布来估计未观测数据的概率密度函数，并最大化观测数据的经验似然函数。

3. 多重插补（Multiple imputation）：多重插补是一种常见的不完整数据处理方法，它通过生成多个完整的数据集来估计参数。

首先，缺失值被随机插补，然后在每个插补数据集上进行参数估计，最后将多个估计结果合并为一个最终的估计值。

这种方法能够提供更可靠的估计结果和更准确的标准误差。

4. 期望最大化算法（Expectation-Maximization algorithm, EM）：期望最大化算法是一种迭代方法，用于估计含有不完整数据的模型参数。

该算法通过迭代地进行两个步骤：期望步骤（E-step）和最大化步骤（M-step）。

在E-step中，通过给出当前的参数估计，计算缺失数据的期望值；在M-step中，通过最大化完整数据的对数似然函数，更新参数估计。

该方法在缺失数据模型中的参数估计中具有良好的性能。

以上所提到的方法是处理不完整数据估计参数常用的方法之一。

根据实际情况和数据特点选择合适的方法，能够有效地提高参数估计的准确性和可靠性。

参数估计

•设总体X的概率密度函数为f（x；θ），其中θ为待估参数。对于从总体中取得的样本观测值，其联合密度函数为Π f（xi；θ ），这是参数的函数，我们称之为θ的似然函数
•L（ θ）=Π f（xi；θ） •MLE就是要求使得似然函数达到极大的θ 作为该参数的估计量，记为ˆ ，并称 ˆ 为参数θ的极大似然估计
统计应用
二战中的经济情报
统计应用
4-2 参数估计
1 参数估计的一般问题 2 一个总体参数的区间估计 3 不同抽样技术的估计（略） 4 样本容量的确定
学习目标
1. 估计量与估计值的概念 2. 点估计与区间估计的区别 3. 评价估计量优良性的标准 4. 一个总体参数的区间估计方法 5. 样本容量的确定方法
总体均值的区间估计
(例题分析)
• 【例4.3】某企业生产某种产品的工人有 1000人，某日采用重复抽样从中随机抽取 100人，调查他们的当日产量为35件，产量的样本标准差为4.5件，试以95.45%的置信度估计平均产量的抽样极限误差和置信区间。
总体均值的区间估计
(例题分析)
【例】一家食品生产企业以生产袋装食品为主，为对产量质量进行监测，企业质检部门经常要进行抽检，以分析每袋重量是否符合要求。现从某天生产的一批食品中随机抽取了25 袋，测得每袋重量（单位：g）如下表所示。已知产品重量的分布服从正态分布，且总体标准差为10g。试估计该批产品平均重量的置信区
统计方法
描述统计
推断统计
参数估计
假设检验
统计推断的过程
总体
样
样本统计量
本
如：样本均值、
比例、方差
1 参数估计的一般问题
1.1 估计量与估计值 1.2 点估计

哈尔滨工业大学深圳-模式识别-2017-考试重要知识点(word文档物超所值)

λ(αi | ωj ) be the loss incurred for taking action αi when the state of nature is ωj.action αi assign the sample into any class-Conditional risk for i = 1,…,a∑===cj j j j i i x P x R 1)|()|()|(ωωαλαSelect the action αi for which R(αi | x) is minimumR is minimum and R in this case is called the Bayes risk = best reasonable result that can be achieved!λij :loss incurred for deciding ωi when the true state of nature is ωjg i (x) = - R(αi | x)max. discriminant corresponds to min. riskg i (x) = P(ωi | x)max. discrimination corresponds to max. posteriorg i (x) ≡ p(x | ωi ) P(ωi ) g i (x) = ln p(x | ωi ) + ln P(ωi )问题由估计似然概率变为估计正态分布的参数问题极大似然估计和贝叶斯估计结果接近相同，但方法概念不同Please present the basic ideas of the maximum likelihood estimation method and Bayesian estimation method. When do these two methods have similar results ?请描述最大似然估计方法和贝叶斯估计方法的基本概念。

大二统计学知识点总结双语

大二统计学知识点总结双语Statistics is a fundamental subject for students majoring in mathematics, economics, and many other fields. In the second year of college, students delve deeper into statistical concepts and methods. In this article, we will summarize the key knowledge points of second-year statistics in both Chinese and English.1. 数据与变量 (Data and Variables)在统计学中，数据是指收集到的事实或观察的结果。

变量是研究中所关注的特征或属性。

数据可以分为定量数据和定性数据两种。

定量数据通常以数字形式表示，而定性数据描述的是非数量性质的特征。

In statistics, data refers to the facts or observations that have been collected. Variables are the characteristics or attributes of interest in a study. Data can be categorized into quantitative data and qualitative data. Quantitative data is usually expressed in numerical form, while qualitative data describes non-quantitative features.2. 描述统计学 (Descriptive Statistics)描述统计学是对已有数据进行总结、展示和分析的方法。

最大似然法名词解释

最大似然法（Maximum Likelihood，ML）是一种具有理论性的点估计法，其基本思想是：当从模型总体随机抽取n组样本观测值后，最合理的参数估计量应该使得从模型中产生的似然值最大。

它考虑了每个位点出现残基的似然值，将每个位置所有可能出现的残基替换概率进行累加，产生特定位点的似然值。

该法对所有可能的系统发育树都计算似然函数，值最大的那棵树即为最可能的系统发育树。

其理论基础是基于两条基本假设：不同的性状进化是独立的；物种发生分歧后进化独立。

最大似然估计（Maximumlikelihoodestimation）

最⼤似然估计（Maximumlikelihoodestimation）最⼤似然估计提供了⼀种给定观察数据来评估模型参数的⽅法，即：“模型已定，参数未知”。

简单⽽⾔，假设我们要统计全国⼈⼝的⾝⾼，⾸先假设这个⾝⾼服从服从正态分布，但是该分布的均值与⽅差未知。

我们没有⼈⼒与物⼒去统计全国每个⼈的⾝⾼，但是可以通过采样，获取部分⼈的⾝⾼，然后通过最⼤似然估计来获取上述假设中的正态分布的均值与⽅差。

最⼤似然估计中采样需满⾜⼀个很重要的假设，就是所有的采样都是独⽴同分布的。

下⾯我们具体描述⼀下最⼤似然估计：⾸先，假设为独⽴同分布的采样，θ为模型参数,f为我们所使⽤的模型，遵循我们上述的独⽴同分布假设。

参数为θ的模型f产⽣上述采样可表⽰为回到上⾯的“模型已定，参数未知”的说法，此时，我们已知的为，未知为θ，故似然定义为: 在实际应⽤中常⽤的是两边取对数，得到公式如下：其中称为对数似然，⽽称为平均对数似然。

⽽我们平时所称的最⼤似然为最⼤的对数平均似然，即：举个别⼈博客中的例⼦，假如有⼀个罐⼦，⾥⾯有⿊⽩两种颜⾊的球，数⽬多少不知，两种颜⾊的⽐例也不知。

我们想知道罐中⽩球和⿊球的⽐例，但我们不能把罐中的球全部拿出来数。

现在我们可以每次任意从已经摇匀的罐中拿⼀个球出来，记录球的颜⾊，然后把拿出来的球再放回罐中。

这个过程可以重复，我们可以⽤记录的球的颜⾊来估计罐中⿊⽩球的⽐例。

假如在前⾯的⼀百次重复记录中，有七⼗次是⽩球，请问罐中⽩球所占的⽐例最有可能是多少？很多⼈马上就有答案了：70%。

⽽其后的理论⽀撑是什么呢？我们假设罐中⽩球的⽐例是p，那么⿊球的⽐例就是1-p。

因为每抽⼀个球出来，在记录颜⾊之后，我们把抽出的球放回了罐中并摇匀，所以每次抽出来的球的颜⾊服从同⼀独⽴分布。

这⾥我们把⼀次抽出来球的颜⾊称为⼀次抽样。

题⽬中在⼀百次抽样中，七⼗次是⽩球的概率是P(Data | M)，这⾥Data是所有的数据，M是所给出的模型，表⽰每次抽出来的球是⽩⾊的概率为p。

写出最大似然估计的一般步骤

写出最大似然估计的一般步骤
最大似然估计（Maximum likelihood estimation，MLE）是统计学中最基本、
重要、有用的概率论估计方法，用于确定不确定系统或过程的参数值。

MLE假设观
察数据同概率分布，并依据已有观测数据确定参数以使得参数设置最能实现现行观察数据生成，从而得出最大似然估计值。

MLE的一般步骤如下：
1、收集数据
首先，我们要获取相关的观测数据，用于进行MLE的估计。

我们可以实验测量获取数据，也可以从文献中收集书面数据。

2、确定模型
在进行MLE的估计之前，我们需要确定估计的模型，如概率密度函数、回归模型等，用于引入变量之间的联系，从而求解参数。

3、求解参数
在MLE过程中，我们要求解参数，即求取特定模型下，参数值使得概率最大，即期望尽可能满足观察序列出现的频率。

4、检验结果
最后，我们可以用卡方检验、贝叶斯校正、最大似然估计检验、Wald检验等方法
对最大似然估计的结果进行检验，以验证最大似然估计是否有效。

总而言之，最大似然估计（MLE）在统计领域成为一项重要的基本应用技术。

MLE的一般步骤是收集数据、确定模型、求解参数以及检验结果。

综上所述，MLE
是一个可靠的统计估计技术，能够可靠有效地求解参数。

Maximum-likelihood decoding method and device

专利名称：Maximum-likelihood decoding method and device发明人：Takao Sugawara,Yoshifumi Mizoshita,Takashi Aikawa,Hiroshi Mutoh,Kiichirou Kasai申请号：US07/910311申请日：19930726公开号：US05432820A公开日：19950711专利内容由知识产权出版社提供摘要：A maximum-likelihood decoding method for decoding an input signal subject to intersymbol interference. An assumption is made of a measure of an interference caused by a future signal that is later in a sequence than an assumed data sequence, on the basis of a predetermined bits of sample values of said input signal that are earlier in a sequence than an assumed data sequence stored in an assumed path memory (104). An assumed sample value of said input signal by referring to this measure of interference. Maximum-likelihood decoding is conducted on said input signal, on the basis of this assumed sample value and a sample value of said input signal; a plurality of survivor paths are generated and stored in a path memory (102); data for the most likely one of survivor paths is output as a decoded data sequence.申请人：FUJITSU LIMITED代理机构：Staas & Halsey更多信息请下载全文后查看。

概率统计英文文献翻译

管理研究中的贝叶斯概率与统计：一个新的地平线摘要一个特殊的问题主要集中在如何使用贝叶斯方法进行估计、推理以及组织研究,并对贝叶斯方法进行补充,以使在某些情况下,可以取代传统的频率统计方法.贝叶斯方法非常适合于解决21世纪所面临的越来越复杂的现象和问题.研究人员和组织人员经常会面临一些非常复杂的数据,而知识和方法的有效性往往被他们视为解决事情的关键.传统的建模技术和概率的频率视图方法将会面临这一新的现实挑战.关键词：贝叶斯,贝叶斯定理,概率,统计,研究方法,科学哲学,频率引言概率自现代以来出现在17世纪,它有两个面(黑客行为,2006).他们通常被称为频率统计和贝叶斯(费伯格,2006).有一段很长的历史关于哲学家、科学家、辩论和统计学家谁更适合估计和推理 (埃弗龙,2005,2010;吉仁泽,1987,1993;好,1989;小,2006）.然而,历史表明,用来协调解决问题的概率存在两种类型.例如,高斯和拉普拉斯所使用的中心极限定理（频率）和后验概率（贝叶斯）为用最小二乘法(施蒂格勒,1986）实现聚合和观测提供了充分的理由.不幸的是,从19世纪中期开始,频率论和贝叶斯概率之间开始分裂（达斯顿,1994,1995）,紧跟着的便是二十世纪（豪伊,2002）费舍尔和其他统计学家推出频率统计方法之间进行争论,如零假设检验(NHST;奥尔德里奇,2008;丹齐格,1994;吉仁泽,斯维基廷克,波特,达斯顿,比蒂,克鲁格1989;麦肯齐,1981;萨贝尔,1989).在第二十世纪后期,组织科学研究人员采用概率的频率论的方法,遵循其他定量社会科学的道路进行探索.由于大多的人采用了这个方法,所以直至今天,组织科学的定量研究依然依赖于p值-最大似然(ML)估计以及其他频率论的统计工具.虽然频率论的方法能够回答某些类型的问题,但“贝叶斯革命”也正在统理数据和其他地方进行着.事实上,贝叶斯分析的一系列科学研究已得到了广泛的使用,如物理、化学、生物学、计算机科学、遗传学、生物信息学、大气科学以及经济学.然而,到目前为止,组织科学人员几乎没有参与或评估过贝叶斯革命所创造的效益.来源于：迈克尔.塞弗,雷德里克.奥斯瓦尔德.管理研究中的贝叶斯概率与统计：一个新的地平线[J].管理杂志,39(1):5-13.附:英文原文Bayesian Probability and Statistics in Management Research: A New HorizonSpecial Issue PurposeThis special issue is focused on how a Bayesian approach to estimation, inference, and reasoning in organizational research might supplement—and in some cases supplant—traditional frequentist approaches. Bayesian methods are well suited to address the increasingly complex phenomena and problems faced by 21st-century researchers and organizations, where very complex data abound and the validity of knowledge and methods are often seen as contextually driven and constructed. Traditional modeling techniques and a frequentist view of probability and method are challenged by this new reality.Keywords: Bayes, Bayesian, probability, statistics, research methods, philosophy of science,frequentistBackgroundSince probability in its modern form emerged in the 1600s, it has had two faces (Hacking,2006). They are often called frequentist and Bayesian (Fienberg, 2006). There is a long history of debate among philosophers, scientists, and statisticians over which is better suited for estimation and inference (Efron, 2005, 2010; Gigerenzer, 1987, 1993; Good, 1989; Little, 2006). However, history shows that both types of probability used to exist harmoniously to solve problems. For example, Gauss and Laplace used the central limit theorem (frequentist) and posterior probabilities (Bayesian) to justify aggregating observations with the method of least squares (Stigler, 1986). Unfortunately, beginning in the mid-1800s, the distinction between frequentist and Bayesian probability became divisive (Daston, 1994, 1995), with continued wrangling in the 20th century (Howie, 2002) as Fisherand other statisticians popularized frequentist methods, such as null hypothesis significance testing (NHST; Aldrich, 2008; Danziger, 1994; Gigerenzer, Swijtink, Porter, Daston, Beatty, & Krüger, 1989; MacKenzie, 1981; Zabell, 1989). Organization science researchers followed the path of most other quantitative social sciences and adopted the frequentist approach to probability in the latter half of the 20th century. The adoption was wholesale, so that today quantitative research in the organization sciences relies on p values, maximum likelihood (ML) estimation, and other frequentist statistical tools.Although frequentist methods are capable of answering certain types of questions, a “Bayesian revolution” is currently under way in statistics and elsewhere. Indeed, Bayesian analysis finds widespread use in a sweeping array of scientific disciplines, such as physics, chemistry, biology, computer science, genetics, bioinformatics, atmospheric science, and economics. To date, however, the organization sciences have hardly participated in or evaluated the benefit of the Bayesian revolution.From：Michael J. Zyphur, University of Melbourne,Frederick L. Oswald, Rice University, Journal of Management 2013,39(1):5-13.。

LikelihoodandMax...

Stochasticity
• Stochasticity refers to things that are unknown. • Models are stochastic because we cannot specify
everything (if we did, we wouldn’t have a model). • The elements that we leave out of models are treated
4. Multiply the probabilities of individual observations.

5. Go back to 1 until you find maximum likelihood estimate
for parameter β.
Likelihood Poisson Process
Scientific hypotheses cannot be treated as outcomes of trials (probabilities) because we will never have the full set of possible outcomes.
What probability distribution would you choose?
dbinom(4,size=24,p=0.12) [1] 0.1709024
Exercise on P(yi|θ)
• The average aboveground biomass in 1 m2 of grassland is 103.4 g/m2 with a std dev of 23.3.
Foreshadowing….what do we need to do to make the area under the curve=1?

数据、模型与决策（运筹学）课后习题和案例答案012s

数据、模型与决策（运筹学）课后习题和案例答案012sCD SUPPLEMENT TO CHAPTER 12DECISION CRITERIA Review Questions12s-1 It might be desirable to use a decision criterion that doesn’t rely on the prior probabilities if these probabilities are not reliable.12s-2 The maximax criterion is a very optimistic criterion that focuses on the best that can happen by choosing the alternative that can yield the maximum of the maximum payoffs.12s-3 The maximin criterion is a very pessimistic criterion that focuses on the worst that can happen by choosing the alternative that can yield the maximum of the minimum payoffs.12s-4 The pessimism-optimism index measures where the decision-maker falls on a scale from totally pessimistic to totally optimistic. This index is used with the realism criterion tocombine the maximax and maximin criteria.12s-5 The maximax and maximin criteria are special cases of the realism criterion where the decision-maker is totally optimistic (index = 1) or totally pessimistic (index = 0).12s-6 The regret from having chosen a particular decision alternative is the maximum payoff minus the actual payoff.12s-7 The regret that can be felt afterward if the decision does not turn out well is being minimized with the minimax regret criterion.12s-8 A totally optimistic person would find the maximax criterion appealing, while a totally pessimistic person would find the maximin criterion appealing. The realism criterionwould appeal to someone who wants to be able to choose how aggressive to be. Theminimax regret criterion would appeal to someone who spends time regretting pastdecisions.12s-9 There is no uniformly reasonable criterion that doesn’t use prior probabilities.12s-10 The maximum likelihood criterion focuses on the most likely state of nature.12s-11 The main criticism of the maximum likelihood criterion is that it does not consider the payoffs for the other states of nature besides the most likely.12s-12 The equally likely criterion assumes that the states of nature are equally likely.12s-13 The main criticism of the equally likely criterion is that it ignores any prior information about the relative likelihood of the various states of nature.Problems 12s.1 a)b)c)d)12s.2 a)b)c)d)12s.3 a)b)c)d)12s.4 a)b)c)d)e)f)12s.5 a)b)c)The above answers demonstrate the objection that making choices between serious alternatives can depend on irrelevant alternatives with this criterion.12s.6 a)b) Choose either the conservative or the counter-cyclical investmentc)d)e) Choose the speculative investment (maximum payoff when stable economy = $10f)。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

a r X i v:mat h /46533v1[mat h.AG ]25J un24The Maximum Likelihood Degree Fabrizio Catanese,Serkan Ho¸s ten,Amit Khetan,and Bernd Sturmfels Abstract Maximum likelihood estimation in statistics leads to the problem of maximizing a product of powers of polynomials.We study the algebraic degree of the critical equations of this optimization prob-lem.This degree is related to the number of bounded regions in the corresponding arrangement of hypersurfaces,and to the Euler charac-teristic of the complexiﬁed complement.Under suitable hypotheses,the maximum likelihood degree equals the top Chern class of a sheaf of logarithmic diﬀerential forms.Exact formulae in terms of degrees and Newton polytopes are given for polynomials with generic coeﬃcients.1Introduction In algebraic statistics [13,21,22],a model for discrete data is a map f :R d →R n whose coordinates f 1,...,f n are polynomial functions in the parameters (θ1,...,θd )=:θ.The parameter vector θranges over an open subset U of R d such that f (θ)lies in the positive orthant R n >0.The image f (U )represents a family of probability distributions on an n -element state space,provided we make the extra assumption that f 1+···+f n −1is the zero polynomial.A given data set is a vector u =(u 1,...,u n )of positive integers.The problem of maximum likelihood estimation is to ﬁnd parameters θwhich best explain the data u .This leads to the following optimization problem:Maximize f 1(θ)u 1f 2(θ)u 2···f n (θ)u n subject to θ∈U .(1)Under suitable assumptions we have an optimal solution ˆθto the problem(1),which is an algebraic function of the data u .Our goal is to compute the degree of that algebraic function.We call this number the maximum1likelihood degree of the model f.Equivalently,the ML degree is the number of complex solutions to the critical equations of(1),for a general data vector u.In this paper we prove results of the following form:Theorem 1.Let f1,...,f n be polynomials of degrees b1,...,b n in d un-knowns.If the maximum likelihood degree of the model f=(f1,...,f n) isﬁnite then it is less than or equal to the coeﬃcient of z d in the generating function(1−z)df1∂f1f2∂f2f3∂f3f4∂f4f1∂f1f2∂f2f3∂f3f4∂f4 (1−2z)4=1+6z+25u1+u2+u3+u4andˆθ2=u1+u2of the top Chern class ofΩ1(log D).If X is projective d-space then this leads to Theorem1.In Section3we study the case when X is a smooth toric variety,and we derive a formula for the ML degree when the f i’s are Laurent polynomials which are generic relative to their Newton polytopes. For instance,Example8shows that the ML degree is13if we replace(3)byf i=αi+βiθ1+γiθ2+δiθ1θ2(i=1,2,3,4).Section4is concerned with the relationship of the ML degree to the bounded regions of the complement of{f i=0}in R d.The number of these regions is a lower bound to the number of real solutions of the critical equa-tions,and therefore a lower bound to the ML degree.We show that for plane quadrics all three numbers can be equal.However,for other combinations of plane curves the ML degree and the number of bounded regions diverge,and we prove a tight upper bound on the latter in Theorem12.Also,following work of Terao[24]and Varchenko[25],we show in Theorem13that the ML degree coincides with the number of bounded regions of the arrangement of hyperplanes{f i=0}when the f i’s are(not necessarily generic)linear forms.Section5revisits the ML degree for toric varieties,replacing the smooth-ness assumption by a much milder condition.Theorem15gives a purely combinatorial formula for the ML degree in terms of the Newton polytopes of the polynomials f i.This section also discusses how resolution of singular-ities can be used to compute the ML degree for nongeneric polynomials.Section6deals with topological methods for determining the ML degree. Theorem19shows that,under certain restrictive hypotheses,it coincides with the Euler characteristic of the complex manifold X\D,and Theorem 22oﬀers a general version of the semi-continuity principle which underlies the inequality in Theorem1.In Section7we relate the ML degree to the sheaf of logarithmic vectorﬁelds along D,which is the sheaf dual toΩ1(logD).This paper was motivated by recent appearances of the concept of ML degree in statistics and computational biology.Chor,Khetan and Snir[7] showed that the ML degree of a phylogenetic model equals9,and Geiger, Meek and Sturmfels[14]proved that an undirected graphical model has ML degree one if and only if it is decomposable.The notion of ML degree also makes sense for certain parametrized models for continuous data:Drton and Richardson[10]showed that the ML degree of a Gaussian graphical model equals5,and Bout and Richards[5]studied the ML degree of certain mixture models.The ML degree always provides an upper bound on the number of3local maxima of the likelihood function.Our ultimate hope is that a better understanding of the ML degree will lead to the development of custom-tailored algorithms for solving the critical equations dlog(f)=0.There is a need for such new algorithms,given that methods currently used in statistics (notably the EM-algorithm)often produce only local maxima in(1).2Critical Points of Rational FunctionsIn this section we work in the following general set-up of algebraic geometry. Let X be a complete factorial algebraic variety over the complex numbers C. We also assume that X is irreducible of dimension d≥1.In applications to statistics,the variety X will often be a smooth projective toric variety.Suppose that f∈C(X)is a rational function on X.Since X is factorial, the local rings O X,x are unique factorization domains.This means that the function f has a global factorization which is unique up to constants:f=F u11F u22···F u r r.(4) Here F i is a prime section of an invertible sheaf O X(D i)where D i is the divisor on X deﬁned by F i.In our applications we usually assume that r≥n where n is the number considered in the Introduction.For instance, if f1,...,f n are polynomials and X=P d then r=n+1;namely,F1,...,F n are the homogenizations of f1,...,f n usingθ0,and F n+1=θ0(see the proof of Theorem1for details).By(4),we can write the divisor of the rational function f uniquely asdiv(f)=r i=1u i D i,where the u i’s are(possibly negative)integers.Let D be the reduced union of the codimension one subvarieties D i⊂X,or,as a divisor,D:=Σr i=1D i.We are interested in computing the critical points of the rational function f on the open set V:=X\D complementary to the divisor D.Especially, we wish to know the number of critical points,counted with multiplicities.A critical point is by deﬁnition a point x∈X where the diﬀerential 1-form d f vanishes.If x is a smooth point on X,and x1,...,x d are local coordinates,then d f=Σd j=1(∂f/∂x j)dx j.Hence x is a critical point of f if4and only if∂f∂x2=···=∂ff =ri=1u i dlog(F i)=r i=1u i dF iinvertible in this local ring for j=i,andωis regular if and only if F i di-videsψi.This implies that the homomorphism which sendsωto the vector ψi(mod F i )i=1,...,r is well deﬁned,and it induces an isomorphism from the quotientΩ1X(logD)/Ω1X onto⊕r i=1O Di.Assume now that X is smooth.Then both sheavesΩ1X(D)andΩ1X are locally free of rank d=dim(X).Hence the intermediate sheafΩ1X(logD)is torsion free of the same rank.Our next result shows thatΩ1X(logD)is locally free if and only if the divisors D i are smooth and intersect transversally. Proposition3.Let x∈X be a smooth point,x1,...,x d local coordinates at x and D1,...,D h the divisors which contain x.Then the sheafΩ1X(logD)is locally free at x if and only if the h×d-matrix(∂F i/∂x j)has rank h at x. Proof.Any local section ofΩ1X(logD)can be written in the formω=ri=1ψi·dlog(F i)+η=h i=1ψi·dlog(F i)+d j=1ηj·dx j.(8)This observation gives rise to a local exact sequence0→O h X,x→O h X,x⊕O d X,x→Ω1X,x(logD)→0.(9) The surjective map on the right takes((ψi),(ηj))to the sum on the right hand side of(8).The injective map on the left takes the h-tuple(A1,...,A h)to((ψi),(ηj))withψi=F i A i andηj=−h l=1A l∂F lIn the above situation where X is smooth andΩ1X(logD)is locally free we shall say that the divisor D has global normal crossings(or GNC). Theorem4.Let X be smooth and assume that D is a GNC divisor.Then1.the section dlog(f)ofΩ1X(logD)does not vanish at any point of D,2.if the divisor D intersects every curve in X(in particular,if D isample)then dlog(f)vanishes only on aﬁnite subset of V=X\D,3.if the above conclusions hold,then the number of critical points of f onV,counted with multiplicities,equals the degree of the top Chern classc d(Ω1X(logD)).Proof.We abbreviateσ:=dlog(f)=Σr i=1u i dlog(F i).By the proof of Proposition3it follows that if(∂F i/∂x j)i=1,...h,j=1,...d has rank h at x,then Ω1X(logD)is locally free of rank d with generators dlog(F i)and some choice of d−h of the dx j.If we writeσin this basis,the coeﬃcients of dlog(F i)are the constants u i while the coeﬃcients of the dx j are some regular functions. Theﬁrst assertion follows immediately since the exponents u i are all nonzero.The second assertion follows from theﬁrst:let Zσbe the zero set of the sectionσ.Since Zσdoes not intersect D,it follows that dim(Zσ)=0.Thirdly,if F is a locally free sheaf of rank d on a smooth variety X of dimension d,andσis a section of H0(F)with a zero scheme Zσof dimension 0,then the length of Zσequals the degree of the top Chern class c d(F).The total Chern class of a sheaf F is the sum c tot(F)=Σd i=0c i(F)z i.This is a polynomial in z whose coeﬃcients are elements in the Chow ring A∗(X). Recall that every element in A∗(X)has a well-deﬁned degree which is the image of its degree d part under the degree homomorphism A d(X)→Z. Corollary5.Suppose that X is smooth and D is a GNC divisor on X which intersects every curve.Then the number of critical points of f,counted with multiplicities,is the degree of the coeﬃcient of z d in the following polynomial:c tot(Ω1X)·Πr i=1(1−zD i)−1∈A∗(X)[z].(10) Proof.The total Chern class c tot(F)is multiplicative with respect to exact sequences,i.e.,if0→A→B→C→0is an exact sequence of sheaves, then c tot(B)=c tot(A)·c tot(C).Hence the sequence(7)implies the result.7In the next section,we apply the formula(10)inthecase whenXis asmooth projective toric variety.The Chow group A d(X)has rank one and is generated by the class of any point.This canonically identiﬁes A d(X)with Z and so any top Chern class can be considered to be a number.Corollary6.Suppose X is a smooth toric variety with boundary divisors ∆1,...,∆s and D is GNC and meets every curve.The number of critical points of f,counted with multiplicity,equals the coeﬃcient of z d inΠs j=1(1−z∆j)θ0,θ2θ0 .The global factorization(4)of this F has r=n+1prime factors,namely,F i=θb i0·f i(θ1θ0)for i=1,...,n,8and F n+1=θ0with u n+1=−b1u1−b2u2−···−b n u n.The Chow ring of X=P d is Z[H]/ H d+1 ,where H represents the hyperplane class.By our genericity hypothesis,the r=n+1prime factors of F are smooth and global normal crossing.They correspond to the following divisor classes:D1=b1H,D2=b2H,...,D n=b n H and D n+1=H. Projective space P d is a smooth toric variety with d+1torus-invariant divisors ∆j,each having the same class H.Hence the formula in(11)specializes to (1−zH)d+1.(1−zb1H)···(1−zb n H) Since we work in the Chow ring of projective space P d,the coeﬃcient of(zH)d is the same as the coeﬃcient of z d in the generating function in(2).We now generalize our results from polynomials ofﬁxed degrees to Lau-rent polynomials withﬁxed Newton polytopes.Recall that the Newton poly-tope of a Laurent polynomial f(θ1,...,θd)is the convex hull of the set of exponent vectors of the monomials appearing in f with nonzero coeﬃcient. Given a convex polytope P⊂R d with vertices in Z d,by a generic Lau-rent polynomial with Newton polytope P we will mean a suﬃciently general C-linear combination of monomials with exponent vectors in P∩Z d.In the next theorem we consider n Laurent polynomials f1,f2,...,f n hav-ing respective Newton polytopes P1,P2,...,P n.Because the f i’s are Laurent polynomials,i.e.,their monomials may have negative exponents,we only consider those critical points of f=f u11f u22···f u n n which lie in the algebraic torus(C∗)d.The number of such critical points(counted with multiplicity) will be called the toric ML degree of the rational function f.Let P=P1+P2+···+P n denote the Minkowski sum of the given Newton polytopes,and let X be the projective toric variety deﬁned by P. Letη1,...,ηs∈Z d be the primitive inner normal vectors of the facets of P. They span the rays of the fan of X.Let∆1,...,∆s denote the corresponding torus-invariant divisors on X.Each of the Newton polytopes P i is the solution set of a system of linear inequalities of the speciﬁc formP i={x∈R d| x,ηj ≥−a ij for j=1,...,s}.The divisor on X deﬁned by the Laurent polynomial f i is linearly equivalent to D i= s j=1a ij∆j.The a ij are integers which can be positive or negative.9The divisor on X deﬁned by f=f u11f u22···f u n n is linearly equivalent toni=1u i D i=s j=1(n i=1u i a ij)·∆j.(12) We abbreviate the support of this divisor byI= j∈{1,...,s}|n i=1u i a ij=0 .(13) A toric variety X is smooth if all the cones in its normal fan are unimodular. Theorem7.If the toric variety X is smooth and the toric ML degree of the rational function f isﬁnite then it is bounded above by the coeﬃcient of z d in the following generating function with coeﬃcients in the Chow ring of X:j/∈I(1−z∆j)P it is ample on X by construction.So D i meets every curve on X and therefore so does D and we can apply Corollary6.A variable x j appears as a factor in F if and only if j∈I,in which case1−z∆j appears in both the numerator and denominator of(11),and we get the expression(14).Consider now arbitrary Laurent polynomials f1,...,f n inθ1,...,θd such that f= f u i i has onlyﬁnitely many critical points in(C∗)d.Letνbe the coeﬃcient of z d in(14).Let C m be the space of all n-tuples of Laurent poly-nomials with the given Newton polytopes.Consider the critical equations of f= f u i i and clear denominators.The resulting collection of d Laurent polynomials deﬁnes an algebraic subset˜W in the product space C m×(C∗)d. Saturate˜W to remove any components along the hypersurfaces{f i=0}and get a new algebraic subset W.The map from W onto C m is dominant and genericallyﬁnite,and the genericﬁber of this map consists ofνpoints.Our given Laurent polynomials f1,...,f n represent a pointφin C m.Let θ(1),...,θ(κ)be the isolated critical points of f.For each i,consider any irreducible component W(i)of W containing the point(φ,θ(i))in W⊂C m×(C∗)d.By Krull’s Principal Ideal Theorem,the component W(i)of W has codimension≤d and hence it has dimension≥m.As the genericﬁber isﬁnite,the dimension of W i is exactly m and the projection to C m is dominant.Sinceθ(i)is an isolated solution of the critical equations,the projection map to C m is open[19,(3.10)],so the intersection of W(i)with an open neighborhood of(φ,θ(i))maps onto an open neighborhood ofφ. Hence every generic point˜φnearφhas a preimage(˜φ,˜θ(i))near(φ,θ(i)), and these preimages are distinct for i=1,...,κ.We conclude thatκ≤ν. This semicontinuity argument is called the“specialization principle”stated in Mumford’s book[19,(3.26)]and also works when theθ(i)have multiplicities, as shown in Theorem22below.We illustrate Theorem7with two examples which we revisit in Section5. Example8.Consider n generic polynomials f1(θ1,θ2),...,f n(θ1,θ2)where the support of f i consists of monomialsθp1θq2with0≤p≤s i and0≤q≤t i, and suppose the u i’s are generic.The Newton polytope of f i is the rectangle P i=conv{(0,0),(s i,0),(0,t i),(s i,t i)}.The Minkowski sum of these rectangles is another rectangle,and X=P1×P1.In the numerator of(14),the contribution of the two torus-invariant divisors D and E corresponding to the left and the bottom edge of this11rectangle survives.The denominator comes from the product of the divisors of f1,...,f n:(1−zD)(1−zE)Figure2:The fan of a smooth projective toric surfaceThe three divisors corresponding to the polygons P1,P2,P3in Figure1areD1=2x3+2x4+2x5+x6D2=2x3+2x4+2x5+x6+x7+x8D3=x4+3x5+2x6+x7If all u i are positive,then the support of the divisor u1D1+u2D2+u3D3is I={3,...,8}.It follows that the toric ML degree is the coeﬃcient of z2in (1−zx1)(1−zx2)(1−zD1)−1(1−zD2)−1(1−zD3)−1.This coeﬃcient is14x1x2,which means that the toric ML degree is14.The toric ML degree of the model f is the toric ML degree deﬁned above for generic u.In this case,there is no cancellation among the coeﬃcients in (13),and I is the set of all indices j such that for some P i the supporting hy-perplane normal toηj does not pass through the origin.The toric ML degree of f is a numerical invariant of the polytopes P1,...,P n.A combinatorial formula for this invariant will be presented in Theorem15of Section5.4Bounded Regions in ArrangementsAs in the Introduction,we consider n polynomials f1,...,f n in d unknowns θ1,...,θd.We now assume that all coeﬃcients of the f i’s are real numbers, and we also assume that u1,...,u n are positive integers.However,we do not assume that the union of the divisors of the f i’s has global normal cross-ings.This is the case of interest in statistics.Consider the arrangement of13hypersurfaces deﬁned by the f i’s and let V R=R d\ n i=1{f i=0}be the complement of this arrangement.A connected component of V R is a bounded region if it is bounded as a subset of R d.Then the following observation holds.Proposition10.For any polynomial map f:R d→R n and any u∈N n>0,#{bounded regions of V R}≤#{critical points of f u11···f u n n in R d}≤ML degree of f.Proof.The function f=f u11···f u n n is continuous,and on the boundary of the closure of each bounded region its value is zero.Hence it has to have at least one(real)critical point in the interior of each region.The second inequality holds trivially,since the ML degree was deﬁned as the number of critical points of f u11···f u n n in C d,counted with multiplicities.This observation raises the question whether the inequalities above could be realized as equalities.We next show that this is the case when f1,...,f n are quadrics in the plane.Here the ML degree is2n2−2n+1by Theorem1. Proposition11.For each n,there are n quadrics f1,...,f n in R2such that #{bounded regions of V R}=ML degree of f=2n2−2n+1. Hence all critical points are real.Proof.We will take n quadrics that deﬁne“nested”ellipses with center at the origin,as suggested by Figure3.The proof follows by induction:assume we have2(n−1)2−2(n−1)+1bounded regions with n−1ellipses.Observe that the(n−1)st ellipse contains2n−3bounded regions.Then we add a new long and skinny ellipse which replaces the2n−3regions with3(2n−3)+2 regions.The total count comes out to be2n2−2n+1.We will see such an equality holding for n linear hyperplanes in R d below. However,even in the plane R2,the number of critical points and the number of bounded regions of V R diverge for curves of degree≥3.Theorem1implies that for n generic plane curves of degrees b1,...,b n the ML degree isni=1b i(b i−2)+ i<j b i b j+1.14Figure3:The“nested”ellipse constructionThe optimal upper bound for the number of bounded regions of V R is smaller than the ML degree,by the following unpublished result due to Oleg Viro. Theorem12.(Viro)Let f1,...,f n be real plane curves of degrees b1,...,b n, and let K be the number of odd degree curves among them.Then#{bounded regions of V R}≤ni=1(b i−1)(b i−2)In order to get any meaningful lower bound on the number of bounded regions of V R one needs to make some assumptions.Without any assumptions the lower bound is zero:for f i of even degree we take an empty(real)curve, and for f i of odd degree we take the union of an empty curve with a line.If we let all the lines intersect in a single point there will not be any bounded region.If we insist on at least having a GNC conﬁguration,then by the same construction the lower bound we get is the number of bounded regions in a generic arrangement of K lines where K is the number of odd degrees b i. This idea leads us to studying the ML degree of a hyperplane arrangement. Theorem13.Let f be given by n linear polynomials f1,...,f n with real coeﬃcients.Then the ML degree of f is equal to the number of the bounded regions of V R,and all critical points of the optimization problem(1)are real.This theorem does not assume any hypothesis such as global normal cross-ing.Under the GNC hypothesis,the hyperplanes would be in general position and the number of bounded regions equals n−1d ,as predicted by Theorem1. Theorem13is essentially due to Varchenko[25].We shall give a new proof.Proof.In light of Proposition10,we need to show that the number of bounded regions of V R equals the number of complex solutions of the ML equations.Let f i= d j=1a ijθj+c i for i=1,...,n.The ML equations areni=1u i a i1f i=0.(16)Consider the mapψ:C d+1→C n given byψ(θ0,...,θd)=(1/F1,...,1/F n). Here F i=c iθ0+ d j=1a ijθj is the homogenization of f i.We let¯H be the central hyperplane arrangement in R d+1given by the F i.We assume that the intersection of all the hyperplanes in¯H is just the origin;otherwise,the linear forms F i depend on fewer than d coordinates,and then we get inﬁnitely many critical points.The Zariski closure of im(ψ)in P n−1is a d-dimensional complex variety V.The solution set on V of the d linear equationsni=1(u i a i1)y i=···=n i=1(u i a id)y i=0consists ofﬁnitely many points provided u1,...,u n are generic.Obviously, the solutions to(16)lift to such complex solutions.In other words,the degree of the projective variety V is an upper bound on the ML degree of f.16Now we will compute the degree of V.This variety is the projective spec-trum of the N-graded algebra R=C[1/F i:i=1,...,n]where deg(1/F i)= 1.Terao[24,Theorem1.4]showed that the Hilbert series of R is equal toX∈L(−1)codim(X)µ(X) td!.We conclude that the degree of the projective variety V is(−1)d+1µ(0).By Zaslavsky[26],this number equals the number of bounded regions of V R. Example14.A family of important statistical models where Theorem13 applies is the linear polynomial model of[22].Such a model is given by a polynomial in r unknowns x=(x1,...,x r)with indeterminate coeﬃcients,p(x)=dj=1θj x a j(with a j∈N r),together with n data points v1,...,v n∈R r.The model is parametrized byf1(θ)=dj=1θj v a j1,...,f n(θ)=d j=1θj v a j n.The ML degree is the number of bounded regions of this arrangement.175Polytopes and Resolution of Singularities We now return to the setting of Section3,with the aim of relaxing the restrictive smoothness hypothesis in Theorem7.Our aim is to derive a combinatorial formula for the toric ML degree of any model f deﬁned by generic Laurent polynomials satisfying a mild hypothesis.The derivation of Theorem15involves resolution of singularities in the toric category.In the end of the section we shall comment on using resolution of singularities for bounding the ML degree in general.Given a polytope P in R d and a linear functional v on R d,we writeP v= p∈P|∀p′∈P: v,p ≤ v,p′for the face of P at which v attains its minimum.Two linear functionals v and v′are equivalent if P v=P v′.The equivalence classes are the relative interiors of cones of the inner normal fanΣP.Ifσis a cone inΣP,orσis a cone in any fan which reﬁnesΣP,then we write Pσ=P v for v in the relative interior ofσ.If f is a polynomial with Newton polytope P then fσdenotes the leading form consisting of all terms of f which are supported on Pσ.As in Section3,let f1,...,f n be Laurent polynomials with Newton poly-topes P1,...,P n⊂R d.Consider any fanΣwhich is a common reﬁnementof the inner normal fansΣP1,...,ΣPn.Supposeτis a cone inΣand let kbe the dimension of(P1+···+P n)τ.There exists a k-dimensional linear subspace L of R d and vectors q1,...,q n∈R d such that q i+Pτi lies in L for all i=1,...,n.The subspace L is unique and satisﬁes L∩Z d≃Z k.Let V(·,...,·)denote the normalized mixed volume on the subspace L.Here “normalized”refers to the lattice L∩Z d,as is customary in toric geometry [12].For any k-element subset{i1,...,i k}of{1,2,...,n}we abbreviateV(P i1,...,P ik;τ)=V(q i1+Pτi1,...,q ik+Pτik)if codim(τ)=k,(19)and V(P i1,...,P ik;τ)=0if codim(τ)>k.If k=d andτ={0}then wesimply write V(P i1,...,P id)for the mixed volume in(19).If k=0andτisfull-dimensional then(19)equals1;this happens in the last sum of(20).We are now ready to state our more general toric ML degree formula.As in Section3,let X be the toric variety corresponding to the Minkowski sum P=P1+···+P n andΣX the normal fan with raysη1,...,ηs.We consider the function f=f u11···f u n n.Each polytope P i corresponds to a divisor D i so the divisor of f is D= u i D i.Let I be the support of D as in(13). Label the rays ofΣX so that{1,...,r}are the indices not in I.18For each subset J of{1,...,r}letτJ denote the smallest cone ofΣwhich contains the vectorsηj for j∈J.If no such cone exists thenτJ is just a formal symbol and the expression(19)is declared to be zero forτ=τJ.The mild smoothness hypothesis we need is that every singular cone ofΣcontains at least one ray from I.Equivalently all conesτJ are smooth.Theorem15.Suppose every singular cone ofΣX contains some ray in the support of the divisor D.Then,the toric ML degree of the rational function f is bounded above by the following alternating sum of mixed volumes:1≤i1≤···≤i d≤n V(P i1,...,P id)− j∈{1,...,r}1≤i1≤···≤i d−1≤nV(P i1,...,P id−1;τ{j})+{j1,j2}⊂{1,...,r}1≤i1≤···≤i d−2≤n V(P i1,...,P id−2;τ{j1,j2})+···+(−1)d{j1,...,j d}⊂{1,...,r}V(∅;τ{j1,...,j d}).(20)Equality holds if each f i is generic relative to its Newton polytope P i. Proof.In order to apply Corollary6we must resolve the singularities of X. For toric varieties this is done in two steps.First we get a simplicial toric variety without adding any new rays to the fan.Second we resolve the re-maining singular(but simplicial)cones by adding new rays.This procedure is described in detail in[12].Typically theﬁrst step involves taking the pulling subdivision at each ray in the fan.However,under the given hypothesis it is enough to perform pulling subdivisions only at the rays in the support of Dto obtain a simplicial fanΣ˜X .Thisﬁne detail will be important below.Ourhypothesis holds for this intermediate fan as well,and subsequently we take a smooth reﬁnementΣX′ofΣ˜X by adding new rays in the relative interiors of each of the singular cones.Letπ:X′→X be the induced map.We will show that we get no new critical points under the resolution. Hence the number of critical points can be computed on X′.Weﬁnally claim that the Chern class formula expands into the given combinatorial formula.We investigate critical points of the pullback of our rational function: F′=π∗(F)=(x− u iπ∗(D i)) π∗(F i(x)).19。