AAI-Lecture4A-Probability theory
- 格式:pptx
- 大小:1.51 MB
- 文档页数:33
概率论与数理统计Probability Theory and Mathematical Statistics第一章概率论的基本概念Chapter1Introduction of Probability Theory不确定性indeterminacy必然现象certain phenomenon随机现象random phenomenon试验experiment结果outcome频率数frequency number样本空间sample space出现次数frequency of occurrencen维样本空间n-dimensional sample space样本空间的点point in sample space随机事件random event/random occurrence基本事件elementary event必然事件certain event不可能事件impossible event等可能事件equally likely event事件运算律operational rules of events事件的包含implication of events并事件union events交事件intersection events互不相容事件、互斥事件mutually exclusive events//incompatible events 互逆的mutually inverse加法定理addition theorem古典概率classical probability古典概率模型classical probabilistic model几何概率geometric probability乘法定理product theorem概率乘法multiplication of probabilities条件概率conditional probability全概率公式、全概率定理formula of total probability贝叶斯公式、逆概率公式Bayes formula后验概率posterior probability先验概率prior probability独立事件independent event独立随机事件independent random event独立实验independent experiment两两独立pairwise independent两两独立事件pairwise independent events第二章随机变量及其分布Chapter2Random V ariables and Distributions随机变量random variables离散随机变量discrete random variables概率分布律law of probability distribution一维概率分布one-dimension probability distribution概率分布probability distribution两点分布two-point distribution伯努利分布Bernoulli distribution二项分布/伯努利分布Binomial distribution超几何分布hyper geometric distribution三项分布trinomial distribution多项分布polynomial distribution泊松分布Poisson distribution泊松参数Poisson theorem分布函数distribution function概率分布函数probability density function连续随机变量continuous random variable概率密度probability density概率密度函数probability density function概率曲线probability curve均匀分布uniform distribution指数分布exponential distribution指数分布密度函数exponential distribution density function正态分布、高斯分布normal distribution标准正态分布standard normal distribution正态概率密度函数normal probability density function正态概率曲线normal probability curve标准正态曲线standard normal curve柯西分布Cauchy distribution分布密度density of distribution第三章多维随机变量及其分布Chapter3Multivariate Random Variables and Distributions 二维随机变量two-dimensional random variable联合分布函数joint distribution function二维离散型随机变量two-dimensional discrete random variable二维连续型随机变量two-dimensional continuous random variable 联合概率密度joint probability variablen维随机变量n-dimensional random variablen维分布函数n-dimensional distribution functionn维概率分布n-dimensional probability distribution边缘分布marginal distribution边缘分布函数marginal distribution function边缘分布律law of marginal distribution边缘概率密度marginal probability density二维正态分布two-dimensional normal distribution二维正态概率密度two-dimensional normal probability density第四章随机变量的数字特征Chapter4Numerical Characteristics of Random Variables数学期望、均值mathematical expectation期望值expectation value方差variance标准差standard deviation随机变量的方差variance of random variables均方差mean square deviation相关关系dependence relation相关系数correlation coefficient协方差covariance协方差矩阵covariance matrix切比雪夫不等式Chebyshev inequality第五章大数定律及中心极限定理Chapter5Law of Large Numbers and Central Limit Theorem大数定律law of great numbers切比雪夫定理的特殊形式special form of Chebyshev theorem依概率收敛convergence in probability伯努利大数定律Bernoulli law of large numbers同分布same distribution列维-林德伯格定理、独立同分布中心极限定理independent Levy-Lindberg theorem 辛钦大数定律Khinchine law of large numbers利亚普诺夫定理Liapunov theorem棣莫弗-拉普拉斯定理De Moivre-Laplace theorem。
probabilityProbabilityIntroduction:Probability is an essential concept in mathematics and statistics that allows us to quantify uncertainty and make informed predictions. It is widely used in various fields like physics, finance, and AI. Understanding probability lays the foundation for statistical analysis and decision-making in many real-world situations. In this document, we will explore the fundamental principles of probability, including its definition, basic rules, and applications.Definition of Probability:Probability can be defined as a measure of the likelihood that an event will occur. It is expressed as a number between 0 and 1, where 0 represents an impossibility and 1 represents an absolute certainty. The concept of probability is based on the idea of a random experiment, which is a process that generates a set of possible outcomes. For example, flipping a coin or rolling a dice are random experiments, and the outcomes can be either heads or tails and the numbers on the dice, respectively.Basic Concepts:In probability theory, we use certain terms to describe the various components of an experiment. Let's delve into the basic concepts:1. Sample Space: The sample space, denoted by 'S', refers to the set of all possible outcomes of an experiment. For example, when flipping a coin, the sample space is {Heads, Tails}.2. Event: An event is a subset of the sample space, which consists of one or more outcomes. Events are denoted by capital letters such as A, B, or C. For example, in the coin flip experiment, the event A could be \。
费马帕斯卡概率论(中英文实用版)Title: Fermat-Pascal Probability Theory摘要:The Fermat-Pascal probability theory, also known as the binomial probability theory, is a fundamental concept in probability theory.It was independently discovered by Pierre de Fermat and Blaise Pascal in the 17th century.This theory provides a mathematical framework for calculating the probabilities of events involving two mutually exclusive outcomes, such as heads or tails in a coin flip.费马-帕斯卡概率论,亦称为二项概率论,是概率论中的一个基本概念。
它在17世纪由皮埃尔·德·费马和布莱兹·帕斯卡独立发现。
这一理论为计算涉及两个相互排斥结果的事件的概率提供了数学框架,例如硬币抛掷中的正面或反面。
The binomial probability distribution is a specific probability distribution that describes the probability of a certain number of successful outcomes in a fixed number of independent Bernoulli trials, where each trial has the same probability of success.The binomial probability formula is given by:P(X = k) = C(n, k) * p^k * (1-p)^(n-k)其中,P(X = k) is the probability of getting exactly k successful outcomes, C(n, k) is the combination of n items taken k at a time, p is the probability of success in a single trial, and n is the number of trials.二项概率分布是一种特定的概率分布,它描述了在固定数量的独立贝努利试验中,成功结果的特定数量的概率,其中每次试验成功的概率相同。
Statistic Machine Learning Probability InequalityLecture Notes 9:Probability InequalityProfessor:Zhihua Zhang Scribe:9Probability Inequality 9.1Jensen InequalityIf g is convex,then E [g (X )]≥g (E X ).Proof.Since g is convex,we can find a linear function L (x )=a +bx such that the only intersection point is E X and L (E X )=g (E X ).Sog (x )≥L (x )E [g (X )]≥E [L (X )]=a +b E X=L (E X )=g (E X )9.2Cauchy-Schwartz InequalityIf X and Y have finite variances,thenE |XY |≤ E (X )E (Y )Proof.Consider vector variable X Y,its variance is var ( X Y)= var (X )cov (X,Y )cov (Y,X )var (Y ) Since variance is semi-definite,so var (X )var (Y )≥cov (X,Y )cov (Y,X).Now let E X =E Y =0,we can get the inequality.9.3Markov InequalityFor all t >0,Y 1{y ≥t }≥t 1{y ≥t }E (Y 1{y ≥t })≥E (t 1{y ≥t })P r ({y ≥t })≤E (Y 1{y ≥t })t If Y >0,then P r ({y ≥t })≤E Y t9-1Corollary 9.1.Let Y =|Z −E Z |,then P r ({|Z −E Z |≥t })≤E |Z −E Z |tCorollary 9.2.If φdenotes a nondecreasing and nonnegative function of Z on a (possibly infinite)interval I ⊂R .Let Y and t take values in I ,t ∈R ,thenP r ({Y ≥t })≤P r ({φ(Y )≥φ(t )})≤E [φ(Y )]φ(t )Example 9.1.Let φ(t )=t 2,I =(0,+∞),Y =|Z −E Z |.ThenP r ({|Z −E Z |≥t })≤var (Z )t 2which is called Chebyshev’s inequality.More generally,φ(t )=t q ,then for some q >0,we haveP r ({|Z −E Z |≥t })leq E ({||Z −E Z ||q })t qExample 9.2.Z is a sum of independent of random variables Z =X 1+X 2+...+X n ,so var (Z )= n i =1var (X i ).Then we haveP r ({1n |n i =1(X i −E X i )|≥t })≤σ2nt where σ2=1n n i =1var (X i ).Let φ(t )=e λt where λis a positive number,then we will getP r ({Z ≥t })≤E [e λZ ]e λtNote:M (λ)=E e λZ ,λ∈R is called the moment generating function.9.4The Cramer-ChernoffMethodLet Z be a real-valued random variable.For all λ≥0,we haveP r ({Z ≥t })≤e −λt E [e λZ ]We want to minimize the upper bound,soinf λ≥0e −λt E [e λZ ]⇔inf λ≥0−λt +log E [e λZ ]Define ψZ (λ)=log E e λZ .Let ψ∗Z (t ) sup λ≥0λt −ψZ (λ),which is called the cramertransform of Z .If λ=0,then ψZ (0)=log E e 0=0.So we can get ψ∗Z≥0.9-21.E Z≤t≤+∞.ψZ(λ)=log E(eλZ)≥log eλE Z=λE Z.Ifλ<0,thenλt−ψZ(λ)≤0.So supλ≥0λt−ψZ(λ)=supλ∈Rλt−ψZ(λ).Note:ψ∗Z (t)=supλ∈Rλt−ψZ(λ)is called Fenchel-Legendre dual function and convexconjugate.SO if t≥E Z,we only need to compute the dual function.2.t≤E Z,To get the maximum value ofλt−ψZ(λ),we compute its deriatives.ψ Z(λ)=E[ZeλZ] E eλZψ Z(λ)=E[Z2eλZ]E[eλZ]−E[ZeλZ]E[ZeλZ](E[eλZ])2According to Cauchy-Schwartz inequality,ψZ(λ)≥0.So,ψ Z(λ)≥ψ Z(0)=E Zt−ψ Z(λ)≤t−E Z≤0Thenλt−ψZ(λ)gets its maximum value atλ=0.In this case,we will getψ∗Z (t)=0,which means P r(Z≥t)≤1.In the following,we only care about t≥E Z.We will getψ∗Z(t)=λt t−ψZ(λt)whereλt is the solution of t−ψZ (λ)=0,i.e.λt=(ψZ−1)(t).Example9.3.Let Z∼N(0,σ2),then we haveψZ(λ)=logeλz1(2πσ2)12exp(−z22σ2)dz=log 1(2πσ2)12exp(−z2−2λσ2z2σ2)dz=log 1(2πσ2)12exp(−(z−λz2)2−λ2σ42σ2)=λ2σ229-3ψ∗Z(t)=supλλt−ψZ(λ) t−λσ2=0⇒λt=tσ2,soP r(Z≥t)≤exp(−t2 2σ2),where t≥E Z=0.Note:IfψY(λ)≤λ2σ22,then we call Y is sub-Gaussian.Homework:Given thatψY(λ)≤λ2σ22,prove var(Y)≤σ2.Example9.4.A random variable Y has Poisson distribution with parameterν.P r(Y=k)=e−ννk k!where k=0,1,2....Let Z=Y−ν,then E Z=0.E eλZ=e−λν∞k=0eλk e−ννkk!=e−λν−ν∞k=0(νeλ)kk!=e−λν−νeνeλThenψ(λ)=ν(eλ−λ−1).Sot−ψ (λ)=0⇒λt=log(1+t ν)Soψ∗(t)=ν[(1+tν)log(1+tν)−tν]Example9.5.A random variable Y has Bernoulli distribution with parameter p.P r(Y=1)=1−P r(Y=0)=p.Let Z=Y−p.ψZ(λ)=log E eλZ=log(peλ(1−p)+(1−p)e−λp)=−λp+log(peλ+1−p)Since(λt−ψZ(λ)) =0,so peλ(1−t−p)=(t+p)(1−p).Then0≤1−t−p.So0≤t≤1−p.ψ∗Z(t)=(1−p−t)log 1−p−t1−p+(p+t)logp+tp 9-4Let a =p +t ,p ≤a ≤1,then we getψ∗Z (t )=(1−a )log 1−a 1−p +a log a p=D (P a ||P p )D (P a ||P p )is the KL-Divergence between P a and P p ,P a means the Bernoulli distribution with paramter a ,P p means the Bernoulli distribution with parameter p .Example 9.6.Let Y ∼Binomial (n,p ),so Y =Z 1+Z 2+..+Z n ,and Z i ∼Bernoulli (p )and Z i ’s are independent.ψY (λ)=log E e λ n i =1Z i=log Πn i =1E e λZ i = log E e λZ i=nψZ (λ)λt −ψY (λ)=λt −nψZ (λ)=n (λt den−ψZ (λ))So,ψ∗Y (t )=nψ∗Z (t n)9.5Hoeffding’s InequalityIf X 1,X 2,...X n are independent random variables with a finite mean value such that for some non-empty interval I ,E e λX i is finite,then defineS =ni =1(X i −E X i ).And assume that X i takes its values in a bounded interval [a i ,b i ].ThenP r (S ≥t )≤exp(−2t 2n i =1(b i−a i )2)for all t >0.Definition 9.1.If P r ( i =1)=P r ( i =−1)=12,then we call i Rademacher randomvariable.Let X i = i a i ,a i is a real number.Then we will get X i ∈[min {−a i ,a i },max {−a i ,a i }].So the inequality above will beP r (S ≥t )≤exp(−t 22 n i =1a 2i)9-5。
probability theory and examples 课后解答1. 引言1.1 概述概率论是一门研究随机事件发生的可能性及其规律的数学分支。
随机性存在于我们日常生活中的各个方面,从天气预报到股市波动,从飞机失事的概率到买彩票中奖的概率,无处不在。
因此,了解和应用概率论对我们做出正确决策、推断和预测至关重要。
本篇长文旨在深入讲解概率论的基本理论和主要计算方法,并通过实际应用例子进行解析。
文章将介绍概率论的基本概念和定义,探讨概率公理化系统及其在随机变量与概率分布中的应用。
同时,我们将详细介绍组合与排列的计算方法、条件概率和全概率公式、独立事件和乘法法则等重要内容。
另外,我们还将深入讨论常见的概率分布模型,包括二项分布、泊松分布、正态分布等连续型和离散型随机变量,并探讨它们的特征参数与统计推断方法。
最后,我们将通过实际案例展示如何应用所学知识进行数据分析和推断。
1.2 文章结构本文共分为五个主要部分。
首先,在概率论的概述中,我们将介绍本文的背景和目的,解释概率论在现实生活中的重要性。
其次,在概率理论概述部分,我们将详细讨论概率论的基本定义和基本概念,并介绍随机变量与概率分布的关系。
然后,在概率计算方法部分,我们将深入探讨组合与排列的计算方法、条件概率和全概率公式以及独立事件和乘法法则等内容。
接下来,在常见的概率分布模型部分,我们将详细介绍二项分布、泊松分布、正态分布等常见模型,并说明它们在实际应用中的意义。
最后,在结论部分,我们将总结文章主要观点和发现,并提出对未来研究的展望和建议。
1.3 目的本篇长文旨在帮助读者全面了解并掌握概率论及其应用方法。
通过学习本文所述内容,读者将能够理解概率论相关术语和定理,并能够应用这些知识进行数据统计、推断和预测。
无论是从事科学研究、金融投资还是进行决策分析,概率论都提供了一种强大而必不可少的工具。
在完成本篇长文阅读后,我们相信读者将能够更加自信地应对各种概率相关问题,并在实践中获得更好的成果。