A Multivariate Poisson-Lognormal Regression Model for Prediction of Crash Counts by Severit
- 格式:pdf
- 大小:209.68 KB
- 文档页数:25
概率密度估计
1 概率密度估计
概率密度估计(Probability Density Estimation,简称PDE)也称为密度函数估计,旨在描述一个随机变量X的概率密度函数,从而
帮助准确定量分析研究变量X的特征。
通常,概率密度估计的过程可以分解为两个步骤。
第一步是从样
本中提取该变量的直方图,然后以某种函数形式拟合该直方图,得到
其对应的概率密度函数。
其中,最常用的函数形式为高斯分布(Gaussian Distribution)的普通分布、泊松分布(Poisson Distribution)、多元正态分布(Multivariate Normal Distribution)、双截止分布(Binomial Distribution)、逻辑正态
分布(Log-normal Distribution)等。
第二步就是根据拟合出概率密度函数形状,运用其特点和参数,
得到该变量的最佳估计,便于对样本进行更有效率的分析。
比如,在
高斯分布模型下,样本拟合出的方差可以帮助我们判断数据的稳定性。
概率密度估计被广泛应用于贝叶斯统计分析、学习理论、社会科
学研究等,是发现重要模式并探寻变量分布的重要工具。
总之,概率密度估计是一项核心重要的数据分析技术,其解释力、拟合能力和模型大小的理论基础为研究者们收集总结数据,比较复杂
的变量特征提供了可靠信息。
统计学复试专业词汇汇总population 总体sampling unit 抽样单元sample 样本observed value 观测值descriptive statistics 描述性统计量random sample 随机样本simple random sample 简单随机样本statistics 统计量order statistic 次序统计量sample range 样本极差mid-range 中程数estimator 估计量sample median 样本中位数sample moment of order k k阶样本矩sample mean 样本均值average 平均数arithmetic mean 算数平均值sample variance 样本方差sample standard deviation 样本标准差sample coefficient of variation 样本变异系数standardized sample random variable 标准化样本随机变量sample coefficient of skewness (歪斜)样本偏度系数sample coefficient of kurtosis (峰态) 样本峰度系数sample covariance 样本协方差sample correclation coefficient 样本相关系数standard error 标准误差interval estimator 区间估计statistical tolerance interval 统计容忍区间statistical tolerance limit 统计容忍限confidence interval 置信区间one-sided confidence interval 单侧置信区间prediction interval 预测区间estimate 估计值error of estimation 估计误差bias 偏倚unbiased estimator 无偏估计量maximum likelihood estimator 极大似然估计量estimation 估计maximum likelihood estimation 极大似然估计likelihood function 似然函数profile likelihood funtion 剖面函数hypothesis 假设null hypothesis 原假设alternative hypothesis 备择假设simple hypothesis 简单假设composite hypothesis 复合假设significance level 显著性水平type I error 第一类错误type II error 第二类错误statistical test 统计检验significance test 显著性检验p-value p值power of a test 检验功效power curve 功效曲线test statistic 检验统计量graphical descriptive statistics 图形描述性统计量numerical descriptive statistics 数值描述性统计量classes 类(组)class 类class 组class limits; class boundaries 组限mid-point of class 组中值class width 组距frequency 频数frequency distribution 频数分布histogram 直方图bar chart 条形图cumulative frequency 累积频数relative frequency 频率cumulative relative frequency 累积频率sample space 样本空间event 事件complementary event 对立事件independent events 独立事件probability [of an event A] [事件A的]概率conditional probability 条件概率distribution function [of a random variable X] [随机变量X的]分布函数family of distributions 分布族parameter 参数random variable 随机变量probability distribution 概率分布distribution 分布expectation 期望p-quantile; p-fractile p分位数median 中位数quartile 四分位数univariate probability distribution 一维概率分布univariate distribution 一维分布multivariate probability distribution 多维概率分布multivariate distribution 多维分布marginal probability distrubition 边缘概率分布marginal distribution 边缘分布conditional probability distribution 条件概率分布conditional distribution 条件分布regression curve 回归曲线regression surface 回归曲面discrete probability distribution 离散概率分布discrete distribution 离散分布continuous probability distribution 连续概率分布continuous distribution 连续分布probability [mass] function 概率函数mode of probability [mass] function 概率函数的众数probability density function 概率密度函数mode of probability density function 概率密度函数的众数discrete random variable 离散随机变量continuous random variable 连续随机变量centred probability distribution 中心化概率分布centred random variable 中心化随机变量standardized probability distribution 标准化概率分布standardized random variable 标准化随机变量moment of order r r阶[原点]矩means 均值moment of order r = 1 一阶矩mean 均值variance 方差standard deviation 标准差coefficient of variation 变异系数coefficient of skewness 偏度系数coefficient of kurtosis 峰度系数joint moment of order r and s (r,s)阶联合[原点]矩joint central moment of order r and s (r,s)阶联合中心矩covariance 协方差correlation coefficient 相关系数multinomial distribution 多项分布binomial distribution 二项分布Poisson distribution 泊松分布hypergeometric distibution 超几何分布negative binomial distribution 负二项分布normal distribution, Gaussian distribution 正态分布standard normal distribution, standard Gaussian distribution 标准正态分布lognormal distribution 对数正态分布t distribution; Student's distribution t分布degrees of freedom 自由度F distribution F分布gamma distribution 伽玛分布, Γ分布chi-squared distribution 卡方分布,χ2分布exponential distribution 指数分布beta distribution 贝塔分布,β分布uniform distribution, rectangular distribution 均匀分布type I value distribution; Gumbel distribution I型极值分布type II value distribution; Gumbel distribution II型极值分布Weibull distribution 威布尔分布type III value distribution; Gumbel distribution III型极值分布multivariate normal distribution 多维正态分布bivariate normal distribution 二维正态分布standard bivariate normal distribution 标准二维正态分布sampling distribution 抽样分布probability space 概率空间。
二维高斯混合分布参数估计python -回复二维高斯混合分布是一种常用的概率模型,常用于数据聚类、模式识别等领域。
在实际应用中,估计二维高斯混合分布的参数是十分重要的一项任务。
本文以Python为工具,并结合实例,介绍了如何一步一步地估计二维高斯混合分布的参数。
1. 引言二维高斯混合分布常用于将数据集拟合到一组由多个二维高斯分布组成的模型中。
通过估计混合分布的参数,我们可以了解数据集的特征和结构,从而进行数据分类、聚类和模式识别等任务。
在本文中,我们将通过以下步骤来估计二维高斯混合分布的参数:1. 初始化参数2. Expectation-Maximization(EM)算法3. 参数优化4. 数据聚类让我们一起开始吧!2. 初始化参数在估计二维高斯混合分布的参数之前,我们首先需要初始化一些参数。
常用的参数包括:- K:混合分布的组数- μ:各组的均值向量- Σ:各组的协方差矩阵- α:各组的权重我们可以使用随机数来初始化均值向量μ和协方差矩阵Σ,而权重α可以初始化为均匀分布。
下面是一个Python函数来实现参数的初始化:pythonimport numpy as npdef initialize_parameters(data, num_clusters):num_dimensions = data.shape[1]mu = np.random.randn(num_clusters, num_dimensions)sigma = [np.eye(num_dimensions)] * num_clustersalpha = np.ones(num_clusters) / num_clustersreturn mu, sigma, alpha注意,我们假设数据集已经加载到名为`data`的Numpy数组中,并且`num_clusters`是我们想要估计的高斯混合分布组数。
3. Expectation-Maximization算法Expectation-Maximization(EM)算法是一种常用的参数估计方法,特别适用于高斯混合分布。
Distributions.Parameter estimation.betafit - Beta parameter estimation.binofit - Binomial parameter estimation.dfittool - Distribution fitting tool.evfit - Extreme value parameter estimation.expfit - Exponential parameter estimation.fitdist - Distribution fitting.gamfit - Gamma parameter estimation.gevfit - Generalized extreme value parameter estimation.gmdistribution - Gaussian mixture model estimation.gpfit - Generalized Pareto parameter estimation.lognfit - Lognormal parameter estimation.mle - Maximum likelihood estimation (MLE).mlecov - Asymptotic covariance matrix of MLE.nbinfit - Negative binomial parameter estimation.normfit - Normal parameter estimation.paretotails - Empirical cdf with generalized Pareto tails.poissfit - Poisson parameter estimation.raylfit - Rayleigh parameter estimation.unifit - Uniform parameter estimation.wblfit - Weibull parameter estimation. Probability density functions (pdf).betapdf - Beta density.binopdf - Binomial density.chi2pdf - Chi square density.evpdf - Extreme value density.exppdf - Exponential density.fpdf - F density.gampdf - Gamma density.geopdf - Geometric density.gevpdf - Generalized extreme value density. gppdf - Generalized Pareto density.hygepdf - Hypergeometric density.lognpdf - Lognormal density.mnpdf - Multinomial probability density function. mvnpdf - Multivariate normal density.mvtpdf - Multivariate t density.nbinpdf - Negative binomial density.ncfpdf - Noncentral F density.nctpdf - Noncentral t density.ncx2pdf - Noncentral Chi-square density. normpdf - Normal (Gaussian) density.pdf - Density function for a specified distribution.poisspdf - Poisson density.raylpdf - Rayleigh density.tpdf - T density.unidpdf - Discrete uniform density.unifpdf - Uniform density.wblpdf - Weibull density.Cumulative Distribution functions (cdf).betacdf - Beta cumulative distribution function.binocdf - Binomial cumulative distribution function.cdf - Specified cumulative distribution function.chi2cdf - Chi square cumulative distribution function.ecdf - Empirical cumulative distribution function (Kaplan-Meier estimate).evcdf - Extreme value cumulative distribution function.expcdf - Exponential cumulative distribution function.fcdf - F cumulative distribution function.gamcdf - Gamma cumulative distribution function.geocdf - Geometric cumulative distribution function.gevcdf - Generalized extreme value cumulative distribution function.gpcdf - Generalized Pareto cumulative distribution function.hygecdf - Hypergeometric cumulative distribution function.logncdf - Lognormal cumulative distribution function.mvncdf - Multivariate normal cumulative distribution function. mvtcdf - Multivariate t cumulative distribution function. nbincdf - Negative binomial cumulative distribution function. ncfcdf - Noncentral F cumulative distribution function.nctcdf - Noncentral t cumulative distribution function.ncx2cdf - Noncentral Chi-square cumulative distribution function. normcdf - Normal (Gaussian) cumulative distribution function. poisscdf - Poisson cumulative distribution function.raylcdf - Rayleigh cumulative distribution function.tcdf - T cumulative distribution function.unidcdf - Discrete uniform cumulative distribution function. unifcdf - Uniform cumulative distribution function.wblcdf - Weibull cumulative distribution function.Critical Values of Distribution functions.betainv - Beta inverse cumulative distribution function.binoinv - Binomial inverse cumulative distribution function.chi2inv - Chi square inverse cumulative distribution function. evinv - Extreme value inverse cumulative distribution function. expinv - Exponential inverse cumulative distribution function. finv - F inverse cumulative distribution function.gaminv - Gamma inverse cumulative distribution function.geoinv - Geometric inverse cumulative distribution function.gevinv - Generalized extreme value inverse cumulative distribution function.gpinv - Generalized Pareto inverse cumulative distribution function.hygeinv - Hypergeometric inverse cumulative distribution function.icdf - Specified inverse cumulative distribution function.logninv - Lognormal inverse cumulative distribution function.nbininv - Negative binomial inverse distribution function.ncfinv - Noncentral F inverse cumulative distribution function.nctinv - Noncentral t inverse cumulative distribution function.ncx2inv - Noncentral Chi-square inverse distribution function.norminv - Normal (Gaussian) inverse cumulative distribution function.poissinv - Poisson inverse cumulative distribution function.raylinv - Rayleigh inverse cumulative distribution function.tinv - T inverse cumulative distribution function.unidinv - Discrete uniform inverse cumulative distribution function.unifinv - Uniform inverse cumulative distribution function.wblinv - Weibull inverse cumulative distribution function.Random Number Generators.betarnd - Beta random numbers.binornd - Binomial random numbers.chi2rnd - Chi square random numbers.evrnd - Extreme value random numbers.exprnd - Exponential random numbers.frnd - F random numbers.gamrnd - Gamma random numbers.geornd - Geometric random numbers.gevrnd - Generalized extreme value random numbers.gprnd - Generalized Pareto inverse random numbers.hygernd - Hypergeometric random numbers.iwishrnd - Inverse Wishart random matrix.johnsrnd - Random numbers from the Johnson system of distributions. lognrnd - Lognormal random numbers.mhsample - Metropolis-Hastings algorithm.mnrnd - Multinomial random vectors.mvnrnd - Multivariate normal random vectors.mvtrnd - Multivariate t random vectors.nbinrnd - Negative binomial random numbers.ncfrnd - Noncentral F random numbers.nctrnd - Noncentral t random numbers.ncx2rnd - Noncentral Chi-square random numbers.normrnd - Normal (Gaussian) random numbers.pearsrnd - Random numbers from the Pearson system of distributions. poissrnd - Poisson random numbers.randg - Gamma random numbers (unit scale). random - Random numbers from specified distribution. randsample - Random sample from finite population. raylrnd - Rayleigh random numbers.slicesample - Slice sampling method.trnd - T random numbers.unidrnd - Discrete uniform random numbers.unifrnd - Uniform random numbers.wblrnd - Weibull random numbers.wishrnd - Wishart random matrix.Quasi-Random Number Generators.haltonset - Halton sequence point set.qrandstream - Quasi-random stream.sobolset - Sobol sequence point set.Statistics.betastat - Beta mean and variance.binostat - Binomial mean and variance.chi2stat - Chi square mean and variance.evstat - Extreme value mean and variance.expstat - Exponential mean and variance.fstat - F mean and variance.gamstat - Gamma mean and variance.geostat - Geometric mean and variance.gevstat - Generalized extreme value mean and variance. gpstat - Generalized Pareto inverse mean and variance. hygestat - Hypergeometric mean and variance. lognstat - Lognormal mean and variance.nbinstat - Negative binomial mean and variance. ncfstat - Noncentral F mean and variance.nctstat - Noncentral t mean and variance.ncx2stat - Noncentral Chi-square mean and variance. normstat - Normal (Gaussian) mean and variance. poisstat - Poisson mean and variance.raylstat - Rayleigh mean and variance.tstat - T mean and variance.unidstat - Discrete uniform mean and variance.unifstat - Uniform mean and variance.wblstat - Weibull mean and variance.Likelihood functions.betalike - Negative beta log-likelihood.evlike - Negative extreme value log-likelihood. explike - Negative exponential log-likelihood. gamlike - Negative gamma log-likelihood.gevlike - Generalized extreme value log-likelihood.gplike - Generalized Pareto inverse log-likelihood. lognlike - Negative lognormal log-likelihood.nbinlike - Negative binomial log-likelihood.normlike - Negative normal likelihood.wbllike - Negative Weibull log-likelihood.Probability distribution objects.ProbDistUnivKernel - Univariate kernel smoothing distributions. ProbDistUnivParam - Univariate parametric distributions. Descriptive Statistics.bootci - Bootstrap confidence intervals.bootstrp - Bootstrap statistics.corr - Linear or rank correlation coefficient.corrcoef - Linear correlation coefficient (in MATLAB toolbox). cov - Covariance (in MATLAB toolbox).crosstab - Cross tabulation.geomean - Geometric mean.grpstats - Summary statistics by group.harmmean - Harmonic mean.iqr - Interquartile range.jackknife - Jackknife statistics.kurtosis - Kurtosis.mad - Median Absolute Deviation.mean - Sample average (in MATLAB toolbox).median - 50th percentile of a sample (in MATLAB toolbox).mode - Mode, or most frequent value in a sample (in MATLAB toolbox). moment - Moments of a sample.nancov - Covariance matrix ignoring NaNs.nanmax - Maximum ignoring NaNs.nanmean - Mean ignoring NaNs.nanmedian - Median ignoring NaNs.nanmin - Minimum ignoring NaNs.nanstd - Standard deviation ignoring NaNs.nansum - Sum ignoring NaNs.nanvar - Variance ignoring NaNs.partialcorr - Linear or rank partial correlation coefficient.prctile - Percentiles.quantile - Quantiles.range - Range.skewness - Skewness.std - Standard deviation (in MATLAB toolbox).tabulate - Frequency table.trimmean - Trimmed mean.var - Variance (in MATLAB toolbox).Linear Models.addedvarplot - Created added-variable plot for stepwise regression.anova1 - One-way analysis of variance.anova2 - Two-way analysis of variance.anovan - n-way analysis of variance.aoctool - Interactive tool for analysis of covariance.dummyvar - Dummy-variable coding.friedman - Friedman's test (nonparametric two-way anova).glmfit - Generalized linear model fitting.glmval - Evaluate fitted values for generalized linear model.invpred - Inverse prediction for simple linear regression.kruskalwallis - Kruskal-Wallis test (nonparametric one-way anova).leverage - Regression diagnostic.lscov - Ordinary, weighted, or generalized least-squares (in MATLAB toolbox).lsqnonneg - Non-negative least-squares (in MATLAB toolbox).manova1 - One-way multivariate analysis of variance.manovacluster - Draw clusters of group means for manova1.mnrfit - Nominal or ordinal multinomial regression model fitting.mnrval - Predict values for nominal or ordinal multinomial regression.multcompare - Multiple comparisons of means and other estimates. mvregress - Multivariate regression with missing data.mvregresslike - Negative log-likelihood for multivariate regression. polyconf - Polynomial evaluation and confidence interval estimation. polyfit - Least-squares polynomial fitting (in MATLAB toolbox).polyval - Predicted values for polynomial functions (in MATLAB toolbox). rcoplot - Residuals case order plot.regress - Multiple linear regression using least squares.regstats - Regression diagnostics.ridge - Ridge regression.robustfit - Robust regression model fitting.rstool - Multidimensional response surface visualization (RSM). stepwise - Interactive tool for stepwise regression.stepwisefit - Non-interactive stepwise regression.x2fx - Factor settings matrix (x) to design matrix (fx).Nonlinear Models.coxphfit - Cox proportional hazards regression.nlinfit - Nonlinear least-squares data fitting.nlintool - Interactive graphical tool for prediction in nonlinear models. nlmefit - Nonlinear mixed-effects data fitting.nlpredci - Confidence intervals for prediction.nlparci - Confidence intervals for parameters.Design of Experiments (DOE).bbdesign - Box-Behnken design.candexch - D-optimal design (row exchange algorithm for candidate set). candgen - Candidates set for D-optimal design generation.ccdesign - Central composite design.cordexch - D-optimal design (coordinate exchange algorithm). daugment - Augment D-optimal design.dcovary - D-optimal design with fixed covariates.fracfactgen - Fractional factorial design generators.ff2n - Two-level full-factorial design.fracfact - Two-level fractional factorial design.fullfact - Mixed-level full-factorial design.hadamard - Hadamard matrices (orthogonal arrays) (in MATLAB toolbox). lhsdesign - Latin hypercube sampling design.lhsnorm - Latin hypercube multivariate normal sample.rowexch - D-optimal design (row exchange algorithm).Statistical Process Control (SPC).capability - Capability indices.capaplot - Capability plot.controlchart - Shewhart control chart.controlrules - Control rules (Western Electric or Nelson) for SPC data.gagerr - Gage repeatability and reproducibility (R&R) study. histfit - Histogram with superimposed normal density. normspec - Plot normal density between specification limits. runstest - Runs test for randomness.Multivariate Statistics.Cluster Analysis.cophenet - Cophenetic coefficient.cluster - Construct clusters from LINKAGE output. clusterdata - Construct clusters from data.dendrogram - Generate dendrogram plot.gmdistribution - Gaussian mixture model estimation. inconsistent - Inconsistent values of a cluster tree.kmeans - k-means clustering.linkage - Hierarchical cluster information.pdist - Pairwise distance between observations. silhouette - Silhouette plot of clustered data.squareform - Square matrix formatted distance. Classification.classify - Linear discriminant analysis.NaiveBayes - Naive Bayes classification.Dimension Reduction T echniques.factoran - Factor analysis.nnmf - Non-negative matrix factorization.pcacov - Principal components from covariance matrix. pcares - Residuals from principal components.princomp - Principal components analysis from raw data. rotatefactors - Rotation of FA or PCA loadings.Copulascopulacdf - Cumulative probability function for a copula. copulafit - Fit a parametric copula to data.copulaparam - Copula parameters as a function of rank correlation. copulapdf - Probability density function for a copula. copularnd - Random vectors from a copula.copulastat - Rank correlation for a copula.Plotting.andrewsplot - Andrews plot for multivariate data.biplot - Biplot of variable/factor coefficients and scores. interactionplot - Interaction plot for factor effects. maineffectsplot - Main effects plot for factor effects.glyphplot - Plot stars or Chernoff faces for multivariate data. gplotmatrix - Matrix of scatter plots grouped by a common variable. multivarichart - Multi-vari chart of factor effects.parallelcoords - Parallel coordinates plot for multivariate data.Other Multivariate Methods.barttest - Bartlett's test for dimensionality.canoncorr - Canonical correlation analysis.cmdscale - Classical multidimensional scaling.mahal - Mahalanobis distance.manova1 - One-way multivariate analysis of variance.mdscale - Metric and non-metric multidimensional scaling. mvregress - Multivariate regression with missing data.plsregress - Partial least squares regression.procrustes - Procrustes analysis.Decision Tree Techniques.classregtree - Classification and regression tree.TreeBagger - Ensemble of bagged decision trees. CompactTreeBagger - Lightweight ensemble of bagged decision trees. Hypothesis Tests.ansaribradley - Ansari-Bradley two-sample test for equal dispersions. dwtest - Durbin-Watson test for autocorrelation in linear regression. linhyptest - Linear hypothesis test on parameter estimates.ranksum - Wilcoxon rank sum test (independent samples). runstest - Runs test for randomness.sampsizepwr - Sample size and power calculation for hypothesis test. signrank - Wilcoxon sign rank test (paired samples).signtest - Sign test (paired samples).ttest - One sample t test.ttest2 - Two sample t test.vartest - One-sample test of variance.vartest2 - Two-sample F test for equal variances.vartestn - Test for equal variances across multiple groups.ztest - Z test.Distribution Testing.chi2gof - Chi-square goodness-of-fit test.jbtest - Jarque-Bera test of normality.kstest - Kolmogorov-Smirnov test for one sample.kstest2 - Kolmogorov-Smirnov test for two samples.lillietest - Lilliefors test of normality.Nonparametric Functions.friedman - Friedman's test (nonparametric two-way anova). kruskalwallis - Kruskal-Wallis test (nonparametric one-way anova). ksdensity - Kernel smoothing density estimation.ranksum - Wilcoxon rank sum test (independent samples). signrank - Wilcoxon sign rank test (paired samples).signtest - Sign test (paired samples).Hidden Markov Models.hmmdecode - Calculate HMM posterior state probabilities.hmmestimate - Estimate HMM parameters given state information.hmmgenerate - Generate random sequence for HMM.hmmtrain - Calculate maximum likelihood estimates for HMM parameters.hmmviterbi - Calculate most probable state path for HMM sequence.Model Assessment.confusionmat - Confusion matrix for classification algorithms.crossval - Loss estimate using cross-validation.cvpartition - Cross-validation partition.perfcurve - ROC and other performance measures for classification algorithms.Model Selection.sequentialfs - Sequential feature selection.stepwise - Interactive tool for stepwise regression.stepwisefit - Non-interactive stepwise regression.Statistical Plotting.andrewsplot - Andrews plot for multivariate data.biplot - Biplot of variable/factor coefficients and scores.boxplot - Boxplots of a data matrix (one per column).cdfplot - Plot of empirical cumulative distribution function. ecdf - Empirical cdf (Kaplan-Meier estimate).ecdfhist - Histogram calculated from empirical cdf.fsurfht - Interactive contour plot of a function.gline - Point, drag and click line drawing on figures. glyphplot - Plot stars or Chernoff faces for multivariate data. gname - Interactive point labeling in x-y plots.gplotmatrix - Matrix of scatter plots grouped by a common variable. gscatter - Scatter plot of two variables grouped by a third.hist - Histogram (in MATLAB toolbox).hist3 - Three-dimensional histogram of bivariate data. ksdensity - Kernel smoothing density estimation.lsline - Add least-square fit line to scatter plot.normplot - Normal probability plot.parallelcoords - Parallel coordinates plot for multivariate data. probplot - Probability plot.qqplot - Quantile-Quantile plot.refcurve - Reference polynomial curve.refline - Reference line.scatterhist - 2D scatter plot with marginal histograms.surfht - Interactive contour plot of a data grid.wblplot - Weibull probability plot.Data Objectsdataset - Create datasets from workspace variables or files. nominal - Create arrays of nominal data.ordinal - Create arrays of ordinal data.Statistics Demos.aoctool - Interactive tool for analysis of covariance.disttool - GUI tool for exploring probability distribution functions. polytool - Interactive graph for prediction of fitted polynomials. randtool - GUI tool for generating random numbers.rsmdemo - Reaction simulation (DOE, RSM, nonlinear curve fitting). robustdemo - Interactive tool to compare robust and least squares fits. File Based I/O.tblread - Read in data in tabular format.tblwrite - Write out data in tabular format to file.tdfread - Read in text and numeric data from tab-delimited file. caseread - Read in case names.casewrite - Write out case names to file.Utility Functions.cholcov - Cholesky-like decomposition for covariance matrix.combnk - Enumeration of all combinations of n objects k at a time. corrcov - Convert covariance matrix to correlation matrix.grp2idx - Convert grouping variable to indices and array of names. hougen - Prediction function for Hougen model (nonlinear example). statget - Get STATS options parameter value.statset - Set STATS options parameter value.tiedrank - Compute ranks of sample, adjusting for ties.zscore - Normalize matrix columns to mean 0, variance 1. Overloaded methods:xregusermod/statsxregunispline/statsxregnnet/statsxregmultilin/statsxregmodel/statsxreglinear/statsxreginterprbf/stats。
Distributions.Parameter estimation.betafit - Beta parameter estimation.binofit - Binomial parameter estimation.dfittool - Distribution fitting tool.evfit - Extreme value parameter estimation.expfit - Exponential parameter estimation.fitdist - Distribution fitting.gamfit - Gamma parameter estimation.gevfit - Generalized extreme value parameter estimation.gmdistribution - Gaussian mixture model estimation.gpfit - Generalized Pareto parameter estimation.lognfit - Lognormal parameter estimation.mle - Maximum likelihood estimation (MLE).mlecov - Asymptotic covariance matrix of MLE.nbinfit - Negative binomial parameter estimation.normfit - Normal parameter estimation.paretotails - Empirical cdf with generalized Pareto tails.poissfit - Poisson parameter estimation.raylfit - Rayleigh parameter estimation.unifit - Uniform parameter estimation.wblfit - Weibull parameter estimation.Probability density functions (pdf).betapdf - Beta density.binopdf - Binomial density.chi2pdf - Chi square density.evpdf - Extreme value density.exppdf - Exponential density.fpdf - F density.gampdf - Gamma density.geopdf - Geometric density.gevpdf - Generalized extreme value density.gppdf - Generalized Pareto density.hygepdf - Hypergeometric density.lognpdf - Lognormal density.mnpdf - Multinomial probability density function.mvnpdf - Multivariate normal density.mvtpdf - Multivariate t density.nbinpdf - Negative binomial density.ncfpdf - Noncentral F density.nctpdf - Noncentral t density.ncx2pdf - Noncentral Chi-square density.normpdf - Normal (Gaussian) density.pdf - Density function for a specified distribution.poisspdf - Poisson density.raylpdf - Rayleigh density.tpdf - T density.unidpdf - Discrete uniform density.unifpdf - Uniform density.wblpdf - Weibull density.Cumulative Distribution functions (cdf).betacdf - Beta cumulative distribution function.binocdf - Binomial cumulative distribution function.cdf - Specified cumulative distribution function.chi2cdf - Chi square cumulative distribution function.ecdf - Empirical cumulative distribution function (Kaplan-Meier estimate). evcdf - Extreme value cumulative distribution function.expcdf - Exponential cumulative distribution function.fcdf - F cumulative distribution function.gamcdf - Gamma cumulative distribution function.geocdf - Geometric cumulative distribution function.gevcdf - Generalized extreme value cumulative distribution function.gpcdf - Generalized Pareto cumulative distribution function.hygecdf - Hypergeometric cumulative distribution function.logncdf - Lognormal cumulative distribution function.mvncdf - Multivariate normal cumulative distribution function.mvtcdf - Multivariate t cumulative distribution function.nbincdf - Negative binomial cumulative distribution function.ncfcdf - Noncentral F cumulative distribution function.nctcdf - Noncentral t cumulative distribution function.ncx2cdf - Noncentral Chi-square cumulative distribution function.normcdf - Normal (Gaussian) cumulative distribution function.poisscdf - Poisson cumulative distribution function.raylcdf - Rayleigh cumulative distribution function.tcdf - T cumulative distribution function.unidcdf - Discrete uniform cumulative distribution function.unifcdf - Uniform cumulative distribution function.wblcdf - Weibull cumulative distribution function.Critical Values of Distribution functions.betainv - Beta inverse cumulative distribution function.binoinv - Binomial inverse cumulative distribution function.chi2inv - Chi square inverse cumulative distribution function.evinv - Extreme value inverse cumulative distribution function.expinv - Exponential inverse cumulative distribution function.finv - F inverse cumulative distribution function.gaminv - Gamma inverse cumulative distribution function.geoinv - Geometric inverse cumulative distribution function.gevinv - Generalized extreme value inverse cumulative distribution function. gpinv - Generalized Pareto inverse cumulative distribution function. hygeinv - Hypergeometric inverse cumulative distribution function.icdf - Specified inverse cumulative distribution function.logninv - Lognormal inverse cumulative distribution function.nbininv - Negative binomial inverse distribution function.ncfinv - Noncentral F inverse cumulative distribution function.nctinv - Noncentral t inverse cumulative distribution function.ncx2inv - Noncentral Chi-square inverse distribution function.norminv - Normal (Gaussian) inverse cumulative distribution function. poissinv - Poisson inverse cumulative distribution function.raylinv - Rayleigh inverse cumulative distribution function.tinv - T inverse cumulative distribution function.unidinv - Discrete uniform inverse cumulative distribution function.unifinv - Uniform inverse cumulative distribution function.wblinv - Weibull inverse cumulative distribution function.Random Number Generators.betarnd - Beta random numbers.binornd - Binomial random numbers.chi2rnd - Chi square random numbers.evrnd - Extreme value random numbers.exprnd - Exponential random numbers.frnd - F random numbers.gamrnd - Gamma random numbers.geornd - Geometric random numbers.gevrnd - Generalized extreme value random numbers.gprnd - Generalized Pareto inverse random numbers.hygernd - Hypergeometric random numbers.iwishrnd - Inverse Wishart random matrix.johnsrnd - Random numbers from the Johnson system of distributions. lognrnd - Lognormal random numbers.mhsample - Metropolis-Hastings algorithm.mnrnd - Multinomial random vectors.mvnrnd - Multivariate normal random vectors.mvtrnd - Multivariate t random vectors.nbinrnd - Negative binomial random numbers.ncfrnd - Noncentral F random numbers.nctrnd - Noncentral t random numbers.ncx2rnd - Noncentral Chi-square random numbers.normrnd - Normal (Gaussian) random numbers.pearsrnd - Random numbers from the Pearson system of distributions.poissrnd - Poisson random numbers.randg - Gamma random numbers (unit scale). random - Random numbers from specified distribution. randsample - Random sample from finite population. raylrnd - Rayleigh random numbers.slicesample - Slice sampling method.trnd - T random numbers.unidrnd - Discrete uniform random numbers.unifrnd - Uniform random numbers.wblrnd - Weibull random numbers.wishrnd - Wishart random matrix.Quasi-Random Number Generators.haltonset - Halton sequence point set. qrandstream - Quasi-random stream.sobolset - Sobol sequence point set.Statistics.betastat - Beta mean and variance.binostat - Binomial mean and variance.chi2stat - Chi square mean and variance.evstat - Extreme value mean and variance.expstat - Exponential mean and variance.fstat - F mean and variance.gamstat - Gamma mean and variance.geostat - Geometric mean and variance.gevstat - Generalized extreme value mean and variance. gpstat - Generalized Pareto inverse mean and variance. hygestat - Hypergeometric mean and variance.lognstat - Lognormal mean and variance.nbinstat - Negative binomial mean and variance. ncfstat - Noncentral F mean and variance.nctstat - Noncentral t mean and variance.ncx2stat - Noncentral Chi-square mean and variance. normstat - Normal (Gaussian) mean and variance. poisstat - Poisson mean and variance.raylstat - Rayleigh mean and variance.tstat - T mean and variance.unidstat - Discrete uniform mean and variance.unifstat - Uniform mean and variance.wblstat - Weibull mean and variance.Likelihood functions.betalike - Negative beta log-likelihood.evlike - Negative extreme value log-likelihood.explike - Negative exponential log-likelihood.gamlike - Negative gamma log-likelihood.gevlike - Generalized extreme value log-likelihood.gplike - Generalized Pareto inverse log-likelihood.lognlike - Negative lognormal log-likelihood.nbinlike - Negative binomial log-likelihood.normlike - Negative normal likelihood.wbllike - Negative Weibull log-likelihood.Probability distribution objects.ProbDistUnivKernel - Univariate kernel smoothing distributions. ProbDistUnivParam - Univariate parametric distributions.Descriptive Statistics.bootci - Bootstrap confidence intervals.bootstrp - Bootstrap statistics.corr - Linear or rank correlation coefficient.corrcoef - Linear correlation coefficient (in MATLAB toolbox).cov - Covariance (in MATLAB toolbox).crosstab - Cross tabulation.geomean - Geometric mean.grpstats - Summary statistics by group.harmmean - Harmonic mean.iqr - Interquartile range.jackknife - Jackknife statistics.kurtosis - Kurtosis.mad - Median Absolute Deviation.mean - Sample average (in MATLAB toolbox).median - 50th percentile of a sample (in MATLAB toolbox).mode - Mode, or most frequent value in a sample (in MATLAB toolbox). moment - Moments of a sample.nancov - Covariance matrix ignoring NaNs.nanmax - Maximum ignoring NaNs.nanmean - Mean ignoring NaNs.nanmedian - Median ignoring NaNs.nanmin - Minimum ignoring NaNs.nanstd - Standard deviation ignoring NaNs.nansum - Sum ignoring NaNs.nanvar - Variance ignoring NaNs.partialcorr - Linear or rank partial correlation coefficient.prctile - Percentiles.quantile - Quantiles.range - Range.skewness - Skewness.std - Standard deviation (in MATLAB toolbox).tabulate - Frequency table.trimmean - Trimmed mean.var - Variance (in MATLAB toolbox).Linear Models.addedvarplot - Created added-variable plot for stepwise regression.anova1 - One-way analysis of variance.anova2 - Two-way analysis of variance.anovan - n-way analysis of variance.aoctool - Interactive tool for analysis of covariance.dummyvar - Dummy-variable coding.friedman - Friedman's test (nonparametric two-way anova).glmfit - Generalized linear model fitting.glmval - Evaluate fitted values for generalized linear model.invpred - Inverse prediction for simple linear regression.kruskalwallis - Kruskal-Wallis test (nonparametric one-way anova).leverage - Regression diagnostic.lscov - Ordinary, weighted, or generalized least-squares (in MATLAB toolbox). lsqnonneg - Non-negative least-squares (in MATLAB toolbox).manova1 - One-way multivariate analysis of variance.manovacluster - Draw clusters of group means for manova1.mnrfit - Nominal or ordinal multinomial regression model fitting.mnrval - Predict values for nominal or ordinal multinomial regression. multcompare - Multiple comparisons of means and other estimates.mvregress - Multivariate regression with missing data.mvregresslike - Negative log-likelihood for multivariate regression.polyconf - Polynomial evaluation and confidence interval estimation.polyfit - Least-squares polynomial fitting (in MATLAB toolbox).polyval - Predicted values for polynomial functions (in MATLAB toolbox). rcoplot - Residuals case order plot.regress - Multiple linear regression using least squares.regstats - Regression diagnostics.ridge - Ridge regression.robustfit - Robust regression model fitting.rstool - Multidimensional response surface visualization (RSM).stepwise - Interactive tool for stepwise regression.stepwisefit - Non-interactive stepwise regression.x2fx - Factor settings matrix (x) to design matrix (fx).Nonlinear Models.coxphfit - Cox proportional hazards regression.nlinfit - Nonlinear least-squares data fitting.nlintool - Interactive graphical tool for prediction in nonlinear models. nlmefit - Nonlinear mixed-effects data fitting.nlpredci - Confidence intervals for prediction.nlparci - Confidence intervals for parameters.Design of Experiments (DOE).bbdesign - Box-Behnken design.candexch - D-optimal design (row exchange algorithm for candidate set). candgen - Candidates set for D-optimal design generation.ccdesign - Central composite design.cordexch - D-optimal design (coordinate exchange algorithm). daugment - Augment D-optimal design.dcovary - D-optimal design with fixed covariates.fracfactgen - Fractional factorial design generators.ff2n - Two-level full-factorial design.fracfact - Two-level fractional factorial design.fullfact - Mixed-level full-factorial design.hadamard - Hadamard matrices (orthogonal arrays) (in MATLAB toolbox). lhsdesign - Latin hypercube sampling design.lhsnorm - Latin hypercube multivariate normal sample.rowexch - D-optimal design (row exchange algorithm).Statistical Process Control (SPC).capability - Capability indices.capaplot - Capability plot.controlchart - Shewhart control chart.controlrules - Control rules (Western Electric or Nelson) for SPC data.gagerr - Gage repeatability and reproducibility (R&R) study.histfit - Histogram with superimposed normal density.normspec - Plot normal density between specification limits.runstest - Runs test for randomness.Multivariate Statistics.Cluster Analysis.cophenet - Cophenetic coefficient.cluster - Construct clusters from LINKAGE output.clusterdata - Construct clusters from data.dendrogram - Generate dendrogram plot.gmdistribution - Gaussian mixture model estimation.inconsistent - Inconsistent values of a cluster tree.kmeans - k-means clustering.linkage - Hierarchical cluster information.pdist - Pairwise distance between observations.silhouette - Silhouette plot of clustered data.squareform - Square matrix formatted distance.Classification.classify - Linear discriminant analysis.NaiveBayes - Naive Bayes classification.Dimension Reduction Techniques.factoran - Factor analysis.nnmf - Non-negative matrix factorization.pcacov - Principal components from covariance matrix. pcares - Residuals from principal components.princomp - Principal components analysis from raw data. rotatefactors - Rotation of FA or PCA loadings.Copulascopulacdf - Cumulative probability function for a copula. copulafit - Fit a parametric copula to data.copulaparam - Copula parameters as a function of rank correlation. copulapdf - Probability density function for a copula. copularnd - Random vectors from a copula.copulastat - Rank correlation for a copula.Plotting.andrewsplot - Andrews plot for multivariate data.biplot - Biplot of variable/factor coefficients and scores. interactionplot - Interaction plot for factor effects. maineffectsplot - Main effects plot for factor effects.glyphplot - Plot stars or Chernoff faces for multivariate data. gplotmatrix - Matrix of scatter plots grouped by a common variable. multivarichart - Multi-vari chart of factor effects. parallelcoords - Parallel coordinates plot for multivariate data.Other Multivariate Methods.barttest - Bartlett's test for dimensionality.canoncorr - Canonical correlation analysis.cmdscale - Classical multidimensional scaling.mahal - Mahalanobis distance.manova1 - One-way multivariate analysis of variance. mdscale - Metric and non-metric multidimensional scaling. mvregress - Multivariate regression with missing data. plsregress - Partial least squares regression.procrustes - Procrustes analysis.Decision Tree Techniques.classregtree - Classification and regression tree.TreeBagger - Ensemble of bagged decision trees. CompactTreeBagger - Lightweight ensemble of bagged decision trees.Hypothesis Tests.ansaribradley - Ansari-Bradley two-sample test for equal dispersions.dwtest - Durbin-Watson test for autocorrelation in linear regression. linhyptest - Linear hypothesis test on parameter estimates.ranksum - Wilcoxon rank sum test (independent samples).runstest - Runs test for randomness.sampsizepwr - Sample size and power calculation for hypothesis test. signrank - Wilcoxon sign rank test (paired samples).signtest - Sign test (paired samples).ttest - One sample t test.ttest2 - Two sample t test.vartest - One-sample test of variance.vartest2 - Two-sample F test for equal variances.vartestn - Test for equal variances across multiple groups.ztest - Z test.Distribution Testing.chi2gof - Chi-square goodness-of-fit test.jbtest - Jarque-Bera test of normality.kstest - Kolmogorov-Smirnov test for one sample.kstest2 - Kolmogorov-Smirnov test for two samples.lillietest - Lilliefors test of normality.Nonparametric Functions.friedman - Friedman's test (nonparametric two-way anova). kruskalwallis - Kruskal-Wallis test (nonparametric one-way anova). ksdensity - Kernel smoothing density estimation.ranksum - Wilcoxon rank sum test (independent samples).signrank - Wilcoxon sign rank test (paired samples).signtest - Sign test (paired samples).Hidden Markov Models.hmmdecode - Calculate HMM posterior state probabilities. hmmestimate - Estimate HMM parameters given state information. hmmgenerate - Generate random sequence for HMM.hmmtrain - Calculate maximum likelihood estimates for HMM parameters. hmmviterbi - Calculate most probable state path for HMM sequence.Model Assessment.confusionmat - Confusion matrix for classification algorithms.crossval - Loss estimate using cross-validation.cvpartition - Cross-validation partition.perfcurve - ROC and other performance measures for classification algorithms.Model Selection.sequentialfs - Sequential feature selection.stepwise - Interactive tool for stepwise regression.stepwisefit - Non-interactive stepwise regression.Statistical Plotting.andrewsplot - Andrews plot for multivariate data.biplot - Biplot of variable/factor coefficients and scores.boxplot - Boxplots of a data matrix (one per column).cdfplot - Plot of empirical cumulative distribution function.ecdf - Empirical cdf (Kaplan-Meier estimate).ecdfhist - Histogram calculated from empirical cdf.fsurfht - Interactive contour plot of a function.gline - Point, drag and click line drawing on figures.glyphplot - Plot stars or Chernoff faces for multivariate data.gname - Interactive point labeling in x-y plots.gplotmatrix - Matrix of scatter plots grouped by a common variable.gscatter - Scatter plot of two variables grouped by a third.hist - Histogram (in MATLAB toolbox).hist3 - Three-dimensional histogram of bivariate data.ksdensity - Kernel smoothing density estimation.lsline - Add least-square fit line to scatter plot.normplot - Normal probability plot.parallelcoords - Parallel coordinates plot for multivariate data.probplot - Probability plot.qqplot - Quantile-Quantile plot.refcurve - Reference polynomial curve.refline - Reference line.scatterhist - 2D scatter plot with marginal histograms.surfht - Interactive contour plot of a data grid.wblplot - Weibull probability plot.Data Objectsdataset - Create datasets from workspace variables or files.nominal - Create arrays of nominal data.ordinal - Create arrays of ordinal data.Statistics Demos.aoctool - Interactive tool for analysis of covariance.disttool - GUI tool for exploring probability distribution functions.polytool - Interactive graph for prediction of fitted polynomials. randtool - GUI tool for generating random numbers.rsmdemo - Reaction simulation (DOE, RSM, nonlinear curve fitting). robustdemo - Interactive tool to compare robust and least squares fits.File Based I/O.tblread - Read in data in tabular format.tblwrite - Write out data in tabular format to file.tdfread - Read in text and numeric data from tab-delimited file. caseread - Read in case names.casewrite - Write out case names to file.Utility Functions.cholcov - Cholesky-like decomposition for covariance matrix. combnk - Enumeration of all combinations of n objects k at a time. corrcov - Convert covariance matrix to correlation matrix.grp2idx - Convert grouping variable to indices and array of names. hougen - Prediction function for Hougen model (nonlinear example). statget - Get STATS options parameter value.statset - Set STATS options parameter value.tiedrank - Compute ranks of sample, adjusting for ties.zscore - Normalize matrix columns to mean 0, variance 1.Overloaded methods:xregusermod/statsxregunispline/statsxregnnet/statsxregmultilin/statsxregmodel/statsxreglinear/statsxreginterprbf/statsxregarx/stats。
Package‘glmmfields’October20,2023Type PackageTitle Generalized Linear Mixed Models with Robust Random Fields for Spatiotemporal ModelingVersion0.1.8Description Implements Bayesian spatial and spatiotemporalmodels that optionally allow for extreme spatial deviations throughtime.'glmmfields'uses a predictive process approach with randomfields implemented through a multivariate-t distribution instead ofthe usual multivariate normal.Sampling is conducted with'Stan'.References:Anderson and Ward(2019)<doi:10.1002/ecy.2403>. License GPL(>=3)URL https:///seananderson/glmmfieldsBugReports https:///seananderson/glmmfields/issues Depends methods,R(>=3.4.0),Rcpp(>=0.12.18)Imports assertthat,broom,broom.mixed,cluster,dplyr(>=0.8.0), forcats,ggplot2(>=2.2.0),loo(>=2.0.0),mvtnorm,nlme,RcppParallel(>=5.0.1),reshape2,rstan(>=2.26.0),rstantools(>=2.1.1),tibbleSuggests bayesplot,coda,knitr,parallel,rmarkdown,testthat,viridisLinkingTo BH(>=1.66.0),Rcpp(>=0.12.8),RcppEigen(>=0.3.3.3.0), RcppParallel(>=5.0.1),rstan(>=2.26.0),StanHeaders(>=2.26.0)VignetteBuilder knitrEncoding UTF-8RoxygenNote7.2.3SystemRequirements GNU makeNeedsCompilation yesBiarch true12glmmfields-package Author Sean C.Anderson[aut,cre],Eric J.Ward[aut],Trustees of Columbia University[cph]Maintainer Sean C.Anderson<********************>Repository CRANDate/Publication2023-10-2017:50:02UTCR topics documented:glmmfields-package (2)format_data (3)glmmfields (4)lognormal (7)loo.glmmfields (8)nbinom2 (8)plot.glmmfields (9)predict (10)sim_glmmfields (12)stan_pars (13)student_t (14)tidy (14)Index16 glmmfields-package The’glmmfields’package.DescriptionImplements Bayesian spatial and spatiotemporal models that optionally allow for extreme spatial deviations through time.’glmmfields’uses a predictive process approach with randomfields im-plemented through a multivariate-t distribution instead of the usual multivariate normal.Sampling is conducted with’Stan’.ReferencesStan Development Team(2018).RStan:the R interface to Stan.R package version2.18.2.format_data3 format_data Format data forfitting a glmmfields modelDescriptionFormat data forfitting a glmmfields modelUsageformat_data(data,y,X,time,lon="lon",lat="lat",station=NULL,nknots=25L,covariance=c("squared-exponential","exponential","matern"),fixed_intercept=FALSE,cluster=c("pam","kmeans"))Argumentsdata A data frame to be formattedy A numeric vector of the responseX A matrix of the predictorstime A character object giving the name of the time columnlon A character object giving the name of the longitude columnlat A character object giving the name of the latitude columnstation A numeric vector giving the integer ID of the stationnknots The number of knotscovariance The type of covariance functionfixed_interceptShould the intercept befixed?cluster The type of clustering algorithm used to determine the not locations."pam"= pam.kmeans is faster for large datasets.glmmfields Fit a spatiotemporal randomfields GLMMDescriptionFit a spatiotemporal randomfields model that optionally uses the MVT distribution instead of a MVN distribution to allow for spatial extremes through time.It is also possible tofit a spatial randomfields model without a time component.Usageglmmfields(formula,data,lon,lat,time=NULL,nknots=15L,prior_gp_theta=half_t(3,0,5),prior_gp_sigma=half_t(3,0,5),prior_sigma=half_t(3,0,5),prior_rw_sigma=half_t(3,0,5),prior_intercept=student_t(3,0,10),prior_beta=student_t(3,0,3),prior_phi=student_t(1000,0,0.5),fixed_df_value=1000,fixed_phi_value=0,estimate_df=FALSE,estimate_ar=FALSE,family=gaussian(link="identity"),binomial_N=NULL,covariance=c("squared-exponential","exponential","matern"),matern_kappa=0.5,algorithm=c("sampling","meanfield"),year_re=FALSE,nb_lower_truncation=0,control=list(adapt_delta=0.9),save_log_lik=FALSE,df_lower_bound=2,cluster=c("pam","kmeans"),offset=NULL,...)Argumentsformula The model formula.data A data frame.lon A character object giving the name of the longitude column.lat A character object giving the name of the latitude column.time A character object giving the name of the time column.Leave as NULL tofit a spatial GLMM without a time element.nknots The number of knots to use in the predictive process model.Smaller values will be faster but may not adequately represent the shape of the spatial pattern. prior_gp_theta The prior on the Gaussian Process scale parameter.Must be declared with half_t().Here,and throughout,priors that are normal or half-normal can beimplemented by setting thefirst parameter in the half-t or student-t distributionto a large value.E.g.something greater than100.prior_gp_sigma The prior on the Gaussian Process eta parameter.Must be declared with half_t(). prior_sigma The prior on the observation process scale parameter.Must be declared with half_t().This acts as a substitute for the scale parameter in whatever obser-vation distribution is being used.E.g.the CV for the Gamma or the dispersionparameter for the negative binomial.prior_rw_sigma The prior on the standard deviation parameter of the random walk process(if specified).Must be declared with half_t().prior_interceptThe prior on the intercept parameter.Must be declared with student_t(). prior_beta The prior on the slope parameters(if any).Must be declared with student_t(). prior_phi The prior on the AR parameter.Must be declared with student_t().fixed_df_value Thefixed value for the student-t degrees of freedom parameter if the degrees of freedom parameter isfixed in the MVT.If the degrees of freedom parameteris estimated then this argument is ignored.Must be1or greater.Very largevalues(e.g.the default value)approximate the normal distribution.If the valueis>=1000then a true MVN distribution will befit.fixed_phi_valueThefixed value for temporal autoregressive parameter,between randomfieldsat time(t)and time(t-1).If the phi parameter is estimated then this argument isignored.estimate_df Logical:should the degrees of freedom parameter be estimated?estimate_ar Logical:should the AR(autoregressive)parameter be estimated?Here,this refers to a autoregressive process in the evolution of the spatialfield throughtime.family Family object describing the observation model.Note that only one link is implemented for each distribution.Gamma,negative binomial(specified vianbinom2()as nbinom2(link="log"),and Poisson must have a log link.Bi-nomial must have a logit link.Also implemented is the lognormal(specifiedvia lognormal()as lognormal(link="log").Besides the negative binomialand lognormal,other families are specified as shown in family.binomial_N A character object giving the optional name of the column containing Binomial sample size.Leave as NULL tofit a spatial GLMM with sample sizes(N)=1,equivalent to bernoulli model.covariance The covariance function of the Gaussian Process.One of"squared-exponential", "exponential",or"matern".matern_kappa Optional parameter for the Matern covariance function.Optional values are1.5 or2.5.Values of0.5are equivalent to exponential.algorithm Character object describing whether the model should befit with full NUTS MCMC or via the variational inference mean-field approach.See rstan::vb().Note that the variational inference approach should not be trusted forfinal infer-ence and is much more likely to give incorrect inference than MCMC.year_re Logical:estimate a random walk for the time variable?If TRUE,then nofixed effects(B coefficients)will be estimated.In this case,prior_intercept willbe used as the prior for the initial value in time.nb_lower_truncationFor NB2only:lower truncation value. E.g.0for no truncation,1for1andall values above.Note that estimation is likely to be considerably slower withlower truncation because the sampling is not vectorized.Also note that thelog likelihood values returned for estimating quantities like LOOIC will not becorrect if lower truncation is implemented.control List to pass to rstan::sampling().For example,increase adapt_delta if there are warnings about divergent transitions:control=list(adapt_delta=0.99).By default,glmmfields sets adapt_delta=0.9.save_log_lik Logical:should the log likelihood for each data point be saved so that informa-tion criteria such as LOOIC or W AIC can be calculated?Defaults to FALSE sothat the size of model objects is smaller.df_lower_bound The lower bound on the degrees of freedom parameter.Values that are too low,e.g.below2or3,it might affect chain convergence.Defaults to2.cluster The type of clustering algorithm used to determine the knot locations."pam"= cluster::pam().The"kmeans"algorithm will be faster on larger datasets.offset An optional offset vector....Any other arguments to pass to rstan::sampling().DetailsNote that there is no guarantee that the default priors are reasonable for your data.Also,there is no guarantee the default priors will remain the same in future versions.Therefore it is important that you specify any priors that are used in your model,even if they replicate the defaults in the package.It is particularly important that you consider that prior on gp_theta since it depends on the distance between your location points.You may need to scale your coordinate units so they are on a ballpark range of1-10by,say,dividing the coordinates(say in UTMs)by several order of magnitude.Examples#Spatiotemporal example:set.seed(1)s<-sim_glmmfields(n_draws=12,n_knots=12,gp_theta=1.5,gp_sigma=0.2,sd_obs=0.2)lognormal7 print(s$plot)#options(mc.cores=parallel::detectCores())#for parallel processing#should use4or more chains for real model fitsm<-glmmfields(y~0,time="time",lat="lat",lon="lon",data=s$dat,nknots=12,iter=1000,chains=2,seed=1)#Spatial example(with covariates)from the vignette and customizing#some priors:set.seed(1)N<-100#number of data pointstemperature<-rnorm(N,0,1)#simulated temperature dataX<-cbind(1,temperature)#design matrixs<-sim_glmmfields(n_draws=1,gp_theta=1.2,n_data_points=N,gp_sigma=0.3,sd_obs=0.1,n_knots=12,obs_error="gamma",covariance="squared-exponential",X=X,B=c(0.5,0.2))#B represents our intercept and sloped<-s$datd$temperature<-temperaturelibrary(ggplot2)ggplot(s$dat,aes(lon,lat,colour=y))+viridis::scale_colour_viridis()+geom_point(size=3)m_spatial<-glmmfields(y~temperature,data=d,family=Gamma(link="log"),lat="lat",lon="lon",nknots=12,iter=2000,chains=2,prior_beta=student_t(100,0,1),prior_intercept=student_t(100,0,5),control=list(adapt_delta=0.95))lognormal Lognormal familyDescriptionLognormal familyUsagelognormal(link="log")Argumentslink The link(must be log)Exampleslognormal()8nbinom2 loo.glmmfields Return LOO information criteriaDescriptionExtract the LOOIC(leave-one-out information criterion)using loo::loo().Usage##S3method for class glmmfieldsloo(x,...)Argumentsx Output from glmmfields().Must befit with save_log_lik=TRUE,which is not the default....Arguments for loo::relative_eff()and loo::loo.array().Examplesset.seed(1)s<-sim_glmmfields(n_draws=12,n_knots=12,gp_theta=1.5,gp_sigma=0.2,sd_obs=0.2)#options(mc.cores=parallel::detectCores())#for parallel processing#save_log_lik defaults to FALSE to save space but is needed for loo():m<-glmmfields(y~0,time="time",lat="lat",lon="lon",data=s$dat,nknots=12,iter=1000,chains=4,seed=1,save_log_lik=TRUE)loo(m)nbinom2Negative binomial familyDescriptionThis is the NB2parameterization where the variance scales quadratically with the mean.Usagenbinom2(link="log")plot.glmmfields9Argumentslink The link(must be log)Examplesnbinom2()plot.glmmfields Plot predictions from an glmmfields modelDescriptionPlot predictions from an glmmfields modelUsage##S3method for class glmmfieldsplot(x,type=c("prediction","spatial-residual","residual-vs-fitted"),link=TRUE,...)Argumentsx An object returned by glmmfieldstype Type of plotlink Logical:should the plots be made on the link scale or on the natural scale?...Other arguments passed to predict.glmmfieldsExamples#Spatiotemporal example:set.seed(1)s<-sim_glmmfields(n_draws=12,n_knots=12,gp_theta=1.5,gp_sigma=0.2,sd_obs=0.1)#options(mc.cores=parallel::detectCores())#for parallel processingm<-glmmfields(y~0,time="time",lat="lat",lon="lon",data=s$dat,nknots=12,iter=600,chains=1)x<-plot(m,type="prediction")xx+ggplot2::scale_color_gradient2()plot(m,type="spatial-residual")plot(m,type="residual-vs-fitted")10predict predict Predict from a glmmfields modelDescriptionThese functions extract posterior draws or credible intervals.The helper functions are named to match those in the rstanarm package and call the function predict()with appropriate argument values.Usage##S3method for class glmmfieldspredictive_interval(object,...)##S3method for class glmmfieldsposterior_linpred(object,...)##S3method for class glmmfieldsposterior_predict(object,...)##S3method for class glmmfieldspredict(object,newdata=NULL,estimate_method=c("median","mean"),conf_level=0.95,interval=c("confidence","prediction"),type=c("link","response"),return_mcmc=FALSE,offset=NULL,iter="all",...)Argumentsobject An object returned by glmmfields()....Ignored currentlynewdata Optionally,a data frame to predict onestimate_methodMethod for computing point estimate("mean"or"median") conf_level Probability level for the credible intervals.interval Type of interval calculation.Same as for stats::predict.lm().type Whether the predictions are returned on"link"scale or"response"scale(Same as for stats::predict.glm()).predict11return_mcmc Logical.Should the full MCMC draws be returned for the predictions?offset Optional offset vector to be used in prediction.iter Number of MCMC iterations to draw.Defaults to all.Exampleslibrary(ggplot2)#simulate:set.seed(1)s<-sim_glmmfields(n_draws=12,n_knots=12,gp_theta=2.5,gp_sigma=0.2,sd_obs=0.1)#fit:#options(mc.cores=parallel::detectCores())#for parallel processingm<-glmmfields(y~0,data=s$dat,time="time",lat="lat",lon="lon",nknots=12,iter=800,chains=1)#Predictions:#Link scale credible intervals:p<-predict(m,type="link",interval="confidence")head(p)#Prediction intervals on new observations(include observation error):p<-predictive_interval(m)head(p)#Posterior prediction draws:p<-posterior_predict(m,iter=100)dim(p)#rows are iterations and columns are data elements#Draws from the linear predictor(not in link space):p<-posterior_linpred(m,iter=100)dim(p)#rows are iterations and columns are data elements#Use the tidy method to extract parameter estimates as a data frame:head(tidy(m,conf.int=TRUE,conf.method="HPDinterval"))#Make predictions on a fine-scale spatial grid:pred_grid<-expand.grid(lat=seq(min(s$dat$lat),max(s$dat$lat),length.out=25),lon=seq(min(s$dat$lon),max(s$dat$lon),length.out=25),time=unique(s$dat$time))pred_grid$prediction<-predict(m,newdata=pred_grid,type="response",iter=100,12sim_glmmfields estimate_method="median",offset=rep(0,nrow(pred_grid)))$estimateggplot(pred_grid,aes(lon,lat,fill=prediction))+facet_wrap(~time)+geom_raster()+scale_fill_gradient2()sim_glmmfields Simulate a randomfield with a MVT distributionDescriptionSimulate a randomfield with a MVT distributionUsagesim_glmmfields(n_knots=15,n_draws=10,gp_theta=0.5,gp_sigma=0.2,mvt=TRUE,df=1e+06,seed=NULL,n_data_points=100,sd_obs=0.1,covariance=c("squared-exponential","exponential","matern"),matern_kappa=0.5,obs_error=c("normal","gamma","poisson","nb2","binomial","lognormal"),B=c(0),phi=0,X=rep(1,n_draws*n_data_points),g=data.frame(lon=runif(n_data_points,0,10),lat=runif(n_data_points,0,10)) )Argumentsn_knots The number of knotsn_draws The number of draws(for example,the number of years)gp_theta The Gaussian Process scale parametergp_sigma The Gaussian Process variance parametermvt Logical:MVT?(vs.MVN)df The degrees of freedom parameter for the MVT distributionseed The random seed valuestan_pars13n_data_points The number of data points per drawsd_obs The observation process scale parametercovariance The covariance function of the Gaussian process("squared-exponential","expo-nential","matern")matern_kappa The optional matern parameter.Can be1.5or2.5.Values of0.5equivalent to exponential model.obs_error The observation error distributionB A vector of parameters.Thefirst element is the interceptphi The auto regressive parameter on the mean of the randomfield knotsX The model matrixg Grid of pointsExampless<-sim_glmmfields(n_draws=12,n_knots=12,gp_theta=1.5,gp_sigma=0.2,sd_obs=0.2)names(s)stan_pars Return a vector of parametersDescriptionReturn a vector of parametersUsagestan_pars(obs_error,estimate_df=TRUE,est_temporalRE=FALSE,estimate_ar=FALSE,fixed_intercept=FALSE,save_log_lik=FALSE)Argumentsobs_error The observation error distributionestimate_df Logical indicating whether the degrees of freedom parameter should be esti-matedest_temporalRE Logical:estimate a random walk for the time variable?estimate_ar Logical indicating whether the ar parameter should be estimated14tidy fixed_interceptShould the intercept befixed?save_log_lik Logical:should the log likelihood for each data point be saved so that informa-tion criteria such as LOOIC or W AIC can be calculated?Defaults to FALSE sothat the size of model objects is smaller.student_t Student-t and half-t priorsDescriptionStudent-t and half-t priors.Note that this can be used to represent an effectively normal distribution prior by setting thefirst argument(the degrees of freedom parameter)to a large value(roughly50 or above).Usagestudent_t(df=3,location=0,scale=1)half_t(df=3,location=0,scale=1)Argumentsdf Degrees of freedom parameterlocation Location parameterscale Scale parameterExamplesstudent_t(3,0,1)half_t(3,0,1)tidy Tidy model outputDescriptionTidy model outputUsagetidy(x,...)##S3method for class glmmfieldstidy(x,...)tidy15Argumentsx Output from glmmfields()...Other argumentsIndexcluster::pam(),6family,5format_data,3glmmfields,4,9glmmfields(),8,10,15glmmfields-package,2half_t(student_t),14half_t(),5lognormal,7lognormal(),5loo(loo.glmmfields),8loo.glmmfields,8loo::loo(),8loo::loo.array(),8loo::relative_eff(),8nbinom2,8nbinom2(),5pam,3plot.glmmfields,9posterior_linpred(predict),10posterior_predict(predict),10predict,10predict.glmmfields,9predictive_interval(predict),10 rstan::sampling(),6rstan::vb(),6sim_glmmfields,12stan_pars,13stats::predict.glm(),10stats::predict.lm(),10student_t,14student_t(),5tidy,1416。
泊松自回归模型matlab全文共四篇示例,供读者参考第一篇示例:泊松自回归模型(Poisson Autoregressive Model)是一种用于计数数据分析的统计模型,常用于分析时间序列数据中的计数变量。
该模型主要用于描述某一时间点上计数变量的取值与之前时间点计数变量的取值之间的关系,并且考虑到计数数据的离散性和非负性。
在实际应用中,泊松自回归模型通常被应用于疾病发生率、环境污染、人口增长等领域的数据分析中,用来建立计数变量和时间相关性的模型,预测未来的计数值。
在本文中,我们将介绍如何使用Matlab软件来实现泊松自回归模型。
一、泊松分布简介泊松分布是概率论中常用的一种分布,用于描述单位时间或单位面积内随机事件的次数。
泊松分布的概率质量函数为:P(X=k) = (λ^k * e^(-λ)) / k!λ是随机事件在单位时间或单位面积上的平均发生率,k是随机事件发生的次数。
二、泊松自回归模型的定义泊松自回归模型是一种基于泊松分布的时间序列模型,用于描述计数变量在时间上的自回归关系。
泊松自回归模型的一般形式为:Y(t) = α + β1 * Y(t-1) + β2 * Y(t-2) + ... + βp * Y(t-p) + ε(t)Y(t)是在时间t上的计数变量的取值,α是截距,β1,β2,...,βp 是模型的回归系数,p是自回归阶数,ε(t)是误差项。
三、使用Matlab实现泊松自回归模型在Matlab中,可以使用泊松回归函数fitglm()来实现泊松自回归模型的拟合。
以下是一个简单的示例代码:```matlab% 生成模拟数据t = 1:100;Y = poissrnd(5,100,1);% 构建泊松自回归模型mdl = fitglm(t,Y,'poisson','Distribution','poisson');% 查看模型参数disp(mdl)```在上述代码中,首先生成了一个包含100个计数变量的模拟数据Y,然后使用fitglm()函数来拟合泊松自回归模型,指定分布类型为poisson。
R语言常用函数整理提示:碰到不懂的函数可以输入“?函数名”,前提条件是需要先安装包,使用命令“istall.packages(“包名”) 或菜单安装。
再载入包,除了几个基本包外,其他的包需要用“library(包名)”载入。
常用计量函数函数用途所在包线性回归及放宽条件lm 做线性回归statssummary() 返回回顾系数t、F检验等statsstatsglm 广义线性回归(probit logit passion回归以及WLS估计等)maxLik 极大似然估计(线性和非线性)maxLikpredict 求回归预测(对绝对部分模型都适用)statscoef 求回归结果系数statscor 求变量间person相关系数和spearman秩相关系数statsresid 返回回归残差statsfitted 返回拟合值statsscale 对数据进行标准化statslm.ridge 岭回归MASSplsr 偏最小二乘法plspcr 主成分回归plsbptest Breusch-Pagan异方差检验lmtest bartlett.test 做变量间方差齐性检验statsdwtest 做DW检验lmtestAIC 返回模型的AIC值statsvar.test 非参数方差齐性检验statsvif 求方差膨胀因子carapropos(“test”) 返回统计常用检验statsconfint() 计算回归模型参数的置信区间stats非线性优化和非线性回归optimize 做一元非线性优化statsoptim 做多元非线性优化stats constrOptim 约束下的非线性优化statsnls 非线性(加权)最小二乘估计statsmaxLik 非线性极大似然估计maxLiklogLik 求回归模型对数似然值statsexpand.grid 求格点statsnls2 类似于nls,但增加了brute-force算法nls2selfstart 生成自动初始值函数stats时间序列常用函数 时间序列描述统计exp() 求指数stats log() log()求自然对数,log10()求常对数,log2(),以2为底对数stats mean() 求向量均值 stats var() 求向量方差 stats sd() 求向量标准差 stats skewness 求向量偏度 e1071 kurtosis 求向量峰度e1071FinTS.stats 求时间序列描述统计量(包括均值、标准差、偏度、峰度等) FinTS t.test 检验时间序列均值是否为零(实际上可作单、双样本检验) statsARMA 相关函数ts 转换为时间序列格式 stats ts.plot 作时序图 stats diff.ts时序差分statsgetInitial 从自动生成初始值函数提取初始值 stats动态经济模型ts 把数据转换成时间序列格式 stats ts .union 合并(bind)时间序列数据 stats lag 对时间序列格式数据滞后 stats grangertest 葛兰杰因果关系检验lmtest 联立方程组systemfit 做联立方程2SLS 、3SLS 、SUR 估计等 systemfit cbind 对数据按列合并 base rbind 对数据按行合并base 离散因变量glmfamily=binomial(link=”probit”) 两元probit 模型 family=binomial(link=”logit”)两元logit 模型 family=passion 泊松回归 stats mlogit 多元logit 模型 mlogit polr 有序多元因变量模型 MASS(VR) stepAIC 利用AIC 准则做逐步回归 MASS tobit做tobit 模型AER面板数据分析plm 做面板数据固定效应、随机效应(包括个体、时间及其两者效应) plm phtest 面板数据Hausman 检验 plm pvcm 面板数据变系数估计 plmas.Date 把非时间向量转为时间向量 stats acf 求自相关函数和作偏自相关函数图 stats pacf 求偏自相关函数和作偏自相关函数图 stats Box.test 作序列自相关B-P 和L-B 检验stats ar 求自回归模型(包括ar.ols,ar.mle,ar.yw,ar.burg ) stats arima 求ARMA 、ARIMA 模型stats ARIMA 引用arima 函数,并增加了残差自相关L-B 检验 FinTS arma 使用条件最小二乘法估计,可任意设定滞后阶数(lag) tseries predict 作预测stats 等 tsdiag 时间序列诊断检验 stats adf.test ADF 检验tseries urdfTest ADF 检验(推荐使用) fUnitRoots kpss.test KPSS 平稳性检验 tseries pp.test Phillips-Perron 单位根检验 tseries Arima.sim 模拟生成给定ARIMA 模型的数据 stats FitAR 估计AR 模型及特定阶的AR 模型 FitAR自回归条件异方差相关函数garch GARCHtseriesgarchOxFit该函数求GARCH 相关模型非常方便, 求GARCH ,IGARCH,EGARCH,GARCH-M, T-GARCH 等 引用OX 软件的G@ARCH 。
numnpy中lognormal的概率密度分布函数-回复首先,我们来讨论一下NumPy中的lognormal分布。
lognormal分布是一种连续概率分布,它的概率密度函数具有以下形式:f(x, s) = (1 / (s * x * sqrt(2 * pi))) * exp(-1 * (log(x) - m)^2 / (2 * s^2))其中,m是对数平均值,s是对数标准差。
该概率密度函数表示了随机变量的对数服从正态分布的概率。
那么,我们如何使用NumPy来生成lognormal分布的概率密度函数呢?第一步是导入NumPy库:import numpy as np接下来,我们可以使用np.random.lognormal()函数来生成lognormal 分布的随机样本。
该函数的参数包括对数平均值m、对数标准差s以及样本个数size。
例如,生成1000个符合lognormal分布的随机样本,其中对数平均值为0,对数标准差为1:samples = np.random.lognormal(0, 1, 1000)然后,我们可以使用np.histogram()函数来计算概率密度函数的直方图。
该函数的第一个参数是随机样本,第二个参数是直方图的箱数。
hist, bin_edges = np.histogram(samples, bins=50, density=True)注意,我们将density参数设置为True,以计算概率密度函数的直方图。
现在,我们可以计算每个箱的中心值,作为概率密度函数的x轴坐标:bin_centers = (bin_edges[:-1] + bin_edges[1:]) / 2接下来,我们可以使用matplotlib库来绘制lognormal分布的概率密度函数图形。
首先,我们需要导入matplotlib库:import matplotlib.pyplot as plt然后,使用plt.plot()函数绘制概率密度函数的图形:plt.plot(bin_centers, hist)最后,我们可以使用plt.xlabel()和plt.ylabel()函数来为x轴和y轴添加标签,以及使用plt.title()函数来添加标题。
A Multivariate Poisson-Lognormal Regression Model for Predictionof Crash Counts by Severity, using Bayesian MethodsJianming Ma, Ph.D., The University of Texas at Austin.6.9 E. Cockrell Jr. Hall, Austin, TX 78712-1076, mjming@Kara M. Kockelman, Associate Professor & William J. Murray Jr. Fellow of Civil Engineering The University of Texas at Austin, 6.9 E. Cockrell Jr. Hall, Austin, TX 78712-1076 kkockelm@, Phone: 512-471-0210, FAX: 512-475-8744 Paul Damien, B. M. (Mack) Rankin, Jr. Professor in Business AdministrationThe University of Texas at Austin, CBA 5.242, Austin, TX 78712paul.damien@, Phone: 512-232-9461, FAX: 512-471-0587To be presented at the 86th Annual Meeting of the Transportation Research Board, January 2007 Submitted for publication consideration by Accident Analysis and Prevention, July 2006Resubmitted, for final review, December 2006ABSTRACTNumerous efforts have been devoted to investigating crash occurrence as related to roadway design features, environmental and traffic conditions. However, most of the research has relied on univariate count models; that is, traffic crash counts at different levels of severity are estimated separately, which may neglect shared information in unobserved error terms, reduce efficiency in parameter estimates, and lead to potential biases in sample databases. This paper offers a multivariate Poisson-lognormal (MVPLN) specification that simultaneously models injuries by severity. The MVPLN specification allows for a more general correlation structure as well as overdispersion. This approach addresses some questions that are difficult to answer by estimating them separately. With recent advancements in crash modeling and Bayesian statistics, the parameter estimation is done within the Bayesian paradigm, using a Gibbs Sampler and the Metropolis-Hastings (M-H) algorithms for crashes on Washington State rural two-lane highways. The estimation results from the MVPLN approach did show statistically significant correlations between crash counts at different levels of injury severity. The non-zero diagonal elements suggested overdispersion in crash counts at all levels of severity. The results lend themselves to several recommendations for highway safety treatments and design policies. For example, wide lanes and shoulders are key for reducing crash frequencies, as are longer vertical curves.KEY WORDSBayesian inference, Bayes’ theorem, crash severity, Gibbs sampler, highway safety, Metropolis-Hastings algorithm, Markov chain Monte Carlo (MCMC) simulation, multivariate Poisson-lognormal regressionINTRODUCTIONRoadway safety is a major concern for the general public and public agencies. Roadway crashes claim many lives and cause substantial economic losses each year. In the U.S. traffic crashes bring about more loss of human life (as measured in human-years) than almost any other cause – falling behind only cancer and heart disease (NHTSA, 2005). The situation is of particular interest on rural two-lane roadways, which experience significantly higher fatality rates than urban roads. The annual cost of traffic crashes is estimated to be $231 billion, or $820 per capita in 2000 (Blincoe et al., 2000). These costs do not include the cost of delays imposed on other travelers, which also are significant, particularly when crashes occur on busy roadways. Schrank and Lomax (2002) estimate that over half of all traffic delays are due to non-recurring events, such as crashes, costing on the order of $1,000 per peak-period driver per year, particularly in urban areas. Thus, while vehicle and roadway design are improving, and growing congestion may be reducing impact speeds, crashes are becoming more critical in many ways, particularly in societies that continue to motorize.Given the importance of roadways safety, there has been considerable crash prediction research (see, e.g., Hauer, 1986, 1997, and 2001; Abdel-Aty, and Radwan, 2000; Ulfarsson and Shankar, 2003; Kweon and Kockelman, 2000; Lord and Persaud, 2000; Lord et al., 2005; Ma and Kockelman, 2006; Karlaftis and Rarko, 1998; Shankar et al., 1998; Khattak et al., 2006). Crash frequencies are commonly collected by severity on relatively homogenous roadway segments, supporting the development of crash count models. However, such research has relied on univariate count models; that is, traffic crash counts at different levels of severity are estimated separately. The widely used univariate count data models ignore the following issues: interdependence due to latent factors is likely to exist across crash rates at different levels of severity for a specific segment of roadway. Recently, Ma and Kockelman (2006) applied a multivariate Poisson (MVP) specification to model crash counts at different levels of severity simultaneously. However, this MVP specification allows only for a common added Poisson error term, resulting in equal positive correlations across crash counts and a very specific data pattern where all counts are equally shifted. In addition, this MVP specification does not allow for overdispersion.Using a multivariate Poisson-lognormal (MVPLN) specification, as well as Bayesian estimation techniques, this work models correlated traffic crash counts simultaneously at different levels of severity. The MVPLN specification allows for a more general correlation structure as well as overdispersion. This approach addresses some questions that are difficult to answer by estimating them separately. With recent advancements in crash modeling and Bayesian statistics, the parameter estimation is done within the Bayesian paradigm, using a Gibbs Sampler and the Metropolis-Hastings (M-H) algorithms. The data come from Washington State rural two-lane highways in 2002, using the Highway Safety Information System (HSIS) database. The results lend themselves to recommendations for highway safety treatments and general design policies.This paper is organized as follows: Related research studies are reviewed first. The model’s formulation and data sets are then discussed, followed by estimation results, concluding remarks, and future research directions.LITERATURE REVIEWModels of crash (or injury) counts can be classified into two major streams: (1) the conventional univariate Poisson and related models, such as the negative binomial (NB); (2) potentially more realistic specifications, like the MVP and MVPLN. The first stream has provided a means forinvestigating associations between crash frequency and many crucial factors, such as traffic volume, access density, posted speed limit and number of lanes (see, e.g., Miaou et al., 1993; Miaou and Lum, 1993; Miaou 1994, 1996 and 2001;Fridstrøm et al., 1995; Johansson, 1996; Vogt and Bared, 1998; Vogt, 1999; Balkin and Ord, 2001; Zegeer et al., 2002; Pernia, 2004). There also has been considerable interest in models that allow for excessive zeros, such as zero-inflated Poisson (ZIP) and zero-inflated negative binomial (ZINB) regression approaches (see, e.g., Lord et al. 2005; Shankar et al., 1997; Garber and Wu, 2001; Lee and Mannering, 2002; Kumara and Chin, 2003; Miaou and Lord, 2003; Rodriguez et al. 2003; Shankar et al. 2003; Noland and Quddus, 2004; Qin et al., 2004).Due to computational and statistical advances, panel data (in which a cross-section of segments, intersections, etc. is observed over time) have become more amenable to rigorous analysis. In traffic crash analyses, there are a great many unobserved explanatory variables that affect frequencies and severities. Panel data can be used to deal with heterogeneity among individuals. To address the heterogeneity, many recent studies have used (univariate) panel count data models, such as random-effect negative binomial (RENB) and fixed-effect negative binomial (FENB) regression models (Kweon and Kockelman, 2000; Karlaftis and Rarko, 1998; Shankar et al., 1998; Chin and Quddus, 2003).Such past research endeavors, however, have neglected the role of unobserved factors across different types of counts (e.g., the number of fatalities and the number of debilitating injuries). Recognizing the need for such considerations, Ladron de Guevara and Washington (2004) investigated the simultaneity of fatality and injury crash outcomes. Bijleveld (2005) also examined the correlation structure between crash and injury counts. As expected, he found significant correlations. However, he did not control for any covariates. Multivariate models (of count data), like Ma and Kockelman’s MVP (2006) or Li et al’s MVZIP (1999), can help correct for this.This work models correlated traffic crash counts simultaneously at different levels of severity using a MVPLN specification, allowing for a very general correlation structure as well as overdispersion. Such specifications are challenging to estimate. Karlis (2003) developed an EM algorithm for an MVP model, and Ma and Kockelman (2006) used Gibbs sampling, as well as Metropolis-Hastings algorithms, within an MCMC simulation framework.In recent years, Bayesian methods have found several applications in traffic crash analysis. Christiansen et al. (1992) and MacNab (2003) developed hierarchical Poisson models for crash counts and surveillance data. Miaou and Song (2005) developed a Bayesian multivariate spatial generalized linear mixed model (GLMM) to rank sites for safety improvements using Texas’ county-level crash data. And Liu et al. (2005) used a hierarchical Bayesian framework to estimate ZIP regression models and develop safety performance functions (SPFs) for two-lane highways. Pawlovich et al. (2006) employed a Bayesian approach to assess impacts of road design measures on crash frequencies and rates. And Washington and Oh (2006) developed a Bayesian methodology for incorporating expert judgment in ranking countermeasure effectiveness under uncertainty.Bayesian estimation methods generate a multivariate posterior distribution across all parameters of interest, as opposed to the traditional maximum likelihood estimation approach, which emphasizes and offers only the modal values of parameters (and relies on asymptotic properties to ascertain covariance).This paper introduces an MVPLN approach to simultaneously model injury counts by severity. A Gibbs sampler and a Metropolis-Hastings (M-H) algorithm are used to estimate theparameters of interest using Bayesian methods. For comparison purposes, a series of independent (univariate) Poisson models for injury counts also are estimated.MODEL STRUCTURE AND ESTIMATION Mathematical FormulationUnivariate Poisson regression models cannot account for correlations for different levels of severity; instead, one needs multivariate count data models. For instance, in practice, omitted variables (such as driveway density and sight distances) may simultaneously affect all crash counts at different levels of severity for a particular roadway segment, thus introducing correlation. Several such models have been developed (see, e.g., Karlis, 2003; Arbous andKerrich, 1951; King, 1989; Winkelmann, 2000; Kockelman, 2001; Tsionas, 2001). However,these specifications support only a common unobserved error term among counts.Here, the focus is placed on the correlated counts within individual roadway segments. Crash counts across roadway segments are assumed to be independent (e.g., there is no spatial correlation 1). The variance-covariance matrix of y can be expressed as below:()121nS n Var ×Ω⎡⎤⎢⎥Ω⎢⎥=⎢⎥⎢⎥Ω⎣⎦0000y 00L L M L (1) where 111212122212i i i S i i i S i i i i S S SS ωωωωωωωωω⎡⎤⎢⎥⎢⎥Ω=⎢⎥⎢⎥⎢⎥⎣⎦L L M L for 1,2,,i n =K (2) Let ()12,,i i i iS εεε′=εrK denote the severity-level-specific unobserved heterogeneity forroadway segment i [1,2,,i n =K , where n is the number of roadway segments], s denote theseverity level [1,2,,s S =K , where S is the number of severity levels], and ()12,,,n ′′′′=εεεεr r r K denote the severity-level-specific unobserved heterogeneity across roadway segments.Assume that crash counts is y , conditioned on i εr, the severity-level-specific explanatoryvariables isx ′ and their coefficients of s β, are independent Poisson distributed. (),,~is i s is is y x Poisson βλεr(3) where ()exp is iss is x λβε′=+. The unobserved heterogeneity terms i εrare assumed to be uncorrelated with the control (i.e., explanatory) variables.Let ()i i diag Λ=λr . This is an S×S matrix, where ()12,,,i i i iS λλλ=λrK and is is is u λξ=.Let ()exp i i =u εr r , where ()12,,,i i i iS u u u ′=u rK . Conditioning on β and Σ, the mean andcovariance matrix of the marginal distribution of i y rcan be obtained as follows:1In reality, spatial correlation may exist and be significant. For example, zoning and design policies createcorrelation across sites within a city; access management and other policies may simply shift the location of certain crash types. The former leads to positive correlation, the latter to negative.()()()()(),,,,,i i i ii i i i i i i i E x E E x E diag ββΣ=Σ==u u y u y y u ξu λr r r r r r r r rr (4) ()()()()(),,,,,,,,ii i i i i i i i i i i i i Var x E Var x Var E x βββΣ=Σ+Σu u y u y u y y u y u r r r r r r r r r r r(Greene, 2003) ()()()()()iii ii i E diag diag Var diag =+uu ξu ξu r r r r rr()exp i i i ′=Λ+ΛΣ−Λ⎡⎤⎣⎦11(5)where ()12,,,S ββββ′=K , ()12,,,i i i iS x x x x ′=K and ()12,,,i i i iS ξξξ′=ξrK . The length of β is12S k k k k =+++L , where s k is the length of s β.From Equation (5), the variance-covariance terms, across counts, can be obtained as follows:()(),0exp 1is il is sl il Cov y y λσλ=+−⎡⎤⎣⎦()()()exp 2exp 1exp 2is ss sl il ll ξσσξσ=−⎡⎤⎣⎦, for s l ≠ (6) ()(),exp 1is is is is ss is Var y y λλσλ=+−⎡⎤⎣⎦ The correlation between crash counts within segments is obtained as follows:()exp 1,exp 1is il Corr y y ξσξσ−⎡⎤=−=where s l ≠.This correlation is unrestricted and can be positive or negative, depending on the sign of sl σ, the (),s l element of Σ. Moreover, this specification implies overdispersion 2, since 0ss σ> for 1,2,,s S =K .Based on Equation (3), the likelihood of observation i can be represented by the following equation:()()1,,Si i i Poisson is is s P x f y βλ==∏y εr r(8)where ()exp is is is iss is u x λξβε′==+. Unfortunately, the marginal distribution of the crash counts i y rcannot be obtained by direct computation. Obtaining the marginal distribution requires the evaluation of an S -variateintegral of the Poisson distribution with respect to the distribution of i εr,()()1,,,,Si i Poisson is is s is S i i s P f y x d βεφ=Σ=⎡Σ⎤⎣⎦∏∫y λε0εr r r r(9) where S φ is the S -variate normal distribution. This S -dimensional integral cannot be algebraically implemented in closed form for arbitrary Σ. Estimating Parameters via MCMC2Overdispersion refers to the situation in which variance is greater than mean.In order to illuminate crash rate relationships, the MVPLN model’s unknown parameters need to be estimated. Chib et al. (1998) showed how to estimate a posterior distribution of unknown parameters for their models of panel count data 3, and Plassmann and Tideman (2001) developed a Gibbs sampler to estimate parameters in a univariate Poisson-lognormal model.Based on Press (1982) and Gelman et al. (2004), the Wishart distribution is commonly used as a conjugate prior for the inverse of variance-covariance parameters. According to Press (1982), the Wishart and normal distributions are very helpful for multivariate analysis. Suppose that the parameters (),βΣ independently have the prior distributions:()00~,k V ββφβ, ()1~,W f V ν−ΣΣΣ(10)where ()001020,,,S ββββ′=K , 01020********S V V V V ββββ⎡⎤⎢⎥⎢⎥=⎢⎥⎢⎥⎢⎥⎣⎦L M M O ML , (),W f ⋅⋅is the Wishart distribution with νΣ degrees of freedom and scale matrix V Σ, and 00,,V ββνΣ and V Σ are knownhyperparameters. The prior distribution for s β can written as ()00~,s s s k s V ββφβ for1,2,,s S =K .According to Bayes’ theorem (posterior prior likelihood ∝×), the posterior kernel can be written as follows:()()()()()0011,,,,,,,n Sk W Poisson is is s is S i i i s y X V f V f y x d βπβφβνβεφΣΣ==Σ∝Σ∏∏∫ε0εr rUsing data augmentation 4, the latent effects ε can be thought of as (“nuisance”)parameters to be estimated. Therefore, the joint posterior density of Σ, ε, and β is written as follows:()()()()()0011,,,,,,,,n Sk W Poisson is is s is S i i s y X V f V f y x βπβφβνβεφΣΣ==Σ∝Σ∏∏εε0r(11)Thanks to this technique, the parameters can be “blocked” as Σ, ε, and β, after which the joint posterior is simulated by iteratively sampling from the following three conditionaldistributions: 1p π−⎡⎤Σ⎣⎦ε, ,,,p y X πβΣ⎡⎤⎣⎦ε, and ,,,p y X πβΣ⎡⎤⎣⎦ε, where ()pπ⋅⋅ denotes the posterior conditional density function.The draws are sampled sequentially using the most recent values of the conditioning variables at each step.Gibbs Sampler with Embedded M-H AlgorithmsAfter manipulating the posterior equation (11), the posterior of 1−Σ conditional on data and other parameters can be written as3Estimation of β in the panel count data models is similar to estimation of s β in the MVPLN model. 4Data augmentation views unobserved or latent variables as unknown parameters (to be estimated), in order to establish iterative algorithms.()()()111,,nW S i i f V πνφ−−ΣΣ=Σ∝ΣΣ∏εε0r(12)where W f denotes the Wishart density with νΣ degrees of freedom and scale matrix V Σ.After manipulating Equation (12), this density can be written as a Wishart kernel withdegrees of freedom n νΣ+ and scale matrix ()111n i i i V −−Σ=⎡⎤′+⎢⎥⎣⎦∑εεr r . In other words,()1111~,n W i i i f n V εν−−−ΣΣ=⎛⎞⎡⎤′Σ++⎜⎟⎢⎥⎜⎟⎣⎦⎝⎠∑εεr r(13) This is a known parametric distribution and thus can be sampled using a Gibbs sampler.In order to sample ε from its posterior density ()()1,,,,ni i i y πβπβ=Σ=Σ∏εεy r r, considersimply the i th posterior kernel density of i εr, thanks to an assumption of no spatial correlation across segments.()()()()1,,,exp ,,,is Sy p i i i i S i is is i i i i s x C C x πβφλλπβ=Σ=Σ−=Σ∏εy εεy r r r r r, (14)where ()exp is iss is x λβε′=+. Draws from this conditional density can be obtained by developing an M-H algorithm, as described below.Following Chib et al. (1998), the multivariate t distribution is used as the proposaldensity. Let ()ˆln ,,,arg max ip i i i i x πβ⎡⎤=Σ⎣⎦εεεy rr r r and ()1i i V H εε−=− be the inverse of the Hessian of ()ln ,,,p i i i x πβΣεy r r at the mode ˆi εr . The mode ˆiεr and variance-covariance matrix i V ε can be obtained using the Newton-Raphson algorithm with the gradient vector()1exp i ii x εβ−=−Σ+−+⎡⎤⎣⎦i i g εy εr r r r and Hessian matrix ()1exp i i i H diag x εβ−=−Σ−+⎡⎤⎣⎦εr , where 12000000i i i iS x x x x ′⎡⎤⎢⎥′⎢⎥=⎢⎥⎢⎥′⎣⎦K K MM O M K and 12S ββββ⎡⎤⎢⎥⎢⎥=⎢⎥⎢⎥⎣⎦M . Then, the proposal density is given by ()ˆ,,i T i i f V εενεεr , a multivariate-t distribution with ενdegrees of freedom (where εν can be used as a tuning parameter in the M-H algorithms to make sure that the acceptance rate 5 lies between 20 and 45percent 6). A proposal value *i εr is drawn from ()ˆ,,i T i i f V εενεεr , and the chain moves to *i εrfrom the current point i εrwith probability5The acceptance rate is the fraction of proposed samples that is accepted. If the proposal steps are too small, the chain will move around the space slowly and thus converge slowly on the true posterior density. If the proposal steps are too large, the acceptance rate will be very low because the proposals are likely to land in regions of much lower probability density. 6Chib and Greenberg (1995) believe that an acceptance rate of 23 percent is desirable as the number of dimensions approaches infinity, and an acceptance rate of 45 percent is desirable for a one-dimensional random-walk chain.()()()()()***ˆ,,,,,,,,,min ,1ˆ,,,,,i i p i i i T i i i i i i p i i i T i i x f V x x f V εεεεπβναβπβν⎧⎫Σ⎪⎪Σ=⎨⎬Σ⎪⎪⎩⎭εy εεεεy εy εεr r r r r r r r r (15) If ()*,,,,i i i i x αβΣεεy r r ris greater than U (where U is uniformly distributed on []0,1), theproposal value *i εr is accepted; otherwise, the current value i εris kept as the new draw for the Markov chain.The samples of s β, conditional on ε, y , X , Σ, and, s β− (where[]1211,,,,,,s s s S ββββββ−−+=K K ) are drawn from the posterior distribution, which isproportional to()()()1,,,,,,,,,,Spppsss s jj j j j sy X y X y X πβπβεπβε⋅⋅⋅⋅=≠Σ=ΣΣ∏ε(),,,p s s s s C y X πβε−⋅⋅=Σ ()()()001,exp exp exp iss s ny k s s is s is is s is i V x x βφβββεβε=′′∝−++⎡⎤⎡⎤⎣⎦⎣⎦∏()()00,,ssk ss ss s V p yβφβββε⋅⋅∝(16)where ()1,,,,Sps j j j j j sC y X πβε−⋅⋅=≠=Σ∏ (which does not involve s β and thus serves as a constant),and ()()()1,,exp exp exp isny s s s is is is is i p y X x x βεβεβε⋅⋅=′′=−++⎡⎤⎡⎤⎣⎦⎣⎦∏ is the probability massfunction of ()12,,,s s s ns y y y y ⋅=K given s β, X and ()12,,,s s s ns εεεε⋅=K . Note that the s β’s ({}1,2,,s S ∈K ) are assumed to be independent of one another. A scheme similar to the one sampling i εris developed here to sample s β. The multivariate -t once again serves as the proposal density. Let()ˆln ,,,,arg max sp s s s s s y X ββπβεβ⋅⋅−⎡⎤=Σ⎣⎦ be the mode, and ()1s s V H ββ−=− the inverse of theHessian of ()ln ,,,,p s s y X πββ−Σε at the mode ˆs β. The mode ˆsβ and variance-covariance matrix s V β can be obtained using the Newton-Raphson algorithm with the gradientvector ()010s s s s V ββββ−=−−+g r ()1exp nis is s is is i y x x βε=′−+⎡⎤⎣⎦∑ and Hessian matrix 01s s H V ββ−=−−()1exp nis s is is isi x x x β=′′+⎡⎤⎣⎦∑εr . Then, the proposal density is given by ()ˆ,,s T s s f V ββββν, a multivariate-t distribution with βνdegrees of freedom (where βν can be used as a tuning parameter in the M-H algorithms to make sure that the acceptance rate lies between 20 and 45percent). A proposal value *sβ is drawn from ()ˆ,,sTs sf V ββββν, and the chain moves to *sβ from the current point s β with probability()()()()()***ˆ,,,,,,,,,,,min ,1ˆ,,,,,,s s p s s T s s s s s p s s T s s y X f V y X y X f V ββββπββββναβββπββββν−−−⎧⎫Σ⎪⎪Σ⎨⎬Σ⎪⎪⎩⎭εεε (17)If ()*,,,,,s s s y X αβββ−Σε is greater than U (where U is uniformly distributed on []0,1), the proposal value *s βis accepted; otherwise, the current value s β is kept as the new draws for the Markov chain.DATA DESCRIPTIONThe crash data sets used here were collected from Washington State through the Highway Safety Information System (HSIS). In order to examine traffic crashes patterns on rural two-laneroadways, this research considers crashes in the Puget Sound region. A random sample of 60% of all rural two-lane road segments in this region was used for model estimation. A total of 7,773 rural two-lane highway segments (with an average segment length of 0.0655 miles 7 and a total of 510 miles) are available for analysis. This sample contains 16 fatal crashes, 50 disabling-injury crashes, 180 non-disabling-injury crashes, 175 possible-injury crashes and 532 property-damage-only (PDO). Table 1 reports summary statistics for the dependent and independent variables employed in the analysis. A variety of readily available variables are controlled for in the model, including design features, traffic intensity, location information, and roadway functional classification.MODEL ESTIMATION AND RESULTS Model EstimationThe MVPLN regression model was estimated using a Bayesian approach. The starting values for β came from distinct univariate Poisson models (using the method of maximum likelihoodestimation (MLE)). The starting values for Σ are 51000001000001000001000001I ⎡⎤⎢⎥⎢⎥=⎢⎥⎢⎥⎢⎥⎢⎥⎣⎦. The MLE estimatesfor the five univariate Poisson models can be found in Ma (2006). A Gibbs sampler and two M-H algorithms were coded in the R language (an open-source statistical computing environment described at /). The prior distributions for the estimation are defined by the hyperparameters νΣ=10, 15V I −Σ=, ()00,0,,0sβ′=K , and 014100sV I β=×. The Gibbs samplerwas implemented to obtain M = 8,000 draws for Σ. The two M-H algorithms were implemented to obtain M = 8,000 draws for each of the 70145=× s 'β and each of the 865,385773,7=× ε’s, respectively. The initial 1,000 draws were discarded as “burn-ins.” An adequate burn-in period eliminates the influence of the starting values. To help ensure chain convergence, the7It is quite possible that very short segments do not faithfully represent the actual location of crashes, since police officers may locate crashes only to the nearest tenth of a mile. Cluster analysis, wherein similar segments/conditions are merged (providing higher crash counts) can address some of this bias in reporting. Ma and Kockelman (2006) conducted such an analysis with Washington State data.Gibbs sampler and the two M-H algorithms were implemented using two sets of starting values8 and both converged at the same posterior distribution of parameters. Estimation results are presented in Tables 2 through 6.Based on the posterior density of Σ, positive correlations between crash counts at different levels of severity within the segment do appear to exist, in a statistically significant way. The univariate models are a special case of the MVPLN, with off-diagonal elements of Σ equalto zero. Given the MVPLN predictions’ added flexibility to represent such pattern, it is expected that they offer somewhat better predictions.Interpretation of ResultsThe following discussion of results emphasizes disabling and fatal injuries (Tables 5 and 6), since these arguably are of greatest concern to agencies and policymakers. Moreover, the data on such outcomes are more likely to be reported and more reliably recorded than that for other crash outcomes (Blincoe et al., 2000). Tables 2 through 4 provide crash count model estimates for the other three severity levels. The signs of most coefficients are consistent throughout the models, indicating robust directions of effect for most control variables.Parameter estimates shown in Tables 2 through 6 suggest that roadway design plays an important role in predicting crash counts. For example, holding all other factors fixed, more severe injury crashes are expected on sharper horizontal curves, while wider shoulders tend to reduce rates of less severe crashes (perhaps by offering added maneuverability space for crash avoidance). Based on an average road segment’s attributes and the MVPLN model’s average parameter estimates, Table 7 provides estimates of percentage changes in crash rates as a function of various design details. For example, a 5-feet increase in (average) right shoulder width (from 2 to 7 feet) is predicted to result in 7.04% fewer crashes (total) per 100 million VMT.A 26.6% higher average annual daily traffic level (rising from 3757 to 4757 vehicles) is predicted to increase total crash count by 16.4% — while reducing the total crash rate by 5.51%. In this way, the MVPLN model results offer statistically (and practically) significant insights into crash counts’ dependence on roadway design.The magnitudes of the parameter estimates for the MVPLN specification are not directly comparable to those of univariate Poisson models (shown in Ma, 2006) or those of univariate negative binomial (UVNB) models (also shown in Ma, 2006). The reason for this is that the MVPLN model accounts for correlations across crash counts (by severity), and is therefore somewhat different from the univariate cases. However, a comparison of parameter signs shows that sharper curves are associated with more fatal crashes in all three models (MVPLN, UVP, and UVNB). The rest of control variables are not statistically significant in both the UVP and UVNB models; however, some of these control variables remain showing a statistically significant effect on fatal crash occurrence in the MVPLN model. For example, speed limit isnot statistically significant in the univariate models but is expected to increase fatal crash rates in the MVPLN model. Vertical curve length and segment grade show the same pattern of effectson disabling-injury crashes in all three models. For example, long vertical curves are predictedto reduce disabling-injury crashes, but steeper segments are associated more disabling-injury crashes. The coefficient signs for remaining control variables are not in agreement across all three models, indicating that specification choice is important to a proper understanding of crash count relationships.8 Zeros were used as the starting values for β in the second chain.。