UNSUPERVISED LEARNING OF GAMMA MIXTURE MODELS USING MINIMUM MESSAGE LENGTH

格式：pdf
大小：145.99 KB
文档页数：6

下载文档原格式

无取样受限波尔兹曼机的图像重建研究

可见单元固定为已知数据的值时，模型平衡状态时可见单元与隐单元值的乘积。而
＜ｖｉｈｊ＞ｍｏｄｅｌ代表模型的期望，即对模型没有
２００２年Ｈｉｎｔｏｎ提出ＲＢＭ的快速学习算法
对比散度（ｃｏｎｓｔｒａｓｔｉｖｅＤｉｖｅｒｇｅｎｃｅ，ＣＤ），２００６年Ｈｉｎｔｏｎ等提出了一种深度信念网络【５］（ＤｅｅｐＢｅｌｉｅｆＮｅｔｓ，ＤＢＮ）。实践表明，ＲＢＭ是一种有效的特征提取方法，已成功应用于分类、回归、图像特征提取、协同过滤等
图像与多媒体技术・Ｉｍａｇｅ＆ＭｕｌｔｉｍｅｄｉａＴｅｃｈｎｏｌｏｇｙ
无取样受限波尔兹曼机的图像重建研究
文／姜倩盼
本丈分析了受限玻尔兹曼机用于重建的不足，提出了一种无取样受限玻尔兹曼机的图像重建新方法，应用ＭＮＩＳＴ数据集证明
对ＲＢＭ原理分析表明，ＲＢＭ的训练过程可以构成一种效果很好的重建过程。基于ＲＢＭ的强大功能，预期基于ＲＢＭ的重建过
ｐ（ｈｊ＝１ＩＶ）＝（＋ ∑ ） …
其中（ｘ）为ｓｉｇｍｏｉｄ函数１／（１＋ｅｘｐ（一ｘ））。
（ＲｅｓｔｒｉｃｔｅｄＢｏｌｔｚｍａｎｎＭａｃｈｉｎｅｓ，ＲＢＭ）。ＬｅＲｏｕｘ和Ｂｅｎｇｉｏ从理论上证明，只要隐单元的数目足够多，ＲＢＭ能够拟合任意离散分布。

基于非凸熵最小化与高斯混合模型聚类的电容层析成像图像重建

基于非凸熵最小化与高斯混合模型聚类的电容层析成像图像重
建
张立峰;卢栋臣;刘卫亮
【期刊名称】《计量学报》
【年(卷),期】2024(45)2
【摘要】基于压缩感知原理提出了一种构建非凸熵(NE)函数作为正则化项的方法,在有效缓解电容层析成像(ECT)病态性逆问题的同时可保证解的稀疏性,并采用快速迭代阈值收缩算法(FISTA)求解以加快收敛速度。

对所得解通过高斯混合模型(GMM)进行阈值寻优,采用期望最大化算法(E-M)更新模型参数,从而构建NE-GMM算法。

仿真及实验结果均表明:与LBP、Landweber、迭代硬阈值(IHT)、ADMM-L1及NE算法进行了对比,该算法所得重建图像质量最优,对中心分布及多物体分布的保真度进一步提高,仿真实验重建图像的平均相对误差和相关系数分别为0.4611及0.8827,优于其他5种算法。

【总页数】7页(P207-213)
【作者】张立峰;卢栋臣;刘卫亮
【作者单位】华北电力大学自动化系
【正文语种】中文
【中图分类】TB937
【相关文献】
1.基于双循环改进landweber电容层析成像图像重建
2.基于卷积神经网络及有限元仿真的电容层析成像图像重建
3.基于卷积稀疏编码的电容层析成像图像重建
4.基于改进ResNet-18网络的电容层析成像图像重建
5.基于彩色-深度传感器的电容层析成像图像重建方法
因版权原因，仅展示原文概要，查看原文内容请购买。

自适应光学图像非对称图像迭代盲复原算法

ＲＩＩＤ算法吸收了ＩＤ计算简单的优点，Ｂ一Ｂ兼顾了ＲＩ保持复原图像非负及能量守恒的特点，也没有解但决好点扩散函数（Ｓ）ＰＦ约束的问题。针对ＲＩＩＤ算法的不足［ｊ综合考虑自适应光学图像复原的特点，Ｂ一６，本文提出以下改进措施：１动态更新ＰＦ空间支持域，（）Ｓ以加快算法收敛速度；２在ＰＦ频率域引入带宽有限约（）Ｓ束来提高算法的可靠性；３自动计算迭代收敛非对称因子，高算法的自适应性。改进后的新算法，为（）提称
自适应光学图像非对称图像迭代盲复原算法
陈波，程承旗，郭仕德，濮国梁，耿则勋。
（．北京大学遥感与地理信息系统研究所，北京１０７；２１０８１．信息工程大学测绘学院，郑州４０５）５０２
敛性的不足，９４年，ｕｒｙ１９Ｔｓｍｕａａ将ＲＬ算法引入了ＩＤ算法中，出了基于ＲＬ迭代的ＩＤ算法［（Ｂ］提Ｂ５ＲＩ］ＩＤ。在Ｒ —ＢＢ）ＬＩＤ算法盲解卷积中，图像复原性能受ＰＦ约束的准确性影响较大。为了提高自适应光学图像Ｓ
复原效果，文在非对称ＩＤ算法的基础上，出了一种新的多重约束非对称图像迭代盲（ＩＤ）原算法。本Ｂ提ＭＬＢ复

自适应不规则纹理的胶囊内镜图像无损压缩

自适应不规则纹理的胶囊内镜图像无损压缩黄胜;向思皓;胡峰;马婷;卢冰【期刊名称】《计算机工程》【年(卷),期】2022(48)10【摘要】为缓解无线胶囊内镜图像在电子设备以及服务器中的存储压力,提出一种自适应不规则纹理的无损压缩算法。

在图像块内,利用扩展角度预测模式寻找与待预测像素最邻近的5个参考像素,并给其中3个参考像素分配不同权重,同时根据邻近像素值梯度变化规律,扩大待预测像素在不规则纹理方向上的预测值选择范围,基于图像块的最小信息熵选择最优的预测值,将真实值与预测值作差获得预测残差,以适应不规则纹理图像。

利用跨分量预测模式选择最优的预测系数,构建符合图像块内预测残差分布规律的线性关系,从而消除当前编码像素中3个分量的冗余数据。

结合Deflate算法对经多角度预测模式与跨分量预测模式预测后的剩余残差进行熵编码。

实验结果表明,该算法在Kvasir-Capsule数据集上的无损压缩比平均为5.81,相比WebP、SAP、MDIP等算法,具有较优的压缩性能,能够有效提高图像的冗余消除率,其中相较WebP算法的冗余消除率提高约1.9%。

【总页数】8页(P21-27)【作者】黄胜;向思皓;胡峰;马婷;卢冰【作者单位】重庆邮电大学通信与信息工程学院;江苏势通生物科技有限公司;重庆邮电大学光通信与网络重点实验室【正文语种】中文【中图分类】TN919.8【相关文献】1.胰腺内镜超声图像纹理特征提取与分类研究2.基于颜色-纹理自相关算法的内镜图像检索3.基于颜色和纹理特征的胶囊内镜图像分类4.基于模糊纹理谱的胶囊内窥图像识别5.基于主题模型的胶囊内镜图像序列筛查因版权原因，仅展示原文概要，查看原文内容请购买。

一种用于图像重构的新型贝叶斯压缩感知技术

ｄｐｎｅｔｏｅｓｇａ．Ｔｏｅｈｎｅａｃｒｃｆｔｅｃｎｅｔｏａｅｅｄｎｎｔｉｎ１ｈｎａｃｃｕａｙｏｈｏｖｎｉｎＢＣＳ，ａｎｗｉｄｏｄｆｅＢＣＳｌｅｋｎｆｍｏｉｄｉ
ｅｐｒｎｓｓｏａｅＰＢＣｕｐｒｒｓｔｅｃｎｅｔｎｌＢＣｎｔｅｅｏｓｕｔｎｍｅｒｓｆｒｘｅｍｅｔｈｗｔｔｔＳＳｏｔｅｏｈｏｖｎｏａＳａｄｏｒｒｃｎｔｃｉｔｃｏｉｈｈｆｍｉｈｒｏｉ
应用传统的ＲＭ进行信号重构往往精度非常差。为了提高精度，文中提出了一种新的ＢＳＶＣ技术：粒子群贝叶斯压缩感知（ＳＣ）实验表明这种新的ＢＳ技术在重构精度上大大超越了传ＰＢＳ。Ｃ
统的ＢＳ术。Ｃ技
关键词：贝叶斯压缩感知（Ｃ）ＢＳ；相关向量机（Ｖ；粒子群优化；局部最优困境；向量选ＲＭ）
ｃｍｒｓｉｅｓｇ（Ｃ）ｗｓｒｐｓｄｉｔｅｒｅｔｅｒ．ＩｃｎｉｅｓｈｅｏｓｕｔｎｐｏｅｓａｏｐｅｓｅｓｎｉＢＳａｏｏｅｅｎａｓｔｏｓｒｔｅｒｃｎｔｃｏｒｃｓｓｖｎｐｎｈｃｙｄｒｉ
ｈｙｓａｄｌｔｒｎｔｒｉｏａｒｓａｓｔｍｏｅ．ＩＳ，ｔＯｃｌｅｅａｃｔｅＢａｅｉｎｍｏｅａｅａｅｔｄｔｎｌ１ｏｌｐｒｉｄ１ｎＢＣｈｅＳａｌｄｔｅｒｌｖｎｅｒｈｔｈｈａｉＺｎｎｙｅｈ

有监督学习(supervised-learning)和无监督学习(unsupervised-learning)

有监督学习(supervised learning)和无监督学习(unsupervised learning)机器学习的常用方法，主要分为有监督学习(supervised learning)和无监督学习(unsupervised learning)。

监督学习，就是人们常说的分类，通过已有的训练样本（即已知数据以及其对应的输出）去训练得到一个最优模型（这个模型属于某个函数的集合，最优则表示在某个评价准则下是最佳的），再利用这个模型将所有的输入映射为相应的输出，对输出进行简单的判断从而实现分类的目的，也就具有了对未知数据进行分类的能力。

在人对事物的认识中，我们从孩子开始就被大人们教授这是鸟啊、那是猪啊、那是房子啊，等等。

我们所见到的景物就是输入数据，而大人们对这些景物的判断结果（是房子还是鸟啊）就是相应的输出。

当我们见识多了以后，脑子里就慢慢地得到了一些泛化的模型，这就是训练得到的那个（或者那些）函数，从而不需要大人在旁边指点的时候，我们也能分辨的出来哪些是房子，哪些是鸟。

监督学习里典型的例子就是KNN、SVM。

无监督学习（也有人叫非监督学习，反正都差不多）则是另一种研究的比较多的学习方法，它与监督学习的不同之处，在于我们事先没有任何训练样本，而需要直接对数据进行建模。

这听起来似乎有点不可思议，但是在我们自身认识世界的过程中很多处都用到了无监督学习。

比如我们去参观一个画展，我们完全对艺术一无所知，但是欣赏完多幅作品之后，我们也能把它们分成不同的派别（比如哪些更朦胧一点，哪些更写实一些，即使我们不知道什么叫做朦胧派，什么叫做写实派，但是至少我们能把他们分为两个类）。

无监督学习里典型的例子就是聚类了。

聚类的目的在于把相似的东西聚在一起，而我们并不关心这一类是什么。

因此，一个聚类算法通常只需要知道如何计算相似度就可以开始工作了。

那么，什么时候应该采用监督学习，什么时候应该采用非监督学习呢？我也是从一次面试的过程中被问到这个问题以后才开始认真地考虑答案。

深度自编码与自更新稀疏组合的异常事件检测算法

DOI : 10.11992/tis.202007003深度自编码与自更新稀疏组合的异常事件检测算法王倩倩，苗夺谦，张远健（同济大学嵌入式系统与服务计算教育部重点实验室，上海 201804）摘要：基于深度学习的异常检测算法输入通常为视频帧或光流图像，检测精度和速度较低。

针对上述问题，提出了一种以运动前景块为中心的卷积自动编码器和自更新稀疏组合学习(convolutional auto-encoders and self-updating sparse combination learning, CASSC)算法。

首先，采用自适应混合高斯模型(gaussian mixture model,GMM)提取视频前景，并以滑动窗口的方式根据前景像素点占比过滤噪声；其次，构建3个卷积自动编码器提取运动前景块的时空特征；最后，使用自更新稀疏组合学习对特征进行重构，依据重构误差进行异常判断。

实验结果表明，与现有算法相比，该方法不仅有效地提高了异常事件检测的准确性，且可以满足实时检测需求。

关键词：深度学习；稀疏组合；自动编码器；自更新；异常事件检测；卷积神经网络；无监督学习；稀疏学习中图分类号：TP391 文献标志码：A 文章编号：1673−4785(2020)06−1197−07中文引用格式：王倩倩, 苗夺谦, 张远健. 深度自编码与自更新稀疏组合的异常事件检测算法[J]. 智能系统学报, 2020, 15(6):1197–1203.英文引用格式：WANG Qianqian, MIAO Duoqian, ZHANG Yuanjian. Abnormal event detection method based on deep auto-encoder and self-updating sparse combination[J]. CAAI transactions on intelligent systems, 2020, 15(6): 1197–1203.Abnormal event detection method based on deep auto-encoder andself-updating sparse combinationWANG Qianqian ，MIAO Duoqian ，ZHANG Yuanjian(Key Laboratory of Embedded System and Service Computing, Tongji University, Shanghai 201804, China)Abstract : In the construction of a deep learning model for abnormal event detection, frames or optical flow are con-sidered but the resulting accuracy and speed are not satisfactory. To address these problems, we present an algorithm based on convolutional auto-encoders and self-updating sparse combination learning, which is centered on the move-ment of foreground blocks. First, we use an adaptive Gaussian mixture model to extract the foreground. Using a sliding window, the foreground blocks that are moving, are filtered based on the number of foreground pixels. Three convolu-tional auto-encoders are then constructed to extract the temporal and spatial features of the moving foreground stly, self-updating sparse combination learning is applied to reconstruct the features and identify abnormal events based on the reconstruction error. The experimental results show that compared with existing algorithms, the proposed method improves the accuracy of abnormality detection and enables real-time detection.Keywords : deep learning; sparse combination; auto-encoder; self-updating; abnormal event detection; convolution neur-al network; unsupervised learning; sparse representation异常事件检测是指通过图像处理、模式识别和计算机视觉等技术，分析视频中的有效信息，判断异常事件。

基于压缩感知理论的高能闪光照相密度反演方法

基于压缩感知理论的高能闪光照相密度反演方法
芦存博;盛云霄
【期刊名称】《电子科技》
【年(卷),期】2023(36)1
【摘要】高能闪光照相中需要研究少数投影数据条件下的非轴对称客体的的密度反演问题。

现有利用压缩感知思想的全变差TV类算法虽然考虑了图像的局部相似性,但没有考虑图像的非局部相似性。

针对上述问题,文中提出了一种基于组稀疏正则化的全变分重建技术TV-GSR。

该技术将组稀疏模型集成于TV框架之下,同时考虑了客体图像的局部相似性和非局部自相似性,充分利用了图像的先验稀疏信息,并利用客体的上、下、左、右4点对称性来降低图像重建的规模,重构精度有所增加,重建速度也更快。

仿真实验表明,文中提出的TV-GSR算法提升了图像在无噪声和有噪声情况下的重建精度,对于高能闪光图像和纹理细节丰富的CT图像都有较好的效果,具有普适性。

【总页数】7页(P1-6)
【作者】芦存博;盛云霄
【作者单位】北京计算机技术及应用研究所;陆军装甲兵学院演训中心
【正文语种】中文
【中图分类】TP751;TN99
【相关文献】
1.蒙特卡罗方法在高能闪光照相图像边界检测中的应用
2.基于全变分正则化的高能闪光照相消模糊图像重建算法
3.高能闪光照相中Sobel算子的边缘检测方法
4.高能闪光照相X射线能谱的理论研究
5.高能闪光照相密度测量的LEPM方法
因版权原因，仅展示原文概要，查看原文内容请购买。

基于无监督学习的图像超分辨率重建算法研究

基于无监督学习的图像超分辨率重建算法研究图像超分辨率重建是计算机视觉领域的重要研究方向之一。

随着高清晰度显示设备的普及和图像处理技术的不断发展，对于低分辨率图像的超分辨率重建需求日益增长。

传统的基于插值和滤波等方法在提高图像分辨率方面存在一定的局限性，因此引入无监督学习方法成为了研究热点之一。

本文旨在通过对基于无监督学习的图像超分辨率重建算法进行深入研究，探讨其原理、方法和应用，并对其未来发展方向进行展望。

第一部分：引言在本部分中，我们将介绍图像超分辨率重建领域的背景和意义，并简要介绍传统方法在该领域中存在的问题。

随后，我们将引出无监督学习方法，并阐述其在图像超分辨率重建中应用价值。

第二部分：相关技术综述本部分将对相关技术进行综述，包括传统基于插值和滤波等方法、有监督学习方法以及无监督学习方法等。

我们将详细介绍各种方法的原理、特点和应用场景，并对比它们的优劣势。

第三部分：基于无监督学习的图像超分辨率重建算法在本部分中，我们将重点介绍基于无监督学习的图像超分辨率重建算法。

首先，我们将介绍无监督学习方法在图像超分辨率重建中的应用思路和基本原理。

然后，我们将详细介绍一些经典的基于无监督学习方法，如自编码器、生成对抗网络等，并对它们进行比较和评估。

第四部分：实验设计与结果分析在本部分中，我们将设计一系列实验来验证基于无监督学习的图像超分辨率重建算法的有效性。

首先，我们将介绍实验数据集和评价指标，并详细说明实验设置和参数选择。

然后，我们将展示实验结果，并进行定性和定量评估。

最后，我们将对实验结果进行深入分析，并讨论其优缺点。

第五部分：讨论与展望在本部分中，我们将对基于无监督学习的图像超分辨率重建算法进行讨论与展望。

首先，我们将总结已有研究成果，并指出其存在问题与不足之处。

然后，我们将提出一些改进思路和方向，并展望未来的研究方向。

最后，我们将对基于无监督学习的图像超分辨率重建算法的发展前景进行展望。

第六部分：结论在本部分中，我们将对全文进行总结，并提出本研究的主要贡献和不足之处。

数据科学家面试30题

数据科学家面试30题1. 什么是数据科学家的角色？2. 请解释什么是数据清洗（data cleaning）。

3. 数据清洗的步骤有哪些？4. 为什么特征选择是数据预处理的重要步骤？5. 请解释什么是正态分布（normal distribution）。

6. 什么是回归分析（regression analysis）？有哪些常用的回归方法？7. 请解释什么是决策树（decision tree）。

8. 什么是朴素贝叶斯（Naive Bayes）算法？它在什么场景下适用？9. 请解释什么是支持向量机（Support Vector Machine，SVM）。

10. 请解释什么是聚类分析（cluster analysis）。

11. 什么是神经网络（neural network）？它的应用领域有哪些？12. 请解释什么是深度研究（deep learning）。

13. 什么是异常检测（anomaly detection）？有哪些常见的异常检测方法？15. 请解释什么是自然语言处理（natural language processing，NLP）。

16. 什么是大数据（big data）技术？有哪些常见的大数据处理框架？18. 什么是机器研究（machine learning）？它是如何工作的？19. 请解释什么是监督研究（supervised learning）和无监督研究（unsupervised learning）。

20. 什么是交叉验证（cross-validation）？21. 请解释什么是特征工程（feature engineering）。

22. 什么是模型评估（model evaluation）和模型选择（model selection）？23. 请解释过拟合（overfitting）和欠拟合（underfitting）。

24. 什么是数据挖掘（data mining）？它与数据科学有何区别？25. 请解释什么是时间序列分析（time series analysis）。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

UNSUPERVISED LEARNING OF GAMMA MIXTURE MODELS USING MINIMUM MESSAGE LENGTHYUDI AGUSTAComputer Science and Software EngineeringMonash UniversityClayton,VIC3800Australia email:yagusta@.auDA VID L.DOWE Computer Science and Software EngineeringMonash UniversityClayton,VIC3800AustraliaABSTRACTMixture modelling or unsupervised classiﬁcation is a prob-lem of identifying and modelling components in a body of data.Earlier work in mixture modelling using Min-imum Message Length(MML)includes the multinomial and Gaussian distributions(Wallace and Boulton,1968), the von Mises circular and Poisson distributions(Wallace and Dowe,1994,2000)and the distribution(Agusta and Dowe,2002a,2002b).In this paper,we extend this re-search by considering MML mixture modelling using the Gamma distribution.The point estimation of the distri-bution was performed using the MML approximation pro-posed by Wallace and Freeman(1987)and gives impres-sive results compared to Maximum Likelihood(ML).We then considered mixture modelling on artiﬁcially gener-ated datasets and compared the results with two other cri-teria,AIC and BIC.In terms of the resulting number of components,the results were again impressive.Applica-tion to the Heming Pike dataset was then examined and the results were compared in terms of the probability bit-costings,showing that the proposed MML method per-forms better than AIC and BIC.A further application also shows that our method works well with datasets contain-ing left-skewed components such as the Palm Valley(Aus-tralia)image dataset.KEY WORDSUnsupervised Classiﬁcation,Mixture Modelling,MML, Gamma1IntroductionMixture modelling[1,2,3]-generally known as cluster-ing or unsupervised classiﬁcation-models,as well as par-titions,an unknown number of components(or classes or clusters)of a dataset into aﬁnite number of components.In this paper,we discuss,in particular,a classiﬁcation prob-lem which models a statistical distribution by a mixture(a weighted sum)of other distributions.This type of classiﬁ-cation results in a model described by four elements:(1a) the number of components,(1b)the relative abundances (or mixing proportions)of each component,(1c)their dis-tribution parameters and(1d)the members(or things)that belong to the components.In selecting the most appropriate number of compo-nents in a dataset,the problem we often face is keeping the balance between model complexity and goodness ofﬁt. In other words,the best model for a dataset must be suf-ﬁciently complex in order to cover all information in the dataset,but not so complex as to over-ﬁt.A series of papers by Wallace and co-authors[1,4,5],dealt with model se-lection and parameter estimation problems using the Min-imum Message Length(MML)principle,which provides a fair comparison between models by stating each of them as a two-part message which encodes both model and the data in light of the model stated.Various related principles have also been stated independently by Solomonoff[6], Kolmogorov[7],Chaitin[8],and subsequently by Rissa-nen[9].For an overview,see[5].The MML mixture modelling proposed in[1]dealt with classiﬁcation problems of discrete multinomial and continuous Gaussian distributions.It was extended by Wal-lace and Dowe[10,11,12]to accommodate two other dis-tributions-Poisson and von Mises circular.The method was further broadened by Agusta and Dowe[13,14]to ac-commodate the distributions with known and unknown degrees of freedom.Beginning with parameter estimation,this paper ex-tends the application of the MML principle to the problem of mixture modelling by considering the Gamma distribu-tion.We compare the parameter estimation results with the Maximum Likelihood method in terms of their Kullback-Leibler distances from the true model to the inferred model. We also analyse mixture modelling results and compare them with the results of two other commonly used crite-ria,AIC and BIC,in terms of the resulting number of com-ponents.Applications to two real-world datasets:Heming Pike dataset and Palm Valley(Australia)image dataset are also provided.2MML Parameter EstimationThe Minimum Message Length(MML)principle is an in-variant Bayesian point estimation and model selection tech-nique based on information theory.The basic idea of MML is toﬁnd a model that minimises the total length of a two-part message encoding the model,and the data in light ofthat model[1,4,5].Letting be the data and be a model with priorprobability,using Bayes’s theorem,the point esti-mation and model selection problems can be regarded si-multaneously as a problem of maximising the posteriorprobability.From the information-theoretic point of view,where an event with probabilityis encoded by a message of length bits,theproblem is then equivalent to minimising(1)where theﬁrst term is the message length of the model andthe second term is the message length of the data in light ofthe model.In applying the MML principle to the mixture prob-lem of Gamma distributions,we needﬁrstly to perform pa-rameter estimation of the Gamma distribution.The param-eter estimation used here utilises the MML approximationproposed by Wallace and Freeman[4].Given the data and parameters,let be theprior probability distribution on,the likelihood,the negative log-likelihood andwhere is the scale parameter,is the shape parameterand is the Gamma function,given by:where we let here that and.For any positive integer,,.For large,the direct calculation from the Gamma func-tion deﬁnition above results in a very large value whichcan not be calculated precisely.Instead,Stirling’s asymp-totic representation of the Gamma function can be used toapproximate the function:In estimating the second parameter,,we considertwo separate cases:ﬁrstly,as a known parameter andsecondly,as an unknown continuous ing(2),the Fisher information for theﬁrst case,,is:Here,we assume a prior on over the range[,]for both cases and a prior on over=1.0&= 1.0 3.010.050.0100.0200.0 ML0.2200.360.1820.220.1130.110.1120.160.1400.360.1180.14=1000.0110.010.0100.010.0110.01MML0.0110.010.0110.010.0120.02 =5.0&= 1.0 3.010.025.0100.0200.0 ML0.1380.190.1930.360.1150.120.1410.190.1000.100.1030.11=1000.0100.010.0100.010.0110.01MML0.0100.010.0120.010.0100.01Table1.Kullback-Leibler distances of the ML and MML estimations of100datasets for Gamma distributions with and(with standard errors).the range(0,]for the second case.Minimising(3),the ﬁrst MML estimate,,gives:where,in this calculation,is regarded as the true model and is the inferred model.The experiment datasets for Table1were generated repeatedly(100times)from Gamma distributions with,and the number of data,.Both methods infer the two estimates as unknown continuous parameters.As shown in Table1,the MML estimations always resulted in estimates closer to the true model,with smaller Kullback-Leibler distances for all values of,and.The MML also performed a more robust estimation,with smaller or equal standard errors of the resulting Kullback-Leibler dis-tances.This case of MML outclassing ML is reminiscent of the von Mises circular distribution[16]as well as the distribution[13,14].3MML Mixture ModellingIn order to apply MML to a mixture modelling problem,a two-part message conveying the mixture model needs tobe constructed.Recall that from Section2,the model for the mixture comprises several concatenated message frag-ments,stating in turn:1a.The number of components:Assuming that all num-bers are considered as equally likely up to some con-stant,(say,100),this part can be encoded using a uni-form distribution over the range.1b.The relative abundances(or mixing proportions)of each component:Considering the relative abundancesof an-component mixture,this is the same as the condition for an-state multinomial distribution.The parameter estimation and the message length cal-culation of the multi-state distribution have been elab-orated upon in subsection2.1.1c.For each component,the distribution parameters of itsattributes:In this case,the component attribute is in-ferred as a Gamma distribution as elaborated in sub-section2.2.1d.For each thing,the component to which the thing isestimated to belong.For the item(1d),instead of the total assignment asproposed in[1],the partial assignment is utilised[17,10, 11,12,13,14].In the partial assignment,the data are as-signed partially to each component with a certain probabil-ity which costs,where is the total prob-ability of any component generating datum.For further discussion,see[17,Sec.3][12,pp77-78][13,Sec.5].Once theﬁrst part of the message is stated,the sec-ond part of the message will encode the data in light of the model stated in theﬁrst part of the message.Since the objective of the MML principle is toﬁnd the model that minimises the message length,we do not need to actually encode the message.In other words,we only need to calcu-late the length of the message andﬁnd the model that gives the shortest/minimum message length.4Alternative Model Selection Criteria-AIC and BICFor comparison,two criteria are considered here.These are the Akaike Information Criterion()and the Bayesian Information Criterion(BIC).AIC wasﬁrst developed by Akaike[18]in order to identify the model of a dataset and is given by:where is the logarithm of the likelihood at the maximum likelihood solution for the investigated mixture model and is the number of parameters estimated.Drawing on the work by Sclove[19],for the Gamma mixture model, is set equal to when is known andwhen is unknown,where is the number of components and is the number of variables in the dataset. Despite some drawbacks of this criterion in selecting the number of components for a mixture modelling problem,it is still often used to assess the order of a mixture model[3]. The model which results in the smallest is the model selected for the dataset.The second criterion,BIC,wasﬁrst introduced by Schwarz[20]and is given by:where is the logarithm of the likelihood at the maximum likelihood solution for the investigated mixture model,is the number of parameters estimated and is the number of data.Based on the discussion by Fraley and Raftery[21], for the Gamma mixture model,is set equal to when is known and when is unknown,where is the number of components and is the number of variables in the dataset.(This model selection criterion is formally,not conceptually,the same as the Minimum Description Length (MDL)criterion proposed by Rissanen[9].)The model which results in the smallest BIC is selected as the best model for the dataset.5ExperimentsWe applied the proposed method to some artiﬁcial datasets and compared the results with two other criteria:AIC and BIC.Applications to the Heming Pike dataset and the Palm Valley(Australia)image dataset are also considered.5.1Artiﬁcial DatasetsIn this experiment,we generated two artiﬁcial univariate three-component mixture datasets with,,,,,,, and for theﬁrst dataset, and,,,,, ,,andfor the second.The mixing proportions of the three com-ponents in both datasets were1:1:1,out of300samples.The two datasets were modelled with the parameter,set as a known parameter and unknown continuous parameter, respectively.The measurement accuracy,,is set equal to 0.01for both cases.We repeated each modelling20times.Comparison of the MML modelling results with those obtained using AIC and BIC can be seen in Figure1for the ﬁrst case,and Figure2for the second case.As shown in both graphs,MML performed better,in most cases,than AIC and BIC.AIC performed badly as none of the datasets were correctlymodelled.parison of MML,AIC and BIC,when mod-elling theﬁrst artiﬁcialdataset.parison of MML,AIC and BIC,when mod-elling the second artiﬁcial dataset.5.2Heming PikeThe Heming Pike dataset consists of523data,measuring the length of northern pikes(Esox lucius).This was sam-pled from Heming Lake,Manitoba,Canada in1965,and was reported earlier in[22,23].The data comprisesﬁve different age-groups of pikes.In[23],the dataset was mod-elled by setting the second parameter,,as a known pa-rameter with(orNew Zealand Journal of Statistics,41(2),1999,153-171.[3]G.J.McLachlan and D.Peel,Finite Mixture Models,(New York:John Wiley and Sons,2000).[4]C.S.Wallace and P.R.Freeman,Estimation and In-ference by Compact Coding,Journal of the Royal Sta-tistical Society(B),49(3),1987,240-265.[5]C.S.Wallace and D.L.Dowe,Minimum MessageLength and Kolmogorov Complexity,Computer Jour-nal,42(4),1999,270-283.Special issue on Kol-mogorov Complexity.[6]R.J.Solomonoff,A formal theory of inductive infer-ence,Information and Control,7,1964,1-22,224-254.[7]A.N.Kolmogorov,Three approaches to the quantita-tive deﬁnition of information,Problems of Informa-tion Transmission,1,1965,4-7.[8]G.J.Chaitin,On the length of programs for comput-ingﬁnite sequences,Journal of the Association for Computing Machinery,13,1966,547-569.[9]J.J.Rissanen,Modeling by shortest data description,Automatica,14,1978,465-471.[10]C.S.Wallace and D.L.Dowe,Intrinsic classiﬁcationby MML-the Snob program,Proc.7th Australian Joint Conference on Artiﬁcial Intelligence,Armidale, Australia,1994,37-44.[11]C.S.Wallace and D.L.Dowe,MML Mixture Mod-elling of multi-state,Poisson,von Mises Circular and Gaussian Distributions,Proc.Sixth International Workshop on Artiﬁcial Intelligence and Statistics, Florida,USA,1997,529-536.[12]C.S.Wallace and D.L.Dowe,MML clustering ofmulti-state,Poisson,von Mises circular and Gaussian distributions,Statistics and Computing,10,Jan.2000, 73-83.[13]Y.Agusta and D.L.Dowe,MML Clustering ofContinuous-Valued Data Using Gaussian and t Dis-tributions,in B.McKay and J.Slaney(Ed.),Lec-ture Notes in Artiﬁcial Intelligence,2557,(Berlin: Springer-Verlag,2002)143-154.[14]Y.Agusta and D.L.Dowe,Clustering of Gaus-sian and t Distributions using Minimum Message Length,Proc.International Conference Knowledge Based Computer Systems(KBCS-2002),Mumbai,In-dia,2002,289-299.[15]P.Tan and D.L.Dowe,MML Inference of DecisionGraphs with Multi-way Joins,in B.McKay and J.Slaney(Ed.),Lecture Notes in Artiﬁcial Intelligence, 2557,(Berlin:Springer-Verlag,2002)131-142.[16]C.S.Wallace and D.L.Dowe,MML estimationof the von Mises concentration parameter,Techni-cal Report TR93/19,Computer Science Department, Monash University,Clayton,3800Australia,1993.[17]C.S.Wallace,An improved program for classiﬁca-tion,Proc.Ninth Australian Computer Science Con-ference(ACSC-9),8,Monash University,Australia, 1986,357-366.[18]H.Akaike,A new look at the statistical model iden-tiﬁcation,IEEE Transactions on Automatic Control, AC-19(6),1974,716-723.[19]S.L.Sclove,Application of model-selection criteriato some problems in multivariate analysis,Psychome-trika,52(3),1987,333-343.[20]G.Schwarz,Estimating the dimension of a model,The Annals of Statistics,6,1978,461-464.[21]C.Fraley and A.E.Raftery,How Many Clusters?Which Clustering Method?Answers Via Model-Based Cluster Analysis,Computer Journal,41(8), 1998,578-588.[22]P.D.M.Macdonald and T.J.Pitcher,Age-groupsfrom size-frequency data:a versatile and efﬁcient method of analysing distribution mixtures,Journal of the Fisheries Research Board of Canada,36,1979, 987-1001.[23]P.D.M.Macdonald,Analysis of length-frequencydistributions,in R.C.Summerfelt and G.E.Hall (Ed.),Age and Growth of Fish,(Iowa:The Iowa State University Press,1987)371-384.。