Learning Parts-Based Representations of Data

格式：pdf
大小：429.94 KB
文档页数：29

下载文档原格式

英文论文写作中一些可能用到的词汇

英⽂论⽂写作中⼀些可能⽤到的词汇英⽂论⽂写作过程中总是被⾃⼰可怜的词汇量击败, 所以我打算在这⾥记录⼀些在阅读论⽂过程中见到的⼀些⾃⼰不曾见过的词句或⽤法。

这些词句查词典都很容易查到，但是只有带⼊论⽂原⽂中才能体会内涵。

毕竟原⽂和译⽂中间总是存在⼀条看不见的思想鸿沟。

形容词1. vanilla: adj. 普通的, 寻常的, 毫⽆特⾊的. ordinary; not special in any way.2. crucial: adj. ⾄关重要的, 关键性的.3. parsimonious：adj. 悭吝的, 吝啬的, ⼩⽓的.e.g. Due to the underlying hyperbolic geometry, this allows us to learn parsimonious representations of symbolic data by simultaneously capturing hierarchy and similarity.4. diverse: adj. 不同的, 相异的, 多种多样的, 形形⾊⾊的.5. intriguing: adj. ⾮常有趣的, 引⼈⼊胜的; 神秘的. *intrigue: v. 激起…的兴趣, 引发…的好奇⼼; 秘密策划(加害他⼈), 密谋.e.g. The results of this paper carry several intriguing implications.6. intimate: adj. 亲密的; 密切的. v.透露; (间接)表⽰, 暗⽰.e.g. The above problems are intimately linked to machine learning on graphs.7. akin: adj. 类似的, 同族的, 相似的.e.g. Akin to GNN, in LOCAL a graph plays a double role: ...8. abundant: adj. ⼤量的, 丰盛的, 充裕的.9. prone: adj. 有做(坏事)的倾向; 易于遭受…的; 俯卧的.e.g. It is thus prone to oversmoothing when convolutions are applied repeatedly.10.concrete: adj. 混凝⼟制的; 确实的, 具体的(⽽⾮想象或猜测的); 有形的; 实在的.e.g. ... as a concrete example ...e.g. More concretely, HGCN applies the Euclidean non-linear activation in...11. plausible: adj. 有道理的; 可信的; 巧⾔令⾊的, 花⾔巧语的.e.g. ... this interpretation may be a plausible explanation of the success of the recently introduced methods.12. ubiquitous: adj. 似乎⽆所不在的;⼗分普遍的.e.g. While these higher-order interac- tions are ubiquitous, an evaluation of the basic properties and organizational principles in such systems is missing.13. disparate: adj. 由不同的⼈(或事物)组成的;迥然不同的;⽆法⽐较的.e.g. These seemingly disparate types of data have something in common: ...14. profound: adj. 巨⼤的; 深切的, 深远的; 知识渊博的; 理解深刻的;深邃的, 艰深的; ⽞奥的.e.g. This has profound consequences for network models of relational data — a cornerstone in the interdisciplinary study of complex systems.15. blurry: adj. 模糊不清的.e.g. When applying these estimators to solve (2), the line between the critic and the encoders $g_1, g_2$ can be blurry.16. amenable: adj. 顺从的; 顺服的; 可⽤某种⽅式处理的.e.g. Ou et al. utilize sparse generalized SVD to generate a graph embedding, HOPE, from a similarity matrix amenableto de- composition into two sparse proximity matrices.17. elaborate: adj. 复杂的;详尽的;精⼼制作的 v.详尽阐述;详细描述;详细制订;精⼼制作e.g. Topic Modeling for Graphs also requires elaborate effort, as graphs are relational while documents are indepen- dent samples.18. pivotal: adj. 关键性的；核⼼的e.g. To ensure the stabilities of complex systems is of pivotal significance toward reliable and better service providing.19. eminent: adj. 卓越的，著名的，显赫的;⾮凡的;杰出的e.g. To circumvent those defects, theoretical studies eminently represented by percolation theories appeared.20. indispensable: adj. 不可或缺的;必不可少的 n. 不可缺少的⼈或物e.g. However, little attention is paid to multipartite networks, which are an indispensable part of complex networks.21. post-hoc: adj. 事后的e.g. Post-hoc explainability typically considers the question “Why the GNN predictor made certain prediction?”.22. prevalent: adj. 流⾏的;盛⾏的;普遍存在的e.g. A prevalent solution is building an explainer model to conduct feature attribution23. salient: adj. 最重要的;显著的;突出的. n. 凸⾓;[建]突出部;<军>进攻或防卫阵地的突出部分e.g. It decomposes the prediction into the contributions of the input features, which redistributes the probability of features according to their importance and sample the salient features as an explanatory subgraph.24. rigorous: adj. 严格缜密的;严格的;谨慎的;细致的;彻底的;严厉的e.g. To inspect the OOD effect rigorously, we take a causal look at the evaluation process with a Structural Causal Model.25. substantial: adj. ⼤量的;价值巨⼤的;重⼤的;⼤⽽坚固的;结实的;牢固的. substantially: adv. ⾮常;⼤⼤地;基本上;⼤体上;总的来说26. cogent: adj. 有说服⼒的;令⼈信服的e.g. The explanatory subgraph $G_s$ emphasizes tokens like “weak” and relations like “n’t→funny”, which is cogent according to human knowledge.27. succinct: adj. 简练的;简洁的 succinctly: adv. 简⽽⾔之，简明扼要地28. concrete: adj. 混凝⼟制的;确实的，具体的(⽽⾮想象或猜测的);有形的;实在的 concretely: adv. 具体地;具体;具体的;有形地29. predominant：adj. 主要的;主导的;显著的;明显的;盛⾏的;占优势的动词1. mitigate: v. 减轻, 缓和. (反 enforce)e.g. In this work, we focus on mitigating this problem for a certain class of symbolic data.2. corroborate: v. [VN] [often passive] (formal) 证实, 确证.e.g. This is corroborated by our experiments on real-world graph.3. endeavor: n./v. 努⼒, 尽⼒, 企图, 试图.e.g. It encourages us to continue the endeavor in applying principles mathematics and theory in successful deployment of deep learning.4. augment: v. 增加, 提⾼, 扩⼤. n. 增加, 补充物.e.g. We also augment the graph with geographic information (longitude, latitude and altitude), and GDP of the country where the airport belongs to.5. constitute: v. (被认为或看做)是, 被算作; 组成, 构成; (合法或正式地)成⽴, 设⽴.6. abide: v. 接受, 遵照(规则, 决定, 劝告); 逗留, 停留.e.g. Training a graph classifier entails identifying what constitutes a class, i.e., finding properties shared by graphs in one class but not the other, and then deciding whether new graphs abide to said learned properties.7. entail: v. 牵涉; 需要; 使必要. to involve sth that cannot be avoided.e.g. Due to the recursive definition of the Chebyshev polynomials, the computation of the filter $g_α(\Delta)f$ entails applying the Laplacian $r$ times, resulting cal operator affecting only 1-hop neighbors of a vertex and in $O(rn)$ operations.8. encompass: v. 包含, 包括, 涉及(⼤量事物); 包围, 围绕, 围住.e.g. This model is chosen as it is sufficiently general to encompass several state-of-the-art networks.e.g. The k-cycle detection problem entails determining if G contains a k-cycle.9. reveal: v. 揭⽰, 显⽰, 透露, 显出, 露出, 展⽰.10. bestow: v. 将(…)给予, 授予, 献给.e.g. Aiming to bestow GCNs with theoretical guarantees, one promising research direction is to study graph scattering transforms (GSTs).11. alleviate: v. 减轻, 缓和, 缓解.12. investigate: v. 侦查(某事), 调查(某⼈), 研究, 调查.e.g. The sensitivity of pGST to random and localized noise is also investigated.13. fuse: v. (使)融合, 熔接, 结合; (使)熔化, (使保险丝熔断⽽)停⽌⼯作.e.g. We then fuse the topological embeddings with the initial node features into the initial query representations using a query network$f_q$ implemented as a two-layer feed-forward neural network.14. magnify: v. 放⼤, 扩⼤; 增强; 夸⼤(重要性或严重性); 夸张.e.g. ..., adding more layers also leads to more parameters which magnify the potential of overfitting.15. circumvent: v. 设法回避, 规避; 绕过, 绕⾏.e.g. To circumvent the issue and fulfill both goals simultaneously, we can add a negative term...16. excel: v. 擅长, 善于; 突出; 胜过平时.e.g. Nevertheless, these methods have been repeatedly shown to excel in practice.17. exploit: v. 利⽤(…为⾃⼰谋利); 剥削, 压榨; 运⽤, 利⽤; 发挥.e.g. In time series and high-dimensional modeling, approaches that use next step prediction exploit the local smoothness of the signal.18. regulate: v. (⽤规则条例)约束, 控制, 管理; 调节, 控制(速度、压⼒、温度等).e.g. ... where $b >0$ is a parameter regulating the probability of this event.19. necessitate: v. 使成为必要.e.g. Combinatorial models reproduce many-body interactions, which appear in many systems and necessitate higher-order models that capture information beyond pairwise interactions.20. portray:描绘, 描画, 描写; 将…描写成; 给⼈以某种印象; 表现; 扮演(某⾓⾊).e.g. Considering pairwise interactions, a standard network model would portray the link topology of the underlying system as shown in Fig. 2b.21. warrant: v. 使有必要; 使正当; 使恰当. n. 执⾏令; 授权令; (接受款项、服务等的)凭单, 许可证; (做某事的)正当理由, 依据.e.g. Besides statistical methods that can be used to detect correlations that warrant higher-order models, ... (除了可以⽤来检测⽀持⾼阶模型的相关性的统计⽅法外, ...)22. justify: v. 证明…正确(或正当、有理); 对…作出解释; 为…辩解(或辩护); 调整使全⾏排满; 使每⾏排齐.e.g. ..., they also come with the assumption of transitive, Markovian paths, which is not justified in many real systems.23. hinder:v. 阻碍; 妨碍; 阻挡. (反 foster: v. 促进; 助长; 培养; ⿎励; 代养, 抚育, 照料(他⼈⼦⼥⼀段时间))e.g. The eigenvalues and eigenvectors of these matrix operators capture how the topology of a system influences the efficiency of diffusion and propagation processes, whether it enforces or mitigates the stability of dynamical systems, or if it hinders or fosters collective dynamics.24. instantiate:v. 例⽰；⽤具体例⼦说明.e.g. To learn the representation we instantiate (2) and split each input MNIST image into two parts ...25. favor:v. 赞同;喜爱, 偏爱; 有利于, 便于. n. 喜爱, 宠爱, 好感, 赞同; 偏袒, 偏爱; 善⾏, 恩惠.26. attenuate: v. 使减弱; 使降低效⼒.e.g. It therefore seems that the bounds we consider favor hard-to-invert encoders, which heavily attenuate part of the noise, over well conditioned encoders.27. elucidate:v. 阐明; 解释; 说明.e.g. Secondly, it elucidates the importance of appropriately choosing the negative samples, which is indeed a critical component in deep metric learning based on triplet losses.28. violate: v. 违反, 违犯, 违背(法律、协议等); 侵犯(隐私等); 使⼈不得安宁; 搅扰; 亵渎, 污损(神圣之地).e.g. Negative samples are obtained by patches from different images as well as patches from the same image, violating the independence assumption.29. compel:v. 强迫, 迫使; 使必须; 引起(反应).30. gauge: v. 判定, 判断(尤指⼈的感情或态度); (⽤仪器)测量, 估计, 估算. n. 测量仪器(或仪表);计量器;宽度;厚度;(枪管的)⼝径e.g. Yet this hyperparameter-tuned approach raises a cubic worst-case space complexity and compels the user to traverse several feature sets and gauge the one that attains the best performance in the downstream task.31. depict: v. 描绘, 描画; 描写, 描述; 刻画.e.g. As they depict different aspects of a node, it would take elaborate designs of graph convolutions such that each set of features would act as a complement to the other.32. sketch: n. 素描;速写;草图;幽默短剧;⼩品;简报;概述 v. 画素描;画速写;概述;简述e.g. Next we sketch how to apply these insights to learning topic models.33. underscore：v. 在…下⾯划线;强调;着重说明 n.下划线e.g. Moreover, the walk-topic distributions generated by Graph Anchor LDA are indeed sharper than those by ordinary LDA, underscoring the need for selecting anchors.34. disclose: v. 揭露;透露;泄露;使显露;使暴露e.g. Another drawback lies in their unexplainable nature, i.e., they cannot disclose the sciences beneath network dynamics.35. coincide: v. 同时发⽣;相同;相符;极为类似;相接;相交;同位;位置重合;重叠e.g. The simulation results coincide quite well with the theoretical results.36. inspect: v. 检查;查看;审视;视察 to look closely at sth/sb, especially to check that everything is as it should be名词1. capacity: n. 容量, 容积, 容纳能⼒; 领悟(或理解、办事)能⼒; 职位, 职责.e.g. This paper studies theoretically the computational capacity limits of graph neural networks (GNN) falling within the message-passing framework of Gilmer et al. (2017).2. implication: n. 可能的影响(或作⽤、结果); 含意, 暗指; (被)牵连, 牵涉.e.g. Section 4 analyses the implications of restricting the depth $d$ and width $w$ of GNN that do not use a readout function.3. trade-off:(在需要⽽⼜相互对⽴的两者间的)权衡, 协调.e.g. This reveals a direct trade-off between the depth and width of a graph neural network.4. cornerstone:n. 基⽯; 最重要部分; 基础; 柱⽯.5. umbrella: n. 伞; 综合体; 总体, 整体; 保护, 庇护(体系).e.g. Community detection is an umbrella term for a large number of algorithms that group nodes into distinct modules to simplify and highlight essential structures in the network topology.6. folklore:n. 民间传统, 民俗; 民间传说.e.g. It is folklore knowledge that maximizing MI does not necessarily lead to useful representations.7. impediment:n. 妨碍,阻碍,障碍; ⼝吃.e.g. While a recent approach overcomes this impediment, it results in poor quality in prediction tasks due to its linear nature.8. obstacle:n. 障碍;阻碍; 绊脚⽯; 障碍物; 障碍栅栏.e.g. However, several major obstacles stand in our path towards leveraging topic modeling of structural patterns to enhance GCNs.9. vicinity:n. 周围地区; 邻近地区; 附近.e.g. The traits with which they engage are those that are performed in their vicinity.10. demerit: n. 过失,缺点,短处; (学校给学⽣记的)过失分e.g. However, their principal demerit is that their implementations are time-consuming when the studied network is large in size. Another介/副/连词1. notwithstanding:prep. 虽然;尽管 adv. 尽管如此.e.g. Notwithstanding this fundamental problem, the negative sampling strategy is often treated as a design choice.2. albeit: conj. 尽管;虽然e.g. Such methods rely on an implicit, albeit rigid, notion of node neighborhood; yet this one-size-fits-all approach cannot grapple with the diversity of real-world networks and applications.3. Hitherto:adv. 迄今;直到某时e.g. Hitherto, tremendous endeavors have been made by researchers to gauge the robustness of complex networks in face of perturbations.短语1.in a nutshell: 概括地说, 简⾔之, ⼀⾔以蔽之.e.g. In a nutshell, GNN are shown to be universal if four strong conditions are met: ...2. counter-intuitively: 反直觉地.3. on-the-fly:动态的(地), 运⾏中的(地).4. shed light on/into:揭⽰, 揭露; 阐明; 解释; 将…弄明⽩; 照亮.e.g. These contemporary works shed light into the stability and generalization capabilities of GCNs.e.g. Discovering roles and communities in networks can shed light on numerous graph mining tasks such as ...5. boil down to: 重点是; 将…归结为.e.g. These aforementioned works usually boil down to a general classification task, where the model is learnt on a training set and selected by checking a validation set.6. for the sake of:为了.e.g. The local structures anchored around each node as well as the attributes of nodes therein are jointly encoded with graph convolution for the sake of high-level feature extraction.7. dates back to:追溯到.e.g. The usual problem setup dates back at least to Becker and Hinton (1992).8. carry out:实施, 执⾏, 实⾏.e.g. We carry out extensive ablation studies and sensi- tivity analysis to show the effectiveness of the proposed functional time encoding and TGAT-layer.9. lay beyond the reach of:...能⼒达不到e.g. They provide us with information on higher-order dependencies between the components of a system, which lay beyond the reach of models that exclusively capture pairwise links.10. account for: ( 数量或⽐例上)占; 导致, 解释(某种事实或情况); 解释, 说明(某事); (某⼈)对(⾏动、政策等)负有责任; 将(钱款)列⼊(预算).e.g. Multilayer models account for the fact that many real complex systems exhibit multiple types of interactions.11. along with: 除某物以外; 随同…⼀起, 跟…⼀起.e.g. Along with giving us the ability to reason about topological features including community structures or node centralities, network science enables us to understand how the topology of a system influences dynamical processes, and thus its function.12. dates back to:可追溯到.e.g. The usual problem setup dates back at least to Becker and Hinton (1992) and can conceptually be described as follows: ...13. to this end:为此⽬的;为此计;为了达到这个⽬标.e.g. To this end, we consider a simple setup of learning a representation of the top half of MNIST handwritten digit images.14. Unless stated otherwise:除⾮另有说明.e.g. Unless stated otherwise, we use a bilinear critic $f(x, y) = x^TWy$, set the batch size to $128$ and the learning rate to $10^{−4}$.15. As a reference point:作为参照.e.g. As a reference point, the linear classification accuracy from pixels drops to about 84% due to the added noise.16. through the lens of:透过镜头. (以...视⾓)e.g. There are (at least) two immediate benefits of viewing recent representation learning methods based on MI estimators through the lens of metric learning.17. in accordance with:符合；依照；和…⼀致.e.g. The metric learning view seems hence in better accordance with the observations from Section 3.2 than the MI view.It can be shown that the anchors selected by our Graph Anchor LDA are not only indicative of “topics” but are also in accordance with the actual graph structures.18. be akin to:近似, 类似, 类似于.e.g. Thus, our learning model is akin to complex contagion dynamics.19. to name a few:仅举⼏例;举⼏个来说.e.g. Multitasking, multidisciplinary work and multi-authored works, to name a few, are ingrained in the fabric of science culture and certainly multi-multi is expected in order to succeed and move up the scientific ranks.20. a handful of:⼀把;⼀⼩撮;少数e.g. A handful of empirical work has investigated the robustness of complex networks at the community level.21. wreak havoc: 破坏;肆虐;严重破坏;造成破坏;浩劫e.g. Failures on one network could elicit failures on its coupled networks, i.e., networks with which the focal network interacts, and eventually those failures would wreak havoc on the entire network.22. apart from: 除了e.g. We further posit that apart from node $a$ node $b$ has $k$ neighboring nodes.。

中科院自动化所的中英文新闻语料库

中科院自动化所的中英文新闻语料库中科院自动化所（Institute of Automation, Chinese Academy of Sciences）是中国科学院下属的一家研究机构，致力于开展自动化科学及其应用的研究。

该所的研究涵盖了从理论基础到技术创新的广泛领域，包括人工智能、机器人技术、自动控制、模式识别等。

下面将分别从中文和英文角度介绍该所的相关新闻语料。

[中文新闻语料]1. 中国科学院自动化所在人脸识别领域取得重大突破中国科学院自动化所的研究团队在人脸识别技术方面取得了重大突破。

通过深度学习算法和大规模数据集的训练，该研究团队成功地提高了人脸识别的准确性和稳定性，使其在安防、金融等领域得到广泛应用。

2. 中科院自动化所发布最新研究成果：基于机器学习的智能交通系统中科院自动化所发布了一项基于机器学习的智能交通系统研究成果。

通过对交通数据的收集和分析，研究团队开发了智能交通控制算法，能够优化交通流量，减少交通拥堵和时间浪费，提高交通效率。

3. 中国科学院自动化所举办国际学术研讨会中国科学院自动化所举办了一场国际学术研讨会，邀请了来自不同国家的自动化领域专家参加。

研讨会涵盖了人工智能、机器人技术、自动化控制等多个研究方向，旨在促进国际间的学术交流和合作。

4. 中科院自动化所签署合作协议，推动机器人技术的产业化发展中科院自动化所与一家著名机器人企业签署了合作协议，共同推动机器人技术的产业化发展。

合作内容包括技术研发、人才培养、市场推广等方面，旨在加强学界与工业界的合作，加速机器人技术的应用和推广。

5. 中国科学院自动化所获得国家科技进步一等奖中国科学院自动化所凭借在人工智能领域的重要研究成果荣获国家科技进步一等奖。

该研究成果在自动驾驶、物联网等领域具有重要应用价值，并对相关行业的创新和发展起到了积极推动作用。

[英文新闻语料]1. Institute of Automation, Chinese Academy of Sciences achievesa major breakthrough in face recognitionThe research team at the Institute of Automation, Chinese Academy of Sciences has made a major breakthrough in face recognition technology. Through training with deep learning algorithms and large-scale datasets, the research team has successfully improved the accuracy and stability of face recognition, which has been widely applied in areas such as security and finance.2. Institute of Automation, Chinese Academy of Sciences releases latest research on machine learning-based intelligent transportationsystemThe Institute of Automation, Chinese Academy of Sciences has released a research paper on a machine learning-based intelligent transportation system. By collecting and analyzing traffic data, the research team has developed intelligent traffic control algorithms that optimize traffic flow, reduce congestion, and minimize time wastage, thereby enhancing overall traffic efficiency.3. Institute of Automation, Chinese Academy of Sciences hosts international academic symposiumThe Institute of Automation, Chinese Academy of Sciences recently held an international academic symposium, inviting automation experts from different countries to participate. The symposium covered various research areas, including artificial intelligence, robotics, and automatic control, aiming to facilitate academic exchanges and collaborations on an international level.4. Institute of Automation, Chinese Academy of Sciences signs cooperation agreement to promote the industrialization of robotics technologyThe Institute of Automation, Chinese Academy of Sciences has signed a cooperation agreement with a renowned robotics company to jointly promote the industrialization of robotics technology. The cooperation includes areas such as technology research and development, talent cultivation, and market promotion, aiming to strengthen the collaboration between academia and industry and accelerate the application and popularization of robotics technology.5. Institute of Automation, Chinese Academy of Sciences receivesNational Science and Technology Progress Award (First Class) The Institute of Automation, Chinese Academy of Sciences has been awarded the National Science and Technology Progress Award (First Class) for its important research achievements in the field of artificial intelligence. The research outcomes have significant application value in areas such as autonomous driving and the Internet of Things, playing a proactive role in promoting innovation and development in related industries.。

representational learning -回复

representational learning -回复什么是Representational LearningRepresentational Learning（表示学习）是一种机器学习的技术，其目的是使计算机能够自动学习数据的表示（representation）。

在传统的机器学习中，研究人员通常需要手动选择和设计特征，然后使用这些特征来训练模型。

然而，对于复杂的任务和大规模数据，手动设计特征是非常困难且耗时的。

而Representational Learning则试图通过自动学习数据的表示，减轻特征工程的负担，提高模型的性能。

那么，Representational Learning是如何工作的呢？首先，我们需要明确一点，和传统机器学习的特征工程不同，Representational Learning并不需要人工设计特征。

相反，它通过训练模型自动地学习数据的表示。

这个表示通常是一个低维的表示空间，其中每个维度都表示了数据的某种特征。

这种低维表示可以帮助我们更好地理解数据，并提取有用的信息。

为了实现这一目标，Representational Learning通常使用神经网络（Neural Networks）这种强大的模型。

神经网络由多个层次组成，每一层都包含一些神经元或节点。

每个节点都有一些权重，用于计算输入的加权和。

通过调整这些权重，神经网络可以自动地学习数据的复杂模式和表示。

那么，如何训练神经网络来学习数据的表示呢？首先，我们需要为神经网络提供大量的训练数据。

这些训练数据包含了我们想要学习的问题或任务的输入和输出。

通过将这些数据输入神经网络，我们可以通过反向传播算法（backpropagation）来调整网络的权重，使其能够更好地拟合训练数据。

在训练过程中，神经网络会逐渐调整其权重以减小输入和期望输出之间的差距。

通过反复迭代这个过程，神经网络可以逐渐学习到数据的内在表示。

这个学习过程通常需要大量的计算资源和时间，但一旦完成，我们就可以利用这个训练好的模型来进行各种任务，如分类、回归、聚类等。

课外英文知识点总结高中

课外英文知识点总结高中1. Parts of SpeechIn English grammar, words are classified into several categories based on their function and usage in sentences. These categories are called parts of speech. The eight parts of speech are:- Noun: A word that represents a person, place, thing, or idea. For example, "dog," "city," and "happiness" are all nouns.- Pronoun: A word that takes the place of a noun. For example, "he," "she," and "it" are all pronouns.- Verb: A word that expresses an action or a state of being. For example, "run," "eat," and "is" are all verbs.- Adjective: A word that describes a noun. For example, "big," "red," and "happy" are all adjectives.- Adverb: A word that modifies a verb, adjective, or another adverb. For example, "quickly," "very," and "well" are all adverbs.- Preposition: A word that shows the relationship between a noun or pronoun and other words in a sentence. For example, "in," "on," and "over" are all prepositions.- Conjunction: A word that connects words, phrases, or clauses. For example, "and," "but," and "or" are all conjunctions.- Interjection: A word or phrase that expresses strong emotion. For example, "wow," "ouch," and "hurray" are all interjections.Understanding the different parts of speech is essential for building grammatically correct sentences and expressing ideas clearly in writing and speaking.2. Sentence StructureIn English, sentences are made up of various elements, including subjects, verbs, objects, and complements. Understanding sentence structure is crucial for constructing grammatically correct sentences. Here are some key components of sentence structure:- Subject: The person or thing that performs the action or is described in the sentence. For example, in the sentence "The cat sat on the mat," "the cat" is the subject.- Verb: The action or state of being in the sentence. For example, in the sentence "She sings beautifully," "sings" is the verb.- Object: The person or thing that receives the action of the verb. For example, in the sentence "I ate an apple," "an apple" is the object.- Complement: A word or group of words that completes the meaning of the sentence. For example, in the sentence "She is a doctor," "a doctor" is the complement.In addition to these basic components, English sentences may also include modifiers, adverbial phrases, and other elements that contribute to the overall structure and meaning of the sentence.3. PunctuationPunctuation marks are used in writing to indicate pauses, intonation, and the structure of sentences. Here are some of the key punctuation marks and their uses:- Period (.) : A period is used to end a declarative sentence or an imperative sentence that is not a command. For example, "I am going to the store."- Question mark (?) : A question mark is used to end a direct question. For example, "What time is the meeting?"- Exclamation mark (!) : An exclamation mark is used to end a sentence that expresses strong emotion or emphasis. For example, "Congratulations on your promotion!"- Comma (,) : A comma is used to separate items in a list, to separate clauses in a compound sentence, and to set off introductory phrases and dependent clauses. For example, "I like apples, bananas, and oranges."- Semicolon (;) : A semicolon is used to connect two closely related independent clauses. For example, "She studied hard for the test; she wanted to get a good grade."- Colon (:) : A colon is used to introduce a list, to introduce an explanation or example, or to separate hours from minutes in time. For example, "Please bring the following items: a pen, a notebook, and a calculator."- Dash (-) : A dash is used to indicate a sudden break or change in thought, to set off a parenthetical element, or to emphasize information. For example, "The weather - cold and windy - made it difficult to stay outside."Understanding and correctly using punctuation marks is essential for clear and effective communication in writing.4. VocabularyExpanding and improving vocabulary is an important aspect of English language learning. A strong vocabulary allows students to express ideas more precisely, comprehend complex texts, and communicate effectively in a variety of contexts. There are several ways to build vocabulary, including:- Reading widely and regularly: Reading a variety of texts, including fiction, non-fiction, newspapers, and magazines, exposes students to different words and contexts.- Using context clues: Context clues are hints in the surrounding text that help readers understand the meaning of unfamiliar words.- Learning word roots, prefixes, and suffixes: Understanding word parts can help students decipher the meaning of unfamiliar words.- Using a dictionary: Looking up unfamiliar words in a dictionary provides definitions, pronunciation, and usage examples.- Engaging in word games and activities: Word puzzles, crosswords, and vocabulary-building games can make learning new words enjoyable and interactive.By actively engaging with words and using them in different contexts, students can expand their vocabulary and become more proficient in English language usage.5. Reading ComprehensionReading comprehension is the ability to understand and interpret written text. It involves a range of skills, including understanding main ideas, identifying details, making inferences, and analyzing the structure and language of a text. Here are some strategies for improving reading comprehension:- Previewing: Before reading a text, previewing the title, headings, and illustrations can give readers an idea of what to expect and activate prior knowledge.- Asking questions: Generating questions before, during, and after reading can help students focus on key ideas and make connections with the text.- Visualizing: Creating mental images or visual representations of the text can help students better understand and remember the content.- Summarizing: Recapping the main points and key details of a text in one's own words can demonstrate understanding and retention.- Making connections: Relating the text to personal experiences, other texts, and the world can deepen comprehension and foster critical thinking.By practicing these strategies, students can strengthen their reading comprehension skills and become more adept at understanding and analyzing written materials.6. Writing SkillsEffective writing skills are essential for clear communication and academic success. Developing these skills involves practicing various types of writing, including narrative, descriptive, expository, and persuasive writing. Additionally, understanding the writing process, including planning, drafting, revising, and editing, is important for producing coherent and compelling written work. Here are some key elements of writing skills:- Organization: Writing with a clear and logical structure helps readers follow the flow of ideas and understand the writer's message.- Clarity and coherence: Using precise language and cohesive devices, such as transitions and logical connections, enhances the clarity and coherence of writing.- Audience and purpose: Considering the needs and expectations of the audience and the intended purpose of the writing helps writers tailor their language and content to achieve their goals.- Voice and tone: Developing a distinctive voice and appropriate tone for different writing contexts can engage readers and convey the writer's attitude and perspective.- Grammar and mechanics: Demonstrating command of grammar, punctuation, spelling, and sentence structure is essential for producing polished and professional written work.Students can improve their writing skills by practicing different types of writing, seeking feedback, and revising and editing their work to produce high-quality compositions.7. Literary AnalysisLiterary analysis involves examining and interpreting literary works, such as novels, short stories, poems, and plays. It requires an understanding of literary elements, including plot, setting, characters, theme, point of view, and symbolism, as well as an appreciation of literary devices, such as metaphor, imagery, irony, and foreshadowing. Here are some strategies for conducting literary analysis:- Close reading: Analyzing the text carefully, paying attention to details, language, and structure, can reveal deeper meanings and insights.- Considering context: Understanding the historical, cultural, and social context of a literary work can shed light on its themes, characters, and messages.- Identifying literary devices: Recognizing and examining the use of literary devices and techniques can help readers appreciate the artistry and craft of the writing.- Formulating interpretations: Developing and supporting interpretations of literary works through evidence from the text and critical analysis can deepen understanding and appreciation.Literary analysis fosters critical thinking, creativity, and a deeper understanding of the human experience, and it enhances students' ability to engage with and appreciate a wide range of literary works.8. Research SkillsConducting research is an essential component of academic and professional writing. Research skills involve identifying and evaluating sources, synthesizing information, andaccurately citing and crediting the work of others. Here are some key skills involved in conducting research:- Selecting sources: Differentiating between reputable, credible sources and unreliable sources is essential for gathering accurate and reliable information.- Evaluating sources: Assessing the credibility, reliability, and relevance of sources helps ensure that the information used is valid and trustworthy.- Synthesizing information: Analyzing and synthesizing information from multiple sources allows researchers to build a coherent and well-supported argument or analysis.- Citing sources: Accurately citing and referencing sources using a specific citation style, such as APA, MLA, or Chicago, is essential for avoiding plagiarism and giving credit to the original authors.Developing research skills equips students with the ability to gather, evaluate, and use information effectively, and it fosters critical thinking and intellectual curiosity.In conclusion, these are some of the key knowledge points in high school English. By understanding and mastering these points, students can enhance their language skills, communication abilities, critical thinking, and academic achievement.。

Learning Spatially Localized Parts-Based Representation

Learning Spatially Localized,Parts-Based Representation Stan Z.Li,XinWen Hou,HongJiang Zhang,QianSheng Cheng.Microsoft Research China,Beijing Sigma Center,Beijing100080,ChinaInstitute of Mathematical Sciences,Peking University,Beijing,ChinaContact:szli@,/szliAbstractIn this paper,we propose a novel method,called local non-negative matrix factorization(LNMF),for learning spa-tially localized,parts-based subspace representation of vi-sual patterns.An objective function is deﬁned to impose lo-calization constraint,in addition to the non-negativity con-straint in the standard NMF[1].This gives a set of bases which not only allows a non-subtractive(part-based)repre-sentation of images but also manifests localized features. An algorithm is presented for the learning of such basis components.Experimental results are presented to compare LNMF with the NMF and PCA methods for face represen-tation and recognition,which demonstrates advantages of LNMF.1IntroductionSubspace analysis helps to reveal low dimensional struc-tures of patterns observed in high dimensional spaces.A speciﬁc pattern of interest can reside in a low dimensional sub-manifold in the original input data space of an unnec-essarily high dimensionality.Consider the case ofimage pixels,each taking a value in;there is a huge number of possible conﬁgurations:.This space is capable of describing a wide variety of patterns or visual object classes.However,for a speciﬁc pattern,such as the human face,the number of admissible conﬁgurations is a only tiny fraction of that.In other words,the intrinsic dimension is much lower than.An observation can be considered as a consequence of linear or nonlinear fusion of a small number of hidden or latent variables.Subspace analysis is aimed to derive a rep-resentation that results in such a fusion.In fact,the essence of feature extraction in pattern analysis can be considered as discovering and computing intrinsic low dimension of the pattern from the observation.For these reasons,subspace analysis has been a major research issue in appearance based imaging and vision,such The work presented in the paper was carried out at Microsoft Re-search,China.as object detection and recognition[2,3,4,5,6,7].The signiﬁcance of is twofold:effective characterization of the pattern and dimension reduction.One approach for learning a subspace representation fora class of image patterns involves deriving a set of basiscomponents for construction of the subspace.The eige-niamge method[2,3,4]uses principal component analy-sis(PCA)[8]performed on a set of representative training data to decorrelate second order moments corresponding to low frequency property.Any image can be represented as a linear combination of these bases.Dimension reduction is achieved by discarding least signiﬁcant components.Due to the holistic nature of the method,the resulting components are global interpretations,and thus PCA is unable to extract basis components manifesting localized features.However,in many applications,localized features offer advantages in object recognition,including stability to lo-cal deformations,lighting variations,and partial occlusion.Several methods have been proposed recently for localized (spatially),parts-based(non-subtractive)feature extraction.Local feature analysis(LFA)[9],also based on second order statistics,is a method for extracting,from the holis-tic(global)PCA basis,local topographic representation in terms of local features.Independent component analysis [10,11]is a linear non-orthogonal transform.It yields a representation in which unknown linear mixtures of multi-dimensional random variables are made as statistically in-dependent as possible.It not only decorrelates the second order statistics but also reduces higher-order statistical de-pendencies.It is found that independent component of nat-ural scenes are localized edge-likeﬁlters[12].The projection coefﬁcients for the linear combinations in the above methods can be either positive or negative,and such linear combinations generally involve complex cancel-lations between positive and negative numbers.Therefore, these representations lack the intuitive meaning of adding parts to form a whole.Non-negative matrix factorization(NMF)[1]imposes the non-negativity constraints in learning basis images.The pixel values of resulting basis images,as well as coefﬁ-cients for reconstruction,are all non-negative.This way, 1only non-subtractive combinations are allowed.This en-sures that the components are combined to form a whole in the non-subtractive way.For this reason,NMF is consid-ered as a procedure for learning a parts-based representa-tion[1].However,the additive parts learned by NMF are not necessarily localized,and moreover,we found that the original NMF representation yields low recognition accu-racy,as will be shown.In this paper,we propose a novel subspace method, called local non-negative matrix factorization(LNMF),for learning spatially localized,parts-based representation of visual patterns.Inspired by the original NMF[1],the aim of this work to impose the locality of features in basis com-ponents and to make the representation suitable for tasks where feature localization is important.An objective func-tion is deﬁned to impose the localization constraint,in addi-tion to the non-negativity constraint of[1].A procedure is presented to optimize the objective to learn truly localized, parts-based components.A proof of the convergence of the algorithm is provided.The rest of the paper is organized as follows:Section2 introduces NMF in contrast to PCA.This is followed by the formulation of LNMF.A LNMF learning procedure is presented and its convergence proved.Section3presents experimental results illustrating properties of LNMF and its performance in face recognition as compared to PCA and NMF.2Constrained Non-Negative Matrix FactorizationLet a set of training images be given as an matrix,with each column consisting of the non-negative pixel values of an image.Denote a set of basis images by an matrix.Each image can be represented as a linear combination of the basis images using the approximate factorization(1)where is the matrix of coefﬁcients or weights. Dimension reduction is achieved when.The PCA factorization requires that the basis images (columns of be orthonormal and the rows of be mu-tually orthogonal.It imposes no other constraints than the orthogonality,and hence allows the entries of and to be of arbitrary sign.Many basis images,or eigenfaces in the case of face recognition,lack intuitive meaning,and a linear combination of the bases generally involves complex cancellations between positive and negative numbers.The NMF and LNMF representations allow only positive coef-ﬁcients and thus non-subtractive combinations.2.1NMFNMF imposes the non-negativity constraints instead of the orthogonality.As the consequence,the entries of and are all non-negative,and hence only non-subtractive combi-nations are allowed.This is believed to be compatible to the intuitive notion of combining parts to form a whole,and is how NMF learns a parts-based representation[1].It is alsoconsistent with the physiological fact that theﬁring rate are non-negative.NMF uses the divergence of from,deﬁned as(2)as the measure of cost for factorizing into.An NMF factorization is deﬁned as(3)where means that all entries of and are non-negative.reduces to Kullback-Leibler di-vergence when.The above optimization can be done by using multiplicative updaterules[13],for which a matlab program is available at under the“Compu-tational Neuroscience”discussion category.2.2LNMFThe NMF model deﬁned by the constrained minimization of(2)does not impose any constraints on the spatial locality and therefore minimizing the objective function can hardly yield a factorization which reveals local features in the data .Letting,,both being,LNMF is aimed at learning local features by imposing the following three additional constraints on the NMF basis:1.A basis component should not be further decomposedinto more components,so as to minimize the num-ber of basis components required to represent.Letbe a basis vector.Given the existing constraints for all,we wish thatshould be as small as possible so that contains asmany non-zero elements as possible.This can be im-posed by.2.Different bases should be as orthogonal as possible,so as to minimize redundancy between different bases.This can be imposed by.23.Only components giving most important informationshould be retained.Given that every image inhas been normalized into a certain range such as,the total“activity”on each retained com-ponent,deﬁned as the total squared projection coef-ﬁcients summed over all training images,should bemaximized.This is imposed by.The incorporated of the above constraints leads the fol-lowing constrained divergence as the objective function for LNMF:(4)where are some constants.A LNMF factorization is deﬁned as a solution to the constrained minimization of(5).A local solution to the above constrained minimization can be found by using the following three step update rules:(5)(6)(7) 2.3Convergence ProofThe learning algorithm(5)-(7)alternates between updating and updating,which is derived based on a technique in which an objective function is minimized by using an auxiliary function.is said to be an auxiliary function for if andare satisﬁed.If is an auxiliary function, then is non-increasing when is updated using[14].This is because.Updating:is updated by minimizingwithﬁxed.An auxiliary function is con-structed for as(8)It is easy to verify.The following proves.Because is a convex function,the following holds for all and:(9)Let(10)Then(11)which is.To minimize w.r.t.,we can update using(12)Such an can be found by letting for all.Because(13)weﬁnd(14)There exists such that(15)where(16)and is a function of and.The result we want to derive from the LNMF learning is the basis,and itself is not so important.Becausewill be normalized by Eq.7,the normalized is regardless of the value as long as.Therefore,we simply replace Eq.(15)by(5).3Updating:is updated by minimizingwithﬁxed.The auxiliary function foris(17)We can prove andlikewise.By letting,weﬁnd(18)Because and is an approximately orthog-onal basis,and,there must be.Therefore,we can always set the ratio to be not too large(e.g.)so that and thus.Therefore,we have Eq.(6).From the above analysis,we conclude that the three step update rules(5)-(7)results in a sequence of non-increasing values of,and hence converges to a local mini-mum of it.2.4Face Recognition in SubspaceFace recognition in the PCA,NMF or LNMF linear sub-space is performed as follows:1.Feature extraction.Let be the mean of training im-ages.Each training face image is projected into thelinear space as a feature vectorwhich is then used as a prototype feature point.Aquery face image to be classiﬁed is represented byits projection into the space as.2.Nearest neighbor classiﬁcation.The Euclidean dis-tance between the query and each prototype,,is calculated.The query is classiﬁed to the class towhich the closest prototype belongs.3Experiments3.1Data PreparationThe Cambridge ORL face database is used for deriving PCA,NMF and LNMF bases.There are400images ()of40persons,10images per person(Fig.1shows the10images of one person).The images are taken at different times,varying lighting slightly,facial expressions (open/closed eyes,smiling/non-smiling)and facial details (glasses/no-glasses).All the images are taken against a dark homogeneous background.The faces are in up-right posi-tion of frontal view,with slight left-right out-of-plane rota-tion.Each image is linearly stretched to the full range of pixel values of[0,255].Figure1:Face examples from ORL database.The set of the10images for each person is randomly partitioned into a training subset of5images and a test set of the other5.The training set is then used to learn basis components,and the test set for evaluate.All the compared methods take the same training and test data.3.2Learning Basis ComponentsLNMF,NMF and PCA representations withbasis components are com-puted from the training set.The matlab package from is used for NMF. NMF converges about5-times faster than LNMF.Fig.2 shows the resulting LNMF and NMF components for subspaces of dimensions25,49and81.Higher pixel values are in in darker color;the components in each LNMF basis set have been ordered(left-right then top-down) according to the signiﬁcance value.The NMF bases are as holistic as the PCA basis(eigenfaces)for the training set.We notice the result presented in[1]does not appear so,perhaps because the faces used for producing that result are well aligned.The LNMF procedure learns basis components which not only lead to non-subtractive representations,but also manifest localized features and thus truly parts-based representations.Also,we see that as the dimension(number of components)increases,the features formed in the LNMF components become more localized.4Figure2:LNMF(left)and NMF(right)bases of dimen-sions25(row1),49(row2)and81(row3).Every basis component is of size and the displayed images are re-sized toﬁt the paper format.The LNMF representation is both parts-based and local,whereas NMF is parts-based but holistic.3.3ReconstructionFig.3shows reconstructions in the LNMF,NMF and PCA subspaces of various dimensions for a face image in the test set which corresponds to the one in the middle of row1 of Fig.1.As the dimension is increased,more details are recovered.We see that while NMF and PCA reconstruc-tions look similar in terms of the smoothness and texture of the reconstructed images,with PCA presenting better reconstruction quality than NMF.Surprisingly the LNMF representation,which is based on more localized features, provides smoother reconstructions than NMF and PCA. 3.4Face RecognitionThe LNMF,NMF and PCA representations are compara-tively evaluated for face recognition using the images from the test set.The recognition accuracy,deﬁned as the per-centage of correctly recognized faces,is used as the per-formance measure.Tests are done with varying numberofFigure3:Reconstructions of the face image in the(left to right)25,49,81and121dimensional(top-down)LNMF, NMF,and PCAsubspaces.Figure4:Examples of random occluding patches of sizes (from left to right)10x10,20x20,...,50x50,60x60.basis components,with or without occlusion.The occlu-sion is simulated in an image by using a white patch of size with at a random location;see Fig.4for examples.Figs.5and6show recognition accuracy curves under various conditions.Fig.5compares the three representa-tions in terms of the recognition accuracies versus the num-ber of basis components for.The LNMF yields the best recognition accuracy,slightly better than PCA whereas the original NMF gives very low accuracy.Fig.6compares the three representations un-der varying degrees of occlusion and with varying num-ber of basis components,in terms of the recognition ac-curacies versus the size of occluding patch for.As we see,although PCA yields more favorable results than LNMF when the patch size is small, the better stability of the LNMF representation under partial occlusion becomes clear as the patch size increases.4ConclusionIn this paper,we have proposed a new method,local non-negative matrix factorization(LNMF),for learning spatially5Figure5:Recognition accuracies as function of the num-ber(in5x5,6x6,...,11x11)of basis components used,for the LNMF(solid)and NMF(dashed)and PCA(dot-dashed) representations.Figure6:Recognition accuracies versus the size(in10x10, 20x20,...,60x60)of occluding patches,with25,49,81,121 basis components(left-right,then top-down),for the LNMF (solid)and NMF(dashed)and PCA(dot-dashed)represen-tations.localized,part-based subspace representation of visual pat-terns.The work is aimed to learn localized of features in NMF basis components suitable for tasks such as face recognition.An algorithms is presented for the learning and its convergence proved.Experimental results have shown that we have achieved our objectives:LNMF derives bases which are better suited for a localized representation than PCA and NMF,and leads to better recognition results than the existing methods.The LNMF and NMF learning algorithms are local min-imizers.They give different basis components from differ-ent initial conditions.We will investigate how this affects the recognition rate.Further future work includes the fol-lowing topics.Theﬁrst is to develop algorithms for faster convergence and better solution in terms of minimizing theobjective function.The second is to investigate the abil-ity of the model to generalize,i.e.how the constraints,the non-negativity and others,are satisﬁed for data not seen in the training set.The third is to compare with other methods for learning spatially localized features such as LFA[9]and ICA[12].References[1] D.D.Lee and H.S.Seung,““Learning the parts of objectsby non-negative matrix factorization”,”Nature,vol.401,pp.788–791,1999.[2]L.Sirovich and M.Kirby,““Low-dimensional procedurefor the characterization of human faces”,”Journal of theOptical Society of America A,vol.4,no.3,pp.519–524,March1987.[3]M.Kirby and L.Sirovich,““Application of the Karhunen-Loeve procedure for the characterization of human faces”,”IEEE Transactions on Pattern Analysis and Machine Intelli-gence,vol.12,no.1,pp.103–108,January1990.[4]Matthew A.Turk and Alex P.Pentland,““Face recognitionusing eigenfaces.”,”in Proceedings of IEEE Computer Soci-ety Conference on Computer Vision and Pattern Recognition,Hawaii,June1991,pp.586–591.[5]David Beymer,Amnon Shashua,and Tomaso Poggio,““Ex-ample based image analysis and synthesis”,” A.I.Memo1431,MIT,1993.[6] A.P.Pentland,B.Moghaddam,and T.Starner,““View-based and modular eigenspaces for face recognition”,”inProceedings of IEEE Computer Society Conference on Com-puter Vision and Pattern Recognition,1994,pp.84–91.[7]H.Murase and S.K.Nayar,““Visual learning and recogni-tion of3-D objects from appearance”,”International Journalof Computer Vision,vol.14,pp.5–24,1995.[8]K.Fukunaga,Introduction to statistical pattern recognition,Academic Press,Boston,2edition,1990.[9]P.Penev and J.Atick,““Local feature analysis:A generalstatistical theory for object representation”,”Neural Systems,vol.7,no.3,pp.477–500,1996.[10] C.Jutten and J.Herault,““Blind separation of sources,partI:An adaptive algorithm based on neuromimetic architec-ture”,”Signal Processing,vol.24,pp.1–10,1991.[11]on,““Independent component analysis-a new con-cept?”,”Signal Processing,vol.36,pp.287–314,1994.[12] A.J.Bell and T.J.Sejnowski,““The‘independent compo-nents’of natural scenes are edgeﬁlters”,”Vision Research,vol.37,pp.3327–3338,1997.[13] D.D.Lee and H.S.Seung,““Algorithms for non-negativematrix factorization”,”in Proceedings of Neural InformationProcessing Systems,2000.[14] A.P.Dempster,ird,and D.B.Bubin,““Maxi-mum likelihood from imcomplete data via EM algorithm”,”Journal of the Royal Statistical Society,Series B,vol.39,pp.1–38,1977.6。

综述Representation learning a review and new perspectives

explanatory factors for the observed input. A good representation is also one that is useful as input to a supervised predictor. Among the various ways of learning representations, this paper focuses on deep learning methods: those that are formed by the composition of multiple non-linear transformations, with the goal of yielding more abstract – and ultimately more useful – representations. Here we survey this rapidly developing area with special emphasis on recent progress. We consider some of the fundamental questions that have been driving research in this area. Speciﬁcally, what makes one representation better than another? Given an example, how should we compute its representation, i.e. perform feature extraction? Also, what are appropriate objectives for learning good representations?

读书的方法英语作文

读书的方法英语作文Title: Effective Methods for Effective Reading。

Reading is an essential skill that contributes significantly to one's intellectual growth and academic success. However, mastering the art of reading requires more than just going through the motions. It involves employing effective methods and strategies to comprehend, analyze, and retain information. In this essay, we will explore various techniques for enhancing reading skills.1. Active Reading: Active reading involves engaging with the text actively rather than passively absorbing information. This can be achieved through techniques such as highlighting key points, taking notes, and asking questions while reading. By actively interacting with the material, readers can better comprehend and remember the content.2. Skimming and Scanning: Skimming and scanning areuseful techniques for quickly locating specific information within a text. Skimming involves rapidly reading through the text to get a general idea of its content, while scanning involves looking for particular keywords or phrases. These techniques are particularly helpful when trying to locate specific information within lengthy texts or documents.3. Previewing: Before diving into a text, it can be beneficial to preview it by examining the title, headings, subheadings, and any visual aids such as graphs or illustrations. Previewing provides readers with a roadmap of the text's structure and main ideas, making it easier to understand the material as they read.4. Chunking: Chunking involves breaking down large pieces of information into smaller, more manageable chunks. This can be done by dividing the text into sections or paragraphs and focusing on one chunk at a time. By breaking the material into smaller parts, readers can better digest and comprehend the information.5. Summarizing: Summarizing involves distilling themain ideas and key points of a text into concise summaries. This can be done either during or after reading and helps reinforce understanding by requiring readers to process and synthesize the information in their own words. Summarizingis an effective way to reinforce learning and retention.6. Visualization: Visualization involves creatingmental images or representations of the material being read. This can help readers better understand abstract conceptsor complex information by making it more concrete and tangible. Visualization can be particularly useful when reading descriptive or narrative texts.7. Active Recall: Active recall involves actively retrieving information from memory rather than simply re-reading the text. This can be done through techniques such as self-quizzing or summarizing the material without referring back to the text. Active recall strengthens memory and comprehension by forcing readers to actively engage with the material.8. Reviewing: Reviewing involves periodicallyrevisiting and reviewing previously read material to reinforce learning and retention. This can be done through techniques such as spaced repetition, where material is revisited at increasing intervals over time. Reviewing is essential for long-term retention and mastery of the material.In conclusion, effective reading requires the application of various techniques and strategies to comprehend, analyze, and retain information. By incorporating active reading, skimming and scanning, previewing, chunking, summarizing, visualization, active recall, and reviewing into their reading routine, readers can enhance their reading skills and achieve greater success in their academic and professional endeavors.。

高一人工智能英语阅读理解25题

高一人工智能英语阅读理解25题1<背景文章>Artificial intelligence (AI) has become one of the most talked - about topics in recent years. AI can be defined as the simulation of human intelligence processes by machines, especially computer systems. These processes include learning, reasoning, problem - solving, perception, and language understanding.The development of AI has a long history. It started in the 1950s when the concept was first introduced. In the early days, AI research focused on simple tasks like playing games and solving basic mathematical problems. However, with the development of computer technology and the increase in data availability, AI has made great strides.AI has found applications in various fields. In the medical field, AI can assist doctors in diagnosing diseases. For example, it can analyze medical images such as X - rays and MRIs to detect early signs of diseases that might be missed by human eyes. In education, AI - powered tutoring systems can provide personalized learning experiences for students. They can adapt to the individual learning pace and style of each student, helping them to better understand difficult concepts. In the transportation industry, self - driving cars, which are a significant application of AI, are expectedto revolutionize the way we travel. They can potentially reduce traffic accidents caused by human error and improve traffic efficiency.However, AI also brings some potential negative impacts. One concern is the impact on employment. As AI systems can perform many tasks that were previously done by humans, there is a fear that many jobs will be lost. For example, jobs in manufacturing, customer service, and some administrative tasks may be at risk. Another issue is the ethical considerations. For instance, how should AI make decisions in life - or - death situations? And there are also concerns about data privacy as AI systems rely on large amounts of data.1. <问题1>What is the main idea of this passage?A. To introduce the development of computer technology.B. To discuss the applications and impacts of artificial intelligence.C. To explain how to solve problems in different fields.D. To show the importance of data in AI systems.答案：B。

基于AHK中德职教项目构建“六步三环”学习模式的实践探索

基于AHK中德职教项目构建“六步三环”学习模式的实践探索*孙跃岗，陈世鹏，陈紫晗（集美工业学校，福建厦门 361022）【摘要】汲取学习德国双元制课程体系的优势，将德国职业标准和职业资格认证内容引入教学实际中，将课程体系进行本地化的改造提升。

以专业核心课程《机电设备安装与调试》为例进行实践探索，以项目为载体、任务为驱动、工作过程为导向开展教学研究，构建行动领域课程体系；依托“六步三环”组织教学，培养新时代融合性技术人才，服务本地高端制造产业。

关键词：行动领域；AHK；三教改革中图分类号：G712 文献标识码：BDOI：10.13596/ki.44-1542/th.2024.03.032Practical Exploration of Constructing a "Six Steps Three Rings"Learning Model Based on AHK Sino GermanVocational Education ProjectSun Yuegang，Chen Shipeng，Chen Zihan（Jimei Industrial School, Xiamen, Fujian 361022, CHN）【Abstract】Drawing on the advantages of the German dual curriculum system, the German voca⁃tional standards and vocational qualification certification are introduced into the teaching practice, and the curriculum system is "localized" to transform and upgrade. The author takes the profes⁃sional core course "Mechanical and Electrical Equipment Installation and Commissioning" as an ex⁃ample for practical exploration, and carries out teaching and research with the project as the carrier, the task as the driver, and the work process as the guidance, and constructs the curriculum system in the field of action. Relying on the "six steps and three rings" to organize teaching, cultivate in⁃tegrated technical talents in the new era, and serve the local high-end manufacturing industry. Key words:areas of action；AHK；three education reform1引言我国从20世纪80年代就开始考察学习德国职业教育，在多年的探索和实践中，真正的德国双元制并没有在国内遍地开花结果。

一种融合主题特征的自适应知识表示方法

第47卷第1期Vol.47No.1计算机工程Computer Engineering2021年1月January2021一种融合主题特征的自适应知识表示方法陈文杰（中国科学院成都文献情报中心，成都610041）摘要：基于翻译的表示学习模型TransE被提出后,研究者提出一系列模型对其进行改进和补充,如TransH、TransG、TransR等。

然而,这类模型往往孤立学习三元组信息,忽略了实体和关系相关的描述文本和类别信息。

基于主题特征构建TransATopic模型,在学习三元组的同时融合关系中的描述文本信息,以增强知识图谱的表示效果。

采用基于主题模型和变分自编器的关系向量构建方法,根据关系上的主题分布信息将同一关系表示为不同的实值向量,同时将损失函数中的距离度量由欧式距离改进为马氏距离,从而实现向量不同维权重的自适应赋值。

实验结果表明,在应用于链路预测和三元组分类等任务时,TransATopic模型的MeanRank、HITS@5和HITS@10指标较TransE模型均有显著改进。

关键词：知识图谱；表示学习；主题模型；变分自编码器；马氏距离开放科学（资源服务）标志码（OSID）：中文引用格式：陈文杰.一种融合主题特征的自适应知识表示方法［J］.计算机工程，2021，47（1）：87-93，100.英文引用格式：CHEN Wenjie.An adaptive approach for knowledge representation fused with topic feature［J］.Computer Engineering，2021，47（1）：87-93，100.An Adaptive Approach for Knowledge Representation Fused with Topic FeatureCHEN Wenjie（Chengdu Library and Information Center，Chinese Academy of Science，Chengdu610041，China）【Abstract】Since the emergence of the translation-based representation learning model，TransE，a series of models such as TransH，TransG and TransR have been proposed to improve and add functions to TransE.However，such models tend to learn triplet information in isolation，and ignore the descriptive text and category information related to entities and relations.Therefore，this paper fuses descriptive text information of relations while learning triples，and constructs the TransATopic model based on topic features to enhance the representation effect of the knowledge graph.The relation vector construction method based on the topic model and Variational Autoencoder（VAE）is used to map one relation to different real-valued vectors according to topic distribution information of relations.At the same time，the distance metric in the loss function is improved from Euclidean distance to a more flexible Mahalanobis distance，which realizes the adaptive assignment of vector weights in different dimensions.Experimental results show that when applied to link prediction and triple classification tasks，TransATopic’s indicators including MeanRank，HITS@5and HITS@10are significantly improved compared with the TransE model.【Key words】knowledge graph；representation learning；topic model；Variational Autoencoder（VAE）；Mahalanobis distanceDOI：10.19678/j.issn.1000-3428.00566880概述知识图谱是由三元组构成的结构化语义知识库，其以符号的形式描述现实世界中实体和实体间的连接关系。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

Journal of Machine Learning Research7(2006)2369-2397Submitted2/05;Revised10/06;Published11/06Learning Parts-Based Representations of DataDavid A.Ross DROSS@ Richard S.Zemel ZEMEL@ Department of Computer ScienceUniversity of Toronto6King’s College RoadToronto,Ontario M5S3H5,CANADAEditor:Pietro PeronaAbstractMany perceptual models and theories hinge on treating objects as a collection of constituent parts.When applying these approaches to data,a fundamental problem arises:how can we determine what are the parts?We attack this problem using learning,proposing a form of generative latent factor model,in which each data dimension is allowed to select a different factor or part as its expla-nation.This approach permits a range of variations that posit different models for the appearance ofa part.Here we provide the details for two such models:a discrete and a continuous one.Further,we show that this latent factor model can be extended hierarchically to account for correlations between the appearances of different parts.This permits modeling of data consisting of multiple categories,and learning these categories simultaneously with the parts when they are unobserved.Experiments demonstrate the ability to learn parts-based representations,and categories,of facial images and user-preference data.Keywords:parts,unsupervised learning,latent factor models,collaborativeﬁltering,hierarchical learning1.IntroductionMany collections of data exhibit a common underlying structure:they consist of a number of parts or factors,each with a range of possible states.When data are represented as vectors,parts manifest themselves as subsets of the data dimensions that take on values in a coordinated fashion.In the domain of digital images,these parts may correspond to the intuitive notion of the component parts of objects,such as the arms,legs,torso,and head of the human body.Prominent theories of compu-tational vision,such as Biederman’s Recognition-by-Components(Biederman,1987)advocate the suitability of a parts-based approach for recognition in both humans and machines.Recognizing an object byﬁrst recognizing its constituent parts,then validating their geometric conﬁguration has several advantages:1.Highly articulate objects,such as the human body,are able to appear in a wide range ofconﬁgurations.It would be difﬁcult to learn a holistic model capturing all of these variants.2.Objects which are partially occluded can be identiﬁed as long as some of their parts arevisible.R OSS AND Z EMEL3.The appearances of certain parts may vary less under a change in pose than the appearance ofthe whole object.This can result in detectors which,for example,are more robust to rotations of the target object.4.New examples from an object class may be recognized as simply a novel combination offamiliar parts.For example a parts-based face detection system could generalize to detect faces with both beards and sunglasses,having been trained only on faces containing one,but not both,of these features.The principal difﬁculty in creating such systems is determining which parts should be used,and identifying examples of these parts in the training data.In the part-based detectors created by Mohan et al.(2001)and Heisele et al.(2000)parts were chosen by the experimenters based on intuition,and the component-detectors—support vec-tor machines—were trained on image subwindows containing only the part in question.Obtaining these subwindows required that they be manually extracted from hundreds or thousands of train-ing images.In contrast,the parts-based detector created by Weber et al.(2000)proposed a way to automate this process.During training of the geometric model,parts were selected from an initial set of candidates to include only those which lead to the highest detection performance.The re-sulting detector relied on a very small number of parts(e.g.,3)corresponding to very small local features.Unlike the SVMs,which were trained on a range of appearances of the part,each of these part-detectors could identify only a singleﬁxed appearance.Parts-based representations of data can also be learned in an entirely unsupervised fashion. These parts can be used for subsequent supervised learning,but the models constructed can also be valuable on their own.A parts-based model provides an efﬁcient,distributed representation,and can aid in the discovery of causal structure within data.For example,a model consisting of K parts, each with J possible states,can represent J K different objects.Inferring the state of each part can be done efﬁciently,as each part depends only on a fraction of the data dimensions.These computational considerations make parts-based models particularly suitable for modeling high-dimensional data such as user preferences in collaborativeﬁltering problems.In this setting, each data vector contains ratings given by a human subject to a number of items,such as books or movies.Typically there are thousands of unique items,but for each user we can only observe ratings for a small fraction of them.The goal in collaborativeﬁltering is to make accurate predictions of the unobserved ratings.Parts can be formed from groups of related items,and the states of a part correspond to different attitudes towards the items.Unsupervised learning of a parts-based model allows us to learn the relationships between items,which allows for efﬁcient online and active learning.Here we propose a probabilistic generative approach to learning parts-based representations of high-dimensional data.Our key assumption is that the dimensions of the data can be separated into several disjoint subsets,or factors,which take on values independently of each other.Each factor has a range of possible states or appearances,and we investigate two ways of modeling this variability.First we address the case in which each factor has a small number of discrete states,and model it using a vector quantizer.In some situations,however,continually-varying descriptions of parts are more suitable.Thus,in our second approach we model part-appearances using factor analysis.Given a set of training examples,our approach learns the association of data dimensions with factors,as well as the states of each factor.Inference and learning are carried out efﬁciently via variational algorithms.The general approach,as well as details for the models,are given inL EARNING P ARTS-B ASED R EPRESENTATIONS OF D ATASection2.Experiments showing parts-based representations learned for real data sets follow in Section3.Although we initially assume that parts take on states independently,clearly in real-world situ-ations there are dependencies.For example,consider the case of modeling images of human faces.A model could be constructed representing the various parts of the face(eyes,nose,mouth,etc.), and the various appearances of each part.If one part were to select an appearance with high pixel intensities,due to lighting conditions or skin tone,then it seems likely that the other parts should appear similarly bright.Realizing this,in Section4we propose a method of learning these depen-dencies between part selections hierarchically,by introducing an additional higher-level cause,or ‘class’variable,on which state selections for the factors are conditioned.This allows us to model different categories of data using the same vocabulary of parts,and to induce the categories when they are not available in advance.We conclude with a comparison to related methods,and aﬁnal discussion in Sections5and6.2.An Approach to Learning Parts-Based ModelsWe approach the problem of learning parts by posing it as a stochastic generative model.We assume that there are K factors,each a probabilistic model for the range of appearances of one of the parts. To generate an observed data vector of D dimensions,x∈ℜD,we stochastically select one state for each factor,and one factor for each data dimension,x d.Theﬁrst selection allows each part to independently choose its appearance,while the second dictates how the parts combine to produce the observed data vector.This approach differs from a traditional mixture model in that each dimension of the data is gen-erated by a different linear combination of the factors.Rather than clustering the data vectors based on which mixture component gives the highest probability,we are grouping the data dimensions based on which part provides the best explanation.The selection of factors for each dimension are represented as binary latent variables,R={r dk}, for d=1...D,k=1...K.The variable r dk=1if and only if factor k has been selected for data dimension d.These variables can be described equivalently as multinomials,r d∈1...K,and are drawn according to their respective prior distributions,P(r dk)=a dk.The choice of state for each factor is also a latent variable,which we will represent by s ingθk to represent the parameters of the k th factor,we arrive at the following complete likelihood function:P(x,R,S|θ)=∏d,k (a dk r dk)∏k(P(s k))∏d,k(P(x d|θk,s k)r dk).(1)This probability model is depicted graphically in Figure1.As described thus far,the approach is independent of the particular choice of model used for each of the factors.We now provide details for two particular choices:a discrete model of factors, vector quantization;and a continuous model,factor analysis.2.1Multiple Cause Vector QuantizationIn Multiple Cause Vector Quantization,ﬁrst proposed in Ross and Zemel(2003),we assume that each part has a small number of appearances,and model them using a vector quantizer(VQ)of J possible states.To generate an observed data example,we stochastically select one state for eachR OSS AND ZEMELFigure 1:Graphical representation of the parts-based learning model.We let r d =1represent all thevariables r d =1,k ,which together select a factor for x 1.Similarly,s k =1selects a state for factor 1.The plates depict repetitions across the D input dimensions and the K factors.To extend this model to multiple data cases,we would include an additional plate over r ,x ,and s .VQ,and,as described above,one VQ for each dimension.Given these selections,a single state from a single VQ determines the value of each data dimension x d .As before,we represent the selections using binary latent variables,S ={s k j },for k =1...K ,j =1...J ,where s k j =1if and only if state j is selected for VQ k .Again we introduce prior selection probabilities P (s k j )=b k j ,with ∑j b k j =1.Assuming each VQ state speciﬁes the mean as well as the standard deviation of a Gaussian distribution,and the noise in the data dimensions is conditionally independent,we have (where θk ={µdk j ,σdk j },and N is the Gaussian pdf)P (x |R ,S ,θ)=∏d ,k ,j N (x d ;µdk j ,σ2dk j )r dk s k j .The resulting model can be thought of as a Gaussian mixture model over J ×K possible states for each data dimension (x d ).The single state k j is selected if s k j r dk =1.Note that this selection has two components.The selection in the j component is made jointly for the different data dimensions,and in the k component it is made independently for each dimension.2.1.1L EARNING AND I NFERENCEThe joint distribution over the observed vector x and the latent variables isP (x ,R ,S |θ)=P (R |θ)P (S |θ)P (x |R ,S ,θ),=∏d ,k a r dk dk ∏k ,j b s k j k j ∏d ,k ,jN (x d ;θk )r dk s k j .Given an input x ,the posterior distribution over the latent variables,P (R ,S |x ,θ),cannot tractably be computed,since all the latent variables become dependent.We apply a variational EM algorithm to learn the parameters θ,and infer latent variables given observations.For a given observation,we could approximate the posterior distribution using a factored distribution,where g and m are variational parameters related to r and s respectively:Q 0(R ,S |x ,θ)=∏d ,k g r dk dk ∏k ,jm s k j k j .(2)L EARNING P ARTS-B ASED R EPRESENTATIONS OF D ATAThe model is learned by optimizing the following objective function(Neal and Hinton,1998), also known as the variational free energy:F(Q0,θ)=E Q[log P(x,R,S|θ)−log Q0(R,S|x,θk)],=E Q0 −∑d,k r dk log(g dk/a dk)−∑k,j s k j log(m k j/b k j)+∑d,k,j r dk s k j log N(x d;θ) ,=−∑d,k g dk logg dka dk−∑k,jm k j logm k jb k j−∑d,k,jg dk m k jεdk j,whereεdk j=logσdk j+(x d−µdk j)22σ2dk j .The objective function F is a lower bound on the log likelihoodof generating the observations,given the particular model parameters.The variational EM algorithm improves this bound by iteratively maximizing F with respect to Q0(E-step)and toθ(M-step).Extending this to handle multiple observations—the columns of X=[x1...x C]—we must now consider approximating the posterior P(R,S|X,θ),where R={r c dk}and S={s c k j}are the latent selections for all training cases c.Our aim is to learn models which have a posterior distribution over factor selections for each data dimension that is consistent across all data(that is to say,regardless of the data case,each x d will typically be associated with the same part or parts).Thus,in the variational posterior we constrain the parameters{g dk}to be the same across all observations x c,c= 1...C.In this general case,the variational approximation to the posterior becomes(cf.Equation(2))Q(R,S|X,θ)=∏c,d,k g dk r c dk ∏c,k,j m c k j s c k j .(3)It is important to point out that this choice of variational approximation is somewhat unorthodox, nonetheless it is perfectly valid and has produced good results in practice.A comparison to more conventional alternatives appears in Appendix A.Under this formulation,only the{m ck j }parameters are updated during the E step for each ob-servation x c:m c k j=b k j exp −∑d g dkεc dk j /J∑ρ=1b kρexp −∑d g dkεc dρk .The M step updates the parameters,µandσ,which relate each latent state k j to each input dimension d,the parameters of Q related to factor selection{g dk},and the priors{a dk}and{b k j}:g dk=a dk exp −1C∑c,j m c k jεc dk j /K∑ρ=1a dρexp −1C∑c,j m c jρεc d jρ ,(4)µdk j=∑c m c k j x c d∑c m c k j,σ2dk j=∑c m c k j(x c d−µdk j)2∑c m c k j,a dk=g dk,b k j=1C∑cm c k j.As can be seen from the update equations,an iteration of EM learning for MCVQ has compu-tational complexity linear in each of C,D,J,and K.R OSS AND Z EMELA variational approximation is just one of a number of possible approaches to performing the intractable inference(E)step in MCVQ.One alternative,known as Monte Carlo EM,is to approxi-mate the posterior with a set of samples{R n,S n}N n=1drawn from the true posterior P(R,S|X,θ) via Gibbs sampling.The M step now becomes a maximization of the approximate likelihood1 N ∑n P(X,R n,S n|θ)with respect to the model parametersθ.Note that the intractable marginal-ization over{r cdk ,s c k j}has been replaced with the less-costly sum over N samples.Details of thisapproach for a related model can be found in Ghahramani(1995),and a general description in Andrieu et al.(2003).2.2Multiple Cause Factor AnalysisIn MCVQ,each factor is modeled as a set of basis vectors,one of which is chosen at a time when generating data.A more general approach is to allow data to be generated by arbitrary linear com-binations of the basis vectors in each factor.This extension(with the appropriate choice of prior distribution)amounts to modeling the range of appearances for each part with a factor analyzer.A factor analysis(FA)model(e.g.,Ghahramani and Hinton,1996)proposes that the data vectors come from a low-dimensional linear subspace,represented by the basis vectors of the factor loading matrix,Λ∈ℜD×J,and an offsetρ∈ℜD from the origin.A data vector is produced by taking a linear combination s∈ℜJ of the columns ofΛ.The linear combination is treated as an unobserved latent variable,with a standard Gaussian prior P(s)=N(s;0,I).To this is added zero-mean Gaussian noise,independent along each dimension.The probability model isP(x,s|θ)=N(x;Λs+ρ,Ψ)N(s;0,I)(5)whereΨis a D×D diagonal covariance matrix.As with MCVQ,we assume that the data contains K parts and model each using a factor analyzer θk=(Λk,ρk,Ψk).A data vector is again generated by stochastically selecting one state s k per factor k,and choosing one factor per data dimension.Under this model factor analyzer k proposes that thevalue of x d has a Gaussian distribution centered atΛkd s k+ρdk with varianceΨk d(whereΛk d indicatesthe d th row of factor loading matrix k,andΨkd the d th entry on the diagonal ofΨk).Thus thelikelihood isP(x|R,S,θ)=∏d,kN(x d;Λk d s k+ρdk,Ψk d)r dk.2.2.1L EARNING AND I NFERENCEAgain this model produces an intractable posterior over latent variables S and R,so we resort to a variational approximation:Q(R,S|X,θ)=∏c,d,k g r c dk dk ∏c,k N(s c k;m c k,Ωck).This differs from the approximation used for MCVQ,Equation(3),in that here we assume the state variables s k have Gaussian posteriors.Thus,in the E step we must now estimate theﬁrst and second moments of the posterior over s k.As before,we also tie the{g dk}parameters to be the same across all data cases.Setting up the objective function and differentiating,gives us the following updatesL EARNING P ARTS-B ASED R EPRESENTATIONS OF D ATAfor the E-step:1m c k=ΩckΛk TΨ−1k((x c−ρk).∗g k),Ωck= Λk T g kΨΛk+I −1, s c k s c k T =Ωck+m c k m c k T,where g kkis a diagonal matrix with entries g dk/Ψk d.Note that the expression forΩck is independent of the index over training cases,c,thus we need only have oneΩk=Ωck,∀c.The M-step involves updating the prior and variational posterior factor-selection parameters, {a dk}and{g dk},as well as the parameters(Λk,ρk,Ψk)of each factor analyzer.At each iteration the prior,a dk=g dk is set to the posterior at the previous iteration.The remaining updates areg dk∝a dk|Ψd|1/2exp −12CΨd∑c (x c d−ρdk)2+Λk d s c k s c k T Λk d T−2(x c d−ρdk)Λk d m c k ,Λk= X−ρk1T M T k ∑c s c k s c k T −1,ρk=1C∑c x c−Λk m c k,Ψk d=1C∑c (x c d−ρdk)2+Λkd s c k s c k T Λk d T−2(x c d−ρdk)Λk d m c k.where(M k is a J×C matrix in which the c th column is m ck).2.3Related AlgorithmsHere we present the details of two related algorithms,principal components analysis(PCA),and non-negative matrix factorization(NMF).A more detailed comparison of these algorithms with MCVQ and MCFA appears below,in Section5.The goal of PCA is to learn a factorization of the data matrix X into the product of a coefﬁcient matrix S and an orthogonal basisΛso that X≈ΛS.TypicallyΛhas fewer columns than rows,so S can be thought of as a reduced-dimensionality approximation of X,andΛas the key features of the ing a squared-error cost function,the optimal solution is to letΛbe the top eigenvectors ofthe sample covariance matrix,1C−1XX T,and let S=ΛT X.PCA can also be posed as a probabilistic generative model,closely related to factor analysis (Roweis,1997;Tipping and Bishop,1999).In fact,probabilistic PCA proposes the same generative model,Equation(5),except that the noise covariance,Ψ,is restricted to be a scalar times the identity matrix:Ψ=σ2I.Non-negative matrix factorization(Lee and Seung,1999)also learns a factorization X≈WH of the data matrix,but includes the additional restriction that X,W,and H must be entirely non-negative.By allowing only additive combinations of a non-negative basis,Lee&Seung propose to obtain basis vectors that correspond to the intuitive parts of the data.Instead of squared error, NMF seeks to minimize the divergence D(X WH)=∑d,c x c d log(WH)dc−(WH)dc which is the 1.The second uncentered moment s c k s c k T need not be computed explicitly,since it can be expressed as a combinationof theﬁrst and second centered moments,m k andΩck.It is,however,a useful subexpression for computing the M-step updates and likelihood bound.R OSS AND Z EMELnegative log-probability of the data,assuming a Poisson density function with mean WH.A local minimum of the divergence is obtained by iterating the following multiplicative updates:w d j←w d j∑cx cd(WH)dch jc,w d j←w d j∑d w d j,h jc←h jc∑dw d jx cd(WH)dc.Despite the probabilistic interpretation,X∼Poisson(WH),NMF is not a proper probabilistic generative model,since it does not specify a prior distribution over the latent variable H.Thus NMF does not specify how new data could be generated from a learned basis.Like the above methods,MCVQ and MCFA can be viewed as matrix factorization methods, where the left-hand matrix is formed from the{g dk}and the collection of VQ/FA basis vectors,while the right-hand matrix is comprised of the latent variables{m ck j }.This view highlights the keycontrast between the assumptions about the data embodied in these earlier models,as opposed to the proposed model.Here the data is viewed as a concatenation of components or parts,corresponding to particular subsets of data dimensions,each of which is modeled as a convex combination of appearances.3.ExperimentsIn this section we examine the ability of MCVQ and MCFA to learn parts-based representations of data from two different problem domains.We begin by modeling sets of digital images,in this case images of human faces.The parts learned areﬁxed subsets of the data dimensions(pixels),corresponding toﬁxed regions of the images,that closely resemble intuitive notions of the parts of faces.The ability to learn parts is robust to partial occlusions in the training images.The application of MCVQ and MCFA to image data assumes that the images are normalized, that is,that the head is in a similar pose in each image,and aligned with respect to position and scale.This constraint is standard for learning methods that attempt to learn visual features beyond low level edges and corners,though,if desired,the model could be extended to perform automatic alignment of images(Jojic and Caspi,2004).While normalization may require a preprocessing step for image applications,in many other types of applications,the input representation is more stable. For instance,in collaborativeﬁltering each data vector consists of a single user’s ratings for aﬁxed set of items;each data dimension always corresponds to the same item.Thus,we also explore the application of MCVQ and MCFA to the problem of predicting ratings of unseen movies,given observed ratings for a large set of users.Here parts correspond to subsets of the movies which have correlated ratings.Code implementing MCVQ and MCFA in M ATLAB,as used for the following experiments,can be obtained at /∼dross/mcvq/.3.1Face ImagesThe face data set consisted of2429images from the CBCL Face database#1(MIT-CBCL,2000). Each image contained a single frontal or near-frontal face,depicted in19×19pixel grayscale.The images were histogram equalized,and pixel values rescaled to lie in[−2,2].Sample training images are shown in Figure2.Using these images as input,we trained MCVQ and MCFA models,each containing K=6 factors.The MCVQ model with J=10states converged in120iterations of Monte Carlo EM,whileL EARNING P ARTS-B ASED R EPRESENTATIONS OF D ATAFigure2:Sample training images from the CBCL Face database#1.VQ 1VQ 2VQ 3VQ 4VQ 5VQ 6Figure3:The parts-based model of faces learned by MCVQ.On the left are plots showing the posterior probability with which each pixel selects the indicated VQ.On the right are themeans,for each state of each VQ,masked by the aforementioned selection probabilities(µk j.∗g k).the MCFA model with J=4basis vectors converged in15variational EM iterations.In practice,the Monte Carlo approach to inference leads to better local maxima of the objective function,and better parts-based models of the data.For both MCVQ and MCFA,prior probabilities were initialized to uniform,and states/basis vectors to randomly selected training images.The learned parts-based decompositions are depicted in Figures3and4.On the left of eachﬁgure is a plot of posterior probabilities{g dk}that each pixel selects the indicated factor as its explanation.White indicates high probability of selection,and black low.As can be seen,each g k can be thought of as a mask indicating which pixels‘belong’to factor k.In the noise-free case,each image generated by one of these models is a sum of the contributions from the various factors,where each contribution is‘masked’by the probability of pixel selection. For example in Figure3the probability of selecting VQ#4is non-zero only around the nose,thus the contribution of this VQ to the remaining areas of the face in any generated image is negligible.On the right of eachﬁgure is a plot of the10states/4basis vectors for each factor.Each image has been masked(via element-wise multiplication)with the corresponding g k.For MCVQ this is (µdk j.∗g dk),and for MCFA this is(Λk d j.∗g dk).In the case of MCVQ,each state is an alternative appearance for the corresponding part.For example VQ#4gives10alternative noses to select fromR OSS AND Z EMELFA 1FA 2FA 3FA 4FA 5FA 6Figure4:The parts-based model of faces learned by MCFA.when generating a face.These range from thin to wide,with shadows of the left or right,and with light or dark upper-lips(possibly corresponding to moustaches).In MCFA,on the other hand,the basis vectors are not discrete alternatives.Rather these vectors are combined via arbitrary linear combinations to generate an appearance of a part.To test theﬁdelity of the learned representations,the trained models were used to probabilisti-cally classify held-out test images as either face or non-face.The test data consisted of the472face images and472randomly-selected non-face images from the CBCL database test set.To classify test images,we evaluated their probability under the model,labeling an image as face if its prob-ability exceeded a threshold,and non-face if it did not.For each model the threshold was chosen to maximize classiﬁcation performance.Evaluating the probability of a data vector under MCVQ and MCFA can be difﬁcult,since it requires marginalizing over all possible values of the latent vari-ables.Thus,in practice,we use a Monte Carlo approximation,obtained by averaging over a sample of possible selections.The results of this experiment are shown in Table1.In addition to MCVQ and MCFA,we performed the same experiment with four other probabilistic models:probabilistic principal components analysis(PPCA),mixture of Gaussians,a single Gaussian distribution,and cooperative vector quantization(Hinton and Zemel,1994)(which is described further in Section5). It is important to note that in this experiment an increase in model size(using more VQs,states, or basis vectors)does not necessarily improve the ability to discriminate faces from non-faces.For example,PPCA achieves its highest accuracy when using only three basis vectors,and a mixture of Gaussians with60states outperforms one with84states.As can be seen,the highest performance is achieved by an MCVQ model with6VQs and14states per VQ.Another way of validating image models it to use them to generate new examples and see how closely they resemble images from the observed data—in this case how much the generated im-ages resemble actual faces.Examples of images generated from MCVQ and MCFA are shown in Figure5.Model AccuracyMixture of Gaussians(60states)0.8072MCVQ(6VQs,10states each)0.8030Probabilistic PCA(3components)0.7903Gaussian distribution(diagonal covariance)0.7871Mixture of Gaussians(84states)0.7680MCFA(6FAs,3basis vectors each)0.7415MCFA(6FAs,4basis vectors each)0.7267Cooperative Vector Quantization(6VQs,10states each)0.6208Table1:Results of classifying test images as face or non-face,by computing their probabilities under trained generative models of faces.Figure5:Synthetic images generated using MCVQ(left)and MCFA(right).The parts-based models learned by MCVQ and MCFA differ from those learned by NMF and PCA,as depicted in Figure6,in two important ways.First,the basis is sparse—each factor con-tributes to only a limited region of the image,whereas in NMF and PCA basis vectors include more global effects.Secondly MCVQ and MCFA learn a grouping of vectors into related parts,by explicitly modeling the sparsity via the g dk distributions.A further point of comparison is that,in images generated by PPCA and NMF,a signiﬁcant proportion of the pixels in the generated images lie outside the range of values appearing in the training data(7%for PPCA and4.5%for NMF),requiring that the generated values be thresholded. On the other hand,MCVQ will never generate pixel values outside the observed range.This is a simple result,since each generated image is a convex combination of basis vectors,and each basis vector is a convex combination of data vectors.Although no such guarantee exists for MCFA,in practice pixels it generates are all within-range.。