Decentralized jointly sparse optimization by reweighted minimization

格式：pdf
大小：188.48 KB
文档页数：12

下载文档原格式

人工智能领域中英文专有名词汇总

名词解释中英文对比<using_information_sources> social networks 社会网络abductive reasoning 溯因推理action recognition(行为识别)active learning(主动学习)adaptive systems 自适应系统adverse drugs reactions(药物不良反应)algorithm design and analysis(算法设计与分析) algorithm(算法)artificial intelligence 人工智能association rule(关联规则)attribute value taxonomy 属性分类规范automomous agent 自动代理automomous systems 自动系统background knowledge 背景知识bayes methods(贝叶斯方法)bayesian inference(贝叶斯推断)bayesian methods(bayes 方法)belief propagation(置信传播)better understanding 内涵理解big data 大数据big data(大数据)biological network(生物网络)biological sciences(生物科学)biomedical domain 生物医学领域biomedical research(生物医学研究)biomedical text(生物医学文本)boltzmann machine(玻尔兹曼机)bootstrapping method 拔靴法case based reasoning 实例推理causual models 因果模型citation matching (引文匹配)classification (分类)classification algorithms(分类算法)clistering algorithms 聚类算法cloud computing(云计算)cluster-based retrieval (聚类检索)clustering (聚类)clustering algorithms(聚类算法)clustering 聚类cognitive science 认知科学collaborative filtering (协同过滤)collaborative filtering(协同过滤)collabrative ontology development 联合本体开发collabrative ontology engineering 联合本体工程commonsense knowledge 常识communication networks(通讯网络)community detection(社区发现)complex data(复杂数据)complex dynamical networks(复杂动态网络)complex network(复杂网络)complex network(复杂网络)computational biology 计算生物学computational biology(计算生物学)computational complexity(计算复杂性) computational intelligence 智能计算computational modeling(计算模型)computer animation(计算机动画)computer networks(计算机网络)computer science 计算机科学concept clustering 概念聚类concept formation 概念形成concept learning 概念学习concept map 概念图concept model 概念模型concept modelling 概念模型conceptual model 概念模型conditional random field(条件随机场模型) conjunctive quries 合取查询constrained least squares (约束最小二乘) convex programming(凸规划)convolutional neural networks(卷积神经网络) customer relationship management(客户关系管理) data analysis(数据分析)data analysis(数据分析)data center(数据中心)data clustering (数据聚类)data compression(数据压缩)data envelopment analysis (数据包络分析)data fusion 数据融合data generation(数据生成)data handling(数据处理)data hierarchy (数据层次)data integration(数据整合)data integrity 数据完整性data intensive computing(数据密集型计算)data management 数据管理data management(数据管理)data management(数据管理)data miningdata mining 数据挖掘data model 数据模型data models(数据模型)data partitioning 数据划分data point(数据点)data privacy(数据隐私)data security(数据安全)data stream(数据流)data streams(数据流)data structure( 数据结构)data structure(数据结构)data visualisation(数据可视化)data visualization 数据可视化data visualization(数据可视化)data warehouse(数据仓库)data warehouses(数据仓库)data warehousing(数据仓库)database management systems(数据库管理系统)database management(数据库管理)date interlinking 日期互联date linking 日期链接Decision analysis(决策分析)decision maker 决策者decision making (决策)decision models 决策模型decision models 决策模型decision rule 决策规则decision support system 决策支持系统decision support systems (决策支持系统) decision tree(决策树)decission tree 决策树deep belief network(深度信念网络)deep learning(深度学习)defult reasoning 默认推理density estimation(密度估计)design methodology 设计方法论dimension reduction(降维) dimensionality reduction(降维)directed graph(有向图)disaster management 灾害管理disastrous event(灾难性事件)discovery(知识发现)dissimilarity (相异性)distributed databases 分布式数据库distributed databases(分布式数据库) distributed query 分布式查询document clustering (文档聚类)domain experts 领域专家domain knowledge 领域知识domain specific language 领域专用语言dynamic databases(动态数据库)dynamic logic 动态逻辑dynamic network(动态网络)dynamic system(动态系统)earth mover's distance(EMD 距离) education 教育efficient algorithm(有效算法)electric commerce 电子商务electronic health records(电子健康档案) entity disambiguation 实体消歧entity recognition 实体识别entity recognition(实体识别)entity resolution 实体解析event detection 事件检测event detection(事件检测)event extraction 事件抽取event identificaton 事件识别exhaustive indexing 完整索引expert system 专家系统expert systems(专家系统)explanation based learning 解释学习factor graph(因子图)feature extraction 特征提取feature extraction(特征提取)feature extraction(特征提取)feature selection (特征选择)feature selection 特征选择feature selection(特征选择)feature space 特征空间first order logic 一阶逻辑formal logic 形式逻辑formal meaning prepresentation 形式意义表示formal semantics 形式语义formal specification 形式描述frame based system 框为本的系统frequent itemsets(频繁项目集)frequent pattern(频繁模式)fuzzy clustering (模糊聚类)fuzzy clustering (模糊聚类)fuzzy clustering (模糊聚类)fuzzy data mining(模糊数据挖掘)fuzzy logic 模糊逻辑fuzzy set theory(模糊集合论)fuzzy set(模糊集)fuzzy sets 模糊集合fuzzy systems 模糊系统gaussian processes(高斯过程)gene expression data 基因表达数据gene expression(基因表达)generative model(生成模型)generative model(生成模型)genetic algorithm 遗传算法genome wide association study(全基因组关联分析) graph classification(图分类)graph classification(图分类)graph clustering(图聚类)graph data(图数据)graph data(图形数据)graph database 图数据库graph database(图数据库)graph mining(图挖掘)graph mining(图挖掘)graph partitioning 图划分graph query 图查询graph structure(图结构)graph theory(图论)graph theory(图论)graph theory(图论)graph theroy 图论graph visualization(图形可视化)graphical user interface 图形用户界面graphical user interfaces(图形用户界面)health care 卫生保健health care(卫生保健)heterogeneous data source 异构数据源heterogeneous data(异构数据)heterogeneous database 异构数据库heterogeneous information network(异构信息网络) heterogeneous network(异构网络)heterogenous ontology 异构本体heuristic rule 启发式规则hidden markov model(隐马尔可夫模型)hidden markov model(隐马尔可夫模型)hidden markov models(隐马尔可夫模型) hierarchical clustering (层次聚类) homogeneous network(同构网络)human centered computing 人机交互技术human computer interaction 人机交互human interaction 人机交互human robot interaction 人机交互image classification(图像分类)image clustering (图像聚类)image mining( 图像挖掘)image reconstruction(图像重建)image retrieval (图像检索)image segmentation(图像分割)inconsistent ontology 本体不一致incremental learning(增量学习)inductive learning (归纳学习)inference mechanisms 推理机制inference mechanisms(推理机制)inference rule 推理规则information cascades(信息追随)information diffusion(信息扩散)information extraction 信息提取information filtering(信息过滤)information filtering(信息过滤)information integration(信息集成)information network analysis(信息网络分析) information network mining(信息网络挖掘) information network(信息网络)information processing 信息处理information processing 信息处理information resource management (信息资源管理) information retrieval models(信息检索模型) information retrieval 信息检索information retrieval(信息检索)information retrieval(信息检索)information science 情报科学information sources 信息源information system( 信息系统)information system(信息系统)information technology(信息技术)information visualization(信息可视化)instance matching 实例匹配intelligent assistant 智能辅助intelligent systems 智能系统interaction network(交互网络)interactive visualization(交互式可视化)kernel function(核函数)kernel operator (核算子)keyword search(关键字检索)knowledege reuse 知识再利用knowledgeknowledgeknowledge acquisitionknowledge base 知识库knowledge based system 知识系统knowledge building 知识建构knowledge capture 知识获取knowledge construction 知识建构knowledge discovery(知识发现)knowledge extraction 知识提取knowledge fusion 知识融合knowledge integrationknowledge management systems 知识管理系统knowledge management 知识管理knowledge management(知识管理)knowledge model 知识模型knowledge reasoningknowledge representationknowledge representation(知识表达) knowledge sharing 知识共享knowledge storageknowledge technology 知识技术knowledge verification 知识验证language model(语言模型)language modeling approach(语言模型方法) large graph(大图)large graph(大图)learning(无监督学习)life science 生命科学linear programming(线性规划)link analysis (链接分析)link prediction(链接预测)link prediction(链接预测)link prediction(链接预测)linked data(关联数据)location based service(基于位置的服务) loclation based services(基于位置的服务) logic programming 逻辑编程logical implication 逻辑蕴涵logistic regression(logistic 回归)machine learning 机器学习machine translation(机器翻译)management system(管理系统)management( 知识管理)manifold learning(流形学习)markov chains 马尔可夫链markov processes(马尔可夫过程)matching function 匹配函数matrix decomposition(矩阵分解)matrix decomposition(矩阵分解)maximum likelihood estimation(最大似然估计)medical research(医学研究)mixture of gaussians(混合高斯模型)mobile computing(移动计算)multi agnet systems 多智能体系统multiagent systems 多智能体系统multimedia 多媒体natural language processing 自然语言处理natural language processing(自然语言处理) nearest neighbor (近邻)network analysis( 网络分析)network analysis(网络分析)network analysis(网络分析)network formation(组网)network structure(网络结构)network theory(网络理论)network topology(网络拓扑)network visualization(网络可视化)neural network(神经网络)neural networks (神经网络)neural networks(神经网络)nonlinear dynamics(非线性动力学)nonmonotonic reasoning 非单调推理nonnegative matrix factorization (非负矩阵分解) nonnegative matrix factorization(非负矩阵分解) object detection(目标检测)object oriented 面向对象object recognition(目标识别)object recognition(目标识别)online community(网络社区)online social network(在线社交网络)online social networks(在线社交网络)ontology alignment 本体映射ontology development 本体开发ontology engineering 本体工程ontology evolution 本体演化ontology extraction 本体抽取ontology interoperablity 互用性本体ontology language 本体语言ontology mapping 本体映射ontology matching 本体匹配ontology versioning 本体版本ontology 本体论open government data 政府公开数据opinion analysis(舆情分析)opinion mining(意见挖掘)opinion mining(意见挖掘)outlier detection(孤立点检测)parallel processing(并行处理)patient care(病人医疗护理)pattern classification(模式分类)pattern matching(模式匹配)pattern mining(模式挖掘)pattern recognition 模式识别pattern recognition(模式识别)pattern recognition(模式识别)personal data(个人数据)prediction algorithms(预测算法)predictive model 预测模型predictive models(预测模型)privacy preservation(隐私保护)probabilistic logic(概率逻辑)probabilistic logic(概率逻辑)probabilistic model(概率模型)probabilistic model(概率模型)probability distribution(概率分布)probability distribution(概率分布)project management(项目管理)pruning technique(修剪技术)quality management 质量管理query expansion(查询扩展)query language 查询语言query language(查询语言)query processing(查询处理)query rewrite 查询重写question answering system 问答系统random forest(随机森林)random graph(随机图)random processes(随机过程)random walk(随机游走)range query(范围查询)RDF database 资源描述框架数据库RDF query 资源描述框架查询RDF repository 资源描述框架存储库RDF storge 资源描述框架存储real time(实时)recommender system(推荐系统)recommender system(推荐系统)recommender systems 推荐系统recommender systems(推荐系统)record linkage 记录链接recurrent neural network(递归神经网络) regression(回归)reinforcement learning 强化学习reinforcement learning(强化学习)relation extraction 关系抽取relational database 关系数据库relational learning 关系学习relevance feedback (相关反馈)resource description framework 资源描述框架restricted boltzmann machines(受限玻尔兹曼机) retrieval models(检索模型)rough set theroy 粗糙集理论rough set 粗糙集rule based system 基于规则系统rule based 基于规则rule induction (规则归纳)rule learning (规则学习)rule learning 规则学习schema mapping 模式映射schema matching 模式匹配scientific domain 科学域search problems(搜索问题)semantic (web) technology 语义技术semantic analysis 语义分析semantic annotation 语义标注semantic computing 语义计算semantic integration 语义集成semantic interpretation 语义解释semantic model 语义模型semantic network 语义网络semantic relatedness 语义相关性semantic relation learning 语义关系学习semantic search 语义检索semantic similarity 语义相似度semantic similarity(语义相似度)semantic web rule language 语义网规则语言semantic web 语义网semantic web(语义网)semantic workflow 语义工作流semi supervised learning(半监督学习)sensor data(传感器数据)sensor networks(传感器网络)sentiment analysis(情感分析)sentiment analysis(情感分析)sequential pattern(序列模式)service oriented architecture 面向服务的体系结构shortest path(最短路径)similar kernel function(相似核函数)similarity measure(相似性度量)similarity relationship (相似关系)similarity search(相似搜索)similarity(相似性)situation aware 情境感知social behavior(社交行为)social influence(社会影响)social interaction(社交互动)social interaction(社交互动)social learning(社会学习)social life networks(社交生活网络)social machine 社交机器social media(社交媒体)social media(社交媒体)social media(社交媒体)social network analysis 社会网络分析social network analysis(社交网络分析)social network(社交网络)social network(社交网络)social science(社会科学)social tagging system(社交标签系统)social tagging(社交标签)social web(社交网页)sparse coding(稀疏编码)sparse matrices(稀疏矩阵)sparse representation(稀疏表示)spatial database(空间数据库)spatial reasoning 空间推理statistical analysis(统计分析)statistical model 统计模型string matching(串匹配)structural risk minimization (结构风险最小化) structured data 结构化数据subgraph matching 子图匹配subspace clustering(子空间聚类)supervised learning( 有support vector machine 支持向量机support vector machines(支持向量机)system dynamics(系统动力学)tag recommendation(标签推荐)taxonmy induction 感应规范temporal logic 时态逻辑temporal reasoning 时序推理text analysis(文本分析)text anaylsis 文本分析text classification (文本分类)text data(文本数据)text mining technique(文本挖掘技术)text mining 文本挖掘text mining(文本挖掘)text summarization(文本摘要)thesaurus alignment 同义对齐time frequency analysis(时频分析)time series analysis( 时time series data(时间序列数据)time series data(时间序列数据)time series(时间序列)topic model(主题模型)topic modeling(主题模型)transfer learning 迁移学习triple store 三元组存储uncertainty reasoning 不精确推理undirected graph(无向图)unified modeling language 统一建模语言unsupervisedupper bound(上界)user behavior(用户行为)user generated content(用户生成内容)utility mining(效用挖掘)visual analytics(可视化分析)visual content(视觉内容)visual representation(视觉表征)visualisation(可视化)visualization technique(可视化技术) visualization tool(可视化工具)web 2.0(网络2.0)web forum(web 论坛)web mining(网络挖掘)web of data 数据网web ontology lanuage 网络本体语言web pages(web 页面)web resource 网络资源web science 万维科学web search (网络检索)web usage mining(web 使用挖掘)wireless networks 无线网络world knowledge 世界知识world wide web 万维网world wide web(万维网)xml database 可扩展标志语言数据库附录 2 Data Mining 知识图谱（共包含二级节点15 个，三级节点93 个）间序列分析)监督学习)领域二级分类三级分类。

算法态度文献综述、研究框架和未来展望

2023年5月（第37卷第5期）May，2023（Vol.37，No.5）East China Economic Management算法态度文献综述、研究框架和未来展望景怡（北京大学光华管理学院，北京100871）摘要：随着数字化和人工智能算法的发展，算法被越来越广泛地采用，算法决策越来越普遍。

但是人们对于算法决策的理解仍然存在不同态度。

通过梳理有关对算法决策态度和行为的研究文献，文章提出系统性研究框架，明确人们对算法决策产生不同态度和行为的主要原因、边界条件、主要发现和研究不足，同时构建算法态度形成过程理论框架，并确定可行的研究方向。

文章希望通过系统性回顾已有研究和整合新的模型，帮助人们更好把握对算法决策的理解，从而促使企业、政府、个人更积极地利用算法工具。

关键词：算法决策；人工智能；算法态度；文献综述中图分类号：TP301.6文献标识码：A文章编号：1007-5097（2023）05-0107-12 Literature Review,Research Framework,and Future Research Directions forAlgorithm AttitudeJING Yi（Guanghua School of Management，Peking University，Beijing100871，China）Abstract：With the development of digitization and artificial intelligence algorithms，algorithms are increasingly widely adopted and algorithmic decision-making is becoming more common.However，people still have different attitudes to⁃wards algorithmic decision-making.By reviewing research literature on attitudes and behaviors towards algorithmic decision-making，this paper proposes a systematic research framework，clarifying the main reasons，boundary conditions，main findings，and research shortcomings for people′s different attitudes and behaviors towards algorithmic decision-making.At the same time，a theoretical framework for the formation process of algorithmic attitudes is constructed，and feasible research directions are identified.This paper aims to systematically review existing research and integrate new models to help people better understand algorithmic decision-making，thereby promoting enterprises，governments，and individuals to actively utilize algorithmic tools.Key words：algorithmic decision-making；artificial intelligence；algorithm attitude；literature review一、研究背景在经历了两次发展浪潮之后，海量数据的积累和深度学习算法的发展带来了人工智能的第三次发展热潮［1］。

综述Representation learning a review and new perspectives

explanatory factors for the observed input. A good representation is also one that is useful as input to a supervised predictor. Among the various ways of learning representations, this paper focuses on deep learning methods: those that are formed by the composition of multiple non-linear transformations, with the goal of yielding more abstract – and ultimately more useful – representations. Here we survey this rapidly developing area with special emphasis on recent progress. We consider some of the fundamental questions that have been driving research in this area. Speciﬁcally, what makes one representation better than another? Given an example, how should we compute its representation, i.e. perform feature extraction? Also, what are appropriate objectives for learning good representations?

机器学习与数据挖掘笔试面试题

What is a decision tree? What are some business reasons you might want to use a decision tree model? How do you build a decision tree model? What impurity measures do you know? Describe some of the different splitting rules used by different decision tree algorithms. Is a big brushy tree always good? How will you compare aegression? Which is more suitable under different circumstances? What is pruning and why is it important? Ensemble models: To answer questions on ensemble models here is a :
Why do we combine multiple trees? What is Random Forest? Why would you prefer it to SVM? Logistic regression: Link to Logistic regression Here's a nice tutorial What is logistic regression? How do we train a logistic regression model? How do we interpret its coefficients? Support Vector Machines A tutorial on SVM can be found and What is the maximal margin classifier? How this margin can be achieved and why is it beneficial? How do we train SVM? What about hard SVM and soft SVM? What is a kernel? Explain the Kernel trick Which kernels do you know? How to choose a kernel? Neural Networks Here's a link to on Coursera What is an Artificial Neural Network? How to train an ANN? What is back propagation? How does a neural network with three layers (one input layer, one inner layer and one output layer) compare to a logistic regression? What is deep learning? What is CNN (Convolution Neural Network) or RNN (Recurrent Neural Network)? Other models: What other models do you know? How can we use Naive Bayes classifier for categorical features? What if some features are numerical? Tradeoffs between different types of classification models. How to choose the best one? Compare logistic regression with decision trees and neural networks. and What is Regularization? Which problem does Regularization try to solve? Ans. used to address the overfitting problem, it penalizes your loss function by adding a multiple of an L1 (LASSO) or an L2 (Ridge) norm of your weights vector w (it is the vector of the learned parameters in your linear regression). What does it mean (practically) for a design matrix to be "ill-conditioned"? When might you want to use ridge regression instead of traditional linear regression? What is the difference between the L1 and L2 regularization? Why (geometrically) does LASSO produce solutions with zero-valued coefficients (as opposed to ridge)? and What is the purpose of dimensionality reduction and why do we need it? Are dimensionality reduction techniques supervised or not? Are all of them are (un)supervised? What ways of reducing dimensionality do you know? Is feature selection a dimensionality reduction technique? What is the difference between feature selection and feature extraction? Is it beneficial to perform dimensionality reduction before fitting an SVM? Why or why not? and Why do you need to use cluster analysis? Give examples of some cluster analysis methods? Differentiate between partitioning method and hierarchical methods. Explain K-Means and its objective? How do you select K for K-Means?

人工智能基础(习题卷9)

人工智能基础(习题卷9)第1部分：单项选择题，共53题，每题只有一个正确答案,多选或少选均不得分。

1.[单选题]由心理学途径产生，认为人工智能起源于数理逻辑的研究学派是（）A)连接主义学派B)行为主义学派C)符号主义学派答案:C解析:2.[单选题]一条规则形如：，其中“←"右边的部分称为(___)A)规则长度B)规则头C)布尔表达式D)规则体答案:D解析:3.[单选题]下列对人工智能芯片的表述，不正确的是（）。

A)一种专门用于处理人工智能应用中大量计算任务的芯片B)能够更好地适应人工智能中大量矩阵运算C)目前处于成熟高速发展阶段D)相对于传统的CPU处理器，智能芯片具有很好的并行计算性能答案:C解析:4.[单选题]以下图像分割方法中，不属于基于图像灰度分布的阈值方法的是( )。

A)类间最大距离法B)最大类间、内方差比法C)p-参数法D)区域生长法答案:B解析:5.[单选题]下列关于不精确推理过程的叙述错误的是（）。

A)不精确推理过程是从不确定的事实出发B)不精确推理过程最终能够推出确定的结论C)不精确推理过程是运用不确定的知识D)不精确推理过程最终推出不确定性的结论答案:B解析:6.[单选题]假定你现在训练了一个线性SVM并推断出这个模型出现了欠拟合现象，在下一次训练时，应该采取的措施是（）0A)增加数据点D)减少特征答案:C解析:欠拟合是指模型拟合程度不高，数据距离拟合曲线较远，或指模型没有很好地捕捉到数据特征，不能够很好地拟合数据。

可通过增加特征解决。

7.[单选题]以下哪一个概念是用来计算复合函数的导数？A)微积分中的链式结构B)硬双曲正切函数C)softplus函数D)劲向基函数答案:A解析:8.[单选题]相互关联的数据资产标准，应确保()。

数据资产标准存在冲突或衔接中断时，后序环节应遵循和适应前序环节的要求，变更相应数据资产标准。

A)连接B)配合C)衔接和匹配D)连接和配合答案:C解析:9.[单选题]固体半导体摄像机所使用的固体摄像元件为( )。

中科院自动化所评测技术报告(SYSTEMII).pptx

中国科学院自动化所
技术说明-依存树到串系统
❖ 在源语言端运用依存结构进行统计翻译的新模型： Dependency-String Structure Model (DSS模型)
❖ 由于计算资源和时间的限制，参加本次评测的系统仍然沿用和分层短语相同的文法
❖ DSS解码算法的输入是一棵树，而不是一个串，所以线图是按照树节点来索引的，而不是按照串中的跨距(span)来建立索引的
Institute of Automation, Chinese Academy of Sciences
中国科学院自动化所
技术说明-分层短语系统
中国科学院自动化所
❖ 主要借鉴了Wei[5]的基本思想，引入分层短语的概念
❖ 有效地结合了短语模型和同步上下文无关文法 ❖ 分层短语模型将语序信息包含在模型之中，克服
❖ 后处理
▪ 大小写转换：未翻译词保留其原始格式 ▪ 格式转换：去除中文中的空格 ▪ 未登录词
Institute of Automation, Chinese Academy of Sciences
技术说明-系统融合[6][7]
MT1
……
MTm
中国科学院自动化所
N-best
N-best
MBR 解码器对齐参考
了传统短语翻译的调序问题 ❖ 沿用了统计线性对数方法进行概率计算，使用了
如下6个特征：
▪ 双向短语翻译概率p(e/f)和p(f/e) ▪ 双向词汇化概率l(e/f)和l(f/e) ▪ 4gram语言模型 ▪ 句子长度惩罚
Institute of Aห้องสมุดไป่ตู้tomation, Chinese Academy of Sciences
中科院自动化所评测技术报告(SYSTEM II)

《神经网络与深度学习综述DeepLearning15May2014

Draft:Deep Learning in Neural Networks:An OverviewTechnical Report IDSIA-03-14/arXiv:1404.7828(v1.5)[cs.NE]J¨u rgen SchmidhuberThe Swiss AI Lab IDSIAIstituto Dalle Molle di Studi sull’Intelligenza ArtiﬁcialeUniversity of Lugano&SUPSIGalleria2,6928Manno-LuganoSwitzerland15May2014AbstractIn recent years,deep artiﬁcial neural networks(including recurrent ones)have won numerous con-tests in pattern recognition and machine learning.This historical survey compactly summarises relevantwork,much of it from the previous millennium.Shallow and deep learners are distinguished by thedepth of their credit assignment paths,which are chains of possibly learnable,causal links between ac-tions and effects.I review deep supervised learning(also recapitulating the history of backpropagation),unsupervised learning,reinforcement learning&evolutionary computation,and indirect search for shortprograms encoding deep and large networks.PDF of earlier draft(v1):http://www.idsia.ch/∼juergen/DeepLearning30April2014.pdfLATEX source:http://www.idsia.ch/∼juergen/DeepLearning30April2014.texComplete BIBTEXﬁle:http://www.idsia.ch/∼juergen/bib.bibPrefaceThis is the draft of an invited Deep Learning(DL)overview.One of its goals is to assign credit to those who contributed to the present state of the art.I acknowledge the limitations of attempting to achieve this goal.The DL research community itself may be viewed as a continually evolving,deep network of scientists who have inﬂuenced each other in complex ways.Starting from recent DL results,I tried to trace back the origins of relevant ideas through the past half century and beyond,sometimes using“local search”to follow citations of citations backwards in time.Since not all DL publications properly acknowledge earlier relevant work,additional global search strategies were employed,aided by consulting numerous neural network experts.As a result,the present draft mostly consists of references(about800entries so far).Nevertheless,through an expert selection bias I may have missed important work.A related bias was surely introduced by my special familiarity with the work of my own DL research group in the past quarter-century.For these reasons,the present draft should be viewed as merely a snapshot of an ongoing credit assignment process.To help improve it,please do not hesitate to send corrections and suggestions to juergen@idsia.ch.Contents1Introduction to Deep Learning(DL)in Neural Networks(NNs)3 2Event-Oriented Notation for Activation Spreading in FNNs/RNNs3 3Depth of Credit Assignment Paths(CAPs)and of Problems4 4Recurring Themes of Deep Learning54.1Dynamic Programming(DP)for DL (5)4.2Unsupervised Learning(UL)Facilitating Supervised Learning(SL)and RL (6)4.3Occam’s Razor:Compression and Minimum Description Length(MDL) (6)4.4Learning Hierarchical Representations Through Deep SL,UL,RL (6)4.5Fast Graphics Processing Units(GPUs)for DL in NNs (6)5Supervised NNs,Some Helped by Unsupervised NNs75.11940s and Earlier (7)5.2Around1960:More Neurobiological Inspiration for DL (7)5.31965:Deep Networks Based on the Group Method of Data Handling(GMDH) (8)5.41979:Convolution+Weight Replication+Winner-Take-All(WTA) (8)5.51960-1981and Beyond:Development of Backpropagation(BP)for NNs (8)5.5.1BP for Weight-Sharing Feedforward NNs(FNNs)and Recurrent NNs(RNNs)..95.6Late1980s-2000:Numerous Improvements of NNs (9)5.6.1Ideas for Dealing with Long Time Lags and Deep CAPs (10)5.6.2Better BP Through Advanced Gradient Descent (10)5.6.3Discovering Low-Complexity,Problem-Solving NNs (11)5.6.4Potential Beneﬁts of UL for SL (11)5.71987:UL Through Autoencoder(AE)Hierarchies (12)5.81989:BP for Convolutional NNs(CNNs) (13)5.91991:Fundamental Deep Learning Problem of Gradient Descent (13)5.101991:UL-Based History Compression Through a Deep Hierarchy of RNNs (14)5.111992:Max-Pooling(MP):Towards MPCNNs (14)5.121994:Contest-Winning Not So Deep NNs (15)5.131995:Supervised Recurrent Very Deep Learner(LSTM RNN) (15)5.142003:More Contest-Winning/Record-Setting,Often Not So Deep NNs (16)5.152006/7:Deep Belief Networks(DBNs)&AE Stacks Fine-Tuned by BP (17)5.162006/7:Improved CNNs/GPU-CNNs/BP-Trained MPCNNs (17)5.172009:First Ofﬁcial Competitions Won by RNNs,and with MPCNNs (18)5.182010:Plain Backprop(+Distortions)on GPU Yields Excellent Results (18)5.192011:MPCNNs on GPU Achieve Superhuman Vision Performance (18)5.202011:Hessian-Free Optimization for RNNs (19)5.212012:First Contests Won on ImageNet&Object Detection&Segmentation (19)5.222013-:More Contests and Benchmark Records (20)5.22.1Currently Successful Supervised Techniques:LSTM RNNs/GPU-MPCNNs (21)5.23Recent Tricks for Improving SL Deep NNs(Compare Sec.5.6.2,5.6.3) (21)5.24Consequences for Neuroscience (22)5.25DL with Spiking Neurons? (22)6DL in FNNs and RNNs for Reinforcement Learning(RL)236.1RL Through NN World Models Yields RNNs With Deep CAPs (23)6.2Deep FNNs for Traditional RL and Markov Decision Processes(MDPs) (24)6.3Deep RL RNNs for Partially Observable MDPs(POMDPs) (24)6.4RL Facilitated by Deep UL in FNNs and RNNs (25)6.5Deep Hierarchical RL(HRL)and Subgoal Learning with FNNs and RNNs (25)6.6Deep RL by Direct NN Search/Policy Gradients/Evolution (25)6.7Deep RL by Indirect Policy Search/Compressed NN Search (26)6.8Universal RL (27)7Conclusion271Introduction to Deep Learning(DL)in Neural Networks(NNs) Which modiﬁable components of a learning system are responsible for its success or failure?What changes to them improve performance?This has been called the fundamental credit assignment problem(Minsky, 1963).There are general credit assignment methods for universal problem solvers that are time-optimal in various theoretical senses(Sec.6.8).The present survey,however,will focus on the narrower,but now commercially important,subﬁeld of Deep Learning(DL)in Artiﬁcial Neural Networks(NNs).We are interested in accurate credit assignment across possibly many,often nonlinear,computational stages of NNs.Shallow NN-like models have been around for many decades if not centuries(Sec.5.1).Models with several successive nonlinear layers of neurons date back at least to the1960s(Sec.5.3)and1970s(Sec.5.5). An efﬁcient gradient descent method for teacher-based Supervised Learning(SL)in discrete,differentiable networks of arbitrary depth called backpropagation(BP)was developed in the1960s and1970s,and ap-plied to NNs in1981(Sec.5.5).BP-based training of deep NNs with many layers,however,had been found to be difﬁcult in practice by the late1980s(Sec.5.6),and had become an explicit research subject by the early1990s(Sec.5.9).DL became practically feasible to some extent through the help of Unsupervised Learning(UL)(e.g.,Sec.5.10,5.15).The1990s and2000s also saw many improvements of purely super-vised DL(Sec.5).In the new millennium,deep NNs haveﬁnally attracted wide-spread attention,mainly by outperforming alternative machine learning methods such as kernel machines(Vapnik,1995;Sch¨o lkopf et al.,1998)in numerous important applications.In fact,supervised deep NNs have won numerous of-ﬁcial international pattern recognition competitions(e.g.,Sec.5.17,5.19,5.21,5.22),achieving theﬁrst superhuman visual pattern recognition results in limited domains(Sec.5.19).Deep NNs also have become relevant for the more generalﬁeld of Reinforcement Learning(RL)where there is no supervising teacher (Sec.6).Both feedforward(acyclic)NNs(FNNs)and recurrent(cyclic)NNs(RNNs)have won contests(Sec.5.12,5.14,5.17,5.19,5.21,5.22).In a sense,RNNs are the deepest of all NNs(Sec.3)—they are general computers more powerful than FNNs,and can in principle create and process memories of ar-bitrary sequences of input patterns(e.g.,Siegelmann and Sontag,1991;Schmidhuber,1990a).Unlike traditional methods for automatic sequential program synthesis(e.g.,Waldinger and Lee,1969;Balzer, 1985;Soloway,1986;Deville and Lau,1994),RNNs can learn programs that mix sequential and parallel information processing in a natural and efﬁcient way,exploiting the massive parallelism viewed as crucial for sustaining the rapid decline of computation cost observed over the past75years.The rest of this paper is structured as follows.Sec.2introduces a compact,event-oriented notation that is simple yet general enough to accommodate both FNNs and RNNs.Sec.3introduces the concept of Credit Assignment Paths(CAPs)to measure whether learning in a given NN application is of the deep or shallow type.Sec.4lists recurring themes of DL in SL,UL,and RL.Sec.5focuses on SL and UL,and on how UL can facilitate SL,although pure SL has become dominant in recent competitions(Sec.5.17-5.22). Sec.5is arranged in a historical timeline format with subsections on important inspirations and technical contributions.Sec.6on deep RL discusses traditional Dynamic Programming(DP)-based RL combined with gradient-based search techniques for SL or UL in deep NNs,as well as general methods for direct and indirect search in the weight space of deep FNNs and RNNs,including successful policy gradient and evolutionary methods.2Event-Oriented Notation for Activation Spreading in FNNs/RNNs Throughout this paper,let i,j,k,t,p,q,r denote positive integer variables assuming ranges implicit in the given contexts.Let n,m,T denote positive integer constants.An NN’s topology may change over time(e.g.,Fahlman,1991;Ring,1991;Weng et al.,1992;Fritzke, 1994).At any given moment,it can be described as aﬁnite subset of units(or nodes or neurons)N= {u1,u2,...,}and aﬁnite set H⊆N×N of directed edges or connections between nodes.FNNs are acyclic graphs,RNNs cyclic.Theﬁrst(input)layer is the set of input units,a subset of N.In FNNs,the k-th layer(k>1)is the set of all nodes u∈N such that there is an edge path of length k−1(but no longer path)between some input unit and u.There may be shortcut connections between distant layers.The NN’s behavior or program is determined by a set of real-valued,possibly modiﬁable,parameters or weights w i(i=1,...,n).We now focus on a singleﬁnite episode or epoch of information processing and activation spreading,without learning through weight changes.The following slightly unconventional notation is designed to compactly describe what is happening during the runtime of the system.During an episode,there is a partially causal sequence x t(t=1,...,T)of real values that I call events.Each x t is either an input set by the environment,or the activation of a unit that may directly depend on other x k(k<t)through a current NN topology-dependent set in t of indices k representing incoming causal connections or links.Let the function v encode topology information and map such event index pairs(k,t)to weight indices.For example,in the non-input case we may have x t=f t(net t)with real-valued net t= k∈in t x k w v(k,t)(additive case)or net t= k∈in t x k w v(k,t)(multiplicative case), where f t is a typically nonlinear real-valued activation function such as tanh.In many recent competition-winning NNs(Sec.5.19,5.21,5.22)there also are events of the type x t=max k∈int (x k);some networktypes may also use complex polynomial activation functions(Sec.5.3).x t may directly affect certain x k(k>t)through outgoing connections or links represented through a current set out t of indices k with t∈in k.Some non-input events are called output events.Note that many of the x t may refer to different,time-varying activations of the same unit in sequence-processing RNNs(e.g.,Williams,1989,“unfolding in time”),or also in FNNs sequentially exposed to time-varying input patterns of a large training set encoded as input events.During an episode,the same weight may get reused over and over again in topology-dependent ways,e.g.,in RNNs,or in convolutional NNs(Sec.5.4,5.8).I call this weight sharing across space and/or time.Weight sharing may greatly reduce the NN’s descriptive complexity,which is the number of bits of information required to describe the NN (Sec.4.3).In Supervised Learning(SL),certain NN output events x t may be associated with teacher-given,real-valued labels or targets d t yielding errors e t,e.g.,e t=1/2(x t−d t)2.A typical goal of supervised NN training is toﬁnd weights that yield episodes with small total error E,the sum of all such e t.The hope is that the NN will generalize well in later episodes,causing only small errors on previously unseen sequences of input events.Many alternative error functions for SL and UL are possible.SL assumes that input events are independent of earlier output events(which may affect the environ-ment through actions causing subsequent perceptions).This assumption does not hold in the broaderﬁelds of Sequential Decision Making and Reinforcement Learning(RL)(Kaelbling et al.,1996;Sutton and Barto, 1998;Hutter,2005)(Sec.6).In RL,some of the input events may encode real-valued reward signals given by the environment,and a typical goal is toﬁnd weights that yield episodes with a high sum of reward signals,through sequences of appropriate output actions.Sec.5.5will use the notation above to compactly describe a central algorithm of DL,namely,back-propagation(BP)for supervised weight-sharing FNNs and RNNs.(FNNs may be viewed as RNNs with certainﬁxed zero weights.)Sec.6will address the more general RL case.3Depth of Credit Assignment Paths(CAPs)and of ProblemsTo measure whether credit assignment in a given NN application is of the deep or shallow type,I introduce the concept of Credit Assignment Paths or CAPs,which are chains of possibly causal links between events.Let usﬁrst focus on SL.Consider two events x p and x q(1≤p<q≤T).Depending on the appli-cation,they may have a Potential Direct Causal Connection(PDCC)expressed by the Boolean predicate pdcc(p,q),which is true if and only if p∈in q.Then the2-element list(p,q)is deﬁned to be a CAP from p to q(a minimal one).A learning algorithm may be allowed to change w v(p,q)to improve performance in future episodes.More general,possibly indirect,Potential Causal Connections(PCC)are expressed by the recursively deﬁned Boolean predicate pcc(p,q),which in the SL case is true only if pdcc(p,q),or if pcc(p,k)for some k and pdcc(k,q).In the latter case,appending q to any CAP from p to k yields a CAP from p to q(this is a recursive deﬁnition,too).The set of such CAPs may be large but isﬁnite.Note that the same weight may affect many different PDCCs between successive events listed by a given CAP,e.g.,in the case of RNNs, or weight-sharing FNNs.Suppose a CAP has the form(...,k,t,...,q),where k and t(possibly t=q)are theﬁrst successive elements with modiﬁable w v(k,t).Then the length of the sufﬁx list(t,...,q)is called the CAP’s depth (which is0if there are no modiﬁable links at all).This depth limits how far backwards credit assignment can move down the causal chain toﬁnd a modiﬁable weight.1Suppose an episode and its event sequence x1,...,x T satisfy a computable criterion used to decide whether a given problem has been solved(e.g.,total error E below some threshold).Then the set of used weights is called a solution to the problem,and the depth of the deepest CAP within the sequence is called the solution’s depth.There may be other solutions(yielding different event sequences)with different depths.Given someﬁxed NN topology,the smallest depth of any solution is called the problem’s depth.Sometimes we also speak of the depth of an architecture:SL FNNs withﬁxed topology imply a problem-independent maximal problem depth bounded by the number of non-input layers.Certain SL RNNs withﬁxed weights for all connections except those to output units(Jaeger,2001;Maass et al.,2002; Jaeger,2004;Schrauwen et al.,2007)have a maximal problem depth of1,because only theﬁnal links in the corresponding CAPs are modiﬁable.In general,however,RNNs may learn to solve problems of potentially unlimited depth.Note that the deﬁnitions above are solely based on the depths of causal chains,and agnostic of the temporal distance between events.For example,shallow FNNs perceiving large“time windows”of in-put events may correctly classify long input sequences through appropriate output events,and thus solve shallow problems involving long time lags between relevant events.At which problem depth does Shallow Learning end,and Deep Learning begin?Discussions with DL experts have not yet yielded a conclusive response to this question.Instead of committing myself to a precise answer,let me just deﬁne for the purposes of this overview:problems of depth>10require Very Deep Learning.The difﬁculty of a problem may have little to do with its depth.Some NNs can quickly learn to solve certain deep problems,e.g.,through random weight guessing(Sec.5.9)or other types of direct search (Sec.6.6)or indirect search(Sec.6.7)in weight space,or through training an NNﬁrst on shallow problems whose solutions may then generalize to deep problems,or through collapsing sequences of(non)linear operations into a single(non)linear operation—but see an analysis of non-trivial aspects of deep linear networks(Baldi and Hornik,1994,Section B).In general,however,ﬁnding an NN that precisely models a given training set is an NP-complete problem(Judd,1990;Blum and Rivest,1992),also in the case of deep NNs(S´ıma,1994;de Souto et al.,1999;Windisch,2005);compare a survey of negative results(S´ıma, 2002,Section1).Above we have focused on SL.In the more general case of RL in unknown environments,pcc(p,q) is also true if x p is an output event and x q any later input event—any action may affect the environment and thus any later perception.(In the real world,the environment may even inﬂuence non-input events computed on a physical hardware entangled with the entire universe,but this is ignored here.)It is possible to model and replace such unmodiﬁable environmental PCCs through a part of the NN that has already learned to predict(through some of its units)input events(including reward signals)from former input events and actions(Sec.6.1).Its weights are frozen,but can help to assign credit to other,still modiﬁable weights used to compute actions(Sec.6.1).This approach may lead to very deep CAPs though.Some DL research is about automatically rephrasing problems such that their depth is reduced(Sec.4). In particular,sometimes UL is used to make SL problems less deep,e.g.,Sec.5.10.Often Dynamic Programming(Sec.4.1)is used to facilitate certain traditional RL problems,e.g.,Sec.6.2.Sec.5focuses on CAPs for SL,Sec.6on the more complex case of RL.4Recurring Themes of Deep Learning4.1Dynamic Programming(DP)for DLOne recurring theme of DL is Dynamic Programming(DP)(Bellman,1957),which can help to facili-tate credit assignment under certain assumptions.For example,in SL NNs,backpropagation itself can 1An alternative would be to count only modiﬁable links when measuring depth.In many typical NN applications this would not make a difference,but in some it would,e.g.,Sec.6.1.be viewed as a DP-derived method(Sec.5.5).In traditional RL based on strong Markovian assumptions, DP-derived methods can help to greatly reduce problem depth(Sec.6.2).DP algorithms are also essen-tial for systems that combine concepts of NNs and graphical models,such as Hidden Markov Models (HMMs)(Stratonovich,1960;Baum and Petrie,1966)and Expectation Maximization(EM)(Dempster et al.,1977),e.g.,(Bottou,1991;Bengio,1991;Bourlard and Morgan,1994;Baldi and Chauvin,1996; Jordan and Sejnowski,2001;Bishop,2006;Poon and Domingos,2011;Dahl et al.,2012;Hinton et al., 2012a).4.2Unsupervised Learning(UL)Facilitating Supervised Learning(SL)and RL Another recurring theme is how UL can facilitate both SL(Sec.5)and RL(Sec.6).UL(Sec.5.6.4) is normally used to encode raw incoming data such as video or speech streams in a form that is more convenient for subsequent goal-directed learning.In particular,codes that describe the original data in a less redundant or more compact way can be fed into SL(Sec.5.10,5.15)or RL machines(Sec.6.4),whose search spaces may thus become smaller(and whose CAPs shallower)than those necessary for dealing with the raw data.UL is closely connected to the topics of regularization and compression(Sec.4.3,5.6.3). 4.3Occam’s Razor:Compression and Minimum Description Length(MDL) Occam’s razor favors simple solutions over complex ones.Given some programming language,the prin-ciple of Minimum Description Length(MDL)can be used to measure the complexity of a solution candi-date by the length of the shortest program that computes it(e.g.,Solomonoff,1964;Kolmogorov,1965b; Chaitin,1966;Wallace and Boulton,1968;Levin,1973a;Rissanen,1986;Blumer et al.,1987;Li and Vit´a nyi,1997;Gr¨u nwald et al.,2005).Some methods explicitly take into account program runtime(Al-lender,1992;Watanabe,1992;Schmidhuber,2002,1995);many consider only programs with constant runtime,written in non-universal programming languages(e.g.,Rissanen,1986;Hinton and van Camp, 1993).In the NN case,the MDL principle suggests that low NN weight complexity corresponds to high NN probability in the Bayesian view(e.g.,MacKay,1992;Buntine and Weigend,1991;De Freitas,2003), and to high generalization performance(e.g.,Baum and Haussler,1989),without overﬁtting the training data.Many methods have been proposed for regularizing NNs,that is,searching for solution-computing, low-complexity SL NNs(Sec.5.6.3)and RL NNs(Sec.6.7).This is closely related to certain UL methods (Sec.4.2,5.6.4).4.4Learning Hierarchical Representations Through Deep SL,UL,RLMany methods of Good Old-Fashioned Artiﬁcial Intelligence(GOFAI)(Nilsson,1980)as well as more recent approaches to AI(Russell et al.,1995)and Machine Learning(Mitchell,1997)learn hierarchies of more and more abstract data representations.For example,certain methods of syntactic pattern recog-nition(Fu,1977)such as grammar induction discover hierarchies of formal rules to model observations. The partially(un)supervised Automated Mathematician/EURISKO(Lenat,1983;Lenat and Brown,1984) continually learns concepts by combining previously learnt concepts.Such hierarchical representation learning(Ring,1994;Bengio et al.,2013;Deng and Yu,2014)is also a recurring theme of DL NNs for SL (Sec.5),UL-aided SL(Sec.5.7,5.10,5.15),and hierarchical RL(Sec.6.5).Often,abstract hierarchical representations are natural by-products of data compression(Sec.4.3),e.g.,Sec.5.10.4.5Fast Graphics Processing Units(GPUs)for DL in NNsWhile the previous millennium saw several attempts at creating fast NN-speciﬁc hardware(e.g.,Jackel et al.,1990;Faggin,1992;Ramacher et al.,1993;Widrow et al.,1994;Heemskerk,1995;Korkin et al., 1997;Urlbe,1999),and at exploiting standard hardware(e.g.,Anguita et al.,1994;Muller et al.,1995; Anguita and Gomes,1996),the new millennium brought a DL breakthrough in form of cheap,multi-processor graphics cards or GPUs.GPUs are widely used for video games,a huge and competitive market that has driven down hardware prices.GPUs excel at fast matrix and vector multiplications required not only for convincing virtual realities but also for NN training,where they can speed up learning by a factorof50and more.Some of the GPU-based FNN implementations(Sec.5.16-5.19)have greatly contributed to recent successes in contests for pattern recognition(Sec.5.19-5.22),image segmentation(Sec.5.21), and object detection(Sec.5.21-5.22).5Supervised NNs,Some Helped by Unsupervised NNsThe main focus of current practical applications is on Supervised Learning(SL),which has dominated re-cent pattern recognition contests(Sec.5.17-5.22).Several methods,however,use additional Unsupervised Learning(UL)to facilitate SL(Sec.5.7,5.10,5.15).It does make sense to treat SL and UL in the same section:often gradient-based methods,such as BP(Sec.5.5.1),are used to optimize objective functions of both UL and SL,and the boundary between SL and UL may blur,for example,when it comes to time series prediction and sequence classiﬁcation,e.g.,Sec.5.10,5.12.A historical timeline format will help to arrange subsections on important inspirations and techni-cal contributions(although such a subsection may span a time interval of many years).Sec.5.1brieﬂy mentions early,shallow NN models since the1940s,Sec.5.2additional early neurobiological inspiration relevant for modern Deep Learning(DL).Sec.5.3is about GMDH networks(since1965),perhaps theﬁrst (feedforward)DL systems.Sec.5.4is about the relatively deep Neocognitron NN(1979)which is similar to certain modern deep FNN architectures,as it combines convolutional NNs(CNNs),weight pattern repli-cation,and winner-take-all(WTA)mechanisms.Sec.5.5uses the notation of Sec.2to compactly describe a central algorithm of DL,namely,backpropagation(BP)for supervised weight-sharing FNNs and RNNs. It also summarizes the history of BP1960-1981and beyond.Sec.5.6describes problems encountered in the late1980s with BP for deep NNs,and mentions several ideas from the previous millennium to overcome them.Sec.5.7discusses aﬁrst hierarchical stack of coupled UL-based Autoencoders(AEs)—this concept resurfaced in the new millennium(Sec.5.15).Sec.5.8is about applying BP to CNNs,which is important for today’s DL applications.Sec.5.9explains BP’s Fundamental DL Problem(of vanishing/exploding gradients)discovered in1991.Sec.5.10explains how a deep RNN stack of1991(the History Compressor) pre-trained by UL helped to solve previously unlearnable DL benchmarks requiring Credit Assignment Paths(CAPs,Sec.3)of depth1000and more.Sec.5.11discusses a particular WTA method called Max-Pooling(MP)important in today’s DL FNNs.Sec.5.12mentions aﬁrst important contest won by SL NNs in1994.Sec.5.13describes a purely supervised DL RNN(Long Short-Term Memory,LSTM)for problems of depth1000and more.Sec.5.14mentions an early contest of2003won by an ensemble of shallow NNs, as well as good pattern recognition results with CNNs and LSTM RNNs(2003).Sec.5.15is mostly about Deep Belief Networks(DBNs,2006)and related stacks of Autoencoders(AEs,Sec.5.7)pre-trained by UL to facilitate BP-based SL.Sec.5.16mentions theﬁrst BP-trained MPCNNs(2007)and GPU-CNNs(2006). Sec.5.17-5.22focus on ofﬁcial competitions with secret test sets won by(mostly purely supervised)DL NNs since2009,in sequence recognition,image classiﬁcation,image segmentation,and object detection. Many RNN results depended on LSTM(Sec.5.13);many FNN results depended on GPU-based FNN code developed since2004(Sec.5.16,5.17,5.18,5.19),in particular,GPU-MPCNNs(Sec.5.19).5.11940s and EarlierNN research started in the1940s(e.g.,McCulloch and Pitts,1943;Hebb,1949);compare also later work on learning NNs(Rosenblatt,1958,1962;Widrow and Hoff,1962;Grossberg,1969;Kohonen,1972; von der Malsburg,1973;Narendra and Thathatchar,1974;Willshaw and von der Malsburg,1976;Palm, 1980;Hopﬁeld,1982).In a sense NNs have been around even longer,since early supervised NNs were essentially variants of linear regression methods going back at least to the early1800s(e.g.,Legendre, 1805;Gauss,1809,1821).Early NNs had a maximal CAP depth of1(Sec.3).5.2Around1960:More Neurobiological Inspiration for DLSimple cells and complex cells were found in the cat’s visual cortex(e.g.,Hubel and Wiesel,1962;Wiesel and Hubel,1959).These cellsﬁre in response to certain properties of visual sensory inputs,such as theorientation of plex cells exhibit more spatial invariance than simple cells.This inspired later deep NN architectures(Sec.5.4)used in certain modern award-winning Deep Learners(Sec.5.19-5.22).5.31965:Deep Networks Based on the Group Method of Data Handling(GMDH) Networks trained by the Group Method of Data Handling(GMDH)(Ivakhnenko and Lapa,1965; Ivakhnenko et al.,1967;Ivakhnenko,1968,1971)were perhaps theﬁrst DL systems of the Feedforward Multilayer Perceptron type.The units of GMDH nets may have polynomial activation functions imple-menting Kolmogorov-Gabor polynomials(more general than traditional NN activation functions).Given a training set,layers are incrementally grown and trained by regression analysis,then pruned with the help of a separate validation set(using today’s terminology),where Decision Regularisation is used to weed out superﬂuous units.The numbers of layers and units per layer can be learned in problem-dependent fashion. This is a good example of hierarchical representation learning(Sec.4.4).There have been numerous ap-plications of GMDH-style networks,e.g.(Ikeda et al.,1976;Farlow,1984;Madala and Ivakhnenko,1994; Ivakhnenko,1995;Kondo,1998;Kord´ık et al.,2003;Witczak et al.,2006;Kondo and Ueno,2008).5.41979:Convolution+Weight Replication+Winner-Take-All(WTA)Apart from deep GMDH networks(Sec.5.3),the Neocognitron(Fukushima,1979,1980,2013a)was per-haps theﬁrst artiﬁcial NN that deserved the attribute deep,and theﬁrst to incorporate the neurophysiolog-ical insights of Sec.5.2.It introduced convolutional NNs(today often called CNNs or convnets),where the(typically rectangular)receptiveﬁeld of a convolutional unit with given weight vector is shifted step by step across a2-dimensional array of input values,such as the pixels of an image.The resulting2D array of subsequent activation events of this unit can then provide inputs to higher-level units,and so on.Due to massive weight replication(Sec.2),relatively few parameters may be necessary to describe the behavior of such a convolutional layer.Competition layers have WTA subsets whose maximally active units are the only ones to adopt non-zero activation values.They essentially“down-sample”the competition layer’s input.This helps to create units whose responses are insensitive to small image shifts(compare Sec.5.2).The Neocognitron is very similar to the architecture of modern,contest-winning,purely super-vised,feedforward,gradient-based Deep Learners with alternating convolutional and competition lay-ers(e.g.,Sec.5.19-5.22).Fukushima,however,did not set the weights by supervised backpropagation (Sec.5.5,5.8),but by local un supervised learning rules(e.g.,Fukushima,2013b),or by pre-wiring.In that sense he did not care for the DL problem(Sec.5.9),although his architecture was comparatively deep indeed.He also used Spatial Averaging(Fukushima,1980,2011)instead of Max-Pooling(MP,Sec.5.11), currently a particularly convenient and popular WTA mechanism.Today’s CNN-based DL machines proﬁta lot from later CNN work(e.g.,LeCun et al.,1989;Ranzato et al.,2007)(Sec.5.8,5.16,5.19).5.51960-1981and Beyond:Development of Backpropagation(BP)for NNsThe minimisation of errors through gradient descent(Hadamard,1908)in the parameter space of com-plex,nonlinear,differentiable,multi-stage,NN-related systems has been discussed at least since the early 1960s(e.g.,Kelley,1960;Bryson,1961;Bryson and Denham,1961;Pontryagin et al.,1961;Dreyfus,1962; Wilkinson,1965;Amari,1967;Bryson and Ho,1969;Director and Rohrer,1969;Griewank,2012),ini-tially within the framework of Euler-LaGrange equations in the Calculus of Variations(e.g.,Euler,1744). Steepest descent in such systems can be performed(Bryson,1961;Kelley,1960;Bryson and Ho,1969)by iterating the ancient chain rule(Leibniz,1676;L’Hˆo pital,1696)in Dynamic Programming(DP)style(Bell-man,1957).A simpliﬁed derivation of the method uses the chain rule only(Dreyfus,1962).The methods of the1960s were already efﬁcient in the DP sense.However,they backpropagated derivative information through standard Jacobian matrix calculations from one“layer”to the previous one, explicitly addressing neither direct links across several layers nor potential additional efﬁciency gains due to network sparsity(but perhaps such enhancements seemed obvious to the authors).。

结构稀疏模型刘建伟

——————————————— 刘建伟，男，1966年生，博士，中国石油大学(北京)地球物理信息工程学院自动化系副研究员，主要研究领域为机器学习，智能信息处理，复杂系统的分析、预测与控制，E-mail：liujw@. 崔立鹏，男，1990年生，中国石油大学(北京)地球物理信息工程学院自动化系硕士研究生，主要研究领域为机器学习，E-mail：cuilipengpeng@. 罗雄麟，男，1963年生，博士，中国石油大学(北京)地球物理信息工程学院自动化系教授，主要研究领域为智能控制、复杂系统分析、预测与控制，E-mail：luoxl@.
结构稀疏化模型是当前稀疏学习领域的研究方向近几年来涌现出很多研究成果文中对主流的结构稀疏模型如组结构稀疏模型结构稀疏字典学习双层结构稀疏模型树结构稀疏模型和图结构稀疏模型进行了总结对结构稀疏模型目标函数中包含非可微非凸和不可分离变量的结构稀疏模型目标函数近似转换为可微凸和可分离变量的近似目标函数的技术如控制一受控不等式majorityminoritymmnesterov双目标函数近似方法一阶泰勒展开和二阶泰勒展开技术对求解结构稀疏化模型近似目标函数的优化算法如最小角回归算法组最小角回归算法groupleastangleregressiongrouplars块坐标下降算法blockcoordinatedescentalgorithm分块坐标梯度下降算法blockcoordinategradientdescentalgorithm局部坐标下降算法1ocalcoordinatedescentalgorithm谱投影梯度法spectralprojectedgradientalgorithm主动集算法activesetalgrithm和交替方向乘子算法alternatingdirectionmethodofmultipliersadmm进行了比较分析并且对结构稀疏模型未来的研究方向进行了探讨

山东大学学报(工学版)总目次

第6期第50卷总目次山东大学学报(工学版）第50卷2020 年总目次机器学习与数据挖掘基于域对抗网络和B E R T 的跨领域文本情感分析...............基于V i B e 算法运动特征的关键帧提取算法......................自适应属性选择的实体对齐方法.............................基于门控循环单元与主动学习的协同过滤推荐算法...........基于异质集成学习的虚假评论检测..........................一种使用并行交错采样进行超分辨的方法....................基于校正神经网络的视频追踪算法...........................基于改进Y O L O v 3的复杂场景车辆分类与跟踪..................基于混合决策的改进鸟群算法..............................一种基于深度神经网络的句法要素识别方法..................基于多维相似度和情感词扩充的相同产品特征识别...........符号序列的L D A 主题特征表示方法 .........................基于元图归一化相似性度量的实体推荐.......................基于Laplacian 支持向量机和序列信息的m i c r o R N A -结合残基预测基于三维剪切波变换和B M 4D 的图像去噪方法................................蔡国永，林强，任凯琪（1-1)……李秋玲，邵宝民，赵磊，王振，姜雪（1-8)……苏佳林，王元卓，靳小龙，程学旗（1-14)......陈德蕾，王成，陈建伟，吴以茵（1-21)…张大鹏，刘雅军，张伟，沈芬，杨建盛（2-1)........................朱安，徐初（2-10)...........陈宁宁，赵建伟，周正华（2-17).............宋士奇，朴燕，蒋泽新（2-27)闫威，张达敏，张绘娟，辛梓芸，陈忠云（2-34)......陈艳平，冯丽，秦永彬，黄瑞章（2-44)...................胡龙茂，胡学钢（2-50).............冯超，徐鲲鹏，陈黎飞（2-60).............张文凯，禹可，吴晓非（2-66).....................马昕，王雪（2-76)......张胜男，王雷，常春红，郝本利（2-83)基于预测数据特征的空气质量预测方法...................................................................................................高铭壑，张莹，张蓉蓉,黄子豪，黄琳焱，李繁菀，张昕，王彦浩（2-91)基于轻型卷积神经网络的火焰检测方法..........................严云洋，杜晨锡，刘以安，高尚兵（2-100)基于深度学习的洗衣机异常音检测..........................李春阳，李楠，冯涛，王朱贺，马靖凯（2-108)语义分析及向量化大数据跨站脚本攻击智检.....................................张海军，陈映辉（2-118)自然语言问答中的语义关系识别.....................一种Chirplet 神经网络自动目标识别算法..............基于G a b o r 特征的乳腺肿瘤M R 图像分类识别模型......基于U A R T 串口的多机通讯.............................基于多模态子空间学习的语义标签生成方法.........基于背景复杂度自适应距离阈值修正的S u B S E N S E 算法基于双重启发式信息求解影响最大化问题的蚁群算法…联合检测的自适应融合目标跟踪.....................基于核极限学习机自编码器的标记分布学习.........基于集成学习〇,的质量浓度预测模型................基于空间注意力和卷积神经网络的视觉情感分析..............................段江丽，胡新（3-1)......................李怡霏，郭尊华（3-8).........袁高腾，刘毅慧，黄伟，胡兵（3-15).............................马金平（3-24)田楓，李欣，刘芳，李闯，孙小强，杜睿山（3-31)...............成科扬，孙爽，詹永照（3-38)•…覃俊，李蔚栋，易金莉，刘晶，马懋德（3-45)...............刘保成，朴燕，宋雪梅（3-51).......王一宾，李田力，程玉胜，钱坤（3-58)..................彭岩，冯婷婷，王洁（4-1)............蔡国永，贺歆灏，储阳阳（4-8)• 2 ■山东大学学报（工学版)第50卷一种基于多目标的容器云任务调度算法...............基于卷积神经网络的深度线段分类算法................基于类激活映射-注意力机制的图像描述方法...........基于Bi -LSTM 的脑电情绪识别.........................带特征指标约束描述的设计模式分类挖掘..............基于NRC 和多模态残差神经网络的肺部肿瘤良恶性分类中文对话理解中基于预训练的意图分类和槽填充联合模型融合残差块注意力机制和生成对抗网络的海马体分割••…........................谢晓兰，王琦（4-14)..............赵宁宁，唐雪嵩，赵鸣博（4-22).....廖南星，周世斌，张国鹏，程德强（4-28)..................刘帅，王磊，丁旭涛（4-35).....肖卓宇，何锫，陈果,徐运标，郭杰（6-48)■•…霍兵强，周涛，陆惠玲，董雅丽，刘珊（6-59)........................马常霞，张晨（6-68)张月芳，邓红霞，呼春香，钱冠宇，李海芳（6-76)控制科学与工程基于空间隐患分布与运动意图解析的危险评估方法........一类非仿射非线性大系统的结构在线扩展.................GPRS 监管的多协议异构现场总线控制系统................基于新型趋近律的参数未知分数阶Rucklidge 系统的滑模同步分数阶Brussel 系统混沌同步的三种控制方案...............一类非线性混沌系统的自适应滑模同步...................含对数项分数阶T 混沌系统的滑模同步...................赵越男，陈桂友，孙琛，卢宁，譽立伟（1 -28)............曹小洁，李小华，刘辉（1-35)……侯鹏飞，孙竹梅，王琦，白建云（1-49).........王春彦，邸金红，毛北行（4-40).........................程春蕊（4-46)..................程春蕊，毛北行（5-1)..................孟晓玲，毛北行（5-7)土木工程含层状节理岩体力学性质数值模拟研究.......................................徐子瑶，虞松，付强（3-66)水泥土搅拌桩沿海软基处理..............................................吕国仁，葛建东，肖海涛（3-73)高地应力下砂岩力学参数和波速变化规律试验研究..............................宫嘉辰，陈士海（3-82)饱和地基中单排孔近场隔振的现场试验与数值分析智慧公路关键技术发展综述...................双节理岩体T B M 滚刀破岩过程数值模拟......基于熵值法的水利施工企业绩效考核K P I 设计方法偏压大跨小净距公路隧道施工力学行为..........基于B P 神经网络算法的结构振动模态模糊控制••砂土介质中颗粒浆液扩散距离变化规律........预应力中空棒构件设计与力学特性..............隐伏溶洞对隧道围岩稳定性影响规律及处治技术硬岩隧道纯钢纤维混凝土管片应用..............喷扩锥台压灌桩最优构造.......................松散地层隧道进洞段管棚注浆加固效应分析……孙连勇，时刚，崔新壮，周明祥，王永军，纪方，闫小东（3-88)................................吴建清，宋修广（4-52)施雪松，管清正，王文扬，许振浩，林鹏，王孝特，刘洁（4-70).........................................程森(4-80)........................................王春国（4-85)...........................王志伟，葛楠，李春伟（5-13)........................冯啸，夏冲，王凤刚，张兵（5-20).............................林超，张程林，王勇（5-26).....................陈禹成，王朝阳，郭明，林鹏（5-33)..............徐振，李德明，王彬，詹谷益，张世杰（5-44)...........李连祥，邢宏侠，李金良，黄亨利，王雷（6-82)...................余俊，翁贤杰，樊文胜，张连震（6-92)机械与能动工程柔性Rushton 桨的振动特性.........................................................刘欣，杨锋茶（5-50)湿法脱硫塔一维传热传质性能模型理论与试验.....................陈保奎，孙奉仲，高明，史月涛（5-56)波浪能发电装置浮体形状参数对俘能性能影响............刘延俊，王伟，陈志，王冬海,王登帅，薛钢（6-1)深拖地震线列阵的动力学建模与位置预报...................朱向前,魏峥嵘，裴彦良，于凯本，宗乐（6-9)淹没深度对三自由度波能浮子获能的影响........................黄淑亭，翟晓宇，刘延俊，史宏达（6-17)尾缘襟翼振荡水翼的水动力特性.................................孙光，王勇，谢玉东，陈晨，张玉兵（6-23)深海带电插拔连接器力学特性分析…韩家桢，王勇，谢玉东，王启先，张新标，高文彬，李荣兰，张传军（6-30) 振荡翼改进运动模型的能量捕获性能分析............................乔凯，王启先，王勇，谢玉东（6-40)第6期第50卷总目次电气工程能源消费发展及预测方法综述..............................杨明，杜萍静，刘凤全，郝旭鹏，孛一凡（1-56)基于物理不可克隆函数的电网NB-IoT端到端安全加密方案............................................................................................刘冬兰，刘新，陈剑飞，王文婷，张昊，马雷，李冬（丨-63)中央空调紧急控制应对受端电网直流闭锁故障研究.................................................................................................刘萌，程定一，张文，张恒旭，李宽，张国辉，苏建军U-72)风电爬坡事件的非精确条件概率预测..........................王勃，汪步惟，杨明，赵元春，朱文立（丨-82)考虑同步调相机无功特性的多馈入直流同时换相失败风险评估方法............................................................................................麻常辉，王亮，谭邵卿，卢奕，马欢，赵康（3-98)考虑路灯充电桩接入的城市配电网电压控制方法............宋士瞻，陈浩宇，张健，王坤，郝庆水（3-104)基于分时电价的含光伏的智慧家庭能量调度方法…潘志远，刘超男，李宏伟，王婧，王威，刘静，郑鑫（3-111)基于弹性梯度下降算法的B P神经网络降雨径流预报模型..........金保明，卢光毅，王伟，杜伦阅（3-117)基于学习理论的含光储联合系统的输电网双层规划……孙东磊，赵龙，秦敬涛，韩学山，杨明，王明强（4-90) 考虑内部动态约束的MMC功率运行区间的确定及控制方法……张锋,杨桂兴，岳晨晶，郝全睿，李东（4 - 9 8)虾米腰弯管内置导流板优化...................................祁金胜，曹洪振，石岩，杜文静，王湛（5-64)基于B P神经网络的短期光伏集群功率区间预测........孙东磊，王艳，于一潇，韩学山，杨明，闰芳晴（5-70)偏心方圆节扩散管数值模拟.................................曹洪振,祁金胜，袁宝强，杜文静，王湛（5-77)烟气成分对湿式电除尘器电晕放电特性的影响.................王磊，张玉磊，李兆东，张金峰，王翔（5-83)含电极式电锅炉的地区电网电源侧综合效益分析......葛维春，李昭，赵东，李振宇，叶青，傅予，于娜（5-90)基于特征频带相电流提取的故障选相和选线方法........................张贺军，王鹏，徐凯，石访（5-99)电动汽车虚拟储能可用容量建模.......................................李蓓，赵松，谢志佳，牛萌（6-101)基于RTDS的配电网一二次融合仿真技术...............李志，余绍峰，苏毅方，王蔚，蒋宏图，张伟（6-112)芒刺参数对电晕放电及细颗粒物脱除特性的影响............................王磊，李明臻，王翔（6-118)含不凝气蒸汽在锯齿形表面的凝结传热特性............................闫吉庆，王效嘉，田茂诚（6-129)化学与环境济南城区大气PM2.5、PM,。

人工智能术语中英文对照

binary genes 二进制编码基因
boundary mutation 边界变异
building block hypothesis 基因块假设,积木块假设
cell 细胞
character genes 符号编码基因
chromosome 染色体
classifier system,CS 分类器系统
reproduction 复制
ribonucleic acid,RNA 核糖核酸
robustness 稳健性
roulette wheel selection 赌盘选择
scaling with sigma truncation O~截断尺度变换
schema 模式
schema defining length 模式定义长度
function optimization 函数最优化
GA deceptive problem 遗传算法欺骗问题
Gaussian mutation 高斯变异
gene 基因
generation gap 代沟
genetic algorithms,GAs 遗传算法
genetic operators 遗传算子
population size 群体大小
power law scaling 乘幂尺度变换
premature convergence 早熟现象,早期收敛
preselection 预选择
probabilistic algorithms 概率算法
probabilistic operator 概率算子
random walks 随机游走
rank-based model 排序选择模型

人工智能深度学习技术练习(习题卷18)

人工智能深度学习技术练习(习题卷18)说明：答案和解析在试卷最后第1部分：单项选择题，共47题，每题只有一个正确答案,多选或少选均不得分。

1.[单选题]一般我们建议将卷积生成对抗网络(convolutional generative adversarial nets)中生成部分的池化层替换成什么?CA)跨距卷积层(Strided convolutional layer)B)ReLU层C)部跨距卷积层(Fractional stridedconvolutional layer)D)仿射层(Affine layer)2.[单选题]梯度下降算法的正确步骤是什么?A计算预测值和真实值之间的误差B重复迭代，直至得到网络权重的最佳值C把输入传入网络，得到输出值D用随机值初始化权重和偏差E对每一个产生误差的神经元，调整相应的(权重)值以减小误差A)AbcdeB)EdcbaC)CbaedD)Dcaeb3.[单选题]自顶向下设计主要由下列哪个语法元素实现？（）。

A)面向对象B)函数C)循环结构D)过程4.[单选题]自定义循环训练模型模式，无需（）过程A)编译B)求导C)计算梯度D)计算得分5.[单选题]pytorch中可视化参数分布：A)writer.add_graphB)writer.add_scalarC)writer.add_histogramD)writer.add_figure6.[单选题]Sigmoid激活函数的导数范围是( )A)(0,0.1]B)(0,0.2]C)(0,0.25]D)(0,0.5]7.[单选题]绘制直方图的命令是（）。

A)plt.bar()B)plt.histC)plt.pieD)plt.scatter8.[单选题]哪个算法同一层使用不同的卷积核尺寸进行计算A)LeNetB)AlexNetC)ResNetD)Inception9.[单选题]要建立数学模型,其详细过程是必须经过数据测量、数据比较、( )过程,才能达到最优。

算法社会中的三大法则

e
rma
.c
ct
c
7
r
nuk
5.
p
参见 Fu
e
lo
ft
h
eFu
t
u
r
e Da
t
ai
sG
i
v
i
ngR
i
s
et
oaNew Ec
onomy Ec
onomi
s
t Ma
t
t
s www
.e
c
onomi
s
t
.c
om
y6 2017 h
p
(
“
数据最终会将是外部性:
n
ewsb
r
i
e
fmg21721634
howi
t
s
ha
代。我说的算法社会是什么意思? 我指的是一个由算法、机器人和人工智能体围绕社会和经济决策组成
的社会,这些代理人不仅作出决定,而且在某些情况下,还执行这些决定。因此,机器人和人工智能的使
用只是算法社会的一个特例。
大数据也只是算法社会的一个特征。事实上,
大数据只是以算法决策为中心的社会的另一面。大数据
是运行算法社会的燃料,
算法社会中的三大法则
[美]杰克·巴尔金著
刘
颖
陈瑶瑶译 *
内容摘要我们正在从互联网时代飞速走向算法社会,这是一个由算法、机器人和人工智能体围绕
社会和经济决策组成的社会。阿西莫夫三大定律具有一定的启发性,但仅适用于科幻小说。人类与机器
人的关系,犹如布拉格魔像传说中的“拉比和魔像”。在算法社会中,规制的核心问题不是算法,而是使用
on
-mu

“宏大叙事”与“切身利益”：政策接受度的多层次比较分析

论文“宏大叙事”与“切身利益”：政策接受度的多层次比较分析郭　跃　邓仪正　付雪聪【摘要】政策接受度是影响公共政策有效实施的关键因素之一。

理解公众政策接受度的认知规律，对于从理论上揭示公共政策有效性的底层逻辑、从实践上提升政策执行的效率并增加公众对政府的信任都具有重要意义。

学者们对政策接受度的概念和影响因素进行了广泛研究，并着重探讨了政策接受度的影响因素与形成机制。

然而，现有研究较少区分政策接受度的不同层次，也尚未解释不同层次政策接受度形成机制的差异。

基于此，论文以中国的社会信用体系建设为政策情境，对公众政策接受度的不同层次———基于“宏大叙事”的抽象接受度与基于“切身利益”的具象接受度加以区分，并对不同层次政策接受度的影响因素与认知特点展开比较分析。

研究基于问卷调查数据，在社会信用体系政策场景中选择个人与社会两个层次的风险感知与收益感知作为自变量，通过ＯＬＳ回归等分析方法，探究其对抽象与具象政策接受度的影响。

研究发现，公众对社会信用体系政策的抽象接受度总体显著高于具象接受度；抽象接受度同时受个人与社会两个层面的风险感知与收益感知的影响，而具象接受度只受个人层面风险感知与收益感知的影响。

在社会信用体系建设中，公众基于“宏大叙事”对政策的抽象认知受到理性与道德的共同作用，而涉及“切身利益”的具象接受度则仅受理性影响。

作为微观个体，公众既是期盼社会诚信水平得以提升的“社会人”，也是担心隐私泄露的“经济人”。

抽象接受度体现出微观个体同时衡量个体与社会的风险收益的“社会人”特征，而具象接受度则体现出微观个体受访者基于纯粹理性的成本收益分析的理性人。

研究结果揭示了在·７６·郭跃，北京师范大学政府管理学院副教授；邓仪正，北京师范大学政府管理学院硕士研究生；通讯作者：付雪聪，中山大学政治与公共事务管理学院硕士研究生。

感谢匿名评审专家的评审意见。

基金项目：国家自然科学基金面上项目：复杂动机嵌入下政策工具组合的公众个体行为效应与设计优化机制研究（７２２７４０２１），国家自然科学基金青年项目：行为公共政策视角下公众参与的政策反馈机制研究（７１８０４０１５），国家自然科学基金面上项目：地方政府公共政策执行行为选择机制的演化博弈分析（７２１７４１５５）。

智能优化算法的设计与应用

智能优化算法的设计与应用## Design and Application of Intelligent Optimization Algorithms ##。

Intelligent optimization algorithms are metaheuristic algorithms that aim to find optimal or near-optimal solutions to complex optimization problems. They are inspired by natural phenomena or other real-world processes and typically involve iterative searches and learning mechanisms. Intelligent optimization algorithms have been widely applied in various fields, including engineering, finance, healthcare, and operations research.### Types of Intelligent Optimization Algorithms.There are various types of intelligent optimization algorithms, some of the most commonly used are:1. Genetic algorithms (GAs): GAs mimic the process of natural selection and evolution to find optimal solutions.They represent solutions as chromosomes and use genetic operators such as crossover and mutation to create new solutions.2. Particle swarm optimization (PSO): PSO is inspiredby the swarming behavior of birds and fish. It represents solutions as particles that move through the search space, guided by their own best position and the best position found by the swarm.3. Ant colony optimization (ACO): ACO is based on the behavior of ants that use pheromones to find the shortest path to food sources. It represents solutions as paths and uses pheromones to guide the search towards promising regions.4. Differential evolution (DE): DE is a relatively new algorithm that uses the concept of differential vectors to generate new solutions. It randomly selects three solutions and uses their differences to create a candidate solution.### Applications of Intelligent Optimization Algorithms.Intelligent optimization algorithms have been applied to solve a wide range of problems in various fields, including:1. Engineering design: Designing and optimizing complex engineering systems such as aircraft, bridges, and engines.2. Finance: Optimizing portfolios, managing risk, and forecasting financial markets.3. Healthcare: Optimizing treatment plans, predicting disease progression, and drug discovery.4. Operations research: Scheduling, logistics, and resource allocation.### Advantages of Intelligent Optimization Algorithms.Intelligent optimization algorithms offer several advantages, including:1. Robustness: They can handle complex, nonlinear, and noisy problems with multiple constraints.2. Flexibility: They can be adapted to different types of problems and objective functions.3. Global search: They have the ability to explore the entire search space to find global optima, unlike local search algorithms.### Challenges in Designing Intelligent Optimization Algorithms.Designing intelligent optimization algorithms involves several challenges:1. Tuning: Choosing appropriate parameter settings to ensure efficient and effective optimization.2. Convergence: Ensuring that the algorithm converges to a satisfactory solution in a reasonable amount of time.3. Scalability: Developing algorithms that can handle large and complex problems with high dimensionality.## 智能优化算法的设计与应用 ##。

贝叶斯网络结构学习总结

贝叶斯⽹络结构学习总结完备数据集下的贝叶斯⽹络结构学习：基于依赖统计分析的⽅法—— 通常利⽤统计或是信息论的⽅法分析变量之间的依赖关系，从⽽获得最优的⽹络结构对于基于依赖统计分析⽅法的研究可分为三种：基于分解的⽅法（V结构的存在）Decomposition of search for v-structures in DAGsDecomposition of structural learning about directed acylic graphsStructural learning of chain graphs via decomposition基于Markov blanket的⽅法Using Markov blankets for causal structure learningLearning Bayesian network strcture using Markov blanket decomposition基于结构空间限制的⽅法Bayesian network learning algorithms using structural restrictions(将这些约束与pc算法相结合提出了⼀种改进算法，提⾼了结构学习效率)（约束由Campos指出包括1、⼀定存在⼀条⽆向边或是有向边 2、⼀定不存在⼀条⽆向边或有向边 3、部分节点的顺序）常⽤的算法：SGS——利⽤节点间的条件独⽴性来确定⽹络结构的⽅法PC——利⽤稀疏⽹络中节点不需要⾼阶独⽴性检验的特点，提出了⼀种削减策略:依次由0阶独⽴性检验开始到⾼阶独⽴性检验，对初始⽹络中节点之间的连接进⾏削减。

此种策略有效地从稀疏模型中建⽴贝叶斯⽹络，解决了SGS算法随着⽹络中节点数的增长复杂度呈指数倍增长的问题。

TPDA——把结构学习过程分三个阶段进⾏:a)起草(drafting)⽹络结构，利⽤节点之间的互信息得到⼀个初始的⽹络结构;b)增厚(thickening)⽹络结构，在步骤a)⽹络结构的基础上计算⽹络中不存在连接节点间的条件互信息，对满⾜条件的两节点之间添加边;。

Cover page Title Decentralized H ∞ Controller Design for Large-scale Wireless Structural S

Cover pageTitle: Decentralized H∞ Controller Design for Large-scale Wireless Structural Sensing and Control SystemsAuthors: Yang WangJerome P. LynchKincho H. Law(FIRST PAGE OF ARTICLE)ABSTRACTWireless structural sensing has attracted much research interest in recent years. With actuation functionality incorporated in the design of the wireless sensing nodes, a wireless feedback structural control system can be constructed. For wireless structural control systems that face challenges in communication range, latency, and reliability, decentralized system architectures can provide promising solutions. This paper examines the use of a decentralized controller that minimizes the H∞ norm of a closed-loop system. Decentralized control solutions are developed for both continuous-time and discrete-time formulations. To evaluate the performance of the decentralized H∞ controller design, numerical simulations of the wireless control system are conducted using a 20-story benchmark structure with different decentralized system architectures.INTRODUCTIONReal-time feedback control has been a topic of great interest to the structural engineering community in the last few decades [1]. A feedback structural control system includes a network of sensors, controllers, and actuators. In traditional feedback control systems, large amount of coaxial wiring is needed to connect the sensors, controllers, and actuators into an integrated system that features a feedback control loop. The cost for installing wires grows significantly as the size of the structure and the number of sensors and control devices increase. Furthermore, once a wired control system is installed in a structure, it could be quite costly to change the system architecture and to reroute the wires. To eliminate the cost and inconvenience of tethered installations, wireless communication and embedded computing technologies can be a viable alternative in structural control systems. In a prototype wireless control system [2], wireless communication replaces wired communication for the exchange of data between sensors and controllers. The distributed network of Yang Wang, and Kincho H. Law, Dept. of Civil and Environmental Engineering, Stanford University, Stanford, CA 94305. Jerome P. Lynch, Dept. of Civil and Environmental Engineering, University of Michigan, Ann Arbor, MI 48109.wireless sensors can be collocated with individual structural actuators for the calculation and execution of control forces.In a centralized control system, regardless of the communication medium (wireless or wired), the central controller has to collect data from all the sensors in the structure. Requirements on communication range and data transmission rate increase with the size of the structure and the number of sensors being deployed. For a wireless control system, these communication issues could potentially present difficulty for large-scale implementations. Furthermore, the centralized control server represents a point of potential bottleneck failure for the whole system. Decentralized control strategies could be deployed to overcome some of the inherent problems of a centralized control system [3, 4]. In decentralized control systems, multiple controllers are distributed throughout the structure. Requiring data only from neighboring sensors, each controller commands actuators in its vicinity. As a result, shorter communication range and lower data transmission rate are required.To ensure a suitable level of performance of the decentralized wireless control system, decentralized controller design based on the linear quadratic regulator (LQR) optimization criteria has been studied [4]. This paper explores a different control methodology, namely the H ∞ control theory that can offer excellent control performance when “worst-case” external disturbances are encountered. H ∞ controller design is also convenient when model uncertainties exist (as is typical in most civil structures). Centralized H ∞ design in the continuous-time domain for civil structural control has been studied by many researchers [5, 6]. One important feature of H ∞ control is that the controller design can be formulated using linear matrix inequalities (LMI) [8]. For an optimization problem with LMI constraints, sparsity patterns can be easily applied to the matrix variables. This property can be utilized to design decentralized controllers where the gain matrices with certain sparsity patterns represent decentralized information feedback. This paper explores the feasibility of designing decentralized H ∞ controllers that may be potentially employed in decentralized wireless structural control systems. Decentralized H ∞ controller design in both the continuous-time and discrete-time domains are investigated. Numerical simulations using a 20-story benchmark structure are conducted to illustrate the efficacy of the decentralized H ∞ controller design employed in different wireless structural control architectures.CONTINUOUS-TIME DECENTRALIZED ∞H CONTROLFor a lumped-mass structural model with n degrees-of-freedom (DOF) and m actuators, the state-space representation can be formulated as [1]:()()()()I I I I I t t t t =++xA xB u E w (1)where ()();I t t =⎡⎤⎣⎦x q q is the state vector; q (t ) is the n × 1 displacement vectorrelative to the ground; u (t ) and w (t ) are the m × 1 control force and r × 1 external excitation vectors, respectively; A I , B I , and E I are the 2n × 2n system, 2n × m actuatorlocation, and 2n × r excitation location matrices, respectively. In this study, it is assumed that inter-story drifts and velocities are observable. The displacement and velocity variables in I x , which are relative to the ground, are first transformed into drifts and velocities between neighboring floors. That is, the inter-story drift andvelocity at each floor are grouped together as x = [q 1; 1q; 21q q −; 21q q − ; …; 1n n q q −−; 1n n qq −− ]. A linear transformation matrix Γ can be defined such that I =x Γx . Substituting 1I −=x Γx into Eq. (1) and left-multiplying the equation withΓ, the state space representation with the transformed state vector becomes:()()()()t t t t =++xAx Bu Ew (2)where 1I −=A ΓA Γ, I =B ΓB , I =E ΓE . The system output z (t ) is defined as the sumof linear transformations to the state vector x (t ) and the control vector u (t ):()()()t t t =+z z z C x D u(3)where C z and D z are the output matrices for the state and control force vectors, respectively. Assuming static state feedback, the control force u (t ) is decided by u (t ) = Gx (t ), where G is termed the control gain matrix. Substituting Gx (t) for u (t ) in Eq. (2) and Eq. (3), the state-space equations of the closed-loop system can be written as:()()()CL t t t =+xA x Ew , and ()()CL t t =z C x (4)where , and CL CL =+=+z z A A BG C C D G . In the frequency-domain, the systemdynamics can be represented by the transfer function H zw (s ) from disturbance w (t ) to output z (t ) as [7]:()()1CL CL s s −=−zw H C I A E(5)where s is the complex Laplacian variable. The objective of ∞H control is to minimize the ∞H -norm of the transfer function with s on the imaginary axis:()()()()()222,0sup j sup t t t ωσω∞≠==⎡⎤⎣⎦zwzw w w H H z w(6)where ω represents the frequency, []σi denotes the maximum singular value of a matrix, and “sup” denotes the supremum of a set of real numbers. By minimizing thepeak of the maximum singular value of the transfer function over the entire frequency span, the system output can be greatly reduced when worst-case disturbances are applied to the system. Note that the ∞zw H norm has an equivalent interpretation in the time domain, as the supremum of the 2-norm amplification from the disturbance to the output, where the 2-norm of a signal f (t) is defined as ()2t =f. Following the Bounded Real Lemma, the following twostatements are equivalent for a γ-suboptimal ∞H controller design [8]:(i) γ∞<zwH , and A CL is stable in continuous-time sense (i.e. the real parts ofall the eigenvalues of A CL are negative);(ii) There exists a symmetric matrix 0>Θ s.t. following inequalities hold:20*T T T CL CL CL ⎡⎤++<⎢⎥−⎣⎦A ΘΘA EE ΘC I (7)where * denotes the symmetric entry. Using the closed-loop matrix definitions in Eq. (4), Eq. (7) becomes:20*T T T T T T T γ⎡⎤+++++<⎢⎥−⎣⎦z z A ΘΘA BG ΘΘG B EE ΘC ΘG D I(8)The above nonlinear matrix inequalities can be converted into a set of linear matrix inequalities (LMI) by introducing a new variable =Y G Θ:20*T T T T T T T γ⎡⎤+++++<⎢⎥−⎣⎦z z A ΘΘA BY Y B EE ΘC Y D I(9)In summary, the continuous-time γ-suboptimal ∞H control problem is nowtransformed into a convex optimization problem: the decision variables are Y , Θ, and γ ; the objective is to minimize γ ; and the constraints are 0>Θ and the LMI expressed in Eq. (9). Numerical solutions to this optimization problem can be computed, for example, using the Matlab LMI Toolbox [9]. After the optimization problem is solved, the γ-suboptimal control gain matrix is computed as:1−=G Y Θ (10)In general, the algorithm finds a gain matrix without any sparsity constraints, which represents a control scheme with centralized state feedback. Gain matrices for decentralized state feedback control can be found by applying appropriate sparsity constraints to the optimization variables Y and Θ.DISCRETE-TIME DECENTRALIZED ∞H CONTROLFor implementation in the microcontrollers of the wireless sensing and control units, a discrete-time decentralized ∞H controller design is developed. The continuous-time system in Eq. (4) can be equivalently formulated as a discrete-time system [1]:[][][]1CL k k k +=+d d d d d x A x E w , and [][]CL k k =d d d z C x(11)where the subscript “d” indicates that the variables are expressed in discrete-time domain, and the closed-loop system matrices A d CL and C d CL are defined accordingly. For linear state feedback, the control force []k d u is decided as [][]k k =d d d u G x . Following the Bounded Real Lemma, the following two statements are equivalent for discrete-time systems [7]:(i) The ∞H -norm of the closed-loop system in Eq.(11) is less than γ, and A d CL isstable in the discrete-time sense (i.e. all of the eigenvalues of A d CL fall in the unit circle on the complex plane);(ii) There exists a symmetric matrix 0>d Θ s.t. the following inequalities hold:0T T CL CL CL T CL γγ⎡⎤⎡⎤⎡⎤⎡⎤−<⎢⎥⎢⎥⎢⎥⎢⎥⎣⎦⎣⎦⎣⎦⎣⎦d d d d ddd dA E A C Θ0Θ0C 0E 00I 0I (12)Replacing d Θ with 2γdΘ and using Schur complements [8], the above matrix inequalities can be shown as equivalent to:2*0*****T T CL CL T γ⎡⎤⎢⎥⎢⎥>⎢⎥⎢⎥⎢⎥⎣⎦d d dd d d dΘ0A ΘC I E Θ0Θ0I (13)Left-multiplying and right-multiplying the above matrix with a positive definitematrix diag(11,,,−−d dΘI ΘI ), and letting 1−=d d ΘΘ , the following matrix inequalities are obtained:2*0*****TT CL CL T γ⎡⎤⎢⎥⎢⎥>⎢⎥⎢⎥⎣⎦d d d d d d d Θ0ΘA ΘC IE 0Θ0I (14)Similar to the continuous-time system, by replacing the closed-loop matrices A d CL andC d CL in Eq. (14), and letting =d d d Y G Θ, the discrete-time γ-suboptimal ∞H control problem can be converted to a convex optimization problem with LMI constraints. Furthermore, sparsity patterns of the gain matrix can be achieved by applying appropriate sparsity patterns to the LMI variables Y d and d Θ.NUMERICAL EXAMPLESince the discrete-time formulation is particularly suitable for implementation in digital controllers, such as the wireless sensing and actuation units developed [4], numerical simulation results are presented using the discrete-time ∞H controllers. A 20-story benchmark structure designed for the SAC project is selected [10]. To simplify the analysis, the building is modeled as an in-plane lumped-mass structure with one actuator allocated between each two neighboring floors. Fig. 1(a) shows the mass, stiffness, and damping parameters of the structure. In the numerical simulations, it is assumed that both the inter-story drifts and inter-story velocities between every two neighboring floors are observable. As shown in Eq. (2), the state-space equations are formulated such that the state-space vector contains inter-story drifts and velocities. For the simulations presented here, matrices C z and D z are defined as:7402072020204010, 10×−××⎡⎤⎡⎤==⎢⎥⎢⎥⎣⎦⎣⎦z1z z 0C C D I 0(15) where C is a 40 × 40 diagonal matrix, whose diagonal entries are ,1,1. Simulations are conducted for different decentralization schemes as shown in Fig. 1(b). The degrees of centralization (DC) reflect the different wireless network architectures, with each wireless channel representing one subnet of the global system. The actuators covered by a subnet are allowed to access the wireless sensor data within that subnet. For example, the case where DC = 1 implies each wireless channel covers only five stories and a total of four wireless channels (subnets) are utilized; the case where DC = 2 implies each wireless channel covers ten stories and a total of three wireless channels are utilized. The gain matrices for these two decentralized information structures have the following sparsity patterns:2040when DC 1×⎡⎤⎢⎥==⎢⎥⎢⎥⎣⎦d G ; 2040when DC 2×⎡⎤⎢⎥==⎢⎥⎢⎥⎣⎦d G(16)Each entry in the above matrices represents a 5 × 10 block submatrix. To achieve thesparsity patterns, the matrix Y d is defined to have the same sparsity pattern as G d , and d Θ is defined to be block-diagonal. For the cases where DC = 3 and DC = 4, the number of stories covered by each wireless subnet increases accordingly.To investigate the effectiveness of the proposed decentralized controller design, we assume the 20-story structure is instrumented with ideal actuators that produce any desired horizontal force between every two neighboring floors. Simulations are performed for different centralization degrees (DC = 1,…,4) and sampling periods (ranging from 0.01s to 0.06s at a resolution of 0.01s). Additionally, three ground motion records all scaled to a peak ground acceleration (PGA) of 1m/s 2 are used for the simulation: the 1940 El Centro NS record (Imperial Valley Irrigation District Station), the 1995 Kobe NS record (JMA Station), and the 1999 Chi-Chi NS record (TCU-076 Station). Two representative performance indexes are adopted:[][]{}{}1222Earthquakes,,Earthquakesˆˆmaxmax max , and maxiik ik iPI d k d k PI==d d zz(17)Here 1PI and 2PI are the performance indexes corresponding to inter-story drifts and the output vector z d , respectively. In Eq. (17), []i d k represents the inter-story drift between floor i (i = 1, …, n ) and its lower floor at time step k , and [],max i k id k is themaximum inter-story drift over the entire time history and among all floors. The maximum inter-story drift is normalized by its counterpart [],ˆmax ik id k , which is the maximum response of the uncontrolled structure. The largest normalized ratio amongthe simulations for the three different earthquake records is defined as the performance index 1PI . Similarly, the performance index 2PI is defined based on the 2-norm of the output vector z d , i.e.[][]21KT k k k ==∑dd d z z z , where K denotes the lasttime step for the duration of the simulation. When computing the two indexes, auniform time step of 0.001s is used to collect the structural response data points for []i d k and []k d z , regardless of the sampling period of the control scheme.Fig. 2 shows the control performance indexes for different degrees of centralization and sampling periods. Generally speaking, control performance is better for higher degrees of centralization and shorter sampling periods. The plots show that except for the case where DC = 1, other control schemes with information overlapping achieve comparable performance. To better review the simulation results,the performance indexes for the four different control schemes are re-plotted as a function of sampling period in Fig. 2(c) and 2(d). It is observed that if shorter sampling periods are achieved in partially decentralized control systems, the system performance can be better than centralized systems with longer sampling periods.SUMMARY AND CONCLUSIONThis paper discusses decentralized structural control design that minimizes the systemH norm with potential applications to wireless structural sensing and control∞systems. Solutions are developed for both continuous-time and discrete-time formulations. Numerical simulation results using a 20-story benchmark structure illustrate the performance of the decentralized controller design. Future research in decentralizedH controller design may utilize system measurement feedback and∞consider time delay effects in the design. Comparative study will be conducted between the decentralizedH controller design and the previously proposed∞decentralized controller design based on LQR criteria [4].ACKNOWLEDGEMENTThis research is partially funded by the National Science Foundation under grant CMS-0528867 and the Office of Naval Research Young Investigator Program awarded to Prof. Jerome P. Lynch at the University of Michigan. The first author was supported by the Office of Technology Licensing Stanford Graduate Fellowship.REFERENCES1. Soong, T.T. 1990. Active Structural Control: Theory and Practice. Wiley, Harlow, Essex, England.2. Wang, Y., R.A. Swartz, J.P. Lynch, K.H. Law, K.-C. Lu, and C.-H. Loh. 2006. "Wireless feedbackstructural control with embedded computing," Proceedings of the SPIE 11th International Symposium on Nondestructive Evaluation for Health Monitoring and Diagnostics, February 26 - March 2, 2006.3. Sandell, N., Jr., P. Varaiya, M. Athans, and M. Safonov. 1978. "Survey of decentralized controlmethods for large scale systems," IEEE T. Automat. Contr., 23(2):108-128.4. Wang, Y., R.A. Swartz, J.P. Lynch, K.H. Law, K.-C. Lu, and C.-H. Loh. 2007. "Decentralized civilstructural control using real-time wireless sensing and embedded computing," Smart Struct. Syst.:in press.5. Yang, J.N., S. Lin, and F. Jabbari. 2004. "H∞-based control strategies for civil engineeringstructures," Struct. Control Hlth., 11(3):223-237.6. Balandin, D.V. and M.M. Kogan. 2005. "LMI-based optimal attenuation of multi-storey buildingoscillations under seismic excitations," Struct. Control Hlth., 12(2):213-224.7. Zhou, K., J.C. Doyle, and K. Glover. 1996. Robust and Optimal Control. Prentice Hall, EnglewoodCliffs, NJ.8. Boyd, S.P., L. El Ghaoui, E. Feron, and V. Balakrishnan. 1994. Linear Matrix Inequalities inSystem and Control Theory. SIAM, Philadelphia, PA.9. Gahinet, P. 1995. LMI Control Toolbox for Use with MATLAB. MathWorks Inc., Natick, MA.10. Spencer, B.F., Jr., R.E. Christenson, and S.J. Dyke. 1998. "Next generation benchmark controlproblem for seismically excited buildings," Proceedings of the 2nd World Conference on Structural Control, June 29 -July 2, 1998.。

consistency regularization 出处 -回复

consistency regularization 出处-回复Consistency Regularization [出处]Consistency regularization is a technique used in machine learning to improve generalization performance and reduce overfitting. It was first introduced in the paper titled "Consistency Regularization for Unsupervised Domain Adaptation" by Sajjadi et al. in 2016. Since then, consistency regularization has been widely adopted and expanded upon in various domains, including image classification, natural language processing, and reinforcement learning.To understand consistency regularization, let's start by defining the problem it aims to solve. In supervised learning, we train models to predict labels given input data and corresponding ground truth labels. However, this approach assumes that the training data and test data are drawn from the same distribution. In real-world scenarios, this assumption often does not hold true. The distribution of the test data may differ from the distribution of the training data, leading to poor performance on unseen data.Unsupervised domain adaptation addresses this issue by leveraging labeled data from a source domain to improve performance on the target domain, where labeled data is scarce or unavailable. Consistency regularization is a method used inunsupervised domain adaptation to encourage the model to produce consistent predictions across different views of the same input.Consistency regularization works by leveraging the concept of perturbation. Given an input sample, we can create a perturbed version of it by applying a random transformation, such as adding Gaussian noise or flipping the image horizontally. By perturbing the input, we create multiple views of the same sample.With these different views, we can train our model to produce consistent predictions across them. The idea is that if the model is robust and generalizable, small perturbations should not drastically change its predictions. Therefore, by minimizing the discrepancy between the predictions on original and perturbed inputs, we encourage the model to make consistent predictions.To implement consistency regularization, we typically use a deep neural network as our model. We pass both the original and perturbed inputs through the network and calculate the predictions. We then compare the predictions using a loss function, such as mean squared error or cross-entropy loss. The goal is to minimize this loss, effectively encouraging consistency.In addition to the loss function used for consistency regularization, we also need to consider the trade-off betweenconsistency and accuracy. If we emphasize consistency too much, the model may become overly cautious and produce overly confident but predictable predictions. On the other hand, if we focus too much on accuracy, the model may ignore the perturbations and fail to generalize well to unseen data. Striking the right balance is crucial for successful consistency regularization.Consistency regularization has shown promising results in various domains. In image classification tasks, it has been used to reduce overfitting and improve performance on out-of-distribution data. For natural language processing tasks, consistency regularization has been applied to improve language models' robustness and reduce sensitivity to input variations. In reinforcement learning, it has been used to stabilize training and improve exploration-exploitation trade-offs.In conclusion, consistency regularization is a powerful technique in machine learning that helps improve generalization performance and reduce overfitting. It encourages models to produce consistent predictions across different views of the same input, making them more robust and better able to handle slight variations in the data. With its wide applicability and promising results, consistency regularization has become an important tool invarious domains of machine learning and continues to be an active area of research and development.。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

February 19, 2012
DRAFT
2
Most existing joint sparse recovery algorithms (e.g., [2]–[5]) are centralized. Consider joint sparse recovery in a multi-agent network. With centralized computation, all the agents must send their data
q
iterations, where q can
be 1 or 2. We numerically compare the proposed decentralized algorithms with existing centralized and decentralized algorithms. Simulation results demonstrate that the proposed decentralized approaches have strong recovery performance and converge reasonably fast.
{y(i) }L i=1 to a fusion center, where a joint sparse recovery algorithm then tries to obtain the solutions.
With decentralized computation, an algorithm runs at every agent in the network, and each agent only needs to exchange information with its one-hop neighbors, namely, those agents within its communication range. While decentralized algorithms are more difﬁcult to develop, they simplify trafﬁc routing, balance communication load, and provide better network scalability. A. Applications Decentralized joint sparse recovery is suitable for a group of related agents to quickly obtain solutions and reach decisions by themselves, instead of spending time and bandwidth on sending all the data to a distant fusion center for processing. Let us brieﬂy describe some examples: Distributed human action recognition: A set of L wireless motion sensors are placed on a person to recognize N actions such as sitting, running, walking, and climbing upstairs/downstairs [6]. Associated with each sensor i is an N × 1 vector x(i) sparsely representing the current action, or a combination of the current actions. In the training stage, the network collects M measurements for each of the N actions; sensor i has the training data A(i) ∈ RM ×N . In the testing stage, sensor i collects y(i) ∈ RM , i =
y(i) = A(i) x(i) . For the same reason, {x(i) }L i=1 are subject to jointly sparse recovery.
Decentralized event detection: In [9], footstep data were collected by two sets of nine sensors, each set consisting of four acoustic, three seismic, one passive infrared and one ultrasonic sensors. Joint sparse
1
Decentralized Jointly Sparse Optimization by Reweighted
q
Minimization
Qing Ling, Zaiwen Wen, and Wotao Yin
Abstract A set of vectors (or signals) are jointly sparse if their nonzero entries are commonly supported on a small subset of locations. Consider a network of agents which collaborative recover a set of joint sparse vectors. This paper proposes novel decentralized algorithms to recover these vectors in a way that every agent runs a recovery algorithm while neighbor agents exchange only their estimates of the joint support but not their data. The agents will obtain their solutions by taking advantages of the joint sparse structure while keeping their data private. As such, the proposed approach ﬁnds applications in distributed (compressive) sensing, decentralized event detection, as well as collaborative data mining. We use a non-convex minimization model and propose algorithms that alternate between support estimate consensus and signal estimate update. The latter step is based on reweighted
1 1, . . . , L, from which the action vectors {x(i) }L i=1 are recovered based on the joint sparsity assumption .
Cooperative spectrum sensing: Suppose that L cognitive radios are sensing sparse wideband spectra2 , where the ith cognitive radioபைடு நூலகம்takes the time-domain measurement y(i) . Each y(i) can be sparsely represented under the Fourier basis, namely, y(i) = F−1 x(i) where F is the Fourier transform matrix and
Qing Ling is with Department of Automation, University of Science and Technology of China, Hefei, Anhui, China. Email: qingling@. His work is supported in part by NSFC 61004137. Zaiwen Wen is with Department of Mathematics, Shanghai Jiaotong University, Shanghai, China. Email: zw2109@. Wotao Yin is with Department of Computational and Applied Mathematics, Rice University, Houston, Texas, USA. Email: wotao.yin@. His work is supported in part by NSF DMS-07-48839 and ECCS-1028782, ONR N00014-08-1-1101, U.S. ARL and ARO W911NF-09-1-0383.
I. I NTRODUCTION In this paper, we derive decentralized algorithms for recovering a set of jointly sparse vectors (or signals) from distributed data. Suppose that there are L agents constituting a bidirectionally connected network, and each agent i in the network wants to recover a signal x(i) ∈ RN from its data y(i) . Our goal is to recover {x(i) }L i=1 under the assumption that they are jointly sparse, namely, their nonzero entries appear on just a few rows of X = [x(1) , x(2) , . . . , x(L) ] [1].

2019年托福高频词汇表：sparse什么意思(附翻译及例句).doc

页数:3
Sparse Feature Learning for Deep Belief Networks

页数:8
Sparse Representations in Unions of Bases

页数:6
稀疏编码 Optimization with sparse inducingnorms

页数:22
稀疏表示在信号处理中的应用

页数:15
第十二讲：Matlab稀疏矩阵介绍

页数:14
sparse recovery 稀疏求解

页数:26
SPARSE(稀松矩阵求解器)

页数:4
稠密匹配和稀疏匹配

页数:3
《Kernel Sparse Representation》中文翻译

页数:8