Learning in Multiagent Systems An Introduction from a Game-Theoretic Perspective
- 格式:pdf
- 大小:161.05 KB
- 文档页数:14
多Agent系统理论概述摘要:Agent在AI(AI:Artificial Intelligence)研究领域已经成为热点,Agent 技术提供了一种新的计算和问题求解规范。
本文简要的讨论Agent、多Agent系统。
关键词:多Agent系统概述1Agent概述1.1Agent的基本概念Agent的概念最早出现在20世纪70年代的人工智能中,80年代后期,被译为“代”理,“智能体”或“智能主体”。
这些概念在许多领域被引用,不同的研究领域和内容,给出了许多不尽相同的定义。
目前为止还没有一个对Agent统一的定义,但多数研究者接受wooldridge和Jelinings所提出的Agent定义,即Agent 是一个具有自治性、社会能力和反应特性的计算机软、硬件系统,它具有自治性、社会能力、反应性和主动性。
1.2Agent具有的特性根据wooldridge的定义,对于Agent所应具有以下特征:1.自治性(Autonomy):Agent一般都具有自己的资源和局部于自身的控制机制,能够在没有外界直接操控下,根据自身的内部状态以及感知的外部环境信息,决定和控制自身的行为。
2.社会能力(Social Ability):Agent之间并不是孤立的。
和人一样,Agent具有通信能力,能够通过某种Agent通信语言与其他Agent进行各种各样的交互,也能和其他各类Agent一起有效地完成各种层次的协同工作。
3.反应性(Reactivity):Agent能够及时地感知其所在外部环境的变化,并能够针对一些特定的时间做出相应的反应。
4.主动性(activity):Agent能够遵循其承诺采取主动行动,表现出面向目标的行为。
它要求Agent保持比较稳定的目标,它的动作都是以此目标为依据的,从而产生一种叫做目标指引的行为(Goal Directed Behavior)。
1.3Agent分类从不同的角度,Agent有下面几种分类方法:1.根据Agent的存在形式:分为有形Agent和无形Agent。
人工智能的群智能和多智能体系统方法人工智能(Artificial Intelligence,简称AI)是一门研究如何使机器能够像人一样思考、学习和判断的学科。
近年来,随着计算能力的提升和数据的爆炸增长,人工智能在各个领域得到了广泛的应用。
而在人工智能的发展过程中,群智能(Collective Intelligence)和多智能体系统(Multi-agent Systems)被认为是非常重要的方法和思想。
本文将着重探讨,并介绍其在实际应用中的意义和价值。
首先,我们需要明确什么是群智能和多智能体系统。
群智能是指在大量个体协作或竞争的情况下,通过信息的交流和共享来实现优化的一种智能系统。
多智能体系统是一种由多个独立的智能体组成的系统,这些智能体可以相互交流、协作和竞争,以实现特定的目标。
多智能体系统通常借鉴了生物学中动物群体的行为,如蚁群算法、鸟群算法等,来设计解决复杂问题的方法和模型。
群智能和多智能体系统方法在人工智能领域有许多重要的应用。
首先,群智能和多智能体系统方法可以用于解决大规模的优化问题。
在现实生活中,许多问题都涉及到多个变量和多个目标之间的复杂关系。
传统的优化算法在面对这类问题时通常效果不佳,而群智能和多智能体系统方法可以通过个体之间的交流和协作来寻找全局最优解。
例如,蚁群算法可以用于解决旅行商问题,通过模拟蚂蚁在搜索食物时释放信息素的行为,不断优化路径,找到最短的旅行路线。
其次,群智能和多智能体系统方法可以用于模拟和仿真复杂的生态系统。
在生态学和环境保护领域,人们经常需要研究不同物种之间的相互作用以及它们对环境的影响。
传统的方法往往通过建立数学模型和进行数值模拟来解决这些问题,但是这种方法往往过于复杂和耗时。
而群智能和多智能体系统方法则可以通过模拟智能体之间的相互作用和行为来模拟整个生态系统的演化过程。
例如,狼群算法就可以用于模拟狼群的捕食行为以及狼群对环境的适应性,从而对保护生态系统和物种多样性有着重要的应用价值。
机器学习与人工智能领域中常用的英语词汇1.General Concepts (基础概念)•Artificial Intelligence (AI) - 人工智能1)Artificial Intelligence (AI) - 人工智能2)Machine Learning (ML) - 机器学习3)Deep Learning (DL) - 深度学习4)Neural Network - 神经网络5)Natural Language Processing (NLP) - 自然语言处理6)Computer Vision - 计算机视觉7)Robotics - 机器人技术8)Speech Recognition - 语音识别9)Expert Systems - 专家系统10)Knowledge Representation - 知识表示11)Pattern Recognition - 模式识别12)Cognitive Computing - 认知计算13)Autonomous Systems - 自主系统14)Human-Machine Interaction - 人机交互15)Intelligent Agents - 智能代理16)Machine Translation - 机器翻译17)Swarm Intelligence - 群体智能18)Genetic Algorithms - 遗传算法19)Fuzzy Logic - 模糊逻辑20)Reinforcement Learning - 强化学习•Machine Learning (ML) - 机器学习1)Machine Learning (ML) - 机器学习2)Artificial Neural Network - 人工神经网络3)Deep Learning - 深度学习4)Supervised Learning - 有监督学习5)Unsupervised Learning - 无监督学习6)Reinforcement Learning - 强化学习7)Semi-Supervised Learning - 半监督学习8)Training Data - 训练数据9)Test Data - 测试数据10)Validation Data - 验证数据11)Feature - 特征12)Label - 标签13)Model - 模型14)Algorithm - 算法15)Regression - 回归16)Classification - 分类17)Clustering - 聚类18)Dimensionality Reduction - 降维19)Overfitting - 过拟合20)Underfitting - 欠拟合•Deep Learning (DL) - 深度学习1)Deep Learning - 深度学习2)Neural Network - 神经网络3)Artificial Neural Network (ANN) - 人工神经网络4)Convolutional Neural Network (CNN) - 卷积神经网络5)Recurrent Neural Network (RNN) - 循环神经网络6)Long Short-Term Memory (LSTM) - 长短期记忆网络7)Gated Recurrent Unit (GRU) - 门控循环单元8)Autoencoder - 自编码器9)Generative Adversarial Network (GAN) - 生成对抗网络10)Transfer Learning - 迁移学习11)Pre-trained Model - 预训练模型12)Fine-tuning - 微调13)Feature Extraction - 特征提取14)Activation Function - 激活函数15)Loss Function - 损失函数16)Gradient Descent - 梯度下降17)Backpropagation - 反向传播18)Epoch - 训练周期19)Batch Size - 批量大小20)Dropout - 丢弃法•Neural Network - 神经网络1)Neural Network - 神经网络2)Artificial Neural Network (ANN) - 人工神经网络3)Deep Neural Network (DNN) - 深度神经网络4)Convolutional Neural Network (CNN) - 卷积神经网络5)Recurrent Neural Network (RNN) - 循环神经网络6)Long Short-Term Memory (LSTM) - 长短期记忆网络7)Gated Recurrent Unit (GRU) - 门控循环单元8)Feedforward Neural Network - 前馈神经网络9)Multi-layer Perceptron (MLP) - 多层感知器10)Radial Basis Function Network (RBFN) - 径向基函数网络11)Hopfield Network - 霍普菲尔德网络12)Boltzmann Machine - 玻尔兹曼机13)Autoencoder - 自编码器14)Spiking Neural Network (SNN) - 脉冲神经网络15)Self-organizing Map (SOM) - 自组织映射16)Restricted Boltzmann Machine (RBM) - 受限玻尔兹曼机17)Hebbian Learning - 海比安学习18)Competitive Learning - 竞争学习19)Neuroevolutionary - 神经进化20)Neuron - 神经元•Algorithm - 算法1)Algorithm - 算法2)Supervised Learning Algorithm - 有监督学习算法3)Unsupervised Learning Algorithm - 无监督学习算法4)Reinforcement Learning Algorithm - 强化学习算法5)Classification Algorithm - 分类算法6)Regression Algorithm - 回归算法7)Clustering Algorithm - 聚类算法8)Dimensionality Reduction Algorithm - 降维算法9)Decision Tree Algorithm - 决策树算法10)Random Forest Algorithm - 随机森林算法11)Support Vector Machine (SVM) Algorithm - 支持向量机算法12)K-Nearest Neighbors (KNN) Algorithm - K近邻算法13)Naive Bayes Algorithm - 朴素贝叶斯算法14)Gradient Descent Algorithm - 梯度下降算法15)Genetic Algorithm - 遗传算法16)Neural Network Algorithm - 神经网络算法17)Deep Learning Algorithm - 深度学习算法18)Ensemble Learning Algorithm - 集成学习算法19)Reinforcement Learning Algorithm - 强化学习算法20)Metaheuristic Algorithm - 元启发式算法•Model - 模型1)Model - 模型2)Machine Learning Model - 机器学习模型3)Artificial Intelligence Model - 人工智能模型4)Predictive Model - 预测模型5)Classification Model - 分类模型6)Regression Model - 回归模型7)Generative Model - 生成模型8)Discriminative Model - 判别模型9)Probabilistic Model - 概率模型10)Statistical Model - 统计模型11)Neural Network Model - 神经网络模型12)Deep Learning Model - 深度学习模型13)Ensemble Model - 集成模型14)Reinforcement Learning Model - 强化学习模型15)Support Vector Machine (SVM) Model - 支持向量机模型16)Decision Tree Model - 决策树模型17)Random Forest Model - 随机森林模型18)Naive Bayes Model - 朴素贝叶斯模型19)Autoencoder Model - 自编码器模型20)Convolutional Neural Network (CNN) Model - 卷积神经网络模型•Dataset - 数据集1)Dataset - 数据集2)Training Dataset - 训练数据集3)Test Dataset - 测试数据集4)Validation Dataset - 验证数据集5)Balanced Dataset - 平衡数据集6)Imbalanced Dataset - 不平衡数据集7)Synthetic Dataset - 合成数据集8)Benchmark Dataset - 基准数据集9)Open Dataset - 开放数据集10)Labeled Dataset - 标记数据集11)Unlabeled Dataset - 未标记数据集12)Semi-Supervised Dataset - 半监督数据集13)Multiclass Dataset - 多分类数据集14)Feature Set - 特征集15)Data Augmentation - 数据增强16)Data Preprocessing - 数据预处理17)Missing Data - 缺失数据18)Outlier Detection - 异常值检测19)Data Imputation - 数据插补20)Metadata - 元数据•Training - 训练1)Training - 训练2)Training Data - 训练数据3)Training Phase - 训练阶段4)Training Set - 训练集5)Training Examples - 训练样本6)Training Instance - 训练实例7)Training Algorithm - 训练算法8)Training Model - 训练模型9)Training Process - 训练过程10)Training Loss - 训练损失11)Training Epoch - 训练周期12)Training Batch - 训练批次13)Online Training - 在线训练14)Offline Training - 离线训练15)Continuous Training - 连续训练16)Transfer Learning - 迁移学习17)Fine-Tuning - 微调18)Curriculum Learning - 课程学习19)Self-Supervised Learning - 自监督学习20)Active Learning - 主动学习•Testing - 测试1)Testing - 测试2)Test Data - 测试数据3)Test Set - 测试集4)Test Examples - 测试样本5)Test Instance - 测试实例6)Test Phase - 测试阶段7)Test Accuracy - 测试准确率8)Test Loss - 测试损失9)Test Error - 测试错误10)Test Metrics - 测试指标11)Test Suite - 测试套件12)Test Case - 测试用例13)Test Coverage - 测试覆盖率14)Cross-Validation - 交叉验证15)Holdout Validation - 留出验证16)K-Fold Cross-Validation - K折交叉验证17)Stratified Cross-Validation - 分层交叉验证18)Test Driven Development (TDD) - 测试驱动开发19)A/B Testing - A/B 测试20)Model Evaluation - 模型评估•Validation - 验证1)Validation - 验证2)Validation Data - 验证数据3)Validation Set - 验证集4)Validation Examples - 验证样本5)Validation Instance - 验证实例6)Validation Phase - 验证阶段7)Validation Accuracy - 验证准确率8)Validation Loss - 验证损失9)Validation Error - 验证错误10)Validation Metrics - 验证指标11)Cross-Validation - 交叉验证12)Holdout Validation - 留出验证13)K-Fold Cross-Validation - K折交叉验证14)Stratified Cross-Validation - 分层交叉验证15)Leave-One-Out Cross-Validation - 留一法交叉验证16)Validation Curve - 验证曲线17)Hyperparameter Validation - 超参数验证18)Model Validation - 模型验证19)Early Stopping - 提前停止20)Validation Strategy - 验证策略•Supervised Learning - 有监督学习1)Supervised Learning - 有监督学习2)Label - 标签3)Feature - 特征4)Target - 目标5)Training Labels - 训练标签6)Training Features - 训练特征7)Training Targets - 训练目标8)Training Examples - 训练样本9)Training Instance - 训练实例10)Regression - 回归11)Classification - 分类12)Predictor - 预测器13)Regression Model - 回归模型14)Classifier - 分类器15)Decision Tree - 决策树16)Support Vector Machine (SVM) - 支持向量机17)Neural Network - 神经网络18)Feature Engineering - 特征工程19)Model Evaluation - 模型评估20)Overfitting - 过拟合21)Underfitting - 欠拟合22)Bias-Variance Tradeoff - 偏差-方差权衡•Unsupervised Learning - 无监督学习1)Unsupervised Learning - 无监督学习2)Clustering - 聚类3)Dimensionality Reduction - 降维4)Anomaly Detection - 异常检测5)Association Rule Learning - 关联规则学习6)Feature Extraction - 特征提取7)Feature Selection - 特征选择8)K-Means - K均值9)Hierarchical Clustering - 层次聚类10)Density-Based Clustering - 基于密度的聚类11)Principal Component Analysis (PCA) - 主成分分析12)Independent Component Analysis (ICA) - 独立成分分析13)T-distributed Stochastic Neighbor Embedding (t-SNE) - t分布随机邻居嵌入14)Gaussian Mixture Model (GMM) - 高斯混合模型15)Self-Organizing Maps (SOM) - 自组织映射16)Autoencoder - 自动编码器17)Latent Variable - 潜变量18)Data Preprocessing - 数据预处理19)Outlier Detection - 异常值检测20)Clustering Algorithm - 聚类算法•Reinforcement Learning - 强化学习1)Reinforcement Learning - 强化学习2)Agent - 代理3)Environment - 环境4)State - 状态5)Action - 动作6)Reward - 奖励7)Policy - 策略8)Value Function - 值函数9)Q-Learning - Q学习10)Deep Q-Network (DQN) - 深度Q网络11)Policy Gradient - 策略梯度12)Actor-Critic - 演员-评论家13)Exploration - 探索14)Exploitation - 开发15)Temporal Difference (TD) - 时间差分16)Markov Decision Process (MDP) - 马尔可夫决策过程17)State-Action-Reward-State-Action (SARSA) - 状态-动作-奖励-状态-动作18)Policy Iteration - 策略迭代19)Value Iteration - 值迭代20)Monte Carlo Methods - 蒙特卡洛方法•Semi-Supervised Learning - 半监督学习1)Semi-Supervised Learning - 半监督学习2)Labeled Data - 有标签数据3)Unlabeled Data - 无标签数据4)Label Propagation - 标签传播5)Self-Training - 自训练6)Co-Training - 协同训练7)Transudative Learning - 传导学习8)Inductive Learning - 归纳学习9)Manifold Regularization - 流形正则化10)Graph-based Methods - 基于图的方法11)Cluster Assumption - 聚类假设12)Low-Density Separation - 低密度分离13)Semi-Supervised Support Vector Machines (S3VM) - 半监督支持向量机14)Expectation-Maximization (EM) - 期望最大化15)Co-EM - 协同期望最大化16)Entropy-Regularized EM - 熵正则化EM17)Mean Teacher - 平均教师18)Virtual Adversarial Training - 虚拟对抗训练19)Tri-training - 三重训练20)Mix Match - 混合匹配•Feature - 特征1)Feature - 特征2)Feature Engineering - 特征工程3)Feature Extraction - 特征提取4)Feature Selection - 特征选择5)Input Features - 输入特征6)Output Features - 输出特征7)Feature Vector - 特征向量8)Feature Space - 特征空间9)Feature Representation - 特征表示10)Feature Transformation - 特征转换11)Feature Importance - 特征重要性12)Feature Scaling - 特征缩放13)Feature Normalization - 特征归一化14)Feature Encoding - 特征编码15)Feature Fusion - 特征融合16)Feature Dimensionality Reduction - 特征维度减少17)Continuous Feature - 连续特征18)Categorical Feature - 分类特征19)Nominal Feature - 名义特征20)Ordinal Feature - 有序特征•Label - 标签1)Label - 标签2)Labeling - 标注3)Ground Truth - 地面真值4)Class Label - 类别标签5)Target Variable - 目标变量6)Labeling Scheme - 标注方案7)Multi-class Labeling - 多类别标注8)Binary Labeling - 二分类标注9)Label Noise - 标签噪声10)Labeling Error - 标注错误11)Label Propagation - 标签传播12)Unlabeled Data - 无标签数据13)Labeled Data - 有标签数据14)Semi-supervised Learning - 半监督学习15)Active Learning - 主动学习16)Weakly Supervised Learning - 弱监督学习17)Noisy Label Learning - 噪声标签学习18)Self-training - 自训练19)Crowdsourcing Labeling - 众包标注20)Label Smoothing - 标签平滑化•Prediction - 预测1)Prediction - 预测2)Forecasting - 预测3)Regression - 回归4)Classification - 分类5)Time Series Prediction - 时间序列预测6)Forecast Accuracy - 预测准确性7)Predictive Modeling - 预测建模8)Predictive Analytics - 预测分析9)Forecasting Method - 预测方法10)Predictive Performance - 预测性能11)Predictive Power - 预测能力12)Prediction Error - 预测误差13)Prediction Interval - 预测区间14)Prediction Model - 预测模型15)Predictive Uncertainty - 预测不确定性16)Forecast Horizon - 预测时间跨度17)Predictive Maintenance - 预测性维护18)Predictive Policing - 预测式警务19)Predictive Healthcare - 预测性医疗20)Predictive Maintenance - 预测性维护•Classification - 分类1)Classification - 分类2)Classifier - 分类器3)Class - 类别4)Classify - 对数据进行分类5)Class Label - 类别标签6)Binary Classification - 二元分类7)Multiclass Classification - 多类分类8)Class Probability - 类别概率9)Decision Boundary - 决策边界10)Decision Tree - 决策树11)Support Vector Machine (SVM) - 支持向量机12)K-Nearest Neighbors (KNN) - K最近邻算法13)Naive Bayes - 朴素贝叶斯14)Logistic Regression - 逻辑回归15)Random Forest - 随机森林16)Neural Network - 神经网络17)SoftMax Function - SoftMax函数18)One-vs-All (One-vs-Rest) - 一对多(一对剩余)19)Ensemble Learning - 集成学习20)Confusion Matrix - 混淆矩阵•Regression - 回归1)Regression Analysis - 回归分析2)Linear Regression - 线性回归3)Multiple Regression - 多元回归4)Polynomial Regression - 多项式回归5)Logistic Regression - 逻辑回归6)Ridge Regression - 岭回归7)Lasso Regression - Lasso回归8)Elastic Net Regression - 弹性网络回归9)Regression Coefficients - 回归系数10)Residuals - 残差11)Ordinary Least Squares (OLS) - 普通最小二乘法12)Ridge Regression Coefficient - 岭回归系数13)Lasso Regression Coefficient - Lasso回归系数14)Elastic Net Regression Coefficient - 弹性网络回归系数15)Regression Line - 回归线16)Prediction Error - 预测误差17)Regression Model - 回归模型18)Nonlinear Regression - 非线性回归19)Generalized Linear Models (GLM) - 广义线性模型20)Coefficient of Determination (R-squared) - 决定系数21)F-test - F检验22)Homoscedasticity - 同方差性23)Heteroscedasticity - 异方差性24)Autocorrelation - 自相关25)Multicollinearity - 多重共线性26)Outliers - 异常值27)Cross-validation - 交叉验证28)Feature Selection - 特征选择29)Feature Engineering - 特征工程30)Regularization - 正则化2.Neural Networks and Deep Learning (神经网络与深度学习)•Convolutional Neural Network (CNN) - 卷积神经网络1)Convolutional Neural Network (CNN) - 卷积神经网络2)Convolution Layer - 卷积层3)Feature Map - 特征图4)Convolution Operation - 卷积操作5)Stride - 步幅6)Padding - 填充7)Pooling Layer - 池化层8)Max Pooling - 最大池化9)Average Pooling - 平均池化10)Fully Connected Layer - 全连接层11)Activation Function - 激活函数12)Rectified Linear Unit (ReLU) - 线性修正单元13)Dropout - 随机失活14)Batch Normalization - 批量归一化15)Transfer Learning - 迁移学习16)Fine-Tuning - 微调17)Image Classification - 图像分类18)Object Detection - 物体检测19)Semantic Segmentation - 语义分割20)Instance Segmentation - 实例分割21)Generative Adversarial Network (GAN) - 生成对抗网络22)Image Generation - 图像生成23)Style Transfer - 风格迁移24)Convolutional Autoencoder - 卷积自编码器25)Recurrent Neural Network (RNN) - 循环神经网络•Recurrent Neural Network (RNN) - 循环神经网络1)Recurrent Neural Network (RNN) - 循环神经网络2)Long Short-Term Memory (LSTM) - 长短期记忆网络3)Gated Recurrent Unit (GRU) - 门控循环单元4)Sequence Modeling - 序列建模5)Time Series Prediction - 时间序列预测6)Natural Language Processing (NLP) - 自然语言处理7)Text Generation - 文本生成8)Sentiment Analysis - 情感分析9)Named Entity Recognition (NER) - 命名实体识别10)Part-of-Speech Tagging (POS Tagging) - 词性标注11)Sequence-to-Sequence (Seq2Seq) - 序列到序列12)Attention Mechanism - 注意力机制13)Encoder-Decoder Architecture - 编码器-解码器架构14)Bidirectional RNN - 双向循环神经网络15)Teacher Forcing - 强制教师法16)Backpropagation Through Time (BPTT) - 通过时间的反向传播17)Vanishing Gradient Problem - 梯度消失问题18)Exploding Gradient Problem - 梯度爆炸问题19)Language Modeling - 语言建模20)Speech Recognition - 语音识别•Long Short-Term Memory (LSTM) - 长短期记忆网络1)Long Short-Term Memory (LSTM) - 长短期记忆网络2)Cell State - 细胞状态3)Hidden State - 隐藏状态4)Forget Gate - 遗忘门5)Input Gate - 输入门6)Output Gate - 输出门7)Peephole Connections - 窥视孔连接8)Gated Recurrent Unit (GRU) - 门控循环单元9)Vanishing Gradient Problem - 梯度消失问题10)Exploding Gradient Problem - 梯度爆炸问题11)Sequence Modeling - 序列建模12)Time Series Prediction - 时间序列预测13)Natural Language Processing (NLP) - 自然语言处理14)Text Generation - 文本生成15)Sentiment Analysis - 情感分析16)Named Entity Recognition (NER) - 命名实体识别17)Part-of-Speech Tagging (POS Tagging) - 词性标注18)Attention Mechanism - 注意力机制19)Encoder-Decoder Architecture - 编码器-解码器架构20)Bidirectional LSTM - 双向长短期记忆网络•Attention Mechanism - 注意力机制1)Attention Mechanism - 注意力机制2)Self-Attention - 自注意力3)Multi-Head Attention - 多头注意力4)Transformer - 变换器5)Query - 查询6)Key - 键7)Value - 值8)Query-Value Attention - 查询-值注意力9)Dot-Product Attention - 点积注意力10)Scaled Dot-Product Attention - 缩放点积注意力11)Additive Attention - 加性注意力12)Context Vector - 上下文向量13)Attention Score - 注意力分数14)SoftMax Function - SoftMax函数15)Attention Weight - 注意力权重16)Global Attention - 全局注意力17)Local Attention - 局部注意力18)Positional Encoding - 位置编码19)Encoder-Decoder Attention - 编码器-解码器注意力20)Cross-Modal Attention - 跨模态注意力•Generative Adversarial Network (GAN) - 生成对抗网络1)Generative Adversarial Network (GAN) - 生成对抗网络2)Generator - 生成器3)Discriminator - 判别器4)Adversarial Training - 对抗训练5)Minimax Game - 极小极大博弈6)Nash Equilibrium - 纳什均衡7)Mode Collapse - 模式崩溃8)Training Stability - 训练稳定性9)Loss Function - 损失函数10)Discriminative Loss - 判别损失11)Generative Loss - 生成损失12)Wasserstein GAN (WGAN) - Wasserstein GAN(WGAN)13)Deep Convolutional GAN (DCGAN) - 深度卷积生成对抗网络(DCGAN)14)Conditional GAN (c GAN) - 条件生成对抗网络(c GAN)15)Style GAN - 风格生成对抗网络16)Cycle GAN - 循环生成对抗网络17)Progressive Growing GAN (PGGAN) - 渐进式增长生成对抗网络(PGGAN)18)Self-Attention GAN (SAGAN) - 自注意力生成对抗网络(SAGAN)19)Big GAN - 大规模生成对抗网络20)Adversarial Examples - 对抗样本•Encoder-Decoder - 编码器-解码器1)Encoder-Decoder Architecture - 编码器-解码器架构2)Encoder - 编码器3)Decoder - 解码器4)Sequence-to-Sequence Model (Seq2Seq) - 序列到序列模型5)State Vector - 状态向量6)Context Vector - 上下文向量7)Hidden State - 隐藏状态8)Attention Mechanism - 注意力机制9)Teacher Forcing - 强制教师法10)Beam Search - 束搜索11)Recurrent Neural Network (RNN) - 循环神经网络12)Long Short-Term Memory (LSTM) - 长短期记忆网络13)Gated Recurrent Unit (GRU) - 门控循环单元14)Bidirectional Encoder - 双向编码器15)Greedy Decoding - 贪婪解码16)Masking - 遮盖17)Dropout - 随机失活18)Embedding Layer - 嵌入层19)Cross-Entropy Loss - 交叉熵损失20)Tokenization - 令牌化•Transfer Learning - 迁移学习1)Transfer Learning - 迁移学习2)Source Domain - 源领域3)Target Domain - 目标领域4)Fine-Tuning - 微调5)Domain Adaptation - 领域自适应6)Pre-Trained Model - 预训练模型7)Feature Extraction - 特征提取8)Knowledge Transfer - 知识迁移9)Unsupervised Domain Adaptation - 无监督领域自适应10)Semi-Supervised Domain Adaptation - 半监督领域自适应11)Multi-Task Learning - 多任务学习12)Data Augmentation - 数据增强13)Task Transfer - 任务迁移14)Model Agnostic Meta-Learning (MAML) - 与模型无关的元学习(MAML)15)One-Shot Learning - 单样本学习16)Zero-Shot Learning - 零样本学习17)Few-Shot Learning - 少样本学习18)Knowledge Distillation - 知识蒸馏19)Representation Learning - 表征学习20)Adversarial Transfer Learning - 对抗迁移学习•Pre-trained Models - 预训练模型1)Pre-trained Model - 预训练模型2)Transfer Learning - 迁移学习3)Fine-Tuning - 微调4)Knowledge Transfer - 知识迁移5)Domain Adaptation - 领域自适应6)Feature Extraction - 特征提取7)Representation Learning - 表征学习8)Language Model - 语言模型9)Bidirectional Encoder Representations from Transformers (BERT) - 双向编码器结构转换器10)Generative Pre-trained Transformer (GPT) - 生成式预训练转换器11)Transformer-based Models - 基于转换器的模型12)Masked Language Model (MLM) - 掩蔽语言模型13)Cloze Task - 填空任务14)Tokenization - 令牌化15)Word Embeddings - 词嵌入16)Sentence Embeddings - 句子嵌入17)Contextual Embeddings - 上下文嵌入18)Self-Supervised Learning - 自监督学习19)Large-Scale Pre-trained Models - 大规模预训练模型•Loss Function - 损失函数1)Loss Function - 损失函数2)Mean Squared Error (MSE) - 均方误差3)Mean Absolute Error (MAE) - 平均绝对误差4)Cross-Entropy Loss - 交叉熵损失5)Binary Cross-Entropy Loss - 二元交叉熵损失6)Categorical Cross-Entropy Loss - 分类交叉熵损失7)Hinge Loss - 合页损失8)Huber Loss - Huber损失9)Wasserstein Distance - Wasserstein距离10)Triplet Loss - 三元组损失11)Contrastive Loss - 对比损失12)Dice Loss - Dice损失13)Focal Loss - 焦点损失14)GAN Loss - GAN损失15)Adversarial Loss - 对抗损失16)L1 Loss - L1损失17)L2 Loss - L2损失18)Huber Loss - Huber损失19)Quantile Loss - 分位数损失•Activation Function - 激活函数1)Activation Function - 激活函数2)Sigmoid Function - Sigmoid函数3)Hyperbolic Tangent Function (Tanh) - 双曲正切函数4)Rectified Linear Unit (Re LU) - 矩形线性单元5)Parametric Re LU (P Re LU) - 参数化Re LU6)Exponential Linear Unit (ELU) - 指数线性单元7)Swish Function - Swish函数8)Softplus Function - Soft plus函数9)Softmax Function - SoftMax函数10)Hard Tanh Function - 硬双曲正切函数11)Softsign Function - Softsign函数12)GELU (Gaussian Error Linear Unit) - GELU(高斯误差线性单元)13)Mish Function - Mish函数14)CELU (Continuous Exponential Linear Unit) - CELU(连续指数线性单元)15)Bent Identity Function - 弯曲恒等函数16)Gaussian Error Linear Units (GELUs) - 高斯误差线性单元17)Adaptive Piecewise Linear (APL) - 自适应分段线性函数18)Radial Basis Function (RBF) - 径向基函数•Backpropagation - 反向传播1)Backpropagation - 反向传播2)Gradient Descent - 梯度下降3)Partial Derivative - 偏导数4)Chain Rule - 链式法则5)Forward Pass - 前向传播6)Backward Pass - 反向传播7)Computational Graph - 计算图8)Neural Network - 神经网络9)Loss Function - 损失函数10)Gradient Calculation - 梯度计算11)Weight Update - 权重更新12)Activation Function - 激活函数13)Optimizer - 优化器14)Learning Rate - 学习率15)Mini-Batch Gradient Descent - 小批量梯度下降16)Stochastic Gradient Descent (SGD) - 随机梯度下降17)Batch Gradient Descent - 批量梯度下降18)Momentum - 动量19)Adam Optimizer - Adam优化器20)Learning Rate Decay - 学习率衰减•Gradient Descent - 梯度下降1)Gradient Descent - 梯度下降2)Stochastic Gradient Descent (SGD) - 随机梯度下降3)Mini-Batch Gradient Descent - 小批量梯度下降4)Batch Gradient Descent - 批量梯度下降5)Learning Rate - 学习率6)Momentum - 动量7)Adaptive Moment Estimation (Adam) - 自适应矩估计8)RMSprop - 均方根传播9)Learning Rate Schedule - 学习率调度10)Convergence - 收敛11)Divergence - 发散12)Adagrad - 自适应学习速率方法13)Adadelta - 自适应增量学习率方法14)Adamax - 自适应矩估计的扩展版本15)Nadam - Nesterov Accelerated Adaptive Moment Estimation16)Learning Rate Decay - 学习率衰减17)Step Size - 步长18)Conjugate Gradient Descent - 共轭梯度下降19)Line Search - 线搜索20)Newton's Method - 牛顿法•Learning Rate - 学习率1)Learning Rate - 学习率2)Adaptive Learning Rate - 自适应学习率3)Learning Rate Decay - 学习率衰减4)Initial Learning Rate - 初始学习率5)Step Size - 步长6)Momentum - 动量7)Exponential Decay - 指数衰减8)Annealing - 退火9)Cyclical Learning Rate - 循环学习率10)Learning Rate Schedule - 学习率调度11)Warm-up - 预热12)Learning Rate Policy - 学习率策略13)Learning Rate Annealing - 学习率退火14)Cosine Annealing - 余弦退火15)Gradient Clipping - 梯度裁剪16)Adapting Learning Rate - 适应学习率17)Learning Rate Multiplier - 学习率倍增器18)Learning Rate Reduction - 学习率降低19)Learning Rate Update - 学习率更新20)Scheduled Learning Rate - 定期学习率•Batch Size - 批量大小1)Batch Size - 批量大小2)Mini-Batch - 小批量3)Batch Gradient Descent - 批量梯度下降4)Stochastic Gradient Descent (SGD) - 随机梯度下降5)Mini-Batch Gradient Descent - 小批量梯度下降6)Online Learning - 在线学习7)Full-Batch - 全批量8)Data Batch - 数据批次9)Training Batch - 训练批次10)Batch Normalization - 批量归一化11)Batch-wise Optimization - 批量优化12)Batch Processing - 批量处理13)Batch Sampling - 批量采样14)Adaptive Batch Size - 自适应批量大小15)Batch Splitting - 批量分割16)Dynamic Batch Size - 动态批量大小17)Fixed Batch Size - 固定批量大小18)Batch-wise Inference - 批量推理19)Batch-wise Training - 批量训练20)Batch Shuffling - 批量洗牌•Epoch - 训练周期1)Training Epoch - 训练周期2)Epoch Size - 周期大小3)Early Stopping - 提前停止4)Validation Set - 验证集5)Training Set - 训练集6)Test Set - 测试集7)Overfitting - 过拟合8)Underfitting - 欠拟合9)Model Evaluation - 模型评估10)Model Selection - 模型选择11)Hyperparameter Tuning - 超参数调优12)Cross-Validation - 交叉验证13)K-fold Cross-Validation - K折交叉验证14)Stratified Cross-Validation - 分层交叉验证15)Leave-One-Out Cross-Validation (LOOCV) - 留一法交叉验证16)Grid Search - 网格搜索17)Random Search - 随机搜索18)Model Complexity - 模型复杂度19)Learning Curve - 学习曲线20)Convergence - 收敛3.Machine Learning Techniques and Algorithms (机器学习技术与算法)•Decision Tree - 决策树1)Decision Tree - 决策树2)Node - 节点3)Root Node - 根节点4)Leaf Node - 叶节点5)Internal Node - 内部节点6)Splitting Criterion - 分裂准则7)Gini Impurity - 基尼不纯度8)Entropy - 熵9)Information Gain - 信息增益10)Gain Ratio - 增益率11)Pruning - 剪枝12)Recursive Partitioning - 递归分割13)CART (Classification and Regression Trees) - 分类回归树14)ID3 (Iterative Dichotomiser 3) - 迭代二叉树315)C4.5 (successor of ID3) - C4.5(ID3的后继者)16)C5.0 (successor of C4.5) - C5.0(C4.5的后继者)17)Split Point - 分裂点18)Decision Boundary - 决策边界19)Pruned Tree - 剪枝后的树20)Decision Tree Ensemble - 决策树集成•Random Forest - 随机森林1)Random Forest - 随机森林2)Ensemble Learning - 集成学习3)Bootstrap Sampling - 自助采样4)Bagging (Bootstrap Aggregating) - 装袋法5)Out-of-Bag (OOB) Error - 袋外误差6)Feature Subset - 特征子集7)Decision Tree - 决策树8)Base Estimator - 基础估计器9)Tree Depth - 树深度10)Randomization - 随机化11)Majority Voting - 多数投票12)Feature Importance - 特征重要性13)OOB Score - 袋外得分14)Forest Size - 森林大小15)Max Features - 最大特征数16)Min Samples Split - 最小分裂样本数17)Min Samples Leaf - 最小叶节点样本数18)Gini Impurity - 基尼不纯度19)Entropy - 熵20)Variable Importance - 变量重要性•Support Vector Machine (SVM) - 支持向量机1)Support Vector Machine (SVM) - 支持向量机2)Hyperplane - 超平面3)Kernel Trick - 核技巧4)Kernel Function - 核函数5)Margin - 间隔6)Support Vectors - 支持向量7)Decision Boundary - 决策边界8)Maximum Margin Classifier - 最大间隔分类器9)Soft Margin Classifier - 软间隔分类器10) C Parameter - C参数11)Radial Basis Function (RBF) Kernel - 径向基函数核12)Polynomial Kernel - 多项式核13)Linear Kernel - 线性核14)Quadratic Kernel - 二次核15)Gaussian Kernel - 高斯核16)Regularization - 正则化17)Dual Problem - 对偶问题18)Primal Problem - 原始问题19)Kernelized SVM - 核化支持向量机20)Multiclass SVM - 多类支持向量机•K-Nearest Neighbors (KNN) - K-最近邻1)K-Nearest Neighbors (KNN) - K-最近邻2)Nearest Neighbor - 最近邻3)Distance Metric - 距离度量4)Euclidean Distance - 欧氏距离5)Manhattan Distance - 曼哈顿距离6)Minkowski Distance - 闵可夫斯基距离7)Cosine Similarity - 余弦相似度8)K Value - K值9)Majority Voting - 多数投票10)Weighted KNN - 加权KNN11)Radius Neighbors - 半径邻居12)Ball Tree - 球树13)KD Tree - KD树14)Locality-Sensitive Hashing (LSH) - 局部敏感哈希15)Curse of Dimensionality - 维度灾难16)Class Label - 类标签17)Training Set - 训练集18)Test Set - 测试集19)Validation Set - 验证集20)Cross-Validation - 交叉验证•Naive Bayes - 朴素贝叶斯1)Naive Bayes - 朴素贝叶斯2)Bayes' Theorem - 贝叶斯定理3)Prior Probability - 先验概率4)Posterior Probability - 后验概率5)Likelihood - 似然6)Class Conditional Probability - 类条件概率7)Feature Independence Assumption - 特征独立假设8)Multinomial Naive Bayes - 多项式朴素贝叶斯9)Gaussian Naive Bayes - 高斯朴素贝叶斯10)Bernoulli Naive Bayes - 伯努利朴素贝叶斯11)Laplace Smoothing - 拉普拉斯平滑12)Add-One Smoothing - 加一平滑13)Maximum A Posteriori (MAP) - 最大后验概率14)Maximum Likelihood Estimation (MLE) - 最大似然估计15)Classification - 分类16)Feature Vectors - 特征向量17)Training Set - 训练集18)Test Set - 测试集19)Class Label - 类标签20)Confusion Matrix - 混淆矩阵•Clustering - 聚类1)Clustering - 聚类2)Centroid - 质心3)Cluster Analysis - 聚类分析4)Partitioning Clustering - 划分式聚类5)Hierarchical Clustering - 层次聚类6)Density-Based Clustering - 基于密度的聚类7)K-Means Clustering - K均值聚类8)K-Medoids Clustering - K中心点聚类9)DBSCAN (Density-Based Spatial Clustering of Applications with Noise) - 基于密度的空间聚类算法10)Agglomerative Clustering - 聚合式聚类11)Dendrogram - 系统树图12)Silhouette Score - 轮廓系数13)Elbow Method - 肘部法则14)Clustering Validation - 聚类验证15)Intra-cluster Distance - 类内距离16)Inter-cluster Distance - 类间距离17)Cluster Cohesion - 类内连贯性18)Cluster Separation - 类间分离度19)Cluster Assignment - 聚类分配20)Cluster Label - 聚类标签•K-Means - K-均值1)K-Means - K-均值2)Centroid - 质心3)Cluster - 聚类4)Cluster Center - 聚类中心5)Cluster Assignment - 聚类分配6)Cluster Analysis - 聚类分析7)K Value - K值8)Elbow Method - 肘部法则9)Inertia - 惯性10)Silhouette Score - 轮廓系数11)Convergence - 收敛12)Initialization - 初始化13)Euclidean Distance - 欧氏距离14)Manhattan Distance - 曼哈顿距离15)Distance Metric - 距离度量16)Cluster Radius - 聚类半径17)Within-Cluster Variation - 类内变异18)Cluster Quality - 聚类质量19)Clustering Algorithm - 聚类算法20)Clustering Validation - 聚类验证•Dimensionality Reduction - 降维1)Dimensionality Reduction - 降维2)Feature Extraction - 特征提取3)Feature Selection - 特征选择4)Principal Component Analysis (PCA) - 主成分分析5)Singular Value Decomposition (SVD) - 奇异值分解6)Linear Discriminant Analysis (LDA) - 线性判别分析7)t-Distributed Stochastic Neighbor Embedding (t-SNE) - t-分布随机邻域嵌入8)Autoencoder - 自编码器9)Manifold Learning - 流形学习10)Locally Linear Embedding (LLE) - 局部线性嵌入11)Isomap - 等度量映射12)Uniform Manifold Approximation and Projection (UMAP) - 均匀流形逼近与投影13)Kernel PCA - 核主成分分析14)Non-negative Matrix Factorization (NMF) - 非负矩阵分解15)Independent Component Analysis (ICA) - 独立成分分析16)Variational Autoencoder (VAE) - 变分自编码器17)Sparse Coding - 稀疏编码18)Random Projection - 随机投影19)Neighborhood Preserving Embedding (NPE) - 保持邻域结构的嵌入20)Curvilinear Component Analysis (CCA) - 曲线成分分析•Principal Component Analysis (PCA) - 主成分分析1)Principal Component Analysis (PCA) - 主成分分析2)Eigenvector - 特征向量3)Eigenvalue - 特征值4)Covariance Matrix - 协方差矩阵。
2022年3月Chinese Journal of Intelligent Science and Technology March 2022 第4卷第1期智能科学与技术学报V ol.4No.1 基于动态自选择参数共享的合作多智能体强化学习算法王涵,俞扬,姜远(计算机软件新技术国家重点实验室(南京大学),江苏南京 210023)摘 要:在多智能体强化学习的研究中,参数共享作为学习过程中一种信息集中的方式,可以有效地缓解不稳定性导致的学习低效性。
但是,在实际应用中智能体使用同样的策略往往会带来不利影响。
为了解决此类过度共享的问题,提出了一种新的方法来赋予智能体自动识别可能受益于共享参数的智能体的能力,并且可以在学习过程中动态地选择共享参数的对象。
具体来说,智能体需要将历史轨迹编码为可表示其潜在意图的隐信息,并通过与其余智能体隐信息的对比选择共享参数的对象。
实验表明,提出的方法在多智能体系统中不仅可以提高参数共享的效率,同时保证了策略学习的质量。
关键词:多智能体系统;强化学习;参数共享中图分类号:TP181文献标志码:Adoi: 10.11959/j.issn.2096−6652.202214A cooperative multi-agent reinforcement learning algorithmbased on dynamic self-selection parameters sharingWANG Han, YU Yang, JIANG YuanState Key Laboratory for Novel Software Technology at Nanjing University, Nanjing 210023, China Abstract: In multi-agent reinforcement learning, parameter sharing can effectively alleviate the inefficiency of learning caused by non-stationarity. However, maintaining the same policy forall agents during learning may have detrimental ef-fects. To solve this problem, a new approach was introduced to give agents the ability to automatically identify agents that may benefit from parameter sharing and dynamically share parameters them during learning. Specifically, agents needed to encode empirical trajectories as implicit information that can represent their potential intentions, and selected peers to share parameters by comparing their intentions. Experiments show that the proposed method not only can improve the ef-ficiency of parameter sharing, but also ensure the quality of policy learning in multi-agent system.Key words: multi-agent system, reinforcement learning, parameter sharing0 引言多智能体强化学习(multi-agent reinforcement learning,MARL)旨在在共享环境中联合训练多个智能体从而完成给定的任务[1-4]。
Multiagent Systems:A Survey from a Machine LearningPerspectivePeter Stone Manuela VelosoAT&T Labs—Research Computer Science Department180Park Ave.,room A273Carnegie Mellon UniversityFlorham Park,NJ07932Pittsburgh,PA15213pstone@ veloso@/˜pstone /˜mmvIn Autonomous Robotics volume8,number3.July,2000.AbstractDistributed Artificial Intelligence(DAI)has existed as a subfield of AI for less than two decades.DAI is concerned with systems that consist of multiple independent entities that interact in a domain.Traditionally,DAI has been divided into two sub-disciplines:Distributed Problem Solving(DPS)focuseson the information management aspects of systems with several components working together towardsa common goal;Multiagent Systems(MAS)deals with behavior management in collections of severalindependent entities,or agents.This survey of MAS is intended to serve as an introduction to thefieldand as an organizational framework.A series of general multiagent scenarios are presented.For eachscenario,the issues that arise are described along with a sampling of the techniques that exist to deal withthem.The presented techniques are not exhaustive,but they highlight how multiagent systems can beand have been used to build complex systems.When options exist,the techniques presented are biasedtowards machine learning approaches.Additional opportunities for applying machine learning to MASare highlighted and robotic soccer is presented as an appropriate test bed for MAS.This survey does notfocus exclusively on robotic systems.However,we believe that much of the prior research in non-roboticMAS is relevant to robotic MAS,and we explicitly discuss several robotic MAS,including all of thosepresented in this issue.1IntroductionExtending the realm of the social world to include autonomous computer systems has always been an awe-some,if not frightening,prospect.However it is now becoming both possible and necessary through ad-vances in thefield of Artificial Intelligence(AI).In the past several years,AI techniques have become more and more robust and complex.To mention just one of the many exciting successes,a car steered itself more than95%of the way across the United States using the ALVINN system[Pormerleau,1993].By meeting this and other such daunting challenges,AI researchers have earned the right to start examining the im-plications of multiple autonomous“agents”interacting in the real world.In fact,they have rendered thisexamination indispensable.If there is one self-steering car,there will surely be more.And although each may be able to drive individually,if several autonomous vehicles meet on the highway,we must know how their behaviors interact.Multiagent Systems(MAS)is the subfield of AI that aims to provide both principles for construction of complex systems involving multiple agents and mechanisms for coordination of independent agents’behaviors.While there is no generally accepted definition of“agent”in AI[Russell and Norvig,1995],for the purposes of this article,we consider an agent to be an entity,such as a robot,with goals,actions,and domain knowledge,situated in an environment.The way it acts is called its“behavior.”(This is not intended as a general theory of agency.)Although the ability to consider coordinating behaviors of autonomous agents is a new one,thefield is advancing quickly by building upon pre-existing work in thefield of Distributed Artificial Intelligence(DAI).DAI has existed as a subfield of AI for less than two decades.Traditionally,DAI is broken into two sub-disciplines:Distributed Problem Solving(DPS)and MAS[Bond and Gasser,1988].The main topics considered in DPS are information management issues such as task decomposition and solution synthesis. For example,a constraint satisfaction problem can often be decomposed into several not entirely independent subproblems that can be solved on different processors.Then these solutions can be synthesized into a solution of the original problem.MAS allows the subproblems of a constraint satisfaction problem to be subcontracted to different prob-lem solving agents with their own interests and goals.Furthermore,domains with multiple agents of any type,including autonomous vehicles and even some human agents,are beginning to be studied.This survey of MAS is intended as an introduction to thefield.The reader should come away with an appreciation for the types of systems that are possible to build using MAS as well as a conceptual framework with which to organize the different types of possible systems.The article is organized as a series of general multiagent scenarios.For each scenario,the issues that arise are described along with a sampling of the techniques that exist to deal with them.The techniques presented are not exhaustive,but they highlight how multiagent systems can be and have been used to build complex systems.Because of the inherent complexity of MAS,there is much interest in using machine learning techniques to help deal with this complexity[Weißand Sen,1996;Sen,1996].When several different systems exist that could illustrate the same or similar MAS techniques,the systems presented here are biased towards those that use machine learning(ML)approaches.Furthermore,every effort is made to highlight additional opportunities for applying ML to MAS.This survey does not focus exclusively on robotic systems.However, we believe that much of the prior research in non-robotic MAS is relevant to robotic MAS,and we explicitly discuss several robotic MAS(referred to as multi-robot systems),including all of those presented in this issue.Although there are many possible ways to divide MAS,the survey is organized along two main di-mensions:agent heterogeneity and amount of communication among agents.Beginning with the simplest multiagent scenario,homogeneous non-communicating agents,the full range of possible multiagent sys-tems,through highly heterogeneous communicating agents,is considered.For each multiagent scenario presented,a single example domain is presented in an appropriate instan-tiation for the purpose of illustration.In this extensively-studied domain,the Predator/Prey or“Pursuit”domain[Benda et al.,1986],many MAS issues arise.Nevertheless,it is a“toy”domain.At the end of the article,a much more complex domain—robotic soccer—is presented in order to illustrate the full power of MAS.The article is organized as follows.Section2introduces thefield of MAS,listing several of its strong points and presenting a taxonomy.The body of the article,Sections3–7,presents the various multiagent scenarios,illustrates them using the pursuit domain,and describes existing work in thefield.A domain that facilitates the study of most multiagent issues,robotic soccer,is advocated as a test bed in Section8. Section9concludes.2Multiagent SystemsTwo obvious questions about any type of technology are:What advantages does it offer over the alternatives?In what circumstances is it useful?It would be foolish to claim that MAS should be used when designing all complex systems.Like any useful approach,there are some situations for which it is particularly appropriate,and others for which it is not. The goal of this section is to underscore the need for and usefulness of MAS while giving characteristics of typical domains that can benefit from it.For a more extensive discussion,see[Bond and Gasser,1988].Some domains require MAS.In particular,if there are different people or organizations with different (possibly conflicting)goals and proprietary information,then a multiagent system is needed to handle their interactions.Even if each organization wants to model its internal affairs with a single system,the organi-zations will not give authority to any single person to build a system that represents them all:the different organizations will need their own systems that reflect their capabilities and priorities.For example,consider a manufacturing scenario in which company X produces tires,but subcontracts the production of lug-nuts to company Y.In order to build a single system to automate(certain aspects of)the production process,the internals of both companies X and Y must be modeled.However,neither company is likely to want to relinquish information and/or control to a system designer representing the other company.Perhaps with just two companies involved,an agreement could be reached,but with several companies involved,MAS is necessary.The only feasible solution is to allow the various companies to create their own agents that accurately represent their goals and interests.They must then be combined into a multiagent system with the aid of some of the techniques described in this article.Another example of a domain that requires MAS is hospital scheduling as presented in[Decker,1996c]. This domain from an actual case study requires different agents to represent the interests of different people within the hospital.Hospital employees have different interests,from nurses who may want to minimize the patient’s time in the hospital,to x-ray operators who may want to maximize the throughput on their ma-chines.Since different people evaluate candidate schedules with different criteria,they must be represented by separate agents if their interests are to be justly considered.Even in domains that could conceivably use systems that are not distributed,there are several possible reasons to use MAS.Having multiple agents could speed up a system’s operation by providing a method for parallel computation.For instance,a domain that is easily broken into components—several independent tasks that can be handled by separate agents—could benefit from MAS.Furthermore,the parallelism of MAS can help deal with limitations imposed by time-bounded or space-bounded reasoning requirements.While parallelism is achieved by assigning different tasks or abilities to different agents,robustness is a benefit of multiagent systems that have redundant agents.If control and responsibilities are sufficiently shared among different agents,the system can tolerate failures by one or more of the agents.Domains that must degrade gracefully are in particular need of this feature of MAS:if a single entity—processor or agent—controls everything,then the entire system could crash if there is a single failure.Although a multiagent system need not be implemented on multiple processors,to provide full robustness against failure,its agents should be distributed across several machines.Another benefit of multiagent systems is their scalability.Since they are inherently modular,it should be easier to add new agents to a multiagent system than it is to add new capabilities to a monolithic system. Systems whose capabilities and parameters are likely to need to change over time or across agents can also benefit from this advantage of MAS.From a programmer’s perspective the modularity of multiagent systems can lead to simpler program-ming.Rather than tackling the whole task with a centralized agent,programmers can identify subtasks and assign control of those subtasks to different agents.The difficult problem of splitting a single agent’s time among different parts of a task solves itself.Thus,when the choice is between using a multiagent system or a single-agent system,MAS may be the simpler option.Of course there are some domains that are more naturally approached from an omniscient perspective—because a global view is given—or with central-ized control—because no parallel actions are possible and there is no action uncertainty[Decker,1996b]. Single-agent systems should be used in such cases.Multiagent systems can also be useful for their illucidation of fundamental problems in the social sci-ences and life sciences[Cao et al.,1997],including intelligence itself[Decker,1987],.As Weißput it:“In-telligence is deeply and inevitably coupled with interaction”[Weiß,1996].In fact,it has been proposed that the best way to develop intelligent machines at all might be to start by creating“social”machines[Daut-enhahn,1995].This theory is based on the socio-biological theory that primate intelligencefirst evolved because of the need to deal with social interactions[Minsky,1988].While all of the above reasons to use MAS apply generally,there are also some arguments in favor of multi-robot systems in particular.In tasks that require robots to be in particular places,such as robot scouting,a team of robots has an advantage over a single robot in that it can take advantage of geographic distribution.While a single robot could only sense the world from a single vantage point,a multi-robot system can observe and act from several locations simultaneously.Finally,as argued in[Jung and Zelinsky,2000],multi-robot systems can exhibit benefits over single-robot systems in terms of the“performance/cost ratio.”By using heterogeneous robots each with a subset of the capabilities necessary to accomplish a given task,one can use simpler robots that are presumably less expensive to engineer than a single monolithic robot with all of the capabilities bundled together.Reasons presented above to use MAS are summarized in Table1.2.1TaxonomySeveral taxonomies have been presented previously for the relatedfield of Distributed Artificial Intelligence (DAI).For example,Decker presents four dimensions of DAI[Decker,1987]:1.Agent granularity(coarse vs.fine);2.Heterogeneity of agent knowledge(redundant vs.specialized);3.Methods of distributing control(benevolent petitive,team vs.hierarchical,static vs.shifting roles);4.and Communication possibilities(blackboard vs.messages,low-level vs.high-level,content).Along dimensions1and4,multiagent systems have coarse agent granularity and high-level communication. Along the other dimensions,they can vary across the whole ranges.In fact,the remaining dimensions are very prominent in this article:degree of heterogeneity is a major MAS dimension and all the methods of distributing control appear here as major issues.More recently,Parunak[1996]has presented a taxonomy of MAS from an application perspective.From this perspective,the important characteristics of MAS are:System function;Agent architecture(degree of heterogeneity,reactive vs.deliberative);System architecture(communication,protocols,human involvement).A useful contribution is that the dimensions are divided into agent and system characteristics.Other overviews of DAI and/or MAS include[Lesser,1995;Durfee,1992;Durfee et al.,1989;Bond and Gasser, 1988].There are also some existing surveys that are specific to multi-robot systems.Dudek et al.[1996] presented a detailed taxonomy of multiagent robotics along seven dimensions,including robot size,various communication parameters,reconfigurability,and unit processing.Cao et al.[1997]presented a“taxonomy based on problems and solutions,”using the followingfive axes:group architecture,resource conflicts, origins of cooperation,learning,and geometric problems.It specifically does not consider competitive multi-robot scenarios.This article contributes a taxonomy that encompasses MAS along with a detailed chronicle of existing systems as theyfit in to this taxonomy.The taxonomy presented in this article is organized along what we believe to be the most important aspects of agents(as opposed to domains):degree of heterogeneity and degree of -munication is presented as an agent aspect because it is the degree to which the agents communicate(or whether they communicate),not the communication protocols that are available to them,that is considered. Other aspects of agents in MAS are touched upon within the heterogeneity/communication framework.For example,the degree to which different agents play different roles is certainly an important MAS issue,but here it is framed within the scenario of heterogeneous non-communicating agents(it arises in the other three scenarios as well).All four combinations of heterogeneity and communication—homogeneous non-communicating agents; heterogeneous non-communicating agents;homogeneous communicating agents;and heterogeneous com-municating agents—are considered in this article.Our approach throughout the article is to categorize the issues as they are reflected in the literature.Many of the issues could apply in earlier scenarios,but do not in the articles that we have come across.On the other hand,many of the issues that arise in the earlier scenarios also apply in the later scenarios.Nevertheless,they are only mentioned again in the later scenarios to the degree that they differ or become more complex.The primary purpose of this taxonomy is as a framework for considering and analyzing the challenges that arise in MAS.This survey is designed to be useful to researchers as a way of separating out the issues that arise as a result of their decisions to use homogeneous versus heterogeneous agents and communicating versus non-communicating agents.The multiagent scenarios along with the issues that arise therein and the techniques that currently exist to address these issues are described in detail in Sections4–7.Table2gives a preview of these scenarios and associated issues as presented in this article.2.2Single-Agent vs.Multiagent SystemsBefore studying and categorizing MAS,we mustfirst consider their most obvious alternative:centralized, single-agent systems.Centralized systems have a single agent which makes all the decisions,while the others act as remote slaves.For the purposes of this survey,a“single-agent system”should be thought of as a centralized system in a domain which also allows for a multiagent approach.A single-agent system might still have multiple entities—several actuators,or even several physically2.2.1Single-Agent SystemsIn general,the agent in a single-agent system models itself,the environment,and their interactions.Of course the agent is itself part of the environment,but for the purposes of this article,agents are considered to have extra-environmental components as well.They are independent entities with their own goals,actions, and knowledge.In a single-agent system,no other such entities are recognized by the agent.Thus,even if there are indeed other agents in the world,they are not modeled as having goals,etc.:they are just considered part of the environment.The point being emphasized is that although agents are also a part of the environment,they are explicitly modeled as having their own goals,actions,and domain knowledge(see Figure1).2.2.2Multiagent SystemsMultiagent systems differ from single-agent systems in that several agents exist which model each other’s goals and actions.In the fully general multiagent scenario,there may be direct interaction among agents (communication).Although this interaction could be viewed as environmental stimuli,we present inter-agent communication as being separate from the environment.From an individual agent’s perspective,multiagent systems differ from single-agent systems most sig-nificantly in that the environment’s dynamics can be affected by other agents.In addition to the uncertaintyFigure1:A general single-agent framework.The agent models itself,the environment,and their interac-tions.If other agents exist,they are considered part of the environment.that may be inherent in the domain,other agents intentionally affect the environment in unpredictable ways. Thus,all multiagent systems can be viewed as having dynamic environments.Figure2illustrates the view that each agent is both part of the environment and modeled as a separate entity.There may be any number of agents,with different degrees of heterogeneity and with or without the ability to communicate directly.From the fully general case depicted here,we begin by eliminating both the communication and the heterogeneity to present homogeneous non-communicating MAS(Section4). Then,the possibilities of agent heterogeneity and inter-agent communication are considered one at a time (Sections5and6).Finally,in Section7,we arrive back at the fully general case by considering agents that directly.can interact3Organization of Existing WorkThe following sections present many different MAS techniques that have been previously published.They present an extensive,but not exhaustive,list of work in thefield.Space does not permit exhaustive coverage. Instead,the work mentioned is intended to illustrate the techniques that exist to deal with the issues that arise in the various multiagent scenarios.When possible,ML approaches are emphasized.All four multiagent scenarios are considered in the following order:homogeneous non-communicating agents,heterogeneous non-communicating agents,homogeneous communicating agents,and heterogeneous communicating agents.For each of these scenarios,the research issues that arise,the techniques that deal with them,and additional ML opportunities are presented.The issues may appear across scenarios,but they are presented and discussed in thefirst scenario to which they apply.In addition to the existing learning approaches described in the sections entitled“Issues and Techniques”, there are several previously unexplored learning opportunities that apply in each of the multiagent scenarios. For each scenario,a few promising opportunities for ML researchers are presented.Many existing ML techniques can be directly applied in multiagent scenarios by delimiting a part of the domain that only involves a single agent.However multiagent learning is more concerned with learning issues that arise because of the multiagent aspect of a given domain.As described by Weiß,multiagent learning is“learning that is done by several agents and that becomes possible only because several agents are present”[Weiß,1995].This type of learning is emphasized in the sections entitled“Further Learning Opportunities.”For the purpose of illustration,each scenario is accompanied by a suitable instantiation of the Preda-tor/Prey or“Pursuit”domain.3.1The Predator/Prey(“Pursuit”)DomainThe Predator/Prey,or“Pursuit”domain(hereafter referred to as the“pursuit domain”),is an appropriate one for illustration of MAS because it has been studied using a wide variety of approaches and because it has many different instantiations that can be used to illustrate different multiagent scenarios.Since it involves agents moving around in a world,it is particularly appropriate as an abstraction of robotic MAS.The pursuit domain is not presented as a complex real-world domain,but rather as a toy domain that helps concretize many concepts.For discussion of a domain that has the full range of complexities characteristic of more real-world domains,see Section8.The pursuit domain was introduced by Benda et al.[1986].Over the years,researchers have studied several variations of its original formulation.In this section,a single instantiation of the domain is presented. However,care is taken to point out the parameters that can be varied.The pursuit domain is usually studied with four predators and one prey.Traditionally,the predators are blue and the prey is red(black and grey respectively in Figure3).The domain can be varied by usingdifferent numbers of predators and prey.Predators see each otherPrey stays put 10% of timePrey moves randomlyPredators can communicateSimultaneous movementsOrthogonal Game in a Toroidal WorldCaptureFigure 3:A particular instantiation of the pursuit domain.Predators are black and the prey is grey.The arrows on top of two of the predators indicate possible moves.The goal of the predators is to “capture”the prey,or surround it so that it cannot move to an unoccupied position.A capture position is shown in Figure 3.If the world has boundaries,fewer than four predators can capture the prey by trapping it against an edge or in a corner.Another possible criterion for capture is that a predator occupies the same position as the prey.Typically,however,no two players are allowed to occupy the same position.As depicted in Figure 3,the predators and prey move around in a discrete,grid-like world with square spaces.They can move to any adjacent square on a given turn.Possible variations include grids with other shapes as spaces (for instance hexagons)or continuous worlds.Within the square game,players may be allowed to move diagonally instead of just horizontally and vertically.The size of the world may also vary from an infinite plane to a small,finite board with edges.The world pictured in Figure 3is a toroidal world:the predators and prey can move off one end of the board and come back on the other end.Other parameters of the game that must be specified are whether the players move simultaneously or in turns;how much of the world the predators can see;and whether and how the predators can communicate.Finally,in the original formulation of the domain,and in most subsequent studies,the prey moves randomly:on each turn it moves in a random direction,staying still with a certain probability in order to simulate being slower than the predators.However,it is also possible to allow the prey to actively try to escape capture.As is discussed in Section 5,there has been some research done to this effect,but there is still much room for improvement.The parameters that can be varied in the pursuit domain are summarized in Table 3.The pursuit domain is a good one for the purposes of illustration because it is simple to understand and because it is flexible enough to illustrate a variety of scenarios.The possible actions of the predators andAgentFigure4:The pursuit domain with just a single agent.One agent controls all predators and the prey is considered part of the environment.For each of the multiagent scenarios presented below,a new instantiation of the pursuit domain is de-fined.Their purpose is to illustrate the different scenarios within a concrete framework.3.2Domain IssuesThroughout this survey,the focus is upon agent capabilities.However,from the point of view of the system designer,the characteristics of the domain are at least as important.Before moving on to the agent-based categorization of thefield in Sections4–7,a range of domain characteristics is considered.Relevant domain characteristics include:the number of agents;the amount of time pressure for gen-erating actions(is it a real-time domain?);whether or not new goals arrive dynamically;the cost of com-munication;the cost of failure;user involvement;and environmental uncertainty.Thefirst four of these characteristics are self-explanatory and do not need further mention.With respect to cost of failure,an example of a domain with high cost of failure is air-traffic control[Rao and Georgeff,1995].On the other hand,the directed improvisation domain considered by Hayes-Roth et al.[1995]has a very low cost of failure.In this domain,entertainment agents accept all improvisation suggestions from each other.The idea is that the agents should not be afraid to make mistakes,but rather should“just let the wordsflow”[Hayes-Roth et al.,1995].Several multiagent systems include humans as one or more of the agents.In this case,the issue of communication between the human and computer agents must be considered[Sanchez et al.,1995].Another example of user involvement is user feedback in an informationfiltering domain[Ferguson and Karakoulas, 1996].Decker[1995]distinguishes three different sources of uncertainty in a domain.The state transitions in the domain itself might be non-deterministic;agents might not know the actions of other agents;and agents might not know the outcomes of their own actions.This and the other domain characteristics are summarized in Table4.4Homogeneous Non-Communicating Multiagent SystemsIn homogeneous,non-communicating multiagent systems,all of the agents have the same internal structure including goals,domain knowledge,and possible actions.They also have the same procedure for selecting among their actions.The only differences among agents are their sensory inputs and the actual actions they take:they are situated differently in the world.4.1Homogeneous Non-Communicating Multiagent PursuitIn the homogeneous non-communicating version of the pursuit domain,rather than having one agent con-trolling all four predators,there is one identical agent per predator.Although the agents have identical capa-bilities and decision procedures,they have limited information about each other’s internal state and sensory inputs.Thus they are not be able to predict each other’s actions.The pursuit domain with homogeneous agents is illustrated in Figure5.Within this framework,Stephens and Merx[1990]propose a simple heuristic behavior for each agent that is based on local information.They define capture positions as the four positions adjacent to the prey. They then propose a“local”strategy whereby each predator agent determines the capture position to which。
a r X i v :c s /0308030v 1 [c s .M A ] 19 A u g 2003In Eduardo Alonso,editor,Adaptive Agents:LNAI 2636.Springer Verlag,2003.to extrapolate from the examples it has seen offixed set E,instead it’s target concept keeps changing,leading to a moving target function problem[10].In general,however,the target concept does not change randomly;it changes based on the learning dynamics of the other agents in the system.Since these agents also learn using machine learning algorithms we are left with some hope that we might someday be able to understand the complex dynamics of these type of systems.Learning agents are most often selfish utility maximizers.These agents often face each other in encounters where the simultaneous actions of a set of agents leads to different utility payoffs for all the participants.For example,in a market-based setting a set of agents might submit their bids to afirst-price sealed-bid auction.The outcome of this auction will result in a utility gain or loss for all the agents.In a robotic setting two agents headed in a collision course towards each other have to decide whether to stay the course or to swerve.The results of their combined actions have direct results in the utilities the agents receive from their actions.We are solely concerned with learning agents that maximize their own utility.We believe that systems where agents share partial results or otherwise help each other can be considered extension on traditional machine learning research.2Game TheoryGame theory provides us with the mathematical tools to understand the possible strategies that utility-maximizing agents might use when making a choice.It is mostly concerned with modeling the decision process of rational humans,a fact that should be kept in mind as we consider its applicability to multiagent systems.The simplest type of game considered in game theory is the single-shot simultaneous-move game.In this game all agents must take one action.All ac-tions are effectively simultaneous.Each agent receives a utility that is a function of the combined set of actions.In an extended-form game the players take turns and receive a payoffat the end of a series of actions.A single-shot game is a good model for the types of situations often faced by agents in a multiagent system where the encounters mostly require coordination.The extended-form games are best suited to modeling more complex scenarios where each succes-sive move places the agents in a different state.Many scenarios thatfirst appear like they would need an extended-form game can actually be described by a se-ries of single-shot games.In fact,that is the approach taken by many multiagent systems researchers.In the one-shot simultaneous-move game we say that each agent i chooses a strategy s i∈S i,where S i is the set of all strategies for agent i.These strategies represent the actions the agent can take.When we say that i chooses strategy s i we mean that it chooses to take action s i.The set of all strategies chosen by all the agents is the strategy profile for that game and it is denoted by s∈S≡×I i=i S i.Once all the agents make their choices and form the strategy profile sA B1,23,21Common knowledge about p means that everybody knows that everybody knows, and so on to infinity,about p.A B 8,2 1,2It has been shown that every game has at least one Nash equilibrium,as long as mixed strategies are allowed.The Nash equilibrium has the advantage of being stable under single agent desertions.That is,if the system is in a Nash equilibrium then no agent,working by itself,will be tempted to take a different action.However,it is possible for two or more agents to conspire together andfind a set of actions which are better for them.This means that the Nash equilibrium is not stable if we allow the formation of coalitions.Another problem we face when using the Nash equilibrium is the fact that a game can have multiple Nash equilibria.In these cases we do not know which one will be chosen,if any.The Nash equilibrium could also be a mixed strategy for some agent while in the real world the agent has only discrete actions avail-able.In both of these cases the Nash equilibrium is not sufficient to identify a unique strategy profile that rational agents are expected to play.As such,fur-ther studies of the dynamics of the system must be carried out in order to refine the Nash equilibrium solution.The theory of learning in games—a branch of game theory—has studied how simple learning mechanisms lead to equilibrium strategies.4Learning in GamesThe theory of learning in games studies the equilibrium concepts dictated by var-ious simple learning mechanisms.That is,while the Nash equilibrium is based on the assumption of perfectly rational players,in learning in games the as-sumption is that the agents use some kind of algorithm.The theory determines the equilibrium strategy that will be arrived at by the various learning mecha-nisms and maps these equilibria to the standard solution concepts,if possible. Many learning mechanisms have been studied.The most common of them are explained in the next few sub-sections.4.1Fictitious PlayA widely studied model of learning in games is the process offictitious play.In it agents assume that their opponents are playing afixed strategy.The agents use their past experiences to build a model of the opponent’s strategy and use this model to choose their own action.Mathematicians have studied these types of games in order to determine when and whether the system converges to a stable strategy.Fictitious play uses a simple form of learning where an agent remembers everything the other agents have done and uses this information to build a probability distribution for the other agents’expected strategy.Formally,for the two agent(i and j)case we say that i maintains a weight function k i:S j→R+. The weight function changes over time as the agent learns.The weight function at time t is represented by k t i which keeps a count of how many times each strategy has been played.When at time t−1opponent j plays strategy s t−1jA B0,01,1˜s j∈S j k t i(˜s j).(2)Player i then determines the strategy that will give it the highest expected utility given that j will play each of its s j∈S j with probability Pr t i[s j].That is,i determines its best response to a probability distribution over j’s possible strategies.This amounts to i assuming that j’s strategy at each time is taken from somefixed but unknown probability distribution.Several interesting results have been derived by researches in this area.These results assume that all players are usingfictitious play.In[3]it was shown that the following two propositions hold.Proposition1.If s is a strict Nash equilibrium and it is played at time t then it will be played at all times greater than t.Intuitively we can see that if thefictitious play algorithm leads to all players to play the same Nash equilibrium then,afterward,they will increase the prob-ability that all others are playing the equilibrium.Since,by definition,the best response of a player when everyone else is playing a strict Nash equilibrium is to play the same equilibrium,all players will play the same strategy and the next time.The same holds true for every time after that.Proposition2.Iffictitious play converges to a pure strategy then that strategy must be a Nash equilibrium.We can show this by contradiction.Iffictitious play converges to a strategy that is not a Nash equilibrium then this means that the best response for at least one of the players is not the same as the convergent strategy.Therefore, that player will take that action at the next time,taking the system away from the strategy profile it was supposed to have converged to.An obvious problem with the solutions provided byfictitious play can be seen in the existence of infinite cycles of behaviors.An example is illustrated by thegame matrix in Figure3.If the players start with initial weights of k01(A)=1, k01(B)=1.5,k02(A)=1,and k02(B)=1.5they will both believe that the other will play B and will,therefore,play A.The weights will then be updated to k11(A)=2,k11(B)=1.5,k12(A)=2,and k12(B)=1.5.Next time,both agents will believe that the other will play A so both will play B.The agents will engage in an endless cycle where they alternatively play(A,A)and(B,B).The agents end up receiving the worst possible payoff.This example illustrates the type of problems we encounter when adding learning to multiagent systems.While we would hope that the machine learning algorithm we use will be able to discern this simple pattern and exploit it,most learning algorithms can easily fall into cycles that are not much complicated than this one.One common strategy for avoiding this problem is the use of randomness.Agents will sometimes take a random action in an effort to exit possible loops and to explore the search space.It is interesting to note that,as in the example from Figure3,it is often the case that the loops the agents fall in often reflect one of the mixed strategy Nash equilibria for the game.That is,(.5,.5)is a Nash equilibrium for this game.Unfortunately,if the agents are synchronized,as in this case,the implementation of a mixed strategy could lead to a lower payoff.Games with more than two players require that we decide whether the agent should learn individual models of each of the other agents independently or a joint probability distribution over their combined strategies.Individual mod-els assume that each agent operates independently while the joint distributions capture the possibility that the others agents’strategies are correlated.Unfor-tunately,for any interesting system the set of all possible strategy profiles is too large to explore—it grows exponentially with the number of agents.Therefore, most learning systems assume that all agents operate independently so they need to maintain only one model per agent.4.2Replicator DynamicsAnother widely studied model is replicator dynamics.This model assumes that the percentage of agents playing a particular strategy will grow in proportion to how well that strategy performs in the population.A homogeneous population of agents is assumed.The agents are randomly paired in order to play a symmetric game,that is,a game where both agents have the same set of possible strategies and receive the same payoffs for the same actions.The replicator dynamics model is meant to capture situations where agents reproduce in proportion to how well they are doing.Formally,we letφt(s)be the number of agents using strategy s at time t. We can then defineφt(s)θt(s)=to be the fraction of agents playing s at time t.The expected utility for an agent playing strategy s at time t is defined asu t(s)≡ s′∈Sθt(s′)u(s,s′),(4) where u(s,s′)is the utility than an agent playing s receives against an agent playing s′.Notice that this expected utility assumes that the agents face each other in pairs and choose their opponents randomly.In the replicator dynamics the reproduction rate for each agent is proportional to how well it did on the previous step,that is,φt+1(s)=φt(s)(1+u t(s)).(5) Notice that the number of agents playing a particular strategy will continue to increase as long as the expected utility for that strategy is greater than zero. Only strategies whose expected utility is negative will decrease in population.It is also true that under these dynamics the size of a population will constantly fluctuate.However,when studying replicator dynamics we ignore the absolute size of the population and focus on the fraction of the population playing a par-ticular strategy,i.e.,θt(s),as time goes on.We are also interested in determining if the system’s dynamics will converge to some strategy and,if so,which one.In order to study these systems using the standard solution concepts we view the fraction of agents playing each strategy as a mixed strategy for the game. Since the game is symmetric we can use that strategy as the strategy for both players,so it becomes a strategy profile.We say that the system is in a Nash equilibrium if the fraction of players playing each strategy is the same as the probability that the strategy will be played on a Nash equilibrium.In the case of a pure strategy Nash equilibrium this means that all players are playing the same strategy.An examination of these systems quickly leads to the conclusion that every Nash equilibrium is a steady state for the replicator dynamics.In the Nash equilibrium all the strategies have the same average payoffsince the fraction of other players playing each strategy matches the Nash equilibrium.This fact can be easily proven by contradiction.If an agent had a pure strategy that would return a higher utility than any other strategy then this strategy would be a best response to the Nash equilibrium.If this strategy was different from the Nash equilibrium then we would have a best response to the equilibrium which is not the equilibrium,so the system could not be at a Nash equilibrium.It has also been shown[4]that a stable steady state of the replicator dynam-ics is a Nash equilibrium.A stable steady state is one that,after suffering from a small perturbation,is pushed back to the same steady state by the system’s dynamics.These states are necessarily Nash equilibria because if they were not then there would exist some particular small perturbation which would take the system away from the steady state.This correspondence was further refined by Bomze[1]who showed that an asymptotically stable steady state corresponds toa Nash equilibrium that is trembling-hand perfect and isolated.That is,the sta-ble steady states are a refinement on Nash equilibria—only a few Nash equilibria can qualify.On the other hand,it is also possible that a replicator dynamics sys-tem will never converge.In fact,there are many examples of simple games with no asymptotically stable steady states.While replicator dynamics reflect some of the most troublesome aspects of learning in multiagent systems some differences are evident.These differences are mainly due to the replication assumption.Agents are not usually expected to replicate,instead they acquire the strategies of others.For example,in a real multiagent system all the agents might choose to play the strategy that performed best in the last round instead of choosing their next strategy in pro-portion to how well it did last time.As such,we cannot directly apply the results from replicator dynamics to multiagent systems.However,the convergence of the systems’dynamics to a Nash equilibrium does illustrate the importance of this solution concept as an attractor of learning agent’s dynamics.4.3Evolutionary Stable StrategiesAn Evolutionary Stable Strategy(ESS)is an equilibrium concept applied to dynamic systems such as the replicator dynamics system of the previous section. An ESS is an equilibrium strategy that can overcome the presence of a small number of invaders.That is,if the equilibrium strategy profile isωand small numberǫof invaders start playingω′then ESS states that the existing population should get a higher payoffagainst the new mixture(ǫω′+(1−ǫ)ω)than the invaders.It has been shown[9]that an ESS is an asymptotically stable steady state of the replicator dynamics.However,the converse need not be true—a stable state in the replicator dynamics does not need to be an ESS.This means that ESS is a further refinement of the solution concept provided by the replicator dynamics.ESS can be used when we need a very stable equilibrium concept.5Learning AgentsThe theory of learning in games provides the designer of multiagent systems with many useful tools for determining the possible equilibrium points of a system. Unfortunately,most multiagent systems with learning agents do not converge to an equilibrium.Designers use learning agents because they do not know,at design time,the specific circumstances that the agents will face at run time.If a designer knew the best strategy,that is,the Nash equilibrium strategy,for his agent then he would simply implement this strategy and avoid the complexities of implementing a learning algorithm.Therefore,the only times we will see a multiagent system with learning agents are when the designer cannot predict that an equilibrium solution will emerge.The two main reasons for this inability to predict the equilibrium solution of a system are the existence of unpredictable environmental changes that affectthe agents’payoffs and the fact that on many systems an agent only has access to its own set of payoffs—it does not know the payoffs of other agents.These two reasons make it impossible for a designer to predict which equilibria,if any,the system would converge to.However,the agents in the system are still playing a game for which an equilibrium exists,even if the designer cannot predict it at design-time.But,since the actual payoffs keep changing it is often the case that the agents are constantly changing their strategy in order to accommodate the new payoffs.Learning agents in a multiagent system are faced with a moving target func-tion problem[10].That is,as the agents change their behavior in an effort to max-imize their utility their payoffs for those actions change,changing the expected utility of their behavior.The system will likely have non-stationary dynamics—always changing in order to match the new goal.While game theory tells us where the equilibrium points are,given that the payoffs stayfixed,multiagent systems often never get to those points.A system designer needs to know how changes in the design of the system and learning algorithms will affect the time to convergence.This type of information can be determined by using CLRI theory.5.1CLRI TheoryThe CLRI theory[12]provides a formal method for analyzing a system composed of learning agents and determining how an agent’s learning is expected to affect the learning of other agents in the system.It assumes a system where each agent has a decision function that governs its behavior as well as a target function that describes the agent’s best possible behavior.The target function is unknown to the agent.The goal of the agent’s learning is to have its decision function be an exact duplicate of its target function.Of course,the target function keeps changing as a result of other agents’learning.Formally,CLRI theory assumes that there are N agents in the system.The world has a set of discrete states w∈W which are presented to the agent with a probability dictated by the probability distribution D(W).Each agent i∈N has a set of possible actions A i where|A i|≥2.Time is discrete and indexed by a variable t.At each time t all agents are presented with a new w∈D(W),take a simultaneous action,and receive some payoff.The scenario is similar to the one assumed byfictitious play except for the addition of w.Each agent i’s behavior is defined by a decision functionδt i(w):W→A. When i learns at time t that it is in state w it will take actionδt i(w).At any time there is an optimal function for i given by its target function∆t i(w).Agent i’s learning algorithm will try to reduce the discrepancy betweenδi and∆i by using the payoffs it receives for each action as clues since it does not have direct access to∆i.The probability that an agent will take a wrong action is given by its error e(δt i)=Pr[δt i(w)=∆t i(w)|w∈D(W)].As other agents learn and change their decision function,i’s target function will also change,leading to the moving target function problem,as depicted in Figure4.An agent’s error is based on afixed probability distribution over world states and a boolean matching between the decision and target functions.These con-δt +1i e (δt +1i )δt i Learn qq q q q q q q q q q q q e (δt i )∆t i MoveSomeone trying to build a multiagent system with learning agents would determine the appropriate values for c,l,r,and either v or I and then use )]=1−r i+v i |A i|r i−1E[e(δt+1i|A i|−1 (6) in order to determine the successive expected errors for a typical agent i.This equation relies on a definition of volatility in terms of impact given by(w)=∆t i(w)]∀w∈W v t i=Pr[∆t+1i=1− j∈N−i(1−I ji Pr[δt+1j(w)=δt j(w)]),(7)which makes the simplifying assumption that changes in agents’decision func-tions will not cancel each other out when calculating their impact on other agents.The difference equation(6)cannot,under most circumstances,be col-lapsed into a function of t so it must still be iterated over.On the other hand, a careful study of the function and the reasoning behind the choice of the CLRI parameter leads to an intuitive understanding of how changes in these parame-ters will be reflected in the function and,therefore,the system.A knowledgeable designer can simply use this added understanding to determine the expected be-havior of his system under various assumptions.An example of this approach is shown in[2].For example,it is easy to see that an agent’s learning rate and the system’s volatility together help to determine how fast,if ever,the agent will reach its target function.A large learning rate means that an agent will change its decision function to almost match the target function.Meanwhile,a low volatility means that the target function will not move much,so it will be easy for the agent to match it.Of course,this type of simple analysis ignores the common situation where the agent’s high learning rate is coupled with a high impact on other agents’target function making their volatility much higher.These agents might then have to increase their learning rate and thereby increase the original agent’s volatility.Equation(6)is most helpful in these type of feedback situations.5.2N-Level AgentsAnother issue that arises when building learning agents is the choice of a model-ing level.A designer must decide whether his agent will learn to correlate actions with rewards,or will try to learn to predict the expected actions of others and use these predictions along with knowledge of the problem domain to determine its actions,or will try to learn how other agents build models of other agents, etc.These choices are usually referred to as n-level modeling agents—an idea first presented in the recursive modeling method[5][6].A0-level agent is one that does not recognize the existence of other agents in the world.It learns which action to take in each possible state of the world because it receives a reward after its actions.The state is usually defined as a static snapshot of the observable aspects of the agent’s environment.A1-level agent recognizes that there are other agents in the world whose actions affect its payoff.It also has some knowledge that tells it the utility it will receive given any set of joint actions.This knowledge usually takes the form of a game matrix that only has utility values for the agent.The1-level agent observes the other agents’actions and builds probabilistic models of the other agents.It then uses these models to predict their action probability distribution and uses these distributions to determine its best possible action.A2-level agent believes that all other agents are1-level agents.It,therefore,builds models of their models of other agents based on the actions it thinks they have seen others take.In essence, the2-level agent applies the1-level algorithm to all other agents in an effort to predict their action probability distribution and uses these distributions to determine its best possible actions.A3-level agent believes that all other agents are2-level,an so ing these guidelines we can determine thatfictitious play (Section4.1)uses1-level agents while the replicator dynamics(Section4.2)uses 0-level agents.These categorizations help us to determine the relative computational costs of each approach and the machine-learning algorithms that are best suited for that learning problem.0-level is usually the easiest to implement since it only requires the learning of one function and no additional knowledge.1-level learn-ing requires us to build a model of every agent and can only be implemented if the agent has the knowledge that tells it which action to take given the set of actions that others have taken.This knowledge must be integrated into the agents.However,recent studies in layered learning[8]have shown how some knowledge could be learned in a“training”situation and thenfixed into the agent so that other knowledge that uses thefirst one can be learned,either at runtime or in another training situation.In general,a change in the level that an agent operates on implies a change on the learning problem and the knowledge built into the agent.Studies with n-level agents have shown[11]that an n-level agent will always perform better in a society full of(n-1)-level agents,and that the computational costs of increasing a level grow exponentially.Meanwhile,the utility gains to the agent grow smaller as the agents in the system increase their level,within an economic scenario.The reason is that an n-level agent is able to exploit the non-equilibrium dynamics of a system composed of(n-1)-level agents.However, as the agents increase their level the system reaches equilibrium faster so the advantages of strategic thinking are reduced—it is best to play the equilibrium strategy and not worry about what others might do.On the other hand,if all agents stopped learning then it would be very easy for a new learning agent to take advantage of them.As such,the research concludes that some of the agents should do some learning some of the time in order to preserve the robustness of the system,even if this learning does not have any direct results.6ConclusionWe have seen how game theory and the theory of learning in games provide us with various equilibrium solution concepts and often tell us when some of them will be reached by simple learning models.On the other hand,we have argued that the reason learning is used in a multiagent system is often because there is no known equilibrium or the equilibrium point keeps changing due to outside forces.We have also shown how the CLRI theory and n-level agents are attempts to characterize and predict,to a limited degree,the dynamics of a system given some basic learning parameters.We conclude that the problems faced by a designer of a learning multiagent systems cannot be solved solely with the tools of game theory.Game theory tells us about possible equilibrium points.However,learning agents are rarely at equilibrium,either because they are not sophisticated enough,because they lack information,or by design.There is a need to explore non-equilibirium sys-tems and to develop more predictive theories which,like CLRI,can tell us how changing either the parameters on the agents’learning algorithms or the rules of the game will affect the expected emergent behavior.References1.Bomze,I.:Noncoopertive two-person games in biology:A classification.Interna-tional Journal of Game Theory15(1986)31–372.Brooks,C.H.,Durfee,E.H.:Congregation formation in multiagent systems.Journalof Autonomous Agents and Multi-agent Systems(2002)to appear.3.Fudenberg,D.,Kreps,D.:Lectures on learning and equilibrium in strategic-formgames.Technical report,CORE Lecture Series(1990)4.Fudenberg,D.,Levine,D.K.:The Theory of Learning in Games.MIT Press(1998)5.Gmytrasiewicz,P.J.,Durfee.,E.H.:A rigorous,operational formalization of re-cursive modeling.In:Proceedings of the First International Conference on Multi-Agent Systems.(1995)125–1326.Gmytrasiewicz,P.J.,Durfee.,E.H.:Rational communication in multi-agent sys-tems.Autonomous Agents and Multi-Agent Systems Journal4(2001)233–2727.Mitchell,T.M.:Machine Learning.McGraw Hill(1997)8.Stone,P.:Layered Learning in Multiagent Systems.MIT Press(2000)9.Taylor,P.,Jonker,L.:Evolutionary stable strategies and game dynamics.Mathe-matical Biosciences16(1978)76–8310.Vidal,J.M.,Durfee,E.H.:The moving target function problem in multi-agentlearning.In:Proceedings of the Third International Conference on Multi-Agent Systems.(1998)11.Vidal,J.M.,Durfee,E.H.:Learning nested models in an information economy.Journal of Experimental and Theoretical Artificial Intelligence10(1998)291–308 12.Vidal,J.M.,Durfee,E.H.:Predicting the expected behavior of agents that learnabout agents:the CLRI framework.Autonomous Agents and Multi-Agent Systems (2002)。