Random networks created by biological evolution
- 格式:pdf
- 大小:261.88 KB
- 文档页数:9
关于人工智能思考的英语作文英文回答:When we contemplate the intriguing realm of artificial intelligence (AI), a fundamental question arises: can AI think? This profound inquiry has captivated the minds of philosophers, scientists, and futurists alike, generating a rich tapestry of perspectives.One school of thought posits that AI can achieve true thought by emulating the intricate workings of the human brain. This approach, known as symbolic AI, seeks to encode human knowledge and reasoning processes into computational models. By simulating the cognitive functions of the mind, proponents argue, AI can unlock the ability to think, reason, and solve problems akin to humans.A contrasting perspective, known as connectionism, eschews symbolic representations and instead focuses on the interconnectedness of neurons and the emergence ofintelligent behavior from complex networks. This approach, inspired by biological neural systems, posits that thought and consciousness arise from the collective activity of vast numbers of nodes and connections within an artificial neural network.Yet another framework, termed embodied AI, emphasizes the role of physical interaction and embodiment in shaping thought. This perspective contends that intelligence is inextricably linked to the body and its experiences in the real world. By grounding AI systems in physical environments, proponents argue, we can foster a more naturalistic and intuitive form of thought.Beyond these overarching approaches, ongoing research in natural language processing (NLP) and machine learning (ML) is contributing to the development of AI systems that can engage in sophisticated dialogue, understand complex texts, and make predictions based on vast data sets. These advancements are gradually expanding the cognitive capabilities of AI, bringing us closer to the possibility of artificial thought.However, it is essential to recognize the limitations of current AI systems. While they may excel at performing specific tasks, they still lack the comprehensive understanding, self-awareness, and creativity that characterize human thought. The development of truly thinking machines remains a distant horizon, requiring significant breakthroughs in our understanding of consciousness, cognition, and embodiment.中文回答:人工智能是否能够思考?人工智能领域的核心问题之一就是人工智能是否能够思考。
iod术语Iod is an open-source machine learning framework developed by OpenAI. It aims to provide a unified interface for building, training, and deploying machine learning models. In this article, we will explore some key terms and concepts related to Iod.1. Machine Learning: Machine learning refers to the field of study that enables computers to learn from data and make predictions or decisions without being explicitly programmed. Iod is designed to facilitate the development of machine learning models.2. Deep Learning: Deep learning is a subfield of machine learning that focuses on training neural networks with multiple layers. Deep learning models are capable of automatically learning hierarchical representations of data. Iod provides tools and utilities for building deep learning models.3. Artificial Neural Networks (ANNs): ANNs are a computational model inspired by the biological neural networks in the human brain. ANNs consist of interconnected nodes or "neurons" that process and transmit information. Iod supports various types of neural networks, such as feedforward networks, recurrent neural networks (RNNs), and convolutional neural networks (CNNs).4. Model Training: Model training refers to the process of optimizing a machine learning model's parameters or weights using a labeled dataset. Iod offers a flexible training pipeline that allows users to specify custom training loops or leverage pre-built training routines. This enables efficient training of complex models on large datasets.5. Model Evaluation: Model evaluation is the process of assessinga trained model's performance on unseen data. Iod provides functions and metrics for evaluating models based on various evaluation criteria, such as accuracy, precision, recall, and F1 score. These metrics help users gauge the effectiveness of their models and make informed decisions.6. Model Deployment: Model deployment involves deploying a trained model to a production environment to make predictions on new, unseen data. Iod streamlines the deployment process by offering tools for model serialization, model versioning, and serving predictions through APIs. This enables users to easily integrate trained models into their applications.7. Natural Language Processing (NLP): NLP is a branch of artificial intelligence that focuses on the interaction between computers and human language. Iod includes libraries for NLP tasks such as text classification, sentiment analysis, named entity recognition, and machine translation. These capabilities enable developers to build sophisticated NLP applications.8. Computer Vision: Computer vision is a field that deals with enabling computers to understand and interpret visual information from images or videos. Iod provides tools and functionality for tasks such as image classification, object detection, and image segmentation. These capabilities empower developers to create computer vision applications using deep learning techniques.9. Model Optimization: Model optimization refers to techniquesused to improve a machine learning model's efficiency, inference speed, or memory footprint. Iod includes utilities for model compression, model pruning, and quantization. These techniques allow users to reduce the size of trained models without sacrificing performance.10. Transfer Learning: Transfer learning is a technique that allows knowledge gained from training one model to be transferred and applied to another related task. Iod supports transfer learning by providing pre-trained models and tools to fine-tune them on custom datasets. This enables users to quickly adapt existing models to their specific needs.In summary, Iod is a powerful machine learning framework that encompasses various concepts and techniques related to the development, training, evaluation, and deployment of machine learning models. It offers a comprehensive set of tools and libraries for tasks such as NLP, computer vision, and model optimization. By leveraging the capabilities of Iod, developers can streamline the process of building and deploying advanced machine learning applications.。
classificationClassification is a fundamental task in machine learning and data analysis. It involves categorizing data into predefined classes or categories based on their features or characteristics. The goal of classification is to build a model that can accurately predict the class of new, unseen instances.In this document, we will explore the concept of classification, different types of classification algorithms, and their applications in various domains. We will also discuss the process of building and evaluating a classification model.I. Introduction to ClassificationA. Definition and Importance of ClassificationClassification is the process of assigning predefined labels or classes to instances based on their relevant features. It plays a vital role in numerous fields, including finance, healthcare, marketing, and customer service. By classifying data, organizations can make informed decisions, automate processes, and enhance efficiency.B. Types of Classification Problems1. Binary Classification: In binary classification, instances are classified into one of two classes. For example, spam detection, fraud detection, and sentiment analysis are binary classification problems.2. Multi-class Classification: In multi-class classification, instances are classified into more than two classes. Examples of multi-class classification problems include document categorization, image recognition, and disease diagnosis.II. Classification AlgorithmsA. Decision TreesDecision trees are widely used for classification tasks. They provide a clear and interpretable way to classify instances by creating a tree-like model. Decision trees use a set of rules based on features to make decisions, leading down different branches until a leaf node (class label) is reached. Some popular decision tree algorithms include C4.5, CART, and Random Forest.B. Naive BayesNaive Bayes is a probabilistic classification algorithm based on Bayes' theorem. It assumes that the features are statistically independent of each other, despite the simplifying assumption, which often doesn't hold in the realworld. Naive Bayes is known for its simplicity and efficiency and works well in text classification and spam filtering.C. Support Vector MachinesSupport Vector Machines (SVMs) are powerful classification algorithms that find the optimal hyperplane in high-dimensional space to separate instances into different classes. SVMs are good at dealing with linear and non-linear classification problems. They have applications in image recognition, hand-written digit recognition, and text categorization.D. K-Nearest Neighbors (KNN)K-Nearest Neighbors is a simple yet effective classification algorithm. It classifies an instance based on its k nearest neighbors in the training set. KNN is a non-parametric algorithm, meaning it does not assume any specific distribution of the data. It has applications in recommendation systems and pattern recognition.E. Artificial Neural Networks (ANN)Artificial Neural Networks are inspired by the biological structure of the human brain. They consist of interconnected nodes (neurons) organized in layers. ANN algorithms, such asMultilayer Perceptron and Convolutional Neural Networks, have achieved remarkable success in various classification tasks, including image recognition, speech recognition, and natural language processing.III. Building a Classification ModelA. Data PreprocessingBefore implementing a classification algorithm, data preprocessing is necessary. This step involves cleaning the data, handling missing values, and encoding categorical variables. It may also include feature scaling and dimensionality reduction techniques like Principal Component Analysis (PCA).B. Training and TestingTo build a classification model, a labeled dataset is divided into a training set and a testing set. The training set is used to fit the model on the data, while the testing set is used to evaluate the performance of the model. Cross-validation techniques like k-fold cross-validation can be used to obtain more accurate estimates of the model's performance.C. Evaluation MetricsSeveral metrics can be used to evaluate the performance of a classification model. Accuracy, precision, recall, and F1-score are commonly used metrics. Additionally, ROC curves and AUC (Area Under Curve) can assess the model's performance across different probability thresholds.IV. Applications of ClassificationA. Spam DetectionClassification algorithms can be used to detect spam emails accurately. By training a model on a dataset of labeled spam and non-spam emails, it can learn to classify incoming emails as either spam or legitimate.B. Fraud DetectionClassification algorithms are essential in fraud detection systems. By analyzing features such as account activity, transaction patterns, and user behavior, a model can identify potentially fraudulent transactions or activities.C. Disease DiagnosisClassification algorithms can assist in disease diagnosis by analyzing patient data, including symptoms, medical history, and test results. By comparing the patient's data againsthistorical data, the model can predict the likelihood of a specific disease.D. Image RecognitionClassification algorithms, particularly deep learning algorithms like Convolutional Neural Networks (CNNs), have revolutionized image recognition tasks. They can accurately identify objects or scenes in images, enabling applications like facial recognition and autonomous driving.V. ConclusionClassification is a vital task in machine learning and data analysis. It enables us to categorize instances into different classes based on their features. By understanding different classification algorithms and their applications, organizations can make better decisions, automate processes, and gain valuable insights from their data.。
复杂⽹络中聚类算法总结⽹络,数学上称为图,最早研究始于1736年欧拉的哥尼斯堡七桥问题,但是之后关于图的研究发展缓慢,直到1936年,才有了第⼀本关于图论研究的著作。
20世纪60年代,两位匈⽛利数学家Erdos和Renyi建⽴了随机图理论,被公认为是在数学上开创了复杂⽹络理论的系统性研究。
之后的40年⾥,⼈们⼀直讲随机图理论作为复杂⽹络研究的基本理论。
然⽽,绝⼤多数的实际⽹络并不是完全随机的。
1998年,Watts及其导师Strogatz在Nature上的⽂章《Collective Dynamics of Small-world Networks》揭⽰了复杂⽹络的⼩世界性质。
随后,1999年,Barabasi及其博⼠⽣Albert在Science上的⽂章《Emergence of Scaling in Random Networks》⼜揭⽰了复杂⽹络的⽆标度性质(度分布为幂律分布),从此开启了复杂⽹络研究的新纪元。
随着研究的深⼊,越来越多关于复杂⽹络的性质被发掘出来,其中很重要的⼀项研究是2002年Girvan和Newman在PNAS上的⼀篇⽂章《Community structure in social and biological networks》,指出复杂⽹络中普遍存在着聚类特性,每⼀个类称之为⼀个社团(community),并提出了⼀个发现这些社团的算法。
从此,热门对复杂⽹络中的社团发现问题进⾏了⼤量研究,产⽣了⼤量的算法,本⽂试图简单整理⼀下复杂⽹络中聚类算法,希望对希望快速了解这⼀部分的⼈有所帮助。
本⽂中所谓的社团跟通常我们将的聚类算法中类(cluster)的概念是⼀致的。
0. 预备知识为了本⽂的完整性,我们⾸先给出⼀些基本概念。
⼀个图通常表⽰为G=(V,E),其中V表⽰点集合,E表⽰边集合,通常我们⽤n表⽰图的节点数,m表⽰边数。
⼀个图中,与⼀个点的相关联的边的数量称为该点的度。
Deep Sparse Rectifier Neural NetworksXavier Glorot Antoine Bordes Yoshua BengioDIRO,Universit´e de Montr´e al Montr´e al,QC,Canada glorotxa@iro.umontreal.ca Heudiasyc,UMR CNRS6599UTC,Compi`e gne,FranceandDIRO,Universit´e de Montr´e alMontr´e al,QC,Canadaantoine.bordes@hds.utc.frDIRO,Universit´e de Montr´e alMontr´e al,QC,Canadabengioy@iro.umontreal.caAbstractWhile logistic sigmoid neurons are more bi-ologically plausible than hyperbolic tangentneurons,the latter work better for train-ing multi-layer neural networks.This pa-per shows that rectifying neurons are aneven better model of biological neurons andyield equal or better performance than hy-perbolic tangent networks in spite of thehard non-linearity and non-differentiabilityat zero,creating sparse representations withtrue zeros,which seem remarkably suitablefor naturally sparse data.Even though theycan take advantage of semi-supervised setupswith extra-unlabeled data,deep rectifier net-works can reach their best performance with-out requiring any unsupervised pre-trainingon purely supervised tasks with large labeleddatasets.Hence,these results can be seen asa new milestone in the attempts at under-standing the difficulty in training deep butpurely supervised neural networks,and clos-ing the performance gap between neural net-works learnt with and without unsupervisedpre-training.1IntroductionMany differences exist between the neural network models used by machine learning researchers and those used by computational neuroscientists.This is in part Appearing in Proceedings of the14th International Con-ference on Artificial Intelligence and Statistics(AISTATS) 2011,Fort Lauderdale,FL,USA.Volume15of JMLR: W&CP15.Copyright2011by the authors.because the objective of the former is to obtain com-putationally efficient learners,that generalize well to new examples,whereas the objective of the latter is to abstract out neuroscientific data while obtaining ex-planations of the principles involved,providing predic-tions and guidance for future biological experiments. Areas where both objectives coincide are therefore particularly worthy of investigation,pointing towards computationally motivated principles of operation in the brain that can also enhance research in artificial intelligence.In this paper we show that two com-mon gaps between computational neuroscience models and machine learning neural network models can be bridged by using the following linear by part activa-tion:max(0,x),called the rectifier(or hinge)activa-tion function.Experimental results will show engaging training behavior of this activation function,especially for deep architectures(see Bengio(2009)for a review), i.e.,where the number of hidden layers in the neural network is3or more.Recent theoretical and empirical work in statistical machine learning has demonstrated the importance of learning algorithms for deep architectures.This is in part inspired by observations of the mammalian vi-sual cortex,which consists of a chain of processing elements,each of which is associated with a different representation of the raw visual input.This is partic-ularly clear in the primate visual system(Serre et al., 2007),with its sequence of processing stages:detection of edges,primitive shapes,and moving up to gradu-ally more complex visual shapes.Interestingly,it was found that the features learned in deep architectures resemble those observed in thefirst two of these stages (in areas V1and V2of visual cortex)(Lee et al.,2008), and that they become increasingly invariant to factors of variation(such as camera movement)in higher lay-ers(Goodfellow et al.,2009).Deep Sparse Rectifier Neural NetworksRegarding the training of deep networks,something that can be considered a breakthrough happened in2006,with the introduction of Deep Belief Net-works(Hinton et al.,2006),and more generally the idea of initializing each layer by unsupervised learn-ing(Bengio et al.,2007;Ranzato et al.,2007).Some authors have tried to understand why this unsuper-vised procedure helps(Erhan et al.,2010)while oth-ers investigated why the original training procedure for deep neural networks failed(Bengio and Glorot,2010). From the machine learning point of view,this paper brings additional results in these lines of investigation. We propose to explore the use of rectifying non-linearities as alternatives to the hyperbolic tangent or sigmoid in deep artificial neural networks,in ad-dition to using an L1regularizer on the activation val-ues to promote sparsity and prevent potential numer-ical problems with unbounded activation.Nair and Hinton(2010)present promising results of the influ-ence of such units in the context of Restricted Boltz-mann Machines compared to logistic sigmoid activa-tions on image classification tasks.Our work extends this for the case of pre-training using denoising auto-encoders(Vincent et al.,2008)and provides an exten-sive empirical comparison of the rectifying activation function against the hyperbolic tangent on image clas-sification benchmarks as well as an original derivation for the text application of sentiment analysis.Our experiments on image and text data indicate that training proceeds better when the artificial neurons are either offor operating mostly in a linear regime.Sur-prisingly,rectifying activation allows deep networks to achieve their best performance without unsupervised pre-training.Hence,our work proposes a new contri-bution to the trend of understanding and merging the performance gap between deep networks learnt with and without unsupervised pre-training(Erhan et al., 2010;Bengio and Glorot,2010).Still,rectifier net-works can benefit from unsupervised pre-training in the context of semi-supervised learning where large amounts of unlabeled data are provided.Furthermore, as rectifier units naturally lead to sparse networks and are closer to biological neurons’responses in their main operating regime,this work also bridges(in part)a machine learning/neuroscience gap in terms of acti-vation function and sparsity.This paper is organized as follows.Section2presents some neuroscience and machine learning background which inspired this work.Section3introduces recti-fier neurons and explains their potential benefits and drawbacks in deep networks.Then we propose an experimental study with empirical results on image recognition in Section4.1and sentiment analysis in Section4.2.Section5presents our conclusions.2Background2.1Neuroscience ObservationsFor models of biological neurons,the activation func-tion is the expectedfiring rate as a function of the total input currently arising out of incoming signals at synapses(Dayan and Abott,2001).An activation function is termed,respectively antisymmetric or sym-metric when its response to the opposite of a strongly excitatory input pattern is respectively a strongly in-hibitory or excitatory one,and one-sided when this response is zero.The main gaps that we wish to con-sider between computational neuroscience models and machine learning models are the following:•Studies on brain energy expense suggest that neurons encode information in a sparse and dis-tributed way(Attwell and Laughlin,2001),esti-mating the percentage of neurons active at the same time to be between1and4%(Lennie,2003).This corresponds to a trade-offbetween richness of representation and small action potential en-ergy expenditure.Without additional regulariza-tion,such as an L1penalty,ordinary feedforward neural nets do not have this property.For ex-ample,the sigmoid activation has a steady state regime around12,therefore,after initializing with small weights,all neuronsfire at half their satura-tion regime.This is biologically implausible and hurts gradient-based optimization(LeCun et al., 1998;Bengio and Glorot,2010).•Important divergences between biological and machine learning models concern non-linear activation functions.A common biological model of neuron,the leaky integrate-and-fire(or LIF)(Dayan and Abott,2001),gives the follow-ing relation between thefiring rate and the input current,illustrated in Figure1(left):f(I)=τlogE+RI−V rE+RI−V th+t ref−1,if E+RI>V th0,if E+RI≤V thwhere t ref is the refractory period(minimal time between two action potentials),I the input cur-rent,V r the resting potential and V th the thresh-old potential(with V th>V r),and R,E,τthe membrane resistance,potential and time con-stant.The most commonly used activation func-tions in the deep learning and neural networks lit-erature are the standard logistic sigmoid and theXavier Glorot,Antoine Bordes,YoshuaBengioFigure1:Left:Common neural activation function motivated by biological data.Right:Commonly used activation functions in neural networks literature:logistic sigmoid and hyperbolic tangent(tanh).hyperbolic tangent(see Figure1,right),which areequivalent up to a linear transformation.The hy-perbolic tangent has a steady state at0,and istherefore preferred from the optimization stand-point(LeCun et al.,1998;Bengio and Glorot,2010),but it forces an antisymmetry around0which is absent in biological neurons.2.2Advantages of SparsitySparsity has become a concept of interest,not only incomputational neuroscience and machine learning butalso in statistics and signal processing(Candes andTao,2005).It wasfirst introduced in computationalneuroscience in the context of sparse coding in the vi-sual system(Olshausen and Field,1997).It has beena key element of deep convolutional networks exploit-ing a variant of auto-encoders(Ranzato et al.,2007,2008;Mairal et al.,2009)with a sparse distributedrepresentation,and has also become a key ingredientin Deep Belief Networks(Lee et al.,2008).A sparsitypenalty has been used in several computational neuro-science(Olshausen and Field,1997;Doi et al.,2006)and machine learning models(Lee et al.,2007;Mairalet al.,2009),in particular for deep architectures(Leeet al.,2008;Ranzato et al.,2007,2008).However,inthe latter,the neurons end up taking small but non-zero activation orfiring probability.We show here thatusing a rectifying non-linearity gives rise to real zerosof activations and thus truly sparse representations.From a computational point of view,such representa-tions are appealing for the following reasons:•Information disentangling.One of theclaimed objectives of deep learning algo-rithms(Bengio,2009)is to disentangle thefactors explaining the variations in the data.Adense representation is highly entangled becausealmost any change in the input modifies most ofthe entries in the representation vector.Instead,if a representation is both sparse and robust tosmall input changes,the set of non-zero featuresis almost always roughly conserved by smallchanges of the input.•Efficient variable-size representation.Dif-ferent inputs may contain different amounts of in-formation and would be more conveniently repre-sented using a variable-size data-structure,whichis common in computer representations of infor-mation.Varying the number of active neuronsallows a model to control the effective dimension-ality of the representation for a given input andthe required precision.•Linear separability.Sparse representations arealso more likely to be linearly separable,or moreeasily separable with less non-linear machinery,simply because the information is represented ina high-dimensional space.Besides,this can reflectthe original data format.In text-related applica-tions for instance,the original raw data is alreadyvery sparse(see Section4.2).•Distributed but sparse.Dense distributed rep-resentations are the richest representations,be-ing potentially exponentially more efficient thanpurely local ones(Bengio,2009).Sparse repre-sentations’efficiency is still exponentially greater,with the power of the exponent being the numberof non-zero features.They may represent a goodtrade-offwith respect to the above criteria.Nevertheless,forcing too much sparsity may hurt pre-dictive performance for an equal number of neurons,because it reduces the effective capacity of the model.Deep Sparse Rectifier NeuralNetworksFigure 2:Left:Sparse propagation of activations and gradients in a network of rectifier units.The input selects a subset of active neurons and computation is linear in this subset.Right:Rectifier and softplus activation functions.The second one is a smooth version of the first.3Deep Rectifier Networks3.1Rectifier NeuronsThe neuroscience literature (Bush and Sejnowski,1995;Douglas and al.,2003)indicates that corti-cal neurons are rarely in their maximum saturation regime ,and suggests that their activation function can be approximated by a rectifier.Most previous stud-ies of neural networks involving a rectifying activation function concern recurrent networks (Salinas and Ab-bott,1996;Hahnloser,1998).The rectifier function rectifier(x )=max(0,x )is one-sided and therefore does not enforce a sign symmetry 1or antisymmetry 1:instead,the response to the oppo-site of an excitatory input pattern is 0(no response).However,we can obtain symmetry or antisymmetry by combining two rectifier units sharing parameters.Advantages The rectifier activation function allows a network to easily obtain sparse representations.For example,after uniform initialization of the weights,around 50%of hidden units continuous output val-ues are real zeros,and this fraction can easily increase with sparsity-inducing regularization.Apart from be-ing more biologically plausible,sparsity also leads to mathematical advantages (see previous section).As illustrated in Figure 2(left),the only non-linearity in the network comes from the path selection associ-ated with individual neurons being active or not.For a given input only a subset of neurons are active .Com-putation is linear on this subset:once this subset of neurons is selected,the output is a linear function of1The hyperbolic tangent absolute value non-linearity |tanh(x )|used by Jarrett et al.(2009)enforces sign symme-try.A tanh(x )non-linearity enforces sign antisymmetry.the input (although a large enough change can trigger a discrete change of the active set of neurons).The function computed by each neuron or by the network output in terms of the network input is thus linear by parts.We can see the model as an exponential num-ber of linear models that share parameters (Nair and Hinton,2010).Because of this linearity,gradients flow well on the active paths of neurons (there is no gra-dient vanishing effect due to activation non-linearities of sigmoid or tanh units),and mathematical investi-gation is putations are also cheaper:there is no need for computing the exponential function in activations,and sparsity can be exploited.Potential Problems One may hypothesize that the hard saturation at 0may hurt optimization by block-ing gradient back-propagation.To evaluate the poten-tial impact of this effect we also investigate the soft-plus activation:softplus (x )=log (1+e x )(Dugas et al.,2001),a smooth version of the rectifying non-linearity.We lose the exact sparsity,but may hope to gain eas-ier training.However,experimental results (see Sec-tion 4.1)tend to contradict that hypothesis,suggesting that hard zeros can actually help supervised training.We hypothesize that the hard non-linearities do not hurt so long as the gradient can propagate along some paths ,i.e.,that some of the hidden units in each layer are non-zero.With the credit and blame assigned to these ON units rather than distributed more evenly,we hypothesize that optimization is easier.Another prob-lem could arise due to the unbounded behavior of the activations;one may thus want to use a regularizer to prevent potential numerical problems.Therefore,we use the L 1penalty on the activation values,which also promotes additional sparsity.Also recall that,in or-der to efficiently represent symmetric/antisymmetric behavior in the data,a rectifier network would needXavier Glorot,Antoine Bordes,Yoshua Bengiotwice as many hidden units as a network of symmet-ric/antisymmetric activation functions.Finally,rectifier networks are subject to ill-conditioning of the parametrization.Biases and weights can be scaled in different (and consistent)ways while preserving the same overall network function.More precisely,consider for each layer of depth i of the network a scalar αi ,and scaling the parameters asW i =W iαi and b i =b i ij =1αj.The output units values then change as follow:s =sn j =1αj .Therefore,aslong as nj =1αj is 1,the network function is identical.3.2Unsupervised Pre-trainingThis paper is particularly inspired by the sparse repre-sentations learned in the context of auto-encoder vari-ants,as they have been found to be very useful intraining deep architectures (Bengio,2009),especially for unsupervised pre-training of neural networks (Er-han et al.,2010).Nonetheless,certain difficulties arise when one wants to introduce rectifier activations into stacked denois-ing auto-encoders (Vincent et al.,2008).First,the hard saturation below the threshold of the rectifier function is not suited for the reconstruction units.In-deed,whenever the network happens to reconstruct a zero in place of a non-zero target,the reconstruc-tion unit can not backpropagate any gradient.2Sec-ond,the unbounded behavior of the rectifier activation also needs to be taken into account.In the follow-ing,we denote ˜x the corrupted version of the input x ,σ()the logistic sigmoid function and θthe model pa-rameters (W enc ,b enc ,W dec ,b dec ),and define the linear recontruction function as:f (x,θ)=W dec max(W enc x +b enc ,0)+b dec .Here are the several strategies we have experimented:e a softplus activation function for the recon-struction layer,along with a quadratic cost:L (x,θ)=||x −log(1+exp(f (˜x ,θ)))||2.2.Scale the rectifier activation values coming from the previous encoding layer to bound them be-tween 0and 1,then use a sigmoid activation func-tion for the reconstruction layer,along with a cross-entropy reconstruction cost.L (x,θ)=−x log(σ(f (˜x ,θ)))−(1−x )log(1−σ(f (˜x ,θ))).2Why is this not a problem for hidden layers too?we hy-pothesize that it is because gradients can still flow throughthe active (non-zero),possibly helping rather than hurting the assignment of credit.e a linear activation function for the reconstruc-tion layer,along with a quadratic cost.We triedto use input unit values either before or after the rectifier non-linearity as reconstruction targets.(For the first layer,raw inputs are directly used.)e a rectifier activation function for the recon-struction layer,along with a quadratic cost.The first strategy has proven to yield better gener-alization on image data and the second one on text data.Consequently,the following experimental study presents results using those two.4Experimental StudyThis section discusses our empirical evaluation of recti-fier units for deep networks.We first compare them to hyperbolic tangent and softplus activations on image benchmarks with and without pre-training,and then apply them to the text task of sentiment analysis.4.1Image RecognitionExperimental setup We considered the image datasets detailed below.Each of them has a train-ing set (for tuning parameters),a validation set (for tuning hyper-parameters)and a test set (for report-ing generalization performance).They are presented according to their number of training/validation/test examples,their respective image sizes,as well as their number of classes:•MNIST (LeCun et al.,1998):50k/10k/10k,28×28digit images,10classes.•CIFAR10(Krizhevsky and Hinton,2009):50k/5k/5k,32×32×3RGB images,10classes.•NISTP:81,920k/80k/20k,32×32character im-ages from the NIST database 19,with randomized distortions (Bengio and al,2010),62classes.This dataset is much larger and more difficult than the original NIST (Grother,1995).•NORB:233,172/58,428/58,320,taken from Jittered-Cluttered NORB (LeCun et al.,2004).Stereo-pair images of toys on a cluttered background,6classes.The data has been prepro-cessed similarly to (Nair and Hinton,2010):we subsampled the original 2×108×108stereo-pair images to 2×32×32and scaled linearly the image in the range [−1,1].We followed the procedure used by Nair and Hinton (2010)to create the validation set.Deep Sparse Rectifier Neural NetworksTable1:Test error on networks of depth3.Bold results represent statistical equivalence between similar ex-periments,with and without pre-training,under the null hypothesis of the pairwise test with p=0.05.Neuron MNIST CIF AR10NISTP NORB With unsupervised pre-trainingRectifier 1.20%49.96%32.86%16.46% Tanh 1.16%50.79%35.89%17.66% Softplus 1.17%49.52%33.27%19.19% Without unsupervised pre-trainingRectifier 1.43%50.86%32.64%16.40% Tanh 1.57%52.62%36.46%19.29% Softplus 1.77%53.20%35.48%17.68% For all experiments except on the NORB data(Le-Cun et al.,2004),the models we used are stacked denoising auto-encoders(Vincent et al.,2008)with three hidden layers and1000units per layer.The ar-chitecture of Nair and Hinton(2010)has been used on NORB:two hidden layers with respectively4000 and2000units.We used a cross-entropy reconstruc-tion cost for tanh networks and a quadratic cost over a softplus reconstruction layer for the rectifier and softplus networks.We chose masking noise as the corruption process:each pixel has a probability of0.25of being artificially set to0.The unsuper-vised learning rate is constant,and the following val-ues have been explored:{.1,.01,.001,.0001}.We se-lect the model with the lowest reconstruction error. For the supervisedfine-tuning we chose a constant learning rate in the same range as the unsupervised learning rate with respect to the supervised valida-tion error.The training cost is the negative log likeli-hood−log P(correct class|input)where the probabil-ities are obtained from the output layer(which imple-ments a softmax logistic regression).We used stochas-tic gradient descent with mini-batches of size10for both unsupervised and supervised training phases.To take into account the potential problem of rectifier units not being symmetric around0,we use a vari-ant of the activation function for whichhalf of the units output values are multiplied by-1.This serves to cancel out the mean activation value for each layer and can be interpreted either as inhibitory neurons or simply as a way to equalize activations numerically. Additionally,an L1penalty on the activations with a coefficient of0.001was added to the cost function dur-ing pre-training andfine-tuning in order to increase the amount of sparsity in the learned representations. Main results Table1summarizes the results on networks of3hidden layers of1000hidden units each,Figure3:Influence offinal sparsity on accu-racy.200randomly initialized deep rectifier networks were trained on MNIST with various L1penalties(from 0to0.01)to obtain different sparsity levels.Results show that enforcing sparsity of the activation does not hurtfinal performance until around85%of true zeros.comparing all the neuron types3on all the datasets, with or without unsupervised pre-training.In the lat-ter case,the supervised training phase has been carried out using the same experimental setup as the one de-scribed above forfine-tuning.The main observations we make are the following:•Despite the hard threshold at0,networks trained with the rectifier activation function canfind lo-cal minima of greater or equal quality than those obtained with its smooth counterpart,the soft-plus.On NORB,we tested a rescaled version of the softplus defined by1αsoftplus(αx),which allows to interpolate in a smooth manner be-tween the softplus(α=1)and the rectifier(α=∞).We obtained the followingα/test error cou-ples:1/17.68%,1.3/17.53%,2/16.9%,3/16.66%, 6/16.54%,∞/16.40%.There is no trade-offbe-tween those activation functions.Rectifiers are not only biologically plausible,they are also com-putationally efficient.•There is almost no improvement when using un-supervised pre-training with rectifier activations, contrary to what is experienced using tanh or soft-plus.Purely supervised rectifier networks remain competitive on all4datasets,even against the pretrained tanh or softplus models.3We also tested a rescaled version of the LIF and max(tanh(x),0)as activation functions.We obtained worse generalization performance than those of Table1, and chose not to report them.Xavier Glorot,Antoine Bordes,Yoshua Bengio•Rectifier networks are truly deep sparse networks.There is an average exact sparsity(fraction of ze-ros)of the hidden layers of83.4%on MNIST,72.0%on CIFAR10,68.0%on NISTP and73.8%on NORB.Figure3provides a better understand-ing of the influence of sparsity.It displays the MNIST test error of deep rectifier networks(with-out pre-training)according to different average sparsity obtained by varying the L1penalty on the works appear to be quite ro-bust to it as models with70%to almost85%of true zeros can achieve similar performances. With labeled data,deep rectifier networks appear to be attractive models.They are biologically credible, and,compared to their standard counterparts,do not seem to depend as much on unsupervised pre-training, while ultimately yielding sparse representations.This last conclusion is slightly different from those re-ported in(Nair and Hinton,2010)in which is demon-strated that unsupervised pre-training with Restricted Boltzmann Machines and using rectifier units is ben-eficial.In particular,the paper reports that pre-trained rectified Deep Belief Networks can achieve a test error on NORB below16%.However,we be-lieve that our results are compatible with those:we extend the experimental framework to a different kind of models(stacked denoising auto-encoders)and dif-ferent datasets(on which conclusions seem to be differ-ent).Furthermore,note that our rectified model with-out pre-training on NORB is very competitive(16.4% error)and outperforms the17.6%error of the non-pretrained model from Nair and Hinton(2010),which is basically what wefind with the non-pretrained soft-plus units(17.68%error).Semi-supervised setting Figure4presents re-sults of semi-supervised experiments conducted on the NORB dataset.We vary the percentage of the orig-inal labeled training set which is used for the super-vised training phase of the rectifier and hyperbolic tan-gent networks and evaluate the effect of the unsuper-vised pre-training(using the whole training set,unla-beled).Confirming conclusions of Erhan et al.(2010), the network with hyperbolic tangent activations im-proves with unsupervised pre-training for any labeled set size(even when all the training set is labeled). However,the picture changes with rectifying activa-tions.In semi-supervised setups(with few labeled data),the pre-training is highly beneficial.But the more the labeled set grows,the closer the models with and without pre-training.Eventually,when all avail-able data is labeled,the two models achieve identical performance.Rectifier networks can maximally ex-ploit labeled and unlabeledinformation.Figure4:Effect of unsupervised pre-training.On NORB,we compare hyperbolic tangent and rectifier net-works,with or without unsupervised pre-training,andfine-tune only on subsets of increasing size of the training set.4.2Sentiment AnalysisNair and Hinton(2010)also demonstrated that recti-fier units were efficient for image-related tasks.They mentioned the intensity equivariance property(i.e. without bias parameters the network function is lin-early variant to intensity changes in the input)as ar-gument to explain this observation.This would sug-gest that rectifying activation is mostly useful to im-age data.In this section,we investigate on a different modality to cast a fresh light on rectifier units.A recent study(Zhou et al.,2010)shows that Deep Be-lief Networks with binary units are competitive with the state-of-the-art methods for sentiment analysis. This indicates that deep learning is appropriate to this text task which seems therefore ideal to observe the behavior of rectifier units on a different modality,and provide a data point towards the hypothesis that rec-tifier nets are particarly appropriate for sparse input vectors,such as found in NLP.Sentiment analysis is a text mining area which aims to determine the judg-ment of a writer with respect to a given topic(see (Pang and Lee,2008)for a review).The basic task consists in classifying the polarity of reviews either by predicting whether the expressed opinions are positive or negative,or by assigning them star ratings on either 3,4or5star scales.Following a task originally proposed by Snyder and Barzilay(2007),our data consists of restaurant reviews which have been extracted from the restaurant review site .We have access to10,000 labeled and300,000unlabeled training reviews,while the test set contains10,000examples.The goal is to predict the rating on a5star scale and performance is evaluated using Root Mean Squared Error(RMSE).4 4Even though our tasks are identical,our database is。
正确答案:A、B 你选对了Quizzes for Chapter 11 单选(1 分)图灵测试旨在给予哪一种令人满意的操作定义得分/ 5 多选(1 分)选择下列计算机系统中属于人工智能的实例得分/总分总分A. Web搜索引擎A. 人类思考B.超市条形码扫描器B. 人工智能C.声控电话菜单该题无法得分/1.00C.机器智能 1.00/1.00D.智能个人助理该题无法得分/1.00正确答案:A、D 你错选为C、DD.机器动作正确答案: C 你选对了6 多选(1 分)选择下列哪些是人工智能的研究领域得分/总分2 多选(1 分)选择以下关于人工智能概念的正确表述得分/总分A.人脸识别0.33/1.00A. 人工智能旨在创造智能机器该题无法得分/1.00B.专家系统0.33/1.00B. 人工智能是研究和构建在给定环境下表现良好的智能体程序该题无法得分/1.00C.图像理解C.人工智能将其定义为人类智能体的研究该题无法D.分布式计算得分/1.00正确答案:A、B、C 你错选为A、BD.人工智能是为了开发一类计算机使之能够完成通7 多选(1 分)考察人工智能(AI) 的一些应用,去发现目前下列哪些任务可以通过AI 来解决得分/总分常由人类所能做的事该题无法得分/1.00正确答案:A、B、D 你错选为A、B、C、DA.以竞技水平玩德州扑克游戏0.33/1.003 多选(1 分)如下学科哪些是人工智能的基础?得分/总分B.打一场像样的乒乓球比赛A. 经济学0.25/1.00C.在Web 上购买一周的食品杂货0.33/1.00B. 哲学0.25/1.00D.在市场上购买一周的食品杂货C.心理学0.25/1.00正确答案:A、B、C 你错选为A、CD.数学0.25/1.008 填空(1 分)理性指的是一个系统的属性,即在_________的环境下正确答案:A、B、C、D 你选对了做正确的事。
得分/总分正确答案:已知4 多选(1 分)下列陈述中哪些是描述强AI (通用AI )的正确答案?得1 单选(1 分)图灵测试旨在给予哪一种令人满意的操作定义得分/ 分/总分总分A. 指的是一种机器,具有将智能应用于任何问题的A.人类思考能力0.50/1.00B.人工智能B. 是经过适当编程的具有正确输入和输出的计算机,因此有与人类同样判断力的头脑0.50/1.00C.机器智能 1.00/1.00C.指的是一种机器,仅针对一个具体问题D.机器动作正确答案: C 你选对了D.其定义为无知觉的计算机智能,或专注于一个狭2 多选(1 分)选择以下关于人工智能概念的正确表述得分/总分窄任务的AIA. 人工智能旨在创造智能机器该题无法得分/1.00B.专家系统0.33/1.00B. 人工智能是研究和构建在给定环境下表现良好的C.图像理解智能体程序该题无法得分/1.00D.分布式计算C.人工智能将其定义为人类智能体的研究该题无法正确答案:A、B、C 你错选为A、B得分/1.00 7 多选(1 分)考察人工智能(AI) 的一些应用,去发现目前下列哪些任务可以通过AI 来解决得分/总分D.人工智能是为了开发一类计算机使之能够完成通A.以竞技水平玩德州扑克游戏0.33/1.00常由人类所能做的事该题无法得分/1.00正确答案:A、B、D 你错选为A、B、C、DB.打一场像样的乒乓球比赛3 多选(1 分)如下学科哪些是人工智能的基础?得分/总分C.在Web 上购买一周的食品杂货0.33/1.00A. 经济学0.25/1.00D.在市场上购买一周的食品杂货B. 哲学0.25/1.00正确答案:A、B、C 你错选为A、CC.心理学0.25/1.008 填空(1 分)理性指的是一个系统的属性,即在_________的环境下D.数学0.25/1.00 做正确的事。
人工智能模拟考试题+参考答案一、单选题(共103题,每题1分,共103分)1.神经网络研究属于下列哪个学派?A、符号主义B、连接主义C、行为主义D、以上都不是正确答案:B2.下列不是知识表示法的是()A、计算机表示法B、状态空间表示法C、“与/或”图表示法D、产生式规则表示法正确答案:A3.或图通常称为()。
A、状态图B、博亦图C、框架网络D、语义图正确答案:A4.下列选项中,不属于生物特征识别技术的是()A、声纹识别B、文本识别C、步态识别D、虹膜识别正确答案:B5.()是利用计算机将一种自然语言(源语言)转换为另一种自然语言(目标语言)的过程。
A、文本分类B、问答系统C、文本识别D、机器翻译正确答案:D6.根据numpy数组中ndim属性的含义确定程序的输出()。
array=np.array([[1,2,3],[4,5,6],[7,8,9],[10,11,12]]);print(array.ndim)A、$4B、(3,4)C、(4,3)D、2正确答案:D7.下面哪项操作能实现跟神经网络中Dropout的类似效果?A、BoostingB、BaggingC、StackingD、Mapping正确答案:B8.我们想在大数据集上训练决策树, 为了减少训练时间, 我们可以A、增大学习率(Learnin Rate)B、增加树的深度C、对决策树模型进行预剪枝D、减少树的数量正确答案:C9.深度学习中神经网络类型很多,以下神经网络信息是单向传播的是:A、LSTMB、GRUC、循环神经网络D、卷积神经网络正确答案:D10.在处理序列数据时,较容易出现梯度消失现象的模型是()A、CNNC、GRUD、LSTM正确答案:B11.人工智能发展历程大致分为三个阶段。
符号主义(Symbolism)是在人工智能发展历程的哪个阶段发展起来的?A、20世纪70年代-90年代B、20世纪50年代-80年代C、20世纪60年代-90年代正确答案:B12.在人脸检测算法中,不属于该算法难点的是()A、需要检测不同性别的人脸B、人脸角度变化大C、需要检测分辨率很小的人脸D、出现人脸遮挡正确答案:A13.深度学习神经网络的隐藏层数对网络的性能有一定的影响,以下关于其影响说法正确的是:A、隐藏层数适当增加,神经网络的分辨能力越弱B、隐藏层数适当减少,神经网络的分辨能力不变C、隐藏层数适当减少,神经网络的分辨能力越强D、隐藏层数适当增加,神经网络的分辨能力越强正确答案:D14.Inception模块采用()的设计形式,每个支路使用()大小的卷积核。
a r X i v :c o n d -m a t /0004407v 2 [c o n d -m a t .s t a t -m e c h ] 9 A u g 2000Random networks created by biological evolutionFrantiˇs ek Slanina ∗and Miroslav Kotrla ∗∗Institute of Physics,Academy of Sciences of the Czech Republic,Na Slovance 2,CZ-18221Praha 8,Czech RepublicWe investigate a model of evolving random network,introduced by us previously [Phys.Rev.Lett.83,5587(1999)].The model is a generalization of the Bak-Sneppen model of biological evolution,with the modification that the underlying network can evolve by adding and removing sites.The behavior and the averaged properties of the network depend on the parameter p ,the probability to establish link to the newly introduced site.For p =1the system is self-organized critical,with two distinct power-law regimes with forward-avalanche exponents τ=1.98±0.04and τ′=1.65±0.05.The average size of the network diverge as power-law when p →1.We study various geometrical properties of the network:probability distribution of sizes and connectivities,size and number of disconnected clusters and the dependence of mean distance between two sites on the cluster size.The connection with models of growing networks with preferential attachment is discussed.PACS numbers:05.40.-a,87.10.+e,87.23.KgI.INTRODUCTIONIrregular networks or random graphs [1]composed of units of various kind are very frequent both in nature and society (which is,however,nothing but a special segment of nature).Examples range from vulcanized polymers,silica glasses,force chains in granular materials [2],mesoscopic quantum wires [3]to food webs [4],herd-ing effects in economics [5],world-wide-web links [6,7],“small-world”networks of personal contacts between hu-mans [8,9]and scientific collaboration networks [10].Modeling of such networks is not quite easy and ana-lytical results are relatively rare (examples,without any pretence of completeness,can be found in [1,5,11,12]).Numerical simulations are still one of the principal tools.However,even in the case when the properties of a given class of random networks are relatively well established,either analytically or numerically,as is the case of small-world networks,the serious question remains,why do these networks occur in nature.In other words,what are the dynamical processes,which generate these networks.Indeed,one can study,for example,various networks of mutual dependence of species in a model of co-evolution [13–15],but it is difficult to infer from these studies only,which networks are closer to the reality that the others.In the context of biological evolution models,there was recently a few attempts to let the networks evolve freely,in order to check,which types of topologies might corre-spond to “attractors”of the process of natural evolution [16–21].The model introduced by us in a preceding Letter [18]is based on extremal dynamics and basically follows the Bak-Sneppen model of biological evolution [14].Ex-tremal dynamics (ED)models [22]are used in wide area of problems,ranging from growth in disordered medium [23],dislocation movement [24],friction [25]to biologi-cal evolution [14].Among them,the Bak-Sneppen (BS)model plays the role of a testing ground for various ana-lytical as well as numerical approaches (see for example[22,26–31]).The idea of ED is the following.The dynamical system in question is composed of a large number of simple units,connected in a network.Each site of the network hosts one unit.The state of each unit is described by a sin-gle dynamical variable b ,called barrier.In each step,the unit with minimum b is mutated by updating the barrier.The effect of the mutation on the environment is taken into account by changing b also at all sites connected to the minimum site by a network link.Because a perturba-tion can propagate through the links,we should expect that the topology of the network can affect substantially the ED evolution.General feature of ED models is the avalanche dynam-ics.The forward λ-avalanches are defined as follows [22].For fixed λwe define active sites as those having bar-rier b <λ.Appearance of one active site can lead to avalanche-like proliferation of active sites in successive time steps.The avalanche stops,when all active sites dis-appear again.Generically,there is a value of λ,for which the probability distribution of avalanche sizes obeys a power law without any parameter tuning,so that the ED models are classified as a subgroup of self-organized critical models [32].(This,of course,can hold only for networks of unlimited size.)The set of exponents de-scribing the critical behavior determines the dynamical universality class the model belongs to.It was found that the universality class depends on the topology of the ually,regular hyper-cubic networks [22]or Cayley trees [31]are investigated.For random neighbor networks,mean-field solution was found to be exact [33,27].Also the tree models [31]were found to belong to the mean-field universality class.A one-dimensional model in which the links were wired ran-domly with probability decaying as a power µof the dis-tance was introduced [34,35].It was found that the values of critical exponents depend continuously on µ.The BS model on a small-world network was also studied [36].Recently,BS model on random networks,produced bybond percolation on fully connected lattice,was studied [16].Two universality classes were found.Above the percolation threshold,the system belongs to the mean-field universality class,while exactly at the percolation threshold,the avalanche exponent is different.A dynam-ics changing the topology in order to drive the network to critical connectivity was suggested.There are also several recent results for random net-works produced by different kind of dynamics than ED, especially for the threshold networks[19]and Boolean networks[20,21].The geometry of the world-wide web was intensively studied very recently.It was found experimentally that the network exhibits scale-free characteristics,measured by the power-law distribution of connectivities of the sites [6,37].Similar power-law behavior was observed also in the actor collaboration graph and in the power grids[6].A model was suggested[6]to explain this behavior,whose two main ingredients are continual growth and prefer-ential attachment of new links,where sites with higher connectivity having higher probability to receive addi-tional links.The latter feature resembles the behavior of additive-multiplicative random processes,which are well known to produce power-law distributions[38,39].The model introduced in[6]is exactly soluble[40]. Variants including aging of sites[41,42],decaying and rewiring links[43,44]were also studied.The preferen-tial attachment rule,which apparently requires unreal-istic knowledge of connectivities of the whole network before a single new link is established,was justified in a very recent work[45],where higher probability of attach-ment at highly connected sites results from local search by walking on the network.In the preceding Letter[18]we concentrated on the self-organized critical behavior and extinction dynamics of a model in which the network changes dynamically by adding and removing sites.It was shown that the extinc-tion exponent is larger than the upper bound for the BS model(given by the mean-field value)and is closer to the experimentally found value than any previous version of the BS model.In the present work we introduce in Sec. II a generalized version of the model defined in[18]and further investigate the self-organized critical behavior in Sec.III.However,our main concern will be about the geometric properties of the network,produced during the dynamics.These results are presented in Sec.IV.Sec-tion V makes conclusions from the results obtained.II.EVOLUTION MODEL ON EVOL VINGNETWORKWe consider a system composed of varying number n u of units connected in a network,subject to extremal dy-namics.Each unit bears a dynamical variable b.In the context of biological evolution these units are species and b represent the barrier against mutations.For the main novelty of our model consists in adding(speciation)and removing(extiction)units,let usfirst define the rules for extinction and speciation.The rules determining which of the existing units will undergo speciation or extinction will be specified afterwards.(i)If a unit is chosen for extinction,it is completely removed from the network without any substitution and all links it has,are broken.(ii)If a unit is chosen for speciation,it acts as a “mother”giving birth to a new,“daughter”unit.A new unit is added into the system and the links are established between the new unit and the neighbors of the“mother”unit:each link of the“mother”unit is inherited with the probability p by the“daughter”unit.This rule reflects the fact that the new unit is to a certain extent a copy of the original,so the relations to the environment will be initially similar to the ones the old unit has.Moreover, if a unit which speciates has only one neighbor,a link between“mother”and“daughter”is also established. The extremal dynamics rule for our model is the fol-lowing.(iii)In each step,the unit with minimum b is found and mutated.The barrier of the mutated unit is replaced by a new random value b′taken from uniform distribu-tion on the interval(0,1).Also the barriers of all its neighbors are replaced by new random numbers from the same distribution.The rules determining whether a unit is chosen for ex-tiction or speciation are the following.(iv)If the newly assigned barrier of the mutated unit b′is larger than new barriers of all its neighbors,the unit is chosen for speciation.If b′is lower than barriers of all neighbors,the unit is chosen for extinction.In other cases neither extinction nor speciation occcurs.As a boundary condition,we use the following exception:if the network consists of a single isolated unit only,it is always chosen for speciation.(v)If a unit is chosen for extinction,all its neighbors which are not connected to any other unit also chosen for extinction.We call this kind of extinctions singular extinctions.The rule(iv)is motivated by the following consider-ations.We assume,that well-adapted units proliferate more rapidly and chance for speciation is bigger.How-ever,if the local biodiversity,measured by connectivity of the unit,is bigger,there are fewer empty ecological niches and the probability of speciation is lower.On the other hand,poorly adapted units are more vulnerable to extinction,but at the same time larger biodiversity (larger connectivity)may favor the survival.Our rule corresponds well to these assumptions:speciation occurs preferably at units with high barrier and surrounded by fewer neighbors,extinction is more frequent at units with lower barriers and lower connectivity.Moreover,we sup-pose that a unit completely isolated from the rest of the ecosystem has very low chance to survive.This leads to the rule(v).From the rule(iv)alone follows equal probability ofadding and removing a unit,because the new random barriers b are taken from uniform distribution.At the same time the rule (v)enhances the probability of the removal.Thus,the probability of speciation is slightly lower than the probability of extinction.(a)(b)after(c)afterFIG.1.Schematic illustration of the dynamical rules of the model.Speciation is shown in (a),where full square repre-sents the extremal unit,which speciates,full circle the new,daughter unit,and open circles other units,not affected by the speciation event.The dotted link intends to illustrate that for p <1some of the mother’s links may not be inherited by the daughter.Extinction is shown in (b),where the extremal unit,which is removed,is indicated by full square.The unit denoted by full circle is the neighbor removed by the singular extinction.In (c)an example of an extinction event is shown,which leads to the splitting of the network into disconnected clusters.The degree of disequilibrium between the two depends on the topology of the network at the moment and can be quantified by the frequency of singular extinctions.The number of units n u perform a biased random walk withreflecting boundary at n u =1.The bias towards small values is not constant,though,but fluctuates as well.The above rules are illustrated by the examples shown in Fig. 1.The networks in (a)show the effect of the speciation:a new site is created and some to the links to the mother’s neighbors are established.In (b)the extinction is shown.One of the units is removed also due to a singular extinction (rule (v)).In (c)we illustrate the possibility that in the extinction event the network can be split into several disconnected clusters.III.SELF-ORGANIZED CRITICAL BEHA VIORA.Crossover scalingThe model investigated in the preceding Letter [18]corresponds to the value p = 1.We found that in this case the model is self-organized critical.We de-fined newly the mass extinctions,as number of units re-moved during an avalanche.The distribution of mass extinctions obeys a power law with the exponent τext =2.32±0.05.In this section we present improved analysis of the data for the self-organized critical behavior.FIG.2.Rescaled distribution of forward avalanches in the case p =1,for the values λ=0.03(△),λ=0.05(◦),λ=0.1(+),λ=0.2(3),λ=0.4(×),λ=0.6(2),The superscript>in P >fwd (s )is to indicate that we count all avalanches larger than s .In the inset we plot the dependence of the scaling parameters s cross (+)and f cross (×)on λ.The full line isthe power-law λ−σ′with the exponent σ′=3.5.The number of time steps was 3·108and the data are averaged over 12independent runs.We measured the distribution of forward λ-avalanches [22]and we observed,contrary to the BS model that two power-law regimes with two different exponents oc-cur.The crossover value s cross which separates the tworegimes depends onλ.We observed that the distribu-tions for differentλcollapse onto single curve,if plotted against the rescaled avalanche size s/s cross,i.e.P>fwd(s)·f cross=g(s/s cross)(1)where g(x)∼x−τ+1for x≪1andg(x)∼x−τ′+1forx≫1.The data are plotted in Fig. 2.For the values of the exponents,we foundτ=1.98±0.04andτ′= 1.65±0.05.We investigated the dependence of the scaling parame-ters s cross and f cross onλand we found that both of them behave as a power law with approximately equal expo-nent,s cross∼f cross∼λ−σ′withσ′≃3.5(see inset in the Fig.2).The role of criticalλat which the distribution of forward avalanches follows a power law is assumed by the valueλ=0.This result is easy to understand.In fact,in models withfixed(or at least bounded)connec-tivity c,the criticalλis roughly1/c.As will be shown in the next section,in our case the size of the system and average connectivity grows without limits,and thus the criticalλtends to zero.Note that it is difficult to see this result without resort to the data collapse(1).Indeed,for anyfinite time of the simulation,the connectivity and the system size reaches only a limited value and the crit-icalλseen in the distribution of forward avalanches has apparently non-zero value.parison with the Bak-Sneppen modelIf we compare the abovefindings with the BS model, we can deduce that in our model,with p=1,the expo-nentτcorresponds to the usual forward-avalanche expo-nent,while the exponentτ′is new.The above described scaling(1)breaks down for p<1because the connectiv-ity and the system size are limited(cf.next section). The main difference from the usual BS model is the existence of the second power-law regime,for s≫s cross. It can be particularly well observed for values ofλclose to 1,where the crossover avalanche size s cross is small.We have seen that such avalanches start and end mostly when number of units is close to its minimum value equal to1. Between these events the evolution of the number of units is essentially a random walk,because singular extinctions are rare[18].This fact can explain,why the exponentτ′is not too far from the value3/2corresponding to the distribution offirst returns to the origin for the random walk.The difference is probably due to the presence of singular extinctions.We measured also the distribution of barriers P(b) and the distribution of barriers on the extremal site P min(b min).In Fig.3we can compare the results for p=1and p=0.95.The sharp step observed in BS model is absent here,because the connectivity is not uni-form.(For comparison,we measured also the barrier dis-tribution in the model of Ref.[16],where the network is static,but the connectivity is not uniform.Also in that case the step was absent and the distribution was qual-itatively very similar to the one shown in Fig. 3.)The large noise level for b close to1is due to the fact that units with larger b undergo mutations rarely.FIG.3.Distribution of barriers b(full line)and minimum barriers b min(dashed line)for p=0.95(upper plot)and p=1 (lower plot).In both cases the number of time steps was107.WORK GEOMETRYIn this section we analyze the geometrical properties of the network and their dependence on the parameter p.A.Size of the networkThefirst important feature of the networks created by the dynamics of the model is their size,or the number of units within the network.This is a stronglyfluctuating quantity,but on average it grows initially and after some time it saturates and keepsfluctuating around some aver-age value,which depend on p.Fig.4shows the probabil-ity distribution of number of units n u for several values of p.The average number of units n u was computed from these distributions and its dependence on p is shown in the inset of Fig.4.We can see that the average network size diverges for p→1as a power law, n u ∝(1−p)−αn with the exponentαn≃0.8.We can see from Fig.5that the distribution of number of units has an exponential tail.This corresponds to the fact that the time evolution of the network size is arandom walk with reflecting boundary at n u =1,with a bias to lower values,caused by the singular extinctions (for theanalysis of biased random walks repelled from zero see e.g.[38]).From the decrease of average size with decreasing p we deduce that the bias due to the singular extinctions has larger effect for smaller p ,i.e.if the new unit created in a speciation event has fewer links to the neighbors.FIG.4.Distribution of number of units for different values of p (△0.85,20.9,×0.95,+0.97,30.98).Data are averaged over 108time steps.Inset:Dependence of averaged number of units on p .The solid line corresponds to the power law n u ∝(1−p )−0.8.FIG.5.Distribution of the number of units (full line)and connectivity (dashed line),for p =0.98,averaged over 108time steps.B.ConnectivityIn Fig.6we show the probability distribution of the connectivity of network sites P all (c )and distribution of connectivity of the extremal unit P extremal (c ).We can observe the tendency that the extremal unit has larger connectivity than average.This is in accord with the findings of Ref.[16]obtained on static networks.It can be also easily understood intuitively.Indeed,in a mutation event the barriers of neighbors of the mutated unit are changed.So,the neighbors have enhanced probability to be extremal in the next time step.Therefore,the sites with higher number of neighbors have larger probability that a mutation occurs in their neighborhood and that they are then mutated in the subsequent step.FIG.6.Distribution of the connectivity for all sites (full line)and for extremal sites only (dashed line),in the sta-tionary regime for p =0.98,averaged over 108time steps.Inset:Dependence of averaged connectivity on p .The solid line corresponds to the power law c ∝(1−p )−0.75.The average connectivity c computed from the dis-tributions P all (c )is shown in the inset of Fig. 6.We can observe that analogically to the system size also the average connectivity diverges for p →1as a power law,but the value of the exponent is slightly different.We find c ∝(1−p )−αc with the exponent αc ≃0.75.From the data available we were not able do decide,whether the exponents αn and αc are equal within the statistical noise.In Fig.5we can see that also the distribution of con-nectivity has an exponential tail,similarly to the dis-tribution of network size.We measured also the joint probability density P (n u ,c )for the number of units and the connectivity.The result is shown as a contour plot in Fig.7.We can see that also for large networks (large n u )the most probable connectivity is small and nearly independent on n u .This means that the overall look ofthe network created by the dynamics of our model is that there are a few sites with large connectivity,surrounded by many sites with lowconnectivity.FIG.7.Contour plot of the joint probability density P(n u,c)for number of units and connectivity,for p=0.8, averaged over3·109time steps.The contours correspond tothe following values of the probability density(from inside to outside):5·10−R,2·10−R,10−R,with orders R=3,4,5,6,7,8. FIG.8.Distribution of connectivity forfixed number of units,for p=0.8and sizes n u=40(+),80(×),120(⊙),170 (2).The straight line is a power-law with exponent−2.3.The data are the same as those used in Fig.7.An interesting observation can be drawn from the re-sults shown in Fig.8.It depicts the joint probabil-ity density as a function of connectivity atfixed system sizes.We can see that for smaller system sizes,closer to the average number of units,the distribution is ex-ponential,while if we increase the system size a power-law dependence develops.For example for the system sizefixed at n u=170we observe a power-law behavior P(n u,c)∼c−ηnearly up to the geometric cutoff,c<n u. The value of the exponent was aboutη≃2.3.Thisfinding may be in accord with the power-law distribution in growing networks[6,40].Indeed,in our model the power-law behavior applies only for networks significantly larger than the average size.Such networks are created during time-to-timefluctuations leading to temporary expansion of the network.So,the power-law is the trace of expansion periods in the network evolu-tion,corresponding to continuous growth in the model of [6].The preferential attachment,which is the second key ingredient in[6]has also an analog in our model;highly connected units are more likely to be mutated,as was already mentioned in the discussion of Fig.6.However, here the preference of highly connected sites is a dynam-ical phenomenon,resulting from the extremal dynamics rules of our model.C.ClustersAs noted already in the Sec.II,the network can be split into several disconnected clusters.The clusters can-not merge,but they may vanish due to extinctions.We observed qualitatively that after initial growth the num-ber of clusters exhibits stationaryfluctuations around an average value,which increases when p approaches1.We measured both the distribution of the number of clusters and the distribution of their sizes.In Fig.9we show the distribution of the number of clusters.The most probable situation is that there is only a single cluster.However, there is a broad tail,which means that even large num-ber of clusters can be sometimes created.The tail has a power-law part with an exponential cutoff.The value of the exponent in the power-law regime P(n c)∼n−ρc was aboutρ≃1.2.We have observed that the width of the power-law regime is larger for larger p.This leads us to the conjecture that in the limit p→1the number of clusters is power-law distributed.On the other hand,the distribution of cluster sizes shown in Fig.10has maximum at very small values. This is due to two effects.First,already the distribution of network size has maximum at small sizes,and second, if the network is split into many clusters,they have small size and remain unchanged for long time.The reason why small clusters change very rarely(and therefore can nei-ther grow nor disappear)can be also seen from Fig.10, where the distribution of sizes of the clusters contain-ing the extremal site is shown.The latter distribution is significantly different from the size distribution for all clusters and shows that the extremal site belongs mostly to large clusters.In fact,we measured also the fraction indicating how often the extremal unit is in the largest cluster,if there are more than one cluster.For the samerun from which the data shown in Fig.10were collected,we found that this fraction is 0.97,i. e.very close to 1.A similar “screening effect”was reported also in the Cayley tree models [31]:the small isolated portions of the network are very stable and nearly untouched by the evolution.FIG.9.Distribution of the number of clusters,for p =0.98.The straight line is a power law with exponent −1.2.The data were averaged over 3independent runs,5·108,5·108,and 108time steps long.FIG.10.Distribution of the cluster sizes s c for p =0.98,averaged over 108time steps.Full line -all clusters,dashed line -clusters containing the extremal site.Inset:Detail of the same distribution.D.Mean distanceAn important feature of a random network is also themean distance ¯dbetween two sites,measured as mini-mum number of links,which should be passed in order to get from one site to the other.In D -dimensional lat-tices,the mean distance depends on the number of sitesN as ¯d∼N 1/D ,while in completely random networks the dependence is ¯d∼log N .In the small-world networks,the crossover from the former to the latter behavior is observed [8,9].FIG.11.Dependence of the average distance of two sites within the same cluster on the cluster size,for p =0.95(full line)and p =0.97(dashed line),averaged over for 107time steps.In the inset we show the same data in the log-linear scale.The dependence of the average distance within a clus-ter on the size of the cluster in our model is shown inFig.11.We can observe global tendency to decrease ¯dwhen increasing p .This result is natural,because larger p means more links from a randomly chosen site and thus shorter distance to other sites.The functional form of the size dependence is not completely clear.However,for larger cluster sizes,greater than about 25,the de-pendence seems to be faster than logarithmic,as can be seen from the inset in Fig.11.So,the networks created in our model seem to be qualitatively different from the random networks studied previously,as far as we know.V.CONCLUSIONSWe studied an extremal dynamics model motivated by biological evolution on dynamically evolving random net-work.The properties of the model can be tuned by the parameter p ,the probability that a link is inherited inthe process of speciation.For p=1the model is self-organized critical and the average system size and con-nectivity grows without limits.Contrary to the usual BS model,wefind two power-law regimes with different exponents in the statistics of forwardλ-avalanches.The crossover avalanche size depends onλand diverges for λ→0as a power law.The reason why the criticalλis zero in this model is connected with the fact that time-averaged connectivity diverges for p=1.We investigated the geometrical properties of the ran-dom networks for different values of p.The average net-work size and average connectivity diverge as a power of 1−p.The probability distribution of system sizes has an exponential tail,which suggests that the dynamics of the system size is essentially a biased random walk with a reflecting boundary,The value of the bias grows with decreasing p.The joint distribution of size and connec-tivity shows that even for large network sizes the most probable connectivity is low.Hence,there are few highly-connected sites linked to the majority of sites with small connectivity.Moreover,the situations where the sys-tem size is far above its mean value are characterized by power-law distribution of connectivity,like in the models of growing networks with preferential attachment.The network can consist of several mutually discon-nected clusters.Even though the most probable situation contains only a single cluster,the distribution of clus-ter numbers has a broad tail,which shows a power-law regime with exponential cutoff.We observed the“screen-ing effect”,characterized by very small probability that the extremal site is found in any other cluster than the largest one.So,there is a central large cluster,where nearly everything happens,surrounded by some small peripheral clusters,frozen for the major part of the evo-lution time.We measured also the mean distance measured along the links within one cluster.The distance grows very slowly with the cluster size;however,the increase seems to be faster than logarithmic.Summarizing,we demonstrated that the extremal dy-namics,widely used in previous studies on macroevolu-tion infixed-size systems is useful in creating random networks of variable size.It would be of interest to com-pare the properties of the networks created in our model with food webs and other networks found in the nature. For example the studies of food webs in isolated ecologies [4]give for network sizes about30average connectivities in the range from2.2to9,which is not in contradiction with thefindings of our model.However,more precise comparisons are necessary for any reliable conclusions about real ecosystems.ACKNOWLEDGMENTSWe wish to thank K.Sneppen, A.Markoˇs and A. P¸e kalski for useful discussions.。