Cluster Stability Analysis using Sub-sampling

格式：pdf
大小：144.82 KB
文档页数：7

下载文档原格式

A MATLAB Toolbox and its Web based Variant for Fuzzy Cluster Analysis

A MATLAB Toolbox and its Web based Variantfor Fuzzy Cluster AnalysisTamas Kenesei,Balazs Balasko,and Janos AbonyiUniversity of Pannonia,Department of Process Engineering,P.O.Box158,H-8201Veszprem,Hungary,abonyij@fmt.uni-pannon.huwww.fmt.vein.hu/softcompAbstract.Nowadays due to the yearly multiplying data comes always the need for useful methods,algorithms,that make the processing of these data easier.For the solution of this problem data mining tools come into existence,to which clustering algorithms belong.The purpose of this paper is to propose a continuously extensible,standard tool,which is useful for any MATLAB user for one’s aim.The toolbox contains crisp and fuzzy clustering algo-rithms,validity indexes and linear and nonlinear visualization methods for high-dimensional data.The web-based prototype version of the toolbox already has been developed.It means that users do not need to have MATLAB software and programming knowledge,but only a web browser and they can load their own data into the web server and download the results because the program codes run on Matlab Web Server with a developed data mining portal, which is found in the following url:www.pe.vein.hu/datamine.The portal is in test phase,with limited services.The Fuzzy Clustering and Data Analysis Toolbox with User’s Guide is available at /fileexchange.1IntroductionIn this paper we propose a MATLAB toolbox for data analysis based on clustering and its application via Internet.Data analysis and data mining methods are more and more important because lots of data is being collected and warehoused in re-cent years since these data deﬁnitely have the potential to provide information.The deﬁnition of data mining is extraction of interesting(non-trivial,implicit,previ-ously unknown and potentially useful)information or patterns from data in large databases.The tasks of data mining can be very different.We can group the data mining tools and processing algorithms in the following primary data mining meth-ods:Classiﬁcation,Regression,Clustering,Summarization,Dependency Modeling, Change and Deviation Detection.Many MATLAB toolboxes have been developed in several researchﬁelds in recent years.A MATLAB toolbox is presented for Self Organizing Map in[1],an-other one is for Bayes Net in[2]and another is for dimensional analysis in[3].The so called KERNEL toolbox can be used for knowledge extraction and reﬁnement based on neural learning[4].Robert Babuska developed a toolbox for fuzzy model identiﬁcation[5].These toolboxes are available in the Internet but none of them can be used without MATLAB.The web based variant of the proposed toolbox enablesMATLAB independent usage and it does not make demand on the client computers because it runs on the web server.The proposed toolbox contains clustering methods and visualization techniques based on clustering.A cluster is a collection of data objects that are similar to one another within the same cluster and dissimilar to the objects in other clusters.Clus-ter analysis is grouping a set of data objects into clusters without any predeﬁned classes so clustering is unsupervised classiﬁcation.Clustering algorithms can be partitioning,hierarchy,density-based,grid-based or model-based methods.Objective function based fuzzy clustering algorithms have been used extensively for various problems such as pattern recognition[6],data analysis[7],image pro-cessing[8]and fuzzy modelling[9].Fuzzy clustering algorithms partition the data set into(partially)overlapping groups in a way that clusters describe an underlying structure within the data.To obtain a good result,a number of issues are of impor-tance.These concern the shape and the volume of the clusters,the initialization of the algorithm,the distribution of the data patterns and the number of clusters.This toolbox contains objective function based partitioning algorithms:they construct various partitions and then evaluate them by some criterion to minimize an objective function that is based on the distance between the cluster prototypes and the data points.The toolbox contains the k-means,k-medoid(crisp),fuzzy c-means, Gustafson-Kessel and Gath-Geva(fuzzy)clustering methods and other important tools such as methods for determining the number of clusters and for visualization of the clustering results.The toolbox contains method for visualization of high-dimensional data.Visual-ization is a technique that projects data in higher dimensions to data in lower dimen-sions while trying to preserve the distances between all points.It can be very use-ful because it can(i)identify meaningful underlying dimensions that could explain similarities or dissimilarities in the data,(ii)detect underlying structure and(iii)re-duction the data dimension and reveal relationships.Nonlinear mapping methods are often based on the results of a clustering algorithm so the clustering and visual-ization algorithms have a strong connection.The so-called online data mining has greater and greater importance in our in-formation society.For this purpose we developed a tool which enables us to use data mining methods via the internet,where the application is running in the server side not in the client computers,so the resources of client computers are free for other applications.For this purpose the client has to have only a web browser.The paper is organized as follows.Section2presents the theoretical base of the toolbox and Section3gives application examples to prove the applicability of this Toolbox.Section4presents an application by which it can be used via internet without downloading and installing the toolbox.Section5contains the conclusions.2Fuzzy Clustering and Data Analysis ToolboxThe objective of cluster analysis is the classiﬁcation of objects according to simi-larities among them,and organizing of data into groups.Clustering techniques are among the unsupervised methods,they do not use prior class identiﬁers.The main potential of clustering is to detect the underlying structure in data,not only for clas-siﬁcation and pattern recognition,but for model reduction and optimization.Clus-tering techniques can be applied to data that is quantitative(numerical),qualitative (categoric),or a mixture of both.In this paper,the clustering of quantitative data is considered.Since clusters can formally be seen as subsets of the data set,one possible clas-siﬁcation of clustering methods can be according to whether the subsets are fuzzy or crisp(hard).Hard clustering methods are based on classical set theory,and require that an object either does or does not belong to a cluster.Fuzzy clustering methods allow objects to belong to several clusters simultaneously,with different degrees of membership.In many real situations,fuzzy clustering is more natural than hard clus-tering,as objects on the boundaries between several classes are not forced to fully belong to one of the classes,but rather are assigned membership degrees between0 and1indicating their partial memberships.Different classiﬁcations can be related to the algorithmic approach of the clus-tering techniques.In this work we have worked out a toolbox for the partitioning methods,especially for hard and fuzzy partition methods.In the following part of this section we brieﬂy discuss the applied and well-known clustering methods,validity indices and algorithms for visualization of clus-ters.As generic notation,c will denote the number of clusters,N the number of data points and n the dimension of each data point.2.1Clustering AlgorithmsThe k-means and k-medoid algorithms are hard partitioning methods and they are simple and popular,though them results are not always reliable and these algorithms have numerical problems as well.The k-means and k-medoid algorithms allocates each data point to one of c clusters to minimize the within-cluster sum of squares:c ∑i=1∑k∈A i||x k−v i||2(1)where A i is a set of objects(data points)in the i-th cluster and v i is the mean for that points over cluster i.In k-means clustering the cluster prototype is a point.In k-medoid clustering the cluster centers are the nearest objects to the mean of data in one clusterThe fuzzy c-means algorithm(FCM)can be seen as the fuzziﬁed version of the k-means algorithm and is based on the minimization of an objective function called c-means functional:J(X;U,V)=c∑i=1N∑k=1(µik)m x k−v i 2A(2)where V=[v1,v2,...,v c],v i∈R n is a vector of cluster prototypes(centers),whichhave to be determined,D2ikA = x k−v i 2A=(x k−v i)T A(x k−v i)is a squared inner-product distance norm,and the N×c matrix U=[µik]represents the fuzzy partitions, whereµik denotes the membership degree that the i th data point belongs to the k th cluster.Its conditions are given by:µi j∈[0,1],∀i,k,c∑k=1µik=1,∀i,0<N∑i=1µik<N,∀k.(3)FCM algorithm canﬁnd only clusters with the same shape and size because the distance norm A is not adaptive and it is often Euclidean norm(spherical clusters). The solution can be given by Lagrange multiplier method.Gustafson-Kessel algorithm(GK)is the extended version of the standard fuzzy c-means algorithm by employing an adaptive distance norm,in order to detect clus-ters of different geometrical shapes in one data set.Each cluster has its own norm-inducing matrix A i.The objective function cannot be directly minimized with re-spect to A i,since it is linear in A i.This means that J can be made as small as desired by simply making A i less positive deﬁnite.To obtain a feasible solution,A i must be constrained in some way.The usual way of accomplishing this is to constrain the determinant of A i.Allowing the matrix A i to vary with its determinantﬁxed corre-sponds to optimizing the cluster’s shape while its volume remains constant so GK algorithm canﬁnd clusters with different shape but with the same size[10].Gath-Geva algorithm(GG)is based on the fuzzy maximum likelihood esti-mation and it is able to detect clusters of varying shapes,sizes and densities.The cluster covariance matrix is used in conjunction with an”exponential”distance,and the clusters are not constrained in volume.However,this algorithm is less robust in the sense that it needs a good initialization,since due to the exponential distance norm,it converges to a near local optimum[11].2.2ValidationCluster validity refers to the problem whether a given fuzzy partitionﬁts to the data all.The clustering algorithm always tries toﬁnd the bestﬁt for aﬁxed number of clusters and the parameterized cluster shapes.However this does not mean that even the bestﬁt is meaningful at all.Either the number of clusters might be wrong or the cluster shapes might not correspond to the groups in the data,if the data can be grouped in a meaningful way at all.Two main approaches to determining the appropriate number of clusters in data can be distinguished:•Starting with a sufﬁciently large number of clusters,and successively reducing this number by merging clusters that are similar(compatible)with respect to some predeﬁned criteria.This approach is called compatible cluster merging[12].•Clustering data for different values of c,and using validity measures to assess the goodness of the obtained partitions.Different scalar validity measures have been proposed in the literature,none of them is perfect by oneself,therefore we used several indexes in our Toolbox.De-tailed description about the applied indexes can be found in the literature so we just make mention of them in this section:Partition Coefﬁcient(PC),Classiﬁca-tion Entropy(CE),Partition Index(SC),Separation Index(S),Xie and Beni’s Index (XB),Dunn’s Index(DI)and Alternative Dunn Index(ADI).Note,that the only difference of SC,S and XB is the approach of the separation of clusters.In the case of overlapped clusters the values of DI and ADI are not really reliable because of re-partitioning the results with the hard partition method.2.3VisualizationThe clustering-based data mining tools are getting popular,since they are able to ”learn”the mapping of functions and systems or explore structures and classes in the data.There are often high-dimensional data in practice and it can be practical if we can see the results of the clustering(e.g.for checking the the results orﬁnding out the underlying structure of the data).For this purpose several methods can be used.The Principal Component Analysis maps the data points into a lower dimen-sional space,which is useful in the analysis and visualization of the correlated high-dimensional data.This mapping is based on the eigenvector-eigenvalues decompo-sition of F covariance matrix and uses only theﬁrst few nonzero eigenvalues and the corresponding eigenvectors.[1]The Sammon mapping method can be used for the visualization of the clus-tering results,which preserves interpattern distances.This mapping methodﬁnds N points in a q-dimensional data space,where the original data are from a higher n-dimensional space.The interpoint distances measured in the n-dimensional space approximate the corresponding interpoint distances in the q-dimensional space.This is achieved by minimizing an error criterion called Sammon’s stress using e.g. gradient-descent method.[13].To avoid the high computational of Sammon mapping,a modiﬁed Sammon map-ping algorithm is used in this work.The fuzzy Sammon mapping method uses the basic properties of fuzzy clustering algorithms where only the distance between the data points and the cluster centers are considered to be important[9].The modiﬁed algorithm takes into account only N×c distances,where c represents the number of clusters,weighted by the membership values.This means,in the projected two dimensional space every cluster is represented by a single point,independently to the form of the original cluster prototype.3Application of the Toolbox3.1Comparing the Clustering ResultsUsing the validity measures mentioned in Section 2.2the partitioning methods can be easily compared.For illustration,a synthetic data set was used shown in Fig.1,Fig.2so the index-values are better demarcated at each type of clustering.These validity measures are collected in Table 1.First of all it must be mentioned,that all these algorithms use random initializa-tion,so different runs issue in different partition results,i.e.values of the validation measures.On the other hand the results hardly depend from the structure of the data,and no validity index is perfect by itself for a clustering problem.Several experiment and evaluation are needed that are not the proposition of this work.PCCE SC S XB DI ADI K-means 1NaN 0.0950.00013987.40.01390.0004K-medoid 1NaN 0.24340.0003Inf 0.00370.0036FCM 0.82820.34700.92210.000819.66630.01750.0119GK0.83150.32750.86970.000932.12430.00810.0104GG 0.98340.02852.24510.0020 2.59830.01600.0084Table 1.The numerical values of validity measuresx1x 2x1x 2Fig.1.Result of k-means and k-medoid algorithms by the synthetic overlapping data with normalization.Fig.1shows that hard clustering methods also can ﬁnd a good solution for the clustering problem,when it is compared with the ﬁgures of fuzzy clustering algo-rithms.On the contrary in Fig.1one can see a typical example for the initialization problem of hard clustering.This caused the differences between the validity index values in Table 1,e.g.the Xie and Beni’s index is inﬁnity (in ”normal case”the k-medoid returns with almost the same results as K-means).The only differencex 2x1x 2x1x 2x1Fig.2.Result of FCM,GK and GG algorithms by the synthetic overlapping data with nor-malization.between the results of FCM and GK Fig.2stands in the shape of the clusters,while the Gustafson-Kessel algorithm can ﬁnd the elongated clusters better.Fig.2shows that the Gath–Geva algorithm returned with a result of three subspaces.As one can see in Table 1,PC and CE are not applicable for K-means and K-medoid,while they are hard clustering methods.But that is the reason for the best results in S,DI (and ADI),which are useful to validate crisp and well separated clusters.On the score of the values of the two ”most popular and used”indexes for fuzzy clustering (Partition Coefﬁcient and Xie and Beni’s Index)the Gath-Geva clustering has the very best results for this data set.3.2Visualization ResultsIn order to examine the performance of the proposed clustering methods a well-known multidimensional classiﬁcation benchmark problem is presented in this sec-tion:wine data.This data set comes from the UCI Repository of Machine Learning Databases.Cause of the too many data points there is no use to show the partition matrixes in tables,so the results of the n -dimensional clustering was projected into 2-dimension,and the 2-D results were plotted.Considering that projected ﬁgures are only approximations of the real partitioning results,the difference between the orig-inal and the projected partition matrix is also represented,and on the other hand one can observe the difference between the PCA,Sammon’s mapping and the Modiﬁed Sammon Mapping too,when these values are comprehended.The detailed projection methods are based the results of a clustering ing the proposed toolbox the best clustering algorithm can be chosen easily for this purpose.In the case of the wine data set the fuzzy c-means clustering has the best stable results correspond to the misclassiﬁed objects,so its resulting ﬁgures are shown in the following.The Wine data contains the chemical analysis of 178wines grown in the same region in Italy but derived from three different cultivars (marked with ’.’,’x’and ’+’).The problem is to distinguish the three different types based on 13continuous attributes derived from chemical analysis.y1y 2y1y2y 2y1Fig.3.Result of PCA,Sammon’s Mapping and Fuzzy Sammon Mapping projection by the Wine data set.P = U −U ∗∑N k =1µ2k ∑N k =1µ2∗k E PCA 0.12950.50330.74240.1301Sammon 0.08740.50330.65740.0576FuzSam 0.03650.50330.51700.0991Table 2.Relation-indexes on Wine data set.As Table 2shows,Fuzzy Sammon Mapping has much better projection results by the value of P ,which measures the difference between the original and projected membership matrices,than Principal component Analysis,and it is computationally cheaper than the original Sammon Mapping.So during the evaluation of the parti-tion theﬁgures created with this projection method were considered.We calculated the original Sammon’s stress for all the three techniques to be able to compare them. 4Web-based Version of The ToolboxThis section presents a solution for using the proposed toolbox without any down-loading and installation.It is an user friendly way via the Internet becouse the users do not need to have MATLAB and do not need to be competent in programming languages.Theﬁnal goal of data mining is the extraction useful information and knowledge from data.Knowledge is the ability of people to learn from information and react faster and better than their competitors.Devices and methods of data acquisition, management,analyzing and forwarding to the right places can be prime importance in nowadays intensive market competition.Corporations have to form their large databases,data warehouses and exter-nal sources to store knowledge.In the following we want to show the integration methodology of the data sources and the soft computing tools.We store data in databases or in a data warehouses(DW),where programs with well designed GUI support fast andﬂexible working.If we want to work with the stored data,we need the above front-end applications to connect the DW,so clent computers needs these installed applications.The basic idea is to use a web browser to analyze data of a complex system. Clients need only a web browser;the workﬂow application supplies the processing methods and the connection to the stored data.So the work of client programs can be fullﬁlled by developing web based work-ﬂow applications,which can be maintained easily with system administrators,and provides the simplicity of installed client programs.Our main goal is to provide a web–based user friendly interface to our toolbox. While Matlab Web Server is available in our Department,it is obvious to use the advantages of this technology.To develop dinamic user–friendly sites we use PHP1, to store user data we use MySQL server.Summarizing the technologies,PHP create dinamic sites,database store user settings and Matlab Web Server deals with data processing,so the following componets are needed to create a web based workﬂow system with MATLAB:•PHP interpreter(in CGI or in server modul format)•Web server(either Apache or IIS)•Database manager(MySQL)•Matlab Web Server•Web server1It’s free,and provides fast developmentFig.4.MATLAB Web Server ComponentsMatlab Web Server2is a complex system to online data processing.Figure4 shows how MATLAB operates over the Web.This structure is pieced together with the additional PHP applications,and the database storing.Using this structure there is no need to compile MATLAB algoritms with MAT-LAB–C compiler,which is a very handy feature,becouse the C compiler couldn’t be able to deal with structures,so it wasn’t able to use the resources of the Matlab programming language.Further beneﬁts of this architecture is that there is no need to deep re-structuring of the implemented algorimts,only the in and output s have to be well determined and the core process method is in a basic Mﬁle.Matlab Web Server do the rest,if it is invoked by a special html form.After processing the data Matlab Web Server can return the results with using a templateﬁle.5ConclusionsTo meet the growing demands of systematizing the nascent data,aﬂexible,powerful tools are needed.The Fuzzy Clustering and Data Analysis Toolbox provides several approaches to cluster,classify and evaluate wether industrial or experimental data sets.The software for these operations has been developed with MATLAB,which is very powerful for matrix-based calculations.The Toolbox providesﬁve different 2types of clustering algorithms,which can be validated by seven validity measures. High-dimensional data sets can be also visualized with a2-dimension projection, hence the toolbox contains three different method for visualization.The web based version of this tool does not require to have MATLAB and the users’computers will be free for other applications because the program runs on the web server and the results can be downloaded from the server.AcknowledgementThe authors would like to acknowledge the support of the Cooperative Research Centre(VIKKK)(project2004-I)and Hungarian Research Found(OTKA T049534). Janos Abonyi is grateful for the support of the Bolyai Research Fellowship of the Hungarian Academy of Sciences.References1.Vesanto,J.,Himberg,J.,Alhoniemi,E.and Parhankangas,J.,Self-organizing map inmatlab:the som toolbox,Proceedings of MATLAB DSP Conference,Espo,Finland (1999)pp.35–40.2.K.Murphy,The bayes net toolbox for matlab,Computing Science and Statistics.3.Br¨u ckner,S.,The Dimensional Analysis Toolbox for Matlab,in:User’s Manual,Stuttgart,/.,2002.4.Castellano,G.,Castiello,C.and Fanelli,A.M.,KERNEL:A Matlab toolbox for Knowl-edge Extraction and Reﬁnement by NEural Learning,2000.5.R.Babuska,Fuzzy modeling and identiﬁcation toolbox for matlab,Delft University ofTechnology:Faculty of Information Technology and Systems.6.W.Pedrycz,Fuzzy clustering with a knowledge-based guidance,Pattern RecognitionLetters25(2004)pp.469–480.7.Szeto,L.K.,Liew,A.W.C.,Yan,H.and Tang,S.S.,Gene expression data clustering andvisualization based on a binary hierarchical clustering framework,Journal of Visual Lan-guages and Computing14(4)(2003)pp.341–362.8.M.Barni,R.Gualtieri,A new possibilistic clustering algorithm for line detection in realworld imagery,Pattern Recognition32(11)(1999)pp.1897–1909.9.Abonyi,J.,Babuska,R.and Szeifert,F.,Modiﬁed Gath-Geva fuzzy clustering for iden-tiﬁcation of Takagi-Sugeno fuzzy models,IEEE Transactions on Systems,Man and Cy-bernetics,Part B-Cybernetics32(5)(2002)pp.612–621.10. D.Gustafson,W.Kessel,Fuzzy clustering with fuzzy covariance matrix,Proceedings ofthe IEEE CDC,San Diego(1979)pp.761–766.11.I.Gath,A.Geva,Unsupervised optimal fuzzy clustering,IEEE Transactions on PatternAnalysis and Machine Intelligence7(1989)pp.773–781.12.M.Setnes,Supervised fuzzy clustering for rule extraction,Proceedings of FUZZ-IEEE’99,Seoul,Korea,(1999)pp.1270–1274.13.J.J.Sammon,A nonlinear mapping for data structure analysis,IEEE Transactions onComputers18(1969)pp.401–409.。

国际自动化与计算杂志.英文版.

国际自动化与计算杂志.英文版.1.Improved Exponential Stability Criteria for Uncertain Neutral System with Nonlinear Parameter PerturbationsFang Qiu，Ban-Tong Cui2.Robust Active Suspension Design Subject to Vehicle Inertial Parameter VariationsHai-Ping Du，Nong Zhang3.Delay-dependent Non-fragile H∞ Filtering for Uncertain Fuzzy Systems Based on Switching Fuzzy Model and Piecewise Lyapunov FunctionZhi-Le Xia，Jun-Min Li，Jiang-Rong Li4.Observer-based Adaptive Iterative Learning Control for Nonlinear Systems with Time-varying DelaysWei-Sheng Chen，Rui-Hong Li，Jing Li5.H∞ Output Feedback Control for Stochastic Systems with Mode-dependent Time-varying Delays and Markovian Jump ParametersXu-Dong Zhao，Qing-Shuang Zeng6.Delay and Its Time-derivative Dependent Robust Stability of Uncertain Neutral Systems with Saturating ActuatorsFatima El Haoussi，El Houssaine Tissir7.Parallel Fuzzy P+Fuzzy I+Fuzzy D Controller:Design and Performance EvaluationVineet Kumar，A.P.Mittal8.Observers for Descriptor Systems with Slope-restricted NonlinearitiesLin-Na Zhou，Chun-Yu Yang，Qing-Ling Zhang9.Parameterized Solution to a Class of Sylvester MatrixEquationsYu-Peng Qiao，Hong-Sheng Qi，Dai-Zhan Cheng10.Indirect Adaptive Fuzzy and Impulsive Control of Nonlinear SystemsHai-Bo Jiang11.Robust Fuzzy Tracking Control for Nonlinear Networked Control Systems with Integral Quadratic ConstraintsZhi-Sheng Chen，Yong He，Min Wu12.A Power-and Coverage-aware Clustering Scheme for Wireless Sensor NetworksLiang Xue，Xin-Ping Guan，Zhi-Xin Liu，Qing-Chao Zheng13.Guaranteed Cost Active Fault-tolerant Control of Networked Control System with Packet Dropout and Transmission DelayXiao-Yuan Luo，Mei-Jie Shang，Cai-Lian Chen，Xin-Ping Guanparison of Two Novel MRAS Based Strategies for Identifying Parameters in Permanent Magnet Synchronous MotorsKan Liu，Qiao Zhang，Zi-Qiang Zhu，Jing Zhang，An-Wen Shen，Paul Stewart15.Modeling and Analysis of Scheduling for Distributed Real-time Embedded SystemsHai-Tao Zhang，Gui-Fang Wu16.Passive Steganalysis Based on Higher Order Image Statistics of Curvelet TransformS.Geetha，Siva S.Sivatha Sindhu，N.Kamaraj17.Movement Invariants-based Algorithm for Medical Image Tilt CorrectionMei-Sen Pan，Jing-Tian Tang，Xiao-Li Yang18.Target Tracking and Obstacle Avoidance for Multi-agent SystemsJing Yan，Xin-Ping Guan，Fu-Xiao Tan19.Automatic Generation of Optimally Rigid Formations Using Decentralized MethodsRui Ren，Yu-Yan Zhang，Xiao-Yuan Luo，Shao-Bao Li20.Semi-blind Adaptive Beamforming for High-throughput Quadrature Amplitude Modulation SystemsSheng Chen，Wang Yao，Lajos Hanzo21.Throughput Analysis of IEEE 802.11 Multirate WLANs with Collision Aware Rate Adaptation AlgorithmDhanasekaran Senthilkumar，A. Krishnan22.Innovative Product Design Based on Customer Requirement Weight Calculation ModelChen-Guang Guo，Yong-Xian Liu，Shou-Ming Hou，Wei Wang23.A Service Composition Approach Based on Sequence Mining for Migrating E-learning Legacy System to SOAZhuo Zhang，Dong-Dai Zhou，Hong-Ji Yang，Shao-Chun Zhong24.Modeling of Agile Intelligent Manufacturing-oriented Production Scheduling SystemZhong-Qi Sheng，Chang-Ping Tang，Ci-Xing Lv25.Estimation of Reliability and Cost Relationship for Architecture-based SoftwareHui Guan，Wei-Ru Chen，Ning Huang，Hong-Ji Yang1.A Computer-aided Design System for Framed-mould in Autoclave ProcessingTian-Guo Jin，Feng-Yang Bi2.Wear State Recognition of Drills Based on K-means Cluster and Radial Basis Function Neural NetworkXu Yang3.The Knee Joint Design and Control of Above-knee Intelligent Bionic Leg Based on Magneto-rheological DamperHua-Long Xie，Ze-Zhong Liang，Fei Li，Li-Xin Guo4.Modeling of Pneumatic Muscle with Shape Memory Alloy and Braided SleeveBin-Rui Wang，Ying-Lian Jin，Dong Wei5.Extended Object Model for Product Configuration DesignZhi-Wei Xu，Ze-Zhong Liang，Zhong-Qi Sheng6.Analysis of Sheet Metal Extrusion Process Using Finite Element MethodXin-Cun Zhuang，Hua Xiang，Zhen Zhao7.Implementation of Enterprises' Interoperation Based on OntologyXiao-Feng Di，Yu-Shun Fan8.Path Planning Approach in Unknown EnvironmentTing-Kai Wang，Quan Dang，Pei-Yuan Pan9.Sliding Mode Variable Structure Control for Visual Servoing SystemFei Li，Hua-Long Xie10.Correlation of Direct Piezoelectric Effect on EAPap under Ambient FactorsLi-Jie Zhao，Chang-Ping Tang，Peng Gong11.XML-based Data Processing in Network Supported Collaborative DesignQi Wang，Zhong-Wei Ren，Zhong-Feng Guo12.Production Management Modelling Based on MASLi He，Zheng-Hao Wang，Ke-Long Zhang13.Experimental Tests of Autonomous Ground Vehicles with PreviewCunjia Liu，Wen-Hua Chen，John Andrews14.Modelling and Remote Control of an ExcavatorYang Liu，Mohammad Shahidul Hasan，Hong-Nian Yu15.TOPSIS with Belief Structure for Group Belief Multiple Criteria Decision MakingJiang Jiang，Ying-Wu Chen，Da-Wei Tang，Yu-Wang Chen16.Video Analysis Based on Volumetric Event DetectionJing Wang，Zhi-Jie Xu17.Improving Decision Tree Performance by Exception HandlingAppavu Alias Balamurugan Subramanian，S.Pramala，B.Rajalakshmi，Ramasamy Rajaram18.Robustness Analysis of Discrete-time Indirect Model Reference Adaptive Control with Normalized Adaptive LawsQing-Zheng Gao，Xue-Jun Xie19.A Novel Lifecycle Model for Web-based Application Development in Small and Medium EnterprisesWei Huang，Ru Li，Carsten Maple，Hong-Ji Yang，David Foskett，Vince Cleaver20.Design of a Two-dimensional Recursive Filter Using the Bees AlgorithmD. T. Pham，Ebubekir Ko(c)21.Designing Genetic Regulatory Networks Using Fuzzy Petri Nets ApproachRaed I. Hamed，Syed I. Ahson，Rafat Parveen1.State of the Art and Emerging Trends in Operations and Maintenance of Offshore Oil and Gas Production Facilities: Some Experiences and ObservationsJayantha P.Liyanage2.Statistical Safety Analysis of Maintenance Management Process of Excavator UnitsLjubisa Papic，Milorad Pantelic，Joseph Aronov，Ajit Kumar Verma3.Improving Energy and Power Efficiency Using NComputing and Approaches for Predicting Reliability of Complex Computing SystemsHoang Pham，Hoang Pham Jr.4.Running Temperature and Mechanical Stability of Grease as Maintenance Parameters of Railway BearingsJan Lundberg，Aditya Parida，Peter S(o)derholm5.Subsea Maintenance Service Delivery: Mapping Factors Influencing Scheduled Service DurationEfosa Emmanuel Uyiomendo，Tore Markeset6.A Systemic Approach to Integrated E-maintenance of Large Engineering PlantsAjit Kumar Verma，A.Srividya，P.G.Ramesh7.Authentication and Access Control in RFID Based Logistics-customs Clearance Service PlatformHui-Fang Deng，Wen Deng，Han Li，Hong-Ji Yang8.Evolutionary Trajectory Planning for an Industrial RobotR.Saravanan，S.Ramabalan，C.Balamurugan，A.Subash9.Improved Exponential Stability Criteria for Recurrent Neural Networks with Time-varying Discrete and Distributed DelaysYuan-Yuan Wu，Tao Li，Yu-Qiang Wu10.An Improved Approach to Delay-dependent Robust Stabilization for Uncertain Singular Time-delay SystemsXin Sun，Qing-Ling Zhang，Chun-Yu Yang，Zhan Su，Yong-Yun Shao11.Robust Stability of Nonlinear Plants with a Non-symmetric Prandtl-Ishlinskii Hysteresis ModelChang-An Jiang，Ming-Cong Deng，Akira Inoue12.Stability Analysis of Discrete-time Systems with Additive Time-varying DelaysXian-Ming Tang，Jin-Shou Yu13.Delay-dependent Stability Analysis for Markovian Jump Systems with Interval Time-varying-delaysXu-Dong Zhao，Qing-Shuang Zeng14.H∞ Synchronization of Chaotic Systems via Delayed Feedback ControlLi Sheng，Hui-Zhong Yang15.Adaptive Fuzzy Observer Backstepping Control for a Class of Uncertain Nonlinear Systems with Unknown Time-delayShao-Cheng Tong，Ning Sheng16.Simulation-based Optimal Design of α-β-γ-δ FilterChun-Mu Wu，Paul P.Lin，Zhen-Yu Han，Shu-Rong Li17.Independent Cycle Time Assignment for Min-max SystemsWen-De Chen，Yue-Gang Tao，Hong-Nian Yu1.An Assessment Tool for Land Reuse with Artificial Intelligence MethodDieter D. Genske，Dongbin Huang，Ariane Ruff2.Interpolation of Images Using Discrete Wavelet Transform to Simulate Image Resizing as in Human VisionRohini S. Asamwar，Kishor M. Bhurchandi，Abhay S. Gandhi3.Watermarking of Digital Images in Frequency DomainSami E. I. Baba，Lala Z. Krikor，Thawar Arif，Zyad Shaaban4.An Effective Image Retrieval Mechanism Using Family-based Spatial Consistency Filtration with Object RegionJing Sun，Ying-Jie Xing5.Robust Object Tracking under Appearance Change ConditionsQi-Cong Wang，Yuan-Hao Gong，Chen-Hui Yang，Cui-Hua Li6.A Visual Attention Model for Robot Object TrackingJin-Kui Chu，Rong-Hua Li，Qing-Ying Li，Hong-Qing Wang7.SVM-based Identification and Un-calibrated Visual Servoing for Micro-manipulationXin-Han Huang，Xiang-Jin Zeng，Min Wang8.Action Control of Soccer Robots Based on Simulated Human IntelligenceTie-Jun Li，Gui-Qiang Chen，Gui-Fang Shao9.Emotional Gait Generation for a Humanoid RobotLun Xie，Zhi-Liang Wang，Wei Wang，Guo-Chen Yu10.Cultural Algorithm for Minimization of Binary Decision Diagram and Its Application in Crosstalk Fault DetectionZhong-Liang Pan，Ling Chen，Guang-Zhao Zhang11.A Novel Fuzzy Direct Torque Control System for Three-level Inverter-fed Induction MachineShu-Xi Liu，Ming-Yu Wang，Yu-Guang Chen，Shan Li12.Statistic Learning-based Defect Detection for Twill FabricsLi-Wei Han，De Xu13.Nonsaturation Throughput Enhancement of IEEE 802.11b Distributed Coordination Function for Heterogeneous Traffic under Noisy EnvironmentDhanasekaran Senthilkumar，A. Krishnan14.Structure and Dynamics of Artificial Regulatory Networks Evolved by Segmental Duplication and Divergence ModelXiang-Hong Lin，Tian-Wen Zhang15.Random Fuzzy Chance-constrained Programming Based on Adaptive Chaos Quantum Honey Bee Algorithm and Robustness AnalysisHan Xue，Xun Li，Hong-Xu Ma16.A Bit-level Text Compression Scheme Based on the ACW AlgorithmHussein A1-Bahadili，Shakir M. Hussain17.A Note on an Economic Lot-sizing Problem with Perishable Inventory and Economies of Scale Costs:Approximation Solutions and Worst Case AnalysisQing-Guo Bai，Yu-Zhong Zhang，Guang-Long Dong1.Virtual Reality: A State-of-the-Art SurveyNing-Ning Zhou，Yu-Long Deng2.Real-time Virtual Environment Signal Extraction and DenoisingUsing Programmable Graphics HardwareYang Su，Zhi-Jie Xu，Xiang-Qian Jiang3.Effective Virtual Reality Based Building Navigation Using Dynamic Loading and Path OptimizationQing-Jin Peng，Xiu-Mei Kang，Ting-Ting Zhao4.The Skin Deformation of a 3D Virtual HumanXiao-Jing Zhou，Zheng-Xu Zhao5.Technology for Simulating Crowd Evacuation BehaviorsWen-Hu Qin，Guo-Hui Su，Xiao-Na Li6.Research on Modelling Digital Paper-cut PreservationXiao-Fen Wang，Ying-Rui Liu，Wen-Sheng Zhang7.On Problems of Multicomponent System Maintenance ModellingTomasz Nowakowski，Sylwia Werbinka8.Soft Sensing Modelling Based on Optimal Selection of Secondary Variables and Its ApplicationQi Li，Cheng Shao9.Adaptive Fuzzy Dynamic Surface Control for Uncertain Nonlinear SystemsXiao-Yuan Luo，Zhi-Hao Zhu，Xin-Ping Guan10.Output Feedback for Stochastic Nonlinear Systems with Unmeasurable Inverse DynamicsXin Yu，Na Duan11.Kalman Filtering with Partial Markovian Packet LossesBao-Feng Wang，Ge Guo12.A Modified Projection Method for Linear FeasibilityProblemsYi-Ju Wang，Hong-Yu Zhang13.A Neuro-genetic Based Short-term Forecasting Framework for Network Intrusion Prediction SystemSiva S. Sivatha Sindhu，S. Geetha，M. Marikannan，A. Kannan14.New Delay-dependent Global Asymptotic Stability Condition for Hopfield Neural Networks with Time-varying DelaysGuang-Deng Zong，Jia Liu hHTTp://15.Crosscumulants Based Approaches for the Structure Identification of Volterra ModelsHouda Mathlouthi，Kamel Abederrahim，Faouzi Msahli，Gerard Favier1.Coalition Formation in Weighted Simple-majority Games under Proportional Payoff Allocation RulesZhi-Gang Cao，Xiao-Guang Yang2.Stability Analysis for Recurrent Neural Networks with Time-varying DelayYuan-Yuan Wu，Yu-Qiang Wu3.A New Type of Solution Method for the Generalized Linear Complementarity Problem over a Polyhedral ConeHong-Chun Sun，Yan-Liang Dong4.An Improved Control Algorithm for High-order Nonlinear Systems with Unmodelled DynamicsNa Duan，Fu-Nian Hu，Xin Yu5.Controller Design of High Order Nonholonomic System with Nonlinear DriftsXiu-Yun Zheng，Yu-Qiang Wu6.Directional Filter for SAR Images Based on NonsubsampledContourlet Transform and Immune Clonal SelectionXiao-Hui Yang，Li-Cheng Jiao，Deng-Feng Li7.Text Extraction and Enhancement of Binary Images Using Cellular AutomataG. Sahoo，Tapas Kumar，B.L. Rains，C.M. Bhatia8.GH2 Control for Uncertain Discrete-time-delay Fuzzy Systems Based on a Switching Fuzzy Model and Piecewise Lyapunov FunctionZhi-Le Xia，Jun-Min Li9.A New Energy Optimal Control Scheme for a Separately Excited DC Motor Based Incremental Motion DriveMilan A.Sheta，Vivek Agarwal，Paluri S.V.Nataraj10.Nonlinear Backstepping Ship Course ControllerAnna Witkowska，Roman Smierzchalski11.A New Method of Embedded Fourth Order with Four Stages to Study Raster CNN SimulationR. Ponalagusamy，S. Senthilkumar12.A Minimum-energy Path-preserving Topology Control Algorithm for Wireless Sensor NetworksJin-Zhao Lin，Xian Zhou，Yun Li13.Synchronization and Exponential Estimates of Complex Networks with Mixed Time-varying Coupling DelaysYang Dai，YunZe Cai，Xiao-Ming Xu14.Step-coordination Algorithm of Traffic Control Based on Multi-agent SystemHai-Tao Zhang，Fang Yu，Wen Li15.A Research of the Employment Problem on Common Job-seekersand GraduatesBai-Da Qu。

SPSS术语中英文对照

SPSS术语中英文对照【常用软件】SPSS术语中英文对照Absolute deviation, 绝对离差Absolute number, 绝对数Absolute residuals, 绝对残差Acceleration array, 加速度立体阵Acceleration in an arbitrary direction, 任意方向上的加速度Acceleration normal, 法向加速度Acceleration space dimension, 加速度空间的维数Acceleration tangential, 切向加速度Acceleration vector, 加速度向量Acceptable hypothesis, 可接受假设Accumulation, 累积Accuracy, 准确度Actual frequency, 实际频数Adaptive estimator, 自适应估计量Addition, 相加Addition theorem, 加法定理Additivity, 可加性Adjusted rate, 调整率Adjusted value, 校正值Admissible error, 容许误差Aggregation, 聚集性Alternative hypothesis, 备择假设Among groups, 组间Amounts, 总量Analysis of correlation, 相关分析Analysis of covariance, 协方差分析Analysis of regression, 回归分析Analysis of time series, 时间序列分析Analysis of variance, 方差分析Angular transformation, 角转换ANOVA （analysis of variance）, 方差分析ANOVA Models, 方差分析模型Arcing, 弧/弧旋Arcsine transformation, 反正弦变换Area under the curve, 曲线面积AREG , 评估从一个时间点到下一个时间点回归相关时的误差ARIMA, 季节和非季节性单变量模型的极大似然估计Arithmetic grid paper, 算术格纸Arithmetic mean, 算术平均数Arrhenius relation, 艾恩尼斯关系Assessing fit, 拟合的评估Associative laws, 结合律Asymmetric distribution, 非对称分布Asymptotic bias, 渐近偏倚Asymptotic efficiency, 渐近效率Asymptotic variance, 渐近方差Attributable risk, 归因危险度Attribute data, 属性资料Attribution, 属性Autocorrelation, 自相关Autocorrelation of residuals, 残差的自相关Average, 平均数Average confidence interval length, 平均置信区间长度Average growth rate, 平均增长率Bar chart, 条形图Bar graph, 条形图Base period, 基期Bayes' theorem , Bayes定理Bell-shaped curve, 钟形曲线Bernoulli distribution, 伯努力分布Best-trim estimator, 最好切尾估计量Bias, 偏性Binary logistic regression, 二元逻辑斯蒂回归Binomial distribution, 二项分布Bisquare, 双平方Bivariate Correlate, 二变量相关Bivariate normal distribution, 双变量正态分布Bivariate normal population, 双变量正态总体Biweight interval, 双权区间Biweight M-estimator, 双权M估计量Block, 区组/配伍组BMDP(Biomedical computer programs), BMDP统计软件包Boxplots, 箱线图/箱尾图Breakdown bound, 崩溃界/崩溃点Canonical correlation, 典型相关Caption, 纵标目Case-control study, 病例对照研究Categorical variable, 分类变量Catenary, 悬链线Cauchy distribution, 柯西分布Cause-and-effect relationship, 因果关系Cell, 单元Censoring, 终检Center of symmetry, 对称中心Centering and scaling, 中心化和定标Central tendency, 集中趋势Central value, 中心值CHAID -χ2 Automatic Interac tion Detector, 卡方自动交互检测Chance, 机遇Chance error, 随机误差Chance variable, 随机变量Characteristic equation, 特征方程Characteristic root, 特征根Characteristic vector, 特征向量Chebshev criterion of fit, 拟合的切比雪夫准则Chernoff faces, 切尔诺夫脸谱图Chi-square test, 卡方检验/χ2检验Choleskey decomposition, 乔洛斯基分解Circle chart, 圆图Class interval, 组距Class mid-value, 组中值Class upper limit, 组上限Classified variable, 分类变量Cluster analysis, 聚类分析Cluster sampling, 整群抽样Code, 代码Coded data, 编码数据Coding, 编码Coefficient of contingency, 列联系数Coefficient of determination, 决定系数Coefficient of multiple correlation, 多重相关系数Coefficient of partial correlation, 偏相关系数Coefficient of production-moment correlation, 积差相关系数Coefficient of rank correlation, 等级相关系数Coefficient of regression, 回归系数Coefficient of skewness, 偏度系数Coefficient of variation, 变异系数Cohort study, 队列研究Column, 列Column effect, 列效应Column factor, 列因素Combination pool, 合并Combinative table, 组合表Common factor, 共性因子Common regression coefficient, 公共回归系数Common value, 共同值Common variance, 公共方差Common variation, 公共变异Communality variance, 共性方差Comparability, 可比性Comparison of bathes, 批比较Comparison value, 比较值Compartment model, 分部模型Compassion, 伸缩Complement of an event, 补事件Complete association, 完全正相关Complete dissociation, 完全不相关Complete statistics, 完备统计量Completely randomized design, 完全随机化设计Composite event, 联合事件Composite events, 复合事件Concavity, 凹性Conditional expectation, 条件期望Conditional likelihood, 条件似然Conditional probability, 条件概率Conditionally linear, 依条件线性Confidence interval, 置信区间Confidence limit, 置信限Confidence lower limit, 置信下限Confidence upper limit, 置信上限Confirmatory Factor Analysis , 验证性因子分析Confirmatory research, 证实性实验研究Confounding factor, 混杂因素Conjoint, 联合分析Consistency, 相合性Consistency check, 一致性检验Consistent asymptotically normal estimate, 相合渐近正态估计Consistent estimate, 相合估计Constrained nonlinear regression, 受约束非线性回归Constraint, 约束Contaminated distribution, 污染分布Contaminated Gausssian, 污染高斯分布Contaminated normal distribution, 污染正态分布Contamination, 污染Contamination model, 污染模型Contingency table, 列联表Contour, 边界线Contribution rate, 贡献率Control, 对照Controlled experiments, 对照实验Conventional depth, 常规深度Convolution, 卷积Corrected factor, 校正因子Corrected mean, 校正均值Correction coefficient, 校正系数Correctness, 正确性Correlation coefficient, 相关系数Correlation index, 相关指数Correspondence, 对应Counting, 计数Counts, 计数/频数Covariance, 协方差Covariant, 共变Cox Regression, Cox回归Criteria for fitting, 拟合准则Criteria of least squares, 最小二乘准则Critical ratio, 临界比Critical region, 拒绝域Critical value, 临界值Cross-over design, 交叉设计Cross-section analysis, 横断面分析Cross-section survey, 横断面调查Crosstabs , 交叉表Cross-tabulation table, 复合表Cube root, 立方根Cumulative distribution function, 分布函数Cumulative probability, 累计概率Curvature, 曲率/弯曲Curvature, 曲率Curve fit , 曲线拟和Curve fitting, 曲线拟合Curvilinear regression, 曲线回归Curvilinear relation, 曲线关系Cut-and-try method, 尝试法Cycle, 周期Cyclist, 周期性D test, D检验Data acquisition, 资料收集Data bank, 数据库Data capacity, 数据容量Data deficiencies, 数据缺乏Data handling, 数据处理Data manipulation, 数据处理Data processing, 数据处理Data reduction, 数据缩减Data set, 数据集Data sources, 数据来源Data transformation, 数据变换Data validity, 数据有效性Data-in, 数据输入Data-out, 数据输出Dead time, 停滞期Degree of freedom, 自由度Degree of precision, 精密度Degree of reliability, 可靠性程度Degression, 递减Density function, 密度函数Density of data points, 数据点的密度Dependent variable, 应变量/依变量/因变量Dependent variable, 因变量Depth, 深度Derivative matrix, 导数矩阵Derivative-free methods, 无导数方法Design, 设计Determinacy, 确定性Determinant, 行列式Determinant, 决定因素Deviation, 离差Deviation from average, 离均差Diagnostic plot, 诊断图Dichotomous variable, 二分变量Differential equation, 微分方程Direct standardization, 直接标准化法Discrete variable, 离散型变量DISCRIMINANT, 判断Discriminant analysis, 判别分析Discriminant coefficient, 判别系数Discriminant function, 判别值Dispersion, 散布/分散度Disproportional, 不成比例的Disproportionate sub-class numbers, 不成比例次级组含量Distribution free, 分布无关性/免分布Distribution shape, 分布形状Distribution-free method, 任意分布法Distributive laws, 分配律Disturbance, 随机扰动项Dose response curve, 剂量反应曲线Double blind method, 双盲法Double blind trial, 双盲试验Double exponential distribution, 双指数分布Double logarithmic, 双对数Downward rank, 降秩Dual-space plot, 对偶空间图DUD, 无导数方法Duncan's new multiple range method, 新复极差法/Duncan新法Effect, 实验效应Eigenvalue, 特征值Eigenvector, 特征向量Ellipse, 椭圆Empirical distribution, 经验分布Empirical probability, 经验概率单位Enumeration data, 计数资料Equal sun-class number, 相等次级组含量Equally likely, 等可能Equivariance, 同变性Error, 误差/错误Error of estimate, 估计误差Error type I, 第一类错误Error type II, 第二类错误Estimand, 被估量Estimated error mean squares, 估计误差均方Estimated error sum of squares, 估计误差平方和Euclidean distance, 欧式距离Event, 事件Event, 事件Exceptional data point, 异常数据点Expectation plane, 期望平面Expectation surface, 期望曲面Expected values, 期望值Experiment, 实验Experimental sampling, 试验抽样Experimental unit, 试验单位Explanatory variable, 说明变量Exploratory data analysis, 探索性数据分析Explore Summarize, 探索-摘要Exponential curve, 指数曲线Exponential growth, 指数式增长EXSMOOTH, 指数平滑方法Extended fit, 扩充拟合Extra parameter, 附加参数Extrapolation, 外推法Extreme observation, 末端观测值Extremes, 极端值/极值F distribution, F分布F test, F检验Factor, 因素/因子Factor analysis, 因子分析Factor Analysis, 因子分析Factor score, 因子得分Factorial, 阶乘Factorial design, 析因试验设计False negative, 假阴性False negative error, 假阴性错误Family of distributions, 分布族Family of estimators, 估计量族Fanning, 扇面Fatality rate, 病死率Field investigation, 现场调查Field survey, 现场调查Finite population, 有限总体Finite-sample, 有限样本First derivative, 一阶导数First principal component, 第一主成分First quartile, 第一四分位数Fisher information, 费雪信息量Fitted value, 拟合值Fitting a curve, 曲线拟合Fixed base, 定基Fluctuation, 随机起伏Forecast, 预测Four fold table, 四格表Fourth, 四分点Fraction blow, 左侧比率Fractional error, 相对误差Frequency, 频率Frequency polygon, 频数多边图Frontier point, 界限点Function relationship, 泛函关系Gamma distribution, 伽玛分布Gauss increment, 高斯增量Gaussian distribution, 高斯分布/正态分布Gauss-Newton increment, 高斯-牛顿增量General census, 全面普查GENLOG (Generalized liner models), 广义线性模型Geometric mean, 几何平均数Gini's mean difference, 基尼均差GLM (General liner models), 一般线性模型Goodness of fit, 拟和优度/配合度Gradient of determinant, 行列式的梯度Graeco-Latin square, 希腊拉丁方Grand mean, 总均值Gross errors, 重大错误Gross-error sensitivity, 大错敏感度Group averages, 分组平均Grouped data, 分组资料Guessed mean, 假定平均数Half-life, 半衰期Hampel M-estimators, 汉佩尔M估计量Happenstance, 偶然事件Harmonic mean, 调和均数Hazard function, 风险均数Hazard rate, 风险率Heading, 标目Heavy-tailed distribution, 重尾分布Hessian array, 海森立体阵Heterogeneity, 不同质Heterogeneity of variance, 方差不齐Hierarchical classification, 组内分组Hierarchical clustering method, 系统聚类法High-leverage point, 高杠杆率点HILOGLINEAR, 多维列联表的层次对数线性模型Hinge, 折叶点Histogram, 直方图Historical cohort study, 历史性队列研究Holes, 空洞HOMALS, 多重响应分析Homogeneity of variance, 方差齐性Homogeneity test, 齐性检验Huber M-estimators, 休伯M估计量Hyperbola, 双曲线Hypothesis testing, 假设检验Hypothetical universe, 假设总体Impossible event, 不可能事件Independence, 独立性Independent variable, 自变量Index, 指标/指数Indirect standardization, 间接标准化法Individual, 个体Inference band, 推断带Infinite population, 无限总体Infinitely great, 无穷大Infinitely small, 无穷小Influence curve, 影响曲线Information capacity, 信息容量Initial condition, 初始条件Initial estimate, 初始估计值Initial level, 最初水平Interaction, 交互作用Interaction terms, 交互作用项Intercept, 截距Interpolation, 内插法Interquartile range, 四分位距Interval estimation, 区间估计Intervals of equal probability, 等概率区间Intrinsic curvature, 固有曲率Invariance, 不变性Inverse matrix, 逆矩阵Inverse probability, 逆概率Inverse sine transformation, 反正弦变换Iteration, 迭代Jacobian determinant, 雅可比行列式Joint distribution function, 分布函数Joint probability, 联合概率Joint probability distribution, 联合概率分布K means method, 逐步聚类法Kaplan-Meier, 评估事件的时间长度Kaplan-Merier chart, Kaplan-Merier图Kendall's rank correlation, Kendall等级相关Kinetic, 动力学Kolmogorov-Smirnove test, 柯尔莫哥洛夫-斯米尔诺夫检验Kruskal and Wallis test, Kruskal及Wallis检验/多样本的秩和检验/H检验Kurtosis, 峰度Lack of fit, 失拟Ladder of powers, 幂阶梯Lag, 滞后Large sample, 大样本Large sample test, 大样本检验Latin square, 拉丁方Latin square design, 拉丁方设计Leakage, 泄漏Least favorable configuration, 最不利构形Least favorable distribution, 最不利分布Least significant difference, 最小显著差法Least square method, 最小二乘法Least-absolute-residuals estimates, 最小绝对残差估计Least-absolute-residuals fit, 最小绝对残差拟合Least-absolute-residuals line, 最小绝对残差线Legend, 图例L-estimator, L估计量L-estimator of location, 位置L估计量L-estimator of scale, 尺度L估计量Level, 水平Life expectance, 预期期望寿命Life table, 寿命表Life table method, 生命表法Light-tailed distribution, 轻尾分布Likelihood function, 似然函数Likelihood ratio, 似然比line graph, 线图Linear correlation, 直线相关Linear equation, 线性方程Linear programming, 线性规划Linear regression, 直线回归Linear Regression, 线性回归Linear trend, 线性趋势Loading, 载荷Location and scale equivariance, 位置尺度同变性Location equivariance, 位置同变性Location invariance, 位置不变性Location scale family, 位置尺度族Log rank test, 时序检验Logarithmic curve, 对数曲线Logarithmic normal distribution, 对数正态分布Logarithmic scale, 对数尺度Logarithmic transformation, 对数变换Logic check, 逻辑检查Logistic distribution, 逻辑斯特分布Logit transformation, Logit转换LOGLINEAR, 多维列联表通用模型Lognormal distribution, 对数正态分布Lost function, 损失函数Low correlation, 低度相关Lower limit, 下限Lowest-attained variance, 最小可达方差LSD, 最小显著差法的简称Lurking variable, 潜在变量Main effect, 主效应Major heading, 主辞标目Marginal density function, 边缘密度函数Marginal probability, 边缘概率Marginal probability distribution, 边缘概率分布Matched data, 配对资料Matched distribution, 匹配过分布Matching of distribution, 分布的匹配Matching of transformation, 变换的匹配Mathematical expectation, 数学期望Mathematical model, 数学模型Maximum L-estimator, 极大极小L 估计量Maximum likelihood method, 最大似然法Mean, 均数Mean squares between groups, 组间均方Mean squares within group, 组内均方Means (Compare means), 均值-均值比较Median, 中位数Median effective dose, 半数效量Median lethal dose, 半数致死量Median polish, 中位数平滑Median test, 中位数检验Minimal sufficient statistic, 最小充分统计量Minimum distance estimation, 最小距离估计Minimum effective dose, 最小有效量Minimum lethal dose, 最小致死量Minimum variance estimator, 最小方差估计量MINITAB, 统计软件包Minor heading, 宾词标目Missing data, 缺失值Model specification, 模型的确定Modeling Statistics , 模型统计Models for outliers, 离群值模型Modifying the model, 模型的修正Modulus of continuity, 连续性模Morbidity, 发病率Most favorable configuration, 最有利构形Multidimensional Scaling (ASCAL), 多维尺度/多维标度Multinomial Logistic Regression , 多项逻辑斯蒂回归Multiple comparison, 多重比较Multiple correlation , 复相关Multiple covariance, 多元协方差Multiple linear regression, 多元线性回归Multiple response , 多重选项Multiple solutions, 多解Multiplication theorem, 乘法定理Multiresponse, 多元响应Multi-stage sampling, 多阶段抽样Multivariate T distribution, 多元T分布Mutual exclusive, 互不相容Mutual independence, 互相独立Natural boundary, 自然边界Natural dead, 自然死亡Natural zero, 自然零Negative correlation, 负相关Negative linear correlation, 负线性相关Negatively skewed, 负偏Newman-Keuls method, q检验NK method, q检验No statistical significance, 无统计意义Nominal variable, 名义变量Nonconstancy of variability, 变异的非定常性Nonlinear regression, 非线性相关Nonparametric statistics, 非参数统计Nonparametric test, 非参数检验Nonparametric tests, 非参数检验Normal deviate, 正态离差Normal distribution, 正态分布Normal equation, 正规方程组Normal ranges, 正常范围Normal value, 正常值Nuisance parameter, 多余参数/讨厌参数Null hypothesis, 无效假设Numerical variable, 数值变量Objective function, 目标函数Observation unit, 观察单位Observed value, 观察值One sided test, 单侧检验One-way analysis of variance, 单因素方差分析Oneway ANOVA , 单因素方差分析Open sequential trial, 开放型序贯设计Optrim, 优切尾Optrim efficiency, 优切尾效率Order statistics, 顺序统计量Ordered categories, 有序分类Ordinal logistic regression , 序数逻辑斯蒂回归Ordinal variable, 有序变量Orthogonal basis, 正交基Orthogonal design, 正交试验设计Orthogonality conditions, 正交条件ORTHOPLAN, 正交设计Outlier cutoffs, 离群值截断点Outliers, 极端值OVERALS , 多组变量的非线性正规相关Overshoot, 迭代过度Paired design, 配对设计Paired sample, 配对样本Pairwise slopes, 成对斜率Parabola, 抛物线Parallel tests, 平行试验Parameter, 参数Parametric statistics, 参数统计Parametric test, 参数检验Partial correlation, 偏相关Partial regression, 偏回归Partial sorting, 偏排序Partials residuals, 偏残差Pattern, 模式Pearson curves, 皮尔逊曲线Peeling, 退层Percent bar graph, 百分条形图Percentage, 百分比Percentile, 百分位数Percentile curves, 百分位曲线Periodicity, 周期性Permutation, 排列P-estimator, P估计量Pie graph, 饼图Pitman estimator, 皮特曼估计量Pivot, 枢轴量Planar, 平坦Planar assumption, 平面的假设PLANCARDS, 生成试验的计划卡Point estimation, 点估计Poisson distribution, 泊松分布Polishing, 平滑Polled standard deviation, 合并标准差Polled variance, 合并方差Polygon, 多边图Polynomial, 多项式Polynomial curve, 多项式曲线Population, 总体Population attributable risk, 人群归因危险度Positive correlation, 正相关Positively skewed, 正偏Posterior distribution, 后验分布Power of a test, 检验效能Precision, 精密度Predicted value, 预测值Preliminary analysis, 预备性分析Principal component analysis, 主成分分析Prior distribution, 先验分布Prior probability, 先验概率Probabilistic model, 概率模型probability, 概率Probability density, 概率密度Product moment, 乘积矩/协方差Profile trace, 截面迹图Proportion, 比/构成比Proportion allocation in stratified random sampling, 按比例分层随机抽样Proportionate, 成比例Proportionate sub-class numbers, 成比例次级组含量Prospective study, 前瞻性调查Proximities, 亲近性Pseudo F test, 近似F检验Pseudo model, 近似模型Pseudosigma, 伪标准差Purposive sampling, 有目的抽样QR decomposition, QR分解Quadratic approximation, 二次近似Qualitative classification, 属性分类Qualitative method, 定性方法Quantile-quantile plot, 分位数-分位数图/Q-Q图Quantitative analysis, 定量分析Quartile, 四分位数Quick Cluster, 快速聚类Radix sort, 基数排序Random allocation, 随机化分组Random blocks design, 随机区组设计Random event, 随机事件Randomization, 随机化Range, 极差/全距Rank correlation, 等级相关Rank sum test, 秩和检验Rank test, 秩检验Ranked data, 等级资料Rate, 比率Ratio, 比例Raw data, 原始资料Raw residual, 原始残差Rayleigh's test, 雷氏检验Rayleigh's Z, 雷氏Z值Reciprocal, 倒数Reciprocal transformation, 倒数变换Recording, 记录Redescending estimators, 回降估计量Reducing dimensions, 降维Re-expression, 重新表达Reference set, 标准组Region of acceptance, 接受域Regression coefficient, 回归系数Regression sum of square, 回归平方和Rejection point, 拒绝点Relative dispersion, 相对离散度Relative number, 相对数Reliability, 可靠性Reparametrization, 重新设置参数Replication, 重复Report Summaries, 报告摘要Residual sum of square, 剩余平方和Resistance, 耐抗性Resistant line, 耐抗线Resistant technique, 耐抗技术R-estimator of location, 位置R估计量R-estimator of scale, 尺度R估计量Retrospective study, 回顾性调查Ridge trace, 岭迹Ridit analysis, Ridit分析Rotation, 旋转Rounding, 舍入Row, 行Row effects, 行效应Row factor, 行因素RXC table, RXC表Sample, 样本Sample regression coefficient, 样本回归系数Sample size, 样本量Sample standard deviation, 样本标准差Sampling error, 抽样误差SAS(Statistical analysis system ), SAS统计软件包Scale, 尺度/量表Scatter diagram, 散点图Schematic plot, 示意图/简图Score test, 计分检验Screening, 筛检SEASON, 季节分析Second derivative, 二阶导数Second principal component, 第二主成分SEM (Structural equation modeling), 结构化方程模型Semi-logarithmic graph, 半对数图Semi-logarithmic paper, 半对数格纸Sensitivity curve, 敏感度曲线Sequential analysis, 贯序分析Sequential data set, 顺序数据集Sequential design, 贯序设计Sequential method, 贯序法Sequential test, 贯序检验法Serial tests, 系列试验Short-cut method, 简捷法Sigmoid curve, S形曲线Sign function, 正负号函数Sign test, 符号检验Signed rank, 符号秩Significance test, 显著性检验Significant figure, 有效数字Simple cluster sampling, 简单整群抽样Simple correlation, 简单相关Simple random sampling, 简单随机抽样Simple regression, 简单回归simple table, 简单表Sine estimator, 正弦估计量Single-valued estimate, 单值估计Singular matrix, 奇异矩阵Skewed distribution, 偏斜分布Skewness, 偏度Slash distribution, 斜线分布Slope, 斜率Smirnov test, 斯米尔诺夫检验Source of variation, 变异来源Spearman rank correlation, 斯皮尔曼等级相关Specific factor, 特殊因子Specific factor variance, 特殊因子方差Spectra , 频谱Spherical distribution, 球型正态分布Spread, 展布SPSS(Statistical package for the social science), SPSS统计软件包Spurious correlation, 假性相关Square root transformation, 平方根变换Stabilizing variance, 稳定方差Standard deviation, 标准差Standard error, 标准误Standard error of difference, 差别的标准误Standard error of estimate, 标准估计误差Standard error of rate, 率的标准误Standard normal distribution, 标准正态分布Standardization, 标准化Starting value, 起始值Statistic, 统计量Statistical control, 统计控制Statistical graph, 统计图Statistical inference, 统计推断Statistical table, 统计表Steepest descent, 最速下降法Stem and leaf display, 茎叶图Step factor, 步长因子Stepwise regression, 逐步回归Storage, 存Strata, 层（复数）Stratified sampling, 分层抽样Stratified sampling, 分层抽样Strength, 强度Stringency, 严密性Structural relationship, 结构关系Studentized residual, 学生化残差/t化残差Sub-class numbers, 次级组含量Subdividing, 分割Sufficient statistic, 充分统计量Sum of products, 积和Sum of squares, 离差平方和Sum of squares about regression, 回归平方和Sum of squares between groups, 组间平方和Sum of squares of partial regression, 偏回归平方和Sure event, 必然事件Survey, 调查Survival, 生存分析Survival rate, 生存率Suspended root gram, 悬吊根图Symmetry, 对称Systematic error, 系统误差Systematic sampling, 系统抽样Tags, 标签Tail area, 尾部面积Tail length, 尾长Tail weight, 尾重Tangent line, 切线Target distribution, 目标分布Taylor series, 泰勒级数Tendency of dispersion, 离散趋势Testing of hypotheses, 假设检验Theoretical frequency, 理论频数Time series, 时间序列Tolerance interval, 容忍区间Tolerance lower limit, 容忍下限Tolerance upper limit, 容忍上限Torsion, 扰率Total sum of square, 总平方和Total variation, 总变异Transformation, 转换Treatment, 处理Trend, 趋势Trend of percentage, 百分比趋势Trial, 试验Trial and error method, 试错法Tuning constant, 细调常数Two sided test, 双向检验Two-stage least squares, 二阶最小平方Two-stage sampling, 二阶段抽样Two-tailed test, 双侧检验Two-way analysis of variance, 双因素方差分析Two-way table, 双向表Type I error, 一类错误/α错误Type II error, 二类错误/β错误UMVU, 方差一致最小无偏估计简称Unbiased estimate, 无偏估计Unconstrained nonlinear regression , 无约束非线性回归Unequal subclass number, 不等次级组含量Ungrouped data, 不分组资料Uniform coordinate, 均匀坐标Uniform distribution, 均匀分布Uniformly minimum variance unbiased estimate, 方差一致最小无偏估计Unit, 单元Unordered categories, 无序分类Upper limit, 上限Upward rank, 升秩Vague concept, 模糊概念Validity, 有效性VARCOMP (Variance component estimation), 方差元素估计Variability, 变异性Variable, 变量Variance, 方差Variation, 变异Varimax orthogonal rotation, 方差最大正交旋转Volume of distribution, 容积W test, W检验Weibull distribution, 威布尔分布Weight, 权数Weighted Chi-square test, 加权卡方检验/Cochran检验Weighted linear regression method, 加权直线回归Weighted mean, 加权平均数Weighted mean square, 加权平均方差Weighted sum of square, 加权平方和Weighting coefficient, 权重系数Weighting method, 加权法W-estimation, W估计量W-estimation of location, 位置W估计量Width, 宽度Wilcoxon paired test, 威斯康星配对法/配对符号秩和检验Wild point, 野点/狂点Wild value, 野值/狂值Winsorized mean, 缩尾均值Withdraw, 失访Youden's index, 尤登指数Z test, Z检验Zero correlation, 零相关Z-transformation, Z变换。

谷子DUS测试标准品种指纹图谱的构建与应用

山西农业科学 2024，52（1）：10-18Journal of Shanxi Agricultural Sciences谷子DUS 测试标准品种指纹图谱的构建与应用史慎奎 1， 2，3，祁东梅 1， 2，3，王春芳 1， 2，3，王玉芳 1， 2，3，蔡爽1（1.河北民族师范学院生物与食品科学学院，河北承德 067000；2.河北民族师范学院国家民委植物学重点实验室，河北承德 067000；3.承德市清洁能源（双碳）产业研究院，河北承德 067000）摘要：谷子遗传资源多样性的研究对谷子基础研究与育种实践具有重要意义。

选择高质量的SSR 标记对DUS（Distinctness -特异性、Uniformity -一致性和Stability -稳定性）测试标准品种进行遗传多样性分析，不仅能够解析标准品种的遗传信息，也有助于对新育成品系进行分子水平的遗传多样性分析。

研究利用高质量的20个SSR 标记对30份不同地理来源的谷子DUS 测试标准品种进行遗传多样性分析，并构建其指纹图谱；同时，利用该指纹图谱的SSR 标记对22份春谷区和30份夏谷区区试品种进行遗传多样性分析，旨在为谷子品种资源利用与遗传改良奠定理论基础。

结果表明，20对SSR 标记在DUS 测试品种平均每对引物检出的等位变异数为9.15个，20个位点的平均多态性信息含量（PIC ）为0.77。

利用上述SSR 标记对来自春谷区与夏谷区的共计52份谷子区试品种进行分子鉴定，春谷区的聚类分析结果显示，22份春谷区区试品种分为3个主要的类群；而夏谷区的聚类分析结果显示，30份谷子夏谷区区试品种分为4个类群，但难以区分春谷、夏谷区区试品种。

研究结果厘清了谷子DUS 测试标准品种和区试品系间的遗传相似性。

关键词：谷子；指纹图谱；遗传多样性；DUS 测试；区试品种中图分类号：S515 文献标识码：A 文章编号：1002‒2481（2024）01‒0010‒09Construction and Application of Fingerprint Map for DUS TestStandard Varieties of Foxtail MilletSHI Shenkui 1，2，3，QI Dongmei 1，2，3，WANG Chunfang 1，2，3，WANG Yufang 1，2，3，CAI Shuang 1（1.College of Biology and Food Science ，Hebei Normal University for Nationalities ，Chengde 067000，China ；2.KeyLaboratory of Botany ，National Ethnic Affairs Commission ，Hebei Normal University for Nationalities ，Chengde 067000，China ；3.Industry Institute of Clean Energy （Carbon Emission Peak and Carbon Neutrality ），Chengde 067000，China ）Abstract ：The research on genetic resources diversity is of great significance for fundamental research and breeding practice in foxtail millet, and the screening of high quality SSR markers for the genetic diversity analysis of DUS(Distinctness, Uniformity, and Stability) test standard varieties could not only resolve the genetic information of standard varieties, but also help to analyze the genetic diversity of new lines at the molecular level. In this study, 20 SSR markers with high quality were applied to analyze the genetic diversity of 30 foxtail millet varieties from different geographic origins, the varieties had been already DUS test standard varieties and used to construct the fingerprint map. Simultaneously, using the SSR markers in the fingerprint map, the genetic diversity of 22 regional test varieties in the spring foxtail millet accessions and 30 regional test varieties in the summer foxtail millet accessions were analyzed, which would lay the theoretical basis for the germplasm utility and genetic improvement of foxtail millet. The results showed that 20 SSR markers showed average 9.15 alleles per primers in the DUS test standard varieties and the average polymorphism information content(PIC) of the 20 loci was 0.77. Using the above SSR markers, molecular identification was conducted on the total of 52 millet regional test varieties in the spring foxtail millet accessions and the summer foxtail millet accessions, the cluster analysis of the varieites in the spring foxtail millet accessions indicated that 22 varieties were divided into three groups and the cluster analysis of the varieites in the summer foxtail millet accessions indicated that 30 varieties were divided into four groups, but it was difficult to distinguish the varieties in the spring and summer accessions. The genetic similarity of the DUS test standard varieties and the regional test lines of foxtail millet was elucidated by the results in this study.Key words ：foxtail millet; fingerprint; genetic diversity; DUS test; regional test linesdoidoi:10.3969/j.issn.1002-2481.2024.01.02收稿日期：2023-09-04基金项目：承德高新区第三批汇智领创空间科技创新项目（HZLC2024009）；河北民族师范学院青年基金项目（QN201601）；承德清洁能源（双碳）产业研究院项目（202205B090）作者简介：史慎奎（1984-），男，河北衡水人，副教授，博士，主要从事杂粮种质资源研究工作。

（完整版）自动控制专业英语词汇

（完整版）自动控制专业英语词汇自动控制专业英语词汇（一）acceleration transducer 加速度传感器acceptance testing 验收测试accessibility 可及性accumulated error 累积误差AC-DC-AC frequency converter 交-直-交变频器AC (alternating current) electric drive 交流电子传动active attitude stabilization 主动姿态稳定actuator 驱动器，执行机构adaline 线性适应元adaptation layer 适应层adaptive telemeter system 适应遥测系统adjoint operator 伴随算子admissible error 容许误差aggregation matrix 集结矩阵AHP (analytic hierarchy process) 层次分析法amplifying element 放大环节analog-digital conversion 模数转换annunciator 信号器antenna pointing control 天线指向控制anti-integral windup 抗积分饱卷aperiodic decomposition 非周期分解a posteriori estimate 后验估计approximate reasoning 近似推理a priori estimate 先验估计articulated robot 关节型机器人assignment problem 配置问题，分配问题associative memory model 联想记忆模型associatron 联想机asymptotic stability 渐进稳定性attained pose drift 实际位姿漂移attitude acquisition 姿态捕获AOCS (attritude and orbit control system) 姿态轨道控制系统attitude angular velocity 姿态角速度attitude disturbance 姿态扰动attitude maneuver 姿态机动attractor 吸引子augment ability 可扩充性augmented system 增广系统automatic manual station 自动-手动操作器automaton 自动机autonomous system 自治系统backlash characteristics 间隙特性base coordinate system 基座坐标系Bayes classifier 贝叶斯分类器bearing alignment 方位对准bellows pressure gauge 波纹管压力表benefit-cost analysis 收益成本分析bilinear system 双线性系统biocybernetics 生物控制论biological feedback system 生物反馈系统black box testing approach 黑箱测试法blind search 盲目搜索block diagonalization 块对角化Boltzman machine 玻耳兹曼机bottom-up development 自下而上开发boundary value analysis 边界值分析brainstorming method 头脑风暴法breadth-first search 广度优先搜索butterfly valve 蝶阀CAE (computer aided engineering) 计算机辅助工程CAM (computer aided manufacturing) 计算机辅助制造Camflex valve 偏心旋转阀canonical state variable 规范化状态变量capacitive displacement transducer 电容式位移传感器capsule pressure gauge 膜盒压力表CARD 计算机辅助研究开发Cartesian robot 直角坐标型机器人cascade compensation 串联补偿catastrophe theory 突变论centrality 集中性chained aggregation 链式集结chaos 混沌characteristic locus 特征轨迹chemical propulsion 化学推进calrity 清晰性classical information pattern 经典信息模式classifier 分类器clinical control system 临床控制系统closed loop pole 闭环极点closed loop transfer function 闭环传递函数cluster analysis 聚类分析coarse-fine control 粗-精控制cobweb model 蛛网模型coefficient matrix 系数矩阵cognitive science 认知科学cognitron 认知机coherent system 单调关联系统combination decision 组合决策combinatorial explosion 组合爆炸combined pressure and vacuum gauge 压力真空表command pose 指令位姿companion matrix 相伴矩阵compartmental model 房室模型compatibility 相容性，兼容性compensating network 补偿网络compensation 补偿，矫正compliance 柔顺，顺应composite control 组合控制computable general equilibrium model 可计算一般均衡模型conditionally instability 条件不稳定性configuration 组态connectionism 连接机制connectivity 连接性conservative system 守恒系统consistency 一致性constraint condition 约束条件consumption function 消费函数context-free grammar 上下文无关语法continuous discrete event hybrid system simulation 连续离散事件混合系统仿真continuous duty 连续工作制control accuracy 控制精度control cabinet 控制柜controllability index 可控指数controllable canonical form 可控规范型[control] plant 控制对象,被控对象controlling instrument 控制仪表control moment gyro 控制力矩陀螺control panel 控制屏，控制盘control synchro 控制[式]自整角机control system synthesis 控制系统综合control time horizon 控制时程cooperative game 合作对策coordinability condition 可协调条件coordination strategy 协调策略coordinator 协调器corner frequency 转折频率costate variable 共态变量cost-effectiveness analysis 费用效益分析coupling of orbit and attitude 轨道和姿态耦合critical damping 临界阻尼critical stability 临界稳定性cross-over frequency 穿越频率，交越频率current source inverter 电流[源]型逆变器cut-off frequency 截止频率cybernetics 控制论cyclic remote control 循环遥控cylindrical robot 圆柱坐标型机器人damped oscillation 阻尼振荡damper 阻尼器damping ratio 阻尼比data acquisition 数据采集data encryption 数据加密data preprocessing 数据预处理data processor 数据处理器DC generator-motor set drive 直流发电机-电动机组传动D controller 微分控制器decentrality 分散性decentralized stochastic control 分散随机控制decision space 决策空间decision support system 决策支持系统decomposition-aggregation approach 分解集结法decoupling parameter 解耦参数deductive-inductive hybrid modeling method 演绎与归纳混合建模法delayed telemetry 延时遥测derivation tree 导出树derivative feedback 微分反馈describing function 描述函数desired value 希望值despinner 消旋体destination 目的站detector 检出器deterministic automaton 确定性自动机deviation 偏差deviation alarm 偏差报警器DFD 数据流图diagnostic model 诊断模型diagonally dominant matrix 对角主导矩阵diaphragm pressure gauge 膜片压力表difference equation model 差分方程模型differential dynamical system 微分动力学系统differential game 微分对策differential pressure level meter 差压液位计differential pressure transmitter 差压变送器differential transformer displacement transducer 差动变压器式位移传感器differentiation element 微分环节digital filer 数字滤波器digital signal processing 数字信号处理digitization 数字化digitizer 数字化仪dimension transducer 尺度传感器direct coordination 直接协调disaggregation 解裂discoordination 失协调discrete event dynamic system 离散事件动态系统discrete system simulation language 离散系统仿真语言discriminant function 判别函数displacement vibration amplitude transducer 位移振幅传感器dissipative structure 耗散结构distributed parameter control system 分布参数控制系统distrubance 扰动disturbance compensation 扰动补偿diversity 多样性divisibility 可分性domain knowledge 领域知识dominant pole 主导极点dose-response model 剂量反应模型dual modulation telemetering system 双重调制遥测系统dual principle 对偶原理dual spin stabilization 双自旋稳定duty ratio 负载比dynamic braking 能耗制动dynamic characteristics 动态特性dynamic deviation 动态偏差dynamic error coefficient 动态误差系数dynamic exactness 动它吻合性dynamic input-output model 动态投入产出模型econometric model 计量经济模型economic cybernetics 经济控制论economic effectiveness 经济效益economic evaluation 经济评价economic index 经济指数economic indicator 经济指标eddy current thickness meter 电涡流厚度计effectiveness 有效性effectiveness theory 效益理论elasticity of demand 需求弹性electric actuator 电动执行机构electric conductance levelmeter 电导液位计electric drive control gear 电动传动控制设备electric hydraulic converter 电-液转换器electric pneumatic converter 电-气转换器electrohydraulic servo vale 电液伺服阀electromagnetic flow transducer 电磁流量传感器electronic batching scale 电子配料秤electronic belt conveyor scale 电子皮带秤electronic hopper scale 电子料斗秤elevation 仰角emergency stop 异常停止empirical distribution 经验分布endogenous variable 内生变量equilibrium growth 均衡增长equilibrium point 平衡点equivalence partitioning 等价类划分ergonomics 工效学error 误差error-correction parsing 纠错剖析estimate 估计量estimation theory 估计理论evaluation technique 评价技术event chain 事件链evolutionary system 进化系统exogenous variable 外生变量expected characteristics 希望特性external disturbance 外扰fact base 事实failure diagnosis 故障诊断fast mode 快变模态feasibility study 可行性研究feasible coordination 可行协调feasible region 可行域feature detection 特征检测feature extraction 特征抽取feedback compensation 反馈补偿feedforward path 前馈通路field bus 现场总线finite automaton 有限自动机FIP (factory information protocol) 工厂信息协议first order predicate logic 一阶谓词逻辑fixed sequence manipulator 固定顺序机械手fixed set point control 定值控制FMS (flexible manufacturing system) 柔性制造系统flow sensor/transducer 流量传感器flow transmitter 流量变送器fluctuation 涨落forced oscillation 强迫振荡formal language theory 形式语言理论formal neuron 形式神经元forward path 正向通路forward reasoning 正向推理fractal 分形体，分维体frequency converter 变频器frequency domain model reduction method 频域模型降阶法frequency response 频域响应full order observer 全阶观测器functional decomposition 功能分解FES (functional electrical stimulation) 功能电刺激functional simularity 功能相似fuzzy logic 模糊逻辑game tree 对策树gate valve 闸阀general equilibrium theory 一般均衡理论generalized least squares estimation 广义最小二乘估计generation function 生成函数geomagnetic torque 地磁力矩geometric similarity 几何相似gimbaled wheel 框架轮global asymptotic stability 全局渐进稳定性global optimum 全局最优globe valve 球形阀goal coordination method 目标协调法grammatical inference 文法推断graphic search 图搜索gravity gradient torque 重力梯度力矩group technology 成组技术guidance system 制导系统gyro drift rate 陀螺漂移率gyrostat 陀螺体Hall displacement transducer 霍尔式位移传感器hardware-in-the-loop simulation 半实物仿真harmonious deviation 和谐偏差harmonious strategy 和谐策略heuristic inference 启发式推理hidden oscillation 隐蔽振荡hierarchical chart 层次结构图hierarchical planning 递阶规划hierarchical control 递阶控制homeostasis 内稳态homomorphic model 同态系统horizontal decomposition 横向分解hormonal control 内分泌控制hydraulic step motor 液压步进马达hypercycle theory 超循环理论I controller 积分控制器identifiability 可辨识性IDSS (intelligent decision support system) 智能决策支持系统image recognition 图像识别impulse 冲量impulse function 冲击函数，脉冲函数inching 点动incompatibility principle 不相容原理incremental motion control 增量运动控制index of merit 品质因数inductive force transducer 电感式位移传感器inductive modeling method 归纳建模法industrial automation 工业自动化inertial attitude sensor 惯性姿态敏感器inertial coordinate system 惯性坐标系inertial wheel 惯性轮inference engine 推理机infinite dimensional system 无穷维系统information acquisition 信息采集infrared gas analyzer 红外线气体分析器inherent nonlinearity 固有非线性inherent regulation 固有调节initial deviation 初始偏差initiator 发起站injection attitude 入轨姿势input-output model 投入产出模型instability 不稳定性instruction level language 指令级语言integral of absolute value of error criterion 绝对误差积分准则integral of squared error criterion 平方误差积分准则integral performance criterion 积分性能准则integration instrument 积算仪器integrity 整体性intelligent terminal 智能终端interacted system 互联系统，关联系统interactive prediction approach 互联预估法，关联预估法interconnection 互联intermittent duty 断续工作制internal disturbance 内扰ISM (interpretive structure modeling) 解释结构建模法invariant embedding principle 不变嵌入原理inventory theory 库伦论inverse Nyquist diagram 逆奈奎斯特图inverter 逆变器investment decision 投资决策isomorphic model 同构模型iterative coordination 迭代协调jet propulsion 喷气推进job-lot control 分批控制joint 关节Kalman-Bucy filer 卡尔曼-布西滤波器knowledge accomodation 知识顺应knowledge acquisition 知识获取knowledge assimilation 知识同化KBMS (knowledge base management system) 知识库管理系统knowledge representation 知识表达ladder diagram 梯形图lag-lead compensation 滞后超前补偿Lagrange duality 拉格朗日对偶性Laplace transform 拉普拉斯变换large scale system 大系统lateral inhibition network 侧抑制网络least cost input 最小成本投入least squares criterion 最小二乘准则level switch 物位开关libration damping 天平动阻尼limit cycle 极限环linearization technique 线性化方法linear motion electric drive 直线运动电气传动linear motion valve 直行程阀linear programming 线性规划LQR (linear quadratic regulator problem) 线性二次调节器问题load cell 称重传感器local asymptotic stability 局部渐近稳定性local optimum 局部最优log magnitude-phase diagram 对数幅相图long term memory 长期记忆lumped parameter model 集总参数模型Lyapunov theorem of asymptotic stability 李雅普诺夫渐近稳定性定理自动控制专业英语词汇（二）macro-economic system 宏观经济系统magnetic dumping 磁卸载magnetoelastic weighing cell 磁致弹性称重传感器magnitude-frequency characteristic 幅频特性magnitude margin 幅值裕度magnitude scale factor 幅值比例尺manipulator 机械手man-machine coordination 人机协调manual station 手动操作器MAP (manufacturing automation protocol) 制造自动化协议marginal effectiveness 边际效益Mason's gain formula 梅森增益公式master station 主站matching criterion 匹配准则maximum likelihood estimation 最大似然估计maximum overshoot 最大超调量maximum principle 极大值原理mean-square error criterion 均方误差准则mechanism model 机理模型meta-knowledge 元知识metallurgical automation 冶金自动化minimal realization 最小实现minimum phase system 最小相位系统minimum variance estimation 最小方差估计minor loop 副回路missile-target relative movement simulator 弹体-目标相对运动仿真器modal aggregation 模态集结modal transformation 模态变换MB (model base) 模型库model confidence 模型置信度model fidelity 模型逼真度model reference adaptive control system 模型参考适应控制系统model verification 模型验证modularization 模块化MEC (most economic control) 最经济控制motion space 可动空间MTBF (mean time between failures) 平均故障间隔时间MTTF (mean time to failures) 平均无故障时间multi-attributive utility function 多属性效用函数multicriteria 多重判据multilevel hierarchical structure 多级递阶结构multiloop control 多回路控制multi-objective decision 多目标决策multistate logic 多态逻辑multistratum hierarchical control 多段递阶控制multivariable control system 多变量控制系统myoelectric control 肌电控制Nash optimality 纳什最优性natural language generation 自然语言生成nearest-neighbor 最近邻necessity measure 必然性侧度negative feedback 负反馈neural assembly 神经集合neural network computer 神经网络计算机Nichols chart 尼科尔斯图noetic science 思维科学noncoherent system 非单调关联系统noncooperative game 非合作博弈nonequilibrium state 非平衡态nonlinear element 非线性环节nonmonotonic logic 非单调逻辑nonparametric training 非参数训练nonreversible electric drive 不可逆电气传动nonsingular perturbation 非奇异摄动non-stationary random process 非平稳随机过程nuclear radiation levelmeter 核辐射物位计nutation sensor 章动敏感器Nyquist stability criterion 奈奎斯特稳定判据objective function 目标函数observability index 可观测指数observable canonical form 可观测规范型on-line assistance 在线帮助on-off control 通断控制open loop pole 开环极点operational research model 运筹学模型optic fiber tachometer 光纤式转速表optimal trajectory 最优轨迹optimization technique 最优化技术orbital rendezvous 轨道交会orbit gyrocompass 轨道陀螺罗盘orbit perturbation 轨道摄动order parameter 序参数orientation control 定向控制originator 始发站oscillating period 振荡周期output prediction method 输出预估法oval wheel flowmeter 椭圆齿轮流量计overall design 总体设计overdamping 过阻尼overlapping decomposition 交叠分解Pade approximation 帕德近似Pareto optimality 帕雷托最优性passive attitude stabilization 被动姿态稳定path repeatability 路径可重复性pattern primitive 模式基元PR (pattern recognition) 模式识别P control 比例控制器peak time 峰值时间penalty function method 罚函数法perceptron 感知器periodic duty 周期工作制perturbation theory 摄动理论pessimistic value 悲观值phase locus 相轨迹phase trajectory 相轨迹phase lead 相位超前photoelectric tachometric transducer 光电式转速传感器phrase-structure grammar 短句结构文法physical symbol system 物理符号系统piezoelectric force transducer 压电式力传感器playback robot 示教再现式机器人PLC (programmable logic controller) 可编程序逻辑控制器plug braking 反接制动plug valve 旋塞阀pneumatic actuator 气动执行机构point-to-point control 点位控制polar robot 极坐标型机器人pole assignment 极点配置pole-zero cancellation 零极点相消polynomial input 多项式输入portfolio theory 投资搭配理论pose overshoot 位姿过调量position measuring instrument 位置测量仪posentiometric displacement transducer 电位器式位移传感器positive feedback 正反馈power system automation 电力系统自动化predicate logic 谓词逻辑pressure gauge with electric contact 电接点压力表pressure transmitter 压力变送器price coordination 价格协调primal coordination 主协调primary frequency zone 主频区PCA (principal component analysis) 主成分分析法principle of turnpike 大道原理priority 优先级process-oriented simulation 面向过程的仿真production budget 生产预算production rule 产生式规则profit forecast 利润预测PERT (program evaluation and review technique) 计划评审技术program set station 程序设定操作器proportional control 比例控制proportional plus derivative controller 比例微分控制器protocol engineering 协议工程prototype 原型pseudo random sequence 伪随机序列pseudo-rate-increment control 伪速率增量控制pulse duration 脉冲持续时间pulse frequency modulation control system 脉冲调频控制系统pulse width modulation control system 脉冲调宽控制系统PWM inverter 脉宽调制逆变器pushdown automaton 下推自动机QC (quality control) 质量管理quadratic performance index 二次型性能指标qualitative physical model 定性物理模型quantized noise 量化噪声quasilinear characteristics 准线性特性queuing theory 排队论radio frequency sensor 射频敏感器ramp function 斜坡函数random disturbance 随机扰动random process 随机过程rate integrating gyro 速率积分陀螺ratio station 比值操作器reachability 可达性reaction wheel control 反作用轮控制realizability 可实现性,能实现性real time telemetry 实时遥测receptive field 感受野rectangular robot 直角坐标型机器人rectifier 整流器recursive estimation 递推估计reduced order observer 降阶观测器redundant information 冗余信息reentry control 再入控制regenerative braking 回馈制动，再生制动regional planning model 区域规划模型regulating device 调节装载regulation 调节relational algebra 关系代数relay characteristic 继电器特性remote manipulator 遥控操作器remote regulating 遥调remote set point adjuster 远程设定点调整器rendezvous and docking 交会和对接reproducibility 再现性resistance thermometer sensor 热电阻resolution principle 归结原理resource allocation 资源分配response curve 响应曲线return difference matrix 回差矩阵return ratio matrix 回比矩阵reverberation 回响reversible electric drive 可逆电气传动revolute robot 关节型机器人revolution speed transducer 转速传感器rewriting rule 重写规则rigid spacecraft dynamics 刚性航天动力学risk decision 风险分析robotics 机器人学robot programming language 机器人编程语言robust control 鲁棒控制robustness 鲁棒性roll gap measuring instrument 辊缝测量仪root locus 根轨迹roots flowmeter 腰轮流量计rotameter 浮子流量计，转子流量计rotary eccentric plug valve 偏心旋转阀rotary motion valve 角行程阀rotating transformer 旋转变压器Routh approximation method 劳思近似判据routing problem 路径问题sampled-data control system 采样控制系统sampling control system 采样控制系统saturation characteristics 饱和特性scalar Lyapunov function 标量李雅普诺夫函数SCARA (selective compliance assembly robot arm) 平面关节型机器人scenario analysis method 情景分析法scene analysis 物景分析s-domain s域self-operated controller 自力式控制器self-organizing system 自组织系统self-reproducing system 自繁殖系统self-tuning control 自校正控制semantic network 语义网络semi-physical simulation 半实物仿真sensing element 敏感元件sensitivity analysis 灵敏度分析sensory control 感觉控制sequential decomposition 顺序分解sequential least squares estimation 序贯最小二乘估计servo control 伺服控制，随动控制servomotor 伺服马达settling time 过渡时间sextant 六分仪short term planning 短期计划short time horizon coordination 短时程协调signal detection and estimation 信号检测和估计signal reconstruction 信号重构similarity 相似性simulated interrupt 仿真中断simulation block diagram 仿真框图simulation experiment 仿真实验simulation velocity 仿真速度simulator 仿真器single axle table 单轴转台single degree of freedom gyro 单自由度陀螺single level process 单级过程single value nonlinearity 单值非线性singular attractor 奇异吸引子singular perturbation 奇异摄动sink 汇点slaved system 受役系统slower-than-real-time simulation 欠实时仿真slow subsystem 慢变子系统socio-cybernetics 社会控制论socioeconomic system 社会经济系统software psychology 软件心理学solar array pointing control 太阳帆板指向控制solenoid valve 电磁阀source 源点specific impulse 比冲speed control system 调速系统spin axis 自旋轴spinner 自旋体stability criterion 稳定性判据stability limit 稳定极限stabilization 镇定，稳定Stackelberg decision theory 施塔克尔贝格决策理论state equation model 状态方程模型state space description 状态空间描述static characteristics curve 静态特性曲线station accuracy 定点精度stationary random process 平稳随机过程statistical analysis 统计分析statistic pattern recognition 统计模式识别steady state deviation 稳态偏差steady state error coefficient 稳态误差系数step-by-step control 步进控制step function 阶跃函数stepwise refinement 逐步精化stochastic finite automaton 随机有限自动机strain gauge load cell 应变式称重传感器strategic function 策略函数strongly coupled system 强耦合系统subjective probability 主观频率suboptimality 次优性supervised training 监督学习supervisory computer control system 计算机监控系统sustained oscillation 自持振荡swirlmeter 旋进流量计switching point 切换点symbolic processing 符号处理synaptic plasticity 突触可塑性synergetics 协同学syntactic analysis 句法分析system assessment 系统评价systematology 系统学system homomorphism 系统同态system isomorphism 系统同构system engineering 系统工程tachometer 转速表target flow transmitter 靶式流量变送器task cycle 作业周期teaching programming 示教编程telemechanics 远动学。

第十九章聚类分析 (Clustering Analysis) - 中南大学

rij
( X X )( X X ) (X X ) (X X
i i j j 2 i i j
j
)
2
(19-1)
The two variables tend to be more similar when the absolute value increases. Similarly, Spearman rank correlation coefficient can be used to define the similarity coefficient of non-normal variables. But when the variables are all qualitative variables, it’s best to use contingency coefficient.
For example, m refers to the number of variables(i.e. indexes)
while n refers to that of cases(i.e. samples) ,you can do as follows: (1) R-type clustering: also called index clustering. The method to sort the m kinds of indexes, aiming at lowering the
individuals to the correct population.
Clustering Analysis: a statistic method for grouping objects of random kind into respective categories. It’s used when there’s no priori hypotheses, but trying to find the most appropriate sorting method resorting to mathematical statistics and some collected information. It has become the first selected means to uncover great capacity of genetic messages.

泥浆专业词汇

重晶石barite般土bentonite灰罐active tank烧碱caustic soda振动筛mud screen泥浆槽mud ditch泥浆枪mud gun加料漏斗hopper计量罐trip tank包，袋sack桶barrel井架derrick井架底座derrick substructure钻机drawworks传动轴drive shaft猫头轴cat shaft刹车brake刹车带brake belt泥浆泵mud pump饮用水potable water起钻pull out of hole下钻run in hole灌泥浆fill the hole抽吸swab单根single一立柱stand井底bottom hole泵压pump pressure泵压过高overhigh pumping pressure ，井涌kick压井kick well调整泥浆conditioning mud淀粉starch ,钻头bit ，白沥青white asphalt ，白油mineral oil ，白云母white mica包被絮凝剂flocculant ，包被envelop包被抑制性encapsulating ability层流layer flow紊流turbulence分散dispersion ,分散剂dispersant ,分析analysis ,粉尘dust ,粉末powder ,改性淀粉modified starch ,改性沥青modified asphalt钙calcium高分子聚合物macromoleclar polymer ,高分子絮凝剂polymer flocculant高岭土kaolinite ,高密度钻井液high density drilling fluid,高温泥浆high-temperature mud ,高温高压流变仪HTHP rheometer ,高效润滑剂super lubricant静切力(结构力)gel strength/static shear force聚合醇polyalcohol ,聚合物不分散泥浆non dispersed polymer mud聚合物降滤失水剂polymer filtration control agent 聚合物钻井液polymer drilling fluid，聚磺钻井液sulphonated polymer mud录井log裸眼open well裸眼井段barefoot interval滤饼filter cake滤液filtrate滤失量filtration滤液侵入filtrate invasion氯化钙calcium chlorideKCl溶液potassium chloride solution，毛细管压力capillary pressure煤层coal bed镁magnesium临界环空流速critical annular fluid velocity临界流量critical flow velocity硫化氢hydrogen sulfide硫酸sulfate流变参数reheological parameter流变模式reheology model流变性rheology behavior流动阻力flow resistance流态flow pattern流型fluid type漏斗粘度funnel viscosity漏失lost circulation漏失层位location of the thief zone蒙脱石smectite密度density木质素磺酸盐lignosulfonate目mesh钠sodium，泥包bit balling泥饼mud-cake，泥浆处理mud treatment泥浆配方mud formula泥岩mudstone , conglomerate泥页岩shale粘度viscosity粘土膨胀clay swelling，粘土稳定性clay stability润滑剂lubricant润滑仪lubricity tester纤维素cellulose羟丙基淀粉hydroxypropul starch羟乙基纤维素hydroxyethyl cellulose安全地层safe formation ，安全钻井safe drilling ，坳陷down warping region凹陷地层subsidence formation ，奥陶系Ordovician system ，多靶点multiple target point ，饱和度saturation ，饱和盐水saturated salt water ，背斜anticlinal比表面积specific surface area ，比重瓶法density bottle method ，边界摩擦boundary friction ，标准化standardization ，标准粘度测量standard visicosity measure ，表面活性剂surfactant /surface active agent ，表面粘度surface viscosity ，表面张力surface tension ，表皮系数(S) skin coefficient ，憋钻bit bouncing ，宾汉方程bingham equation ，薄而韧的泥饼thin,plastic and compacted mud-cake ，薄片flake ，薄弱地层weak formation ，剥离peel off ，不分散泥浆nondispersed mud ，部分水解聚丙烯酰胺(PHPA) partially hydrolyzed polyacrylamide ，参数优选parametric optimization ，残酸reacted acid ，残渣gel residue / solid residue ，测量measure ，侧钻水平井sidetrack horizontal well ，层间interlayer ，层理bedding ，柴油diesel oil ，操作方法operation method超深井high deep well ，超声波ultrasonography ，超细碳酸钙super-fine calcium carbonate ，产层production/pay zone产层亏空reservoir voidage ，产量production ,output ，沉淀precipitation沉降subside ，沉降速度settling rate ，沉砂sand setting ，程序program ，成分ingredient ，成胶剂gelatinizing agent ，成膜树脂film-forming resin ，成岩性差poor diagenetic grade ，承压bearing pressure ，承压低lower pressure resistance ，承压能力loading capacity尺寸dimension ，除硫剂sulfur elimination除砂器desander ，触变性thixotropy ，垂直井vertical well ，充气钻井液aerated drilling fluid冲砂sand removal ，冲蚀flush冲刷washing out冲洗clean ，冲洗效率cleaning efficiency冲洗液washing fluid ，丛式井cluster well ，稠油区viscous oil area ，稠油藏high oil reservoir ，初步分析preliminary analysis初始稠度initial consistency ，初始粘度initial viscosity ，处理剂additive /treating-agent粗分散泥浆coarse dispersed mud ，醋酸acetate ，窜流fluid channeling ，脆性brittle/crisp ,fragility大段水层thick aqueous formation大井斜角high deviation angle ，大块岩样big rock sample ，大块钻屑massive drilling cuttings大理石marble ，大砾石层large gravel bed ，大量分析quantitative analysis大排量洗井high flow rate washover ，大排量循环high flow rate circulation ，大位移定向井extended-reach directional well大井眼large hole ，代表性岩心representive core sample单宁酸tannate ，氮nitrogen ，淡水fresh water单向压力暂堵剂unidirectional pressure temporary plugging additive ，导向螺杆钻具stearable assemly ，导向器guider低毒油基low toxicity oil based ，低返速low return-velocity低固相泥浆low solid drilling fluid ，低粘土相泥浆low clay content drilling fluid ，滴定titration ，地层formation ，地层破碎straturn breaking ，地层倾角大higher formation clination ，地层水formation water ，地层损害formation damage ，地下水groundwater ,/subsurface water ,地应力ground stress ,地质geology地质构造geologic structure ,电测electronic logging电解质electrolyte顶替过程displacing operation定向井direction well ,动态滤失dynamic filtration ,动切力yield value动塑比yield value to plastic viscosity ,堵漏plugging ,堵水water shutoff ,毒性大high toxicity ,毒性污染环境toxicity ruins the environment短纤维brief fiber,断层发育mature fault,断裂带faulted zone多分支侧钻井multi-lateral sidetracking well ,多功能添加剂multifunction additive ,惰性材料inert material ,惰性润滑剂inert lubricant ,二叠系Permian system ,二开second section ,二氧化碳carbon dioxide反排解堵plug removal by reverse flow ,范氏粘度计fann viscosimeter ,防窜水泥anti-fluid-channeling cement防腐anti-corrosion ,防卡pipe-sticking prevention /anti-sticking ,防漏失lost circulation prevention防塌机理mechanism of anti-caving ，放空不返loss of bit load with loss return ,非离子nonionic非牛顿流体non-newtonian fluid ,废泥浆mud disposal ,分段固井技术stage cementing technology,粉砂质aleuritic texture ,封堵formation sealing封堵剂formation sealant ,封固段interval isolation ,扶正器centralizer复杂地层complex formation, troublesome region ,trick formation 复杂情况down-hole troublesome condition ,腐蚀corrosion ,腐蚀电位corrosion potential腐蚀速率corrosion rate ,腐殖酸humate ,humic acid ,负压钻井underbalanced drilling ,附加密度addition mud density ,改善泥饼质量improvement of mud cake高压盐水层high pressured slatwater layer ,膏岩层gypsolyte ,工程engineering狗腿dogleg ,构造裂缝structural fracture ,固井技术cementing technology ,固相solid phase ,固相含量solid concentration ,固相颗粒solid particles ,固相颗粒侵入solid invasion ,固相控制技术solid control technology固相损害damage of particles硅粉silica powder ,海上off shore ,海水泥浆sea water mud海湾bay ,海洋生物marine animal ,含量content核桃壳粉walnut shell flour ,核磁共振（NMR）nuclear magnetic resonance ,合成synthesis合成基钻井液synthetic base drilling fluid ,褐煤lignite ,花岗岩granite ,划眼作业reaming operation ,环境保护environment protection ,环空当量密度annular equivalent density环空返速velocity in annular ,环空压耗annular pressure lost ,缓蚀剂corrosion inhibitor ,磺化酚醛树脂sulfomethal phenolaldehy resin ,磺化沥青sulfonated gilsonite ,磺甲基酚醛树脂sulfonated methypheuo formald-ehyde 灰岩limestone ,回收率recovery percent ,火成岩igneous rock混合盐水mixed salt ,活动套管moving casing ,活度water activity基液base fluid ,机械钻速(ROP) rate of penetrate及时反出timely return ,激光粒度仪laser particle analyzer技术措施technical measure ,技术套管intermediate casing ,钾potassium ,甲酸盐formate加量dosage ,加重剂heavy weight additive ,加重泥浆weighted mud监督supervision ,碱alkali ,简化泥浆处理simplify mud treatment减阻剂anti-friction agent , drag reducer ,剪切应力shear stress健康,安全与环境(HSE) health 、safety and environment , 降粘剂thinner,visbreaker降失水剂fluid loss agent/additive, filtration reducer；胶凝gelatify结构强度structural strength ，解卡剂pipe free agent ，近平衡钻井near-balanced drilling ，井壁稳定hole stability ,stable borehole ，井底downhole ，井底静止温度低(BHST) low borehole static temperature井段interval/section ，井径well/hole gauge ，井径规则regular and consistent borehole gauge井径扩大率hole diameter enlargement rate ，井口wellhead ，井漏lost circulation井身结构wellbore configuration ，井下安全downhole safety ，井下复杂情况down hole problem井斜inclination ，井眼well bore ,borehole ，井眼轨迹well track井眼净化hole cleaning井眼缩径hole shrinkage井眼稳定hole stability浸泡时间soak time卡钻pipe-sticking勘探与开发exploration and development开发井development well开钻泥浆spud mud孔喉pore throat孔隙pore孔隙度测井porosity log孔隙压力pore pressure矿化度mineral salt concentration ,雷诺数Renault number离心机固控技术centrifugal solid control粒度分布particles/size distribution粒度分析particles size analysis粒子particle砾石充填gravel pack裂隙地层fractured formation幂律模式power law method陆上on shore幂律模式power law method纳米技术nano-tech内泥饼internal filter cake粘性流体viscous fluid柠檬酸citric acid凝固点freezing point凝析油condensate oil扭矩torque浓度concentration浓硫酸@strong sulfuric剖面图profile map泡沫剂foaming agent泡沫钻井液foam drilling fluid配浆时间drilling fluid preparing time盆地basin喷blowout喷射钻井jet drilling喷嘴粘度nozzle viscosity膨润土bentonite膨润土含量bentonite content膨胀swell膨胀性堵漏材料expandable plugging additives平衡压力钻井balanced drilling评价标准evaluation criterion评价井appraisal well平板型层流plate laminar flow平均井深average well depth屏蔽环shielding zone屏蔽暂堵技术temporary shielding method ,barrier-building temporary seal incores 破胶剂gel breaker起泡剂frothing agent起下钻阻卡blockage during tripping前置液prepad fluid潜山buried hill浅海shallow-water浅井shallow well欠平衡钻井underbanlanced drilling桥堵剂bridge additive切力shearing force侵入深度invasion depth侵蚀erosion亲水性hydrophilcity亲油性lipophilic氢氧化钙calcium hydroxide清洗剂cleaning agent倾角dip angle区块block屈服值yielding point取代度substituted ratio取芯core,coring operation取芯进尺coring footage取芯收获率coring recovery rate溶洞cave溶液solution乳化剂emulsifier软化点沥青softening point asphalt软泥岩soft mudstone塞流顶替plug-flow displacement射孔perforation射孔液perforation fluid砂泥岩sand shale砂岩sand ,sandstone杀菌剂bacteriostat筛管screen pipe深度depth渗漏leakage渗透率fluid permeability渗透率恢复值return permeability声波测井sonic logging生物处理biological treatment生物降解biological degradation生物聚合物biological polymer ,xanthan 石灰lime石蜡alpha , paraffin wax石炭系carboniferous system石英quartz塑料小球plastic beads瞬时滤失instantaneous filtration , spurt loss 水泥环cement sheath水泥浆cement slurry水基泥浆water-base drilling fluid水敏性water sensitivity水锁water lock塑性粘度plastic viscosity速敏speed-sensitivity酸碱滴定法acid-base titration酸敏acid sensitivity酸溶性acid soluble随钻堵漏plugging while drilling缩径hole shrinkage坍塌slough坍塌压力collapse pressure坍塌页岩sloughing shale碳酸钙calcium carbonate套管casing调整井adjustment well铁矿粉hematite通井drafting process完井液completion fluid温度temperature无毒non-toxicity污染contamination稀释剂thinner下套管running casing消泡剂defoamer小井眼slim hole修井液workover fluid溴盐bromine一开surface section抑制性inhibitive ability抑制性差poor inhibity抑制剂inhibitor荧光florescence荧光分析仪fluorescence analyzer荧光级别高high class of florescence荧光录井fluorologging油层损害formation damage油气层hydrocarbon reservoir油气层保护formation damage control ,reservoir protection 原油crude oil云母mica暂堵剂temporary plugging additive造壁性well building property造斜点kick-off增粘剂thickening agent , viscosifier浊点cloud point自喷井fountain well侏罗系Jurassic组成constituent组分component钻井过程drilling process，钻井压差drilling pressure differential钻井液drilling fluid ，钻井液结构drilling fluid structure ，钻井液密度drilling fluid density钻井液排放drilling fluid discharge ，钻井液配方drilling fluid composition ，钻井周期drilling term钻具drilling tads /drill stem ，钻具腐蚀corrosion of drilling pipe ，钻速penetrate/drilling rate钻屑drilling cuttings钻屑回收率cuttings recovery钻屑污染cuttings contamination ，最大井斜maximum inclination，当班on duty下班off duty值班on shift加班overtime夜班night duty白班day duty钻井经理drilling manager钻井工程师drilling engineer钻井领班tool pusher司钻driller副司钻assistant driller井架工derrick man钻工floor man , roughneck材料员material man报务员radio operator水手长roustabout pusher吊车工crane operator轮机长chief engineer焊工welder电工electrician管事chief steward救生艇life boat救生筏life raft救生衣life jacket急救箱first aid kit稀释剂diluent分散剂dispersant双层振动筛double deck shaker泥浆补充管线fill in line泥浆输出管线Mud delivery manifold line 泥浆上水管线mud suction manifold line 泥浆上水池mud suction tank储存罐storage tank。

量化研究与统计分析—集群分析

身高vs.体重(标准化)
JCR
Information Science & Library Science之 54种期刊是否可以依其index加以分类
Impact factor vs. immediacy index
• 排名好又热门（叫好又叫座的期刊）
Impact factor vs. articles
5 4 2 3 6
0 0 0 1 0
•
下一阶段是5，所以看阶段5。集群1放1，集群2放6，且先出现集群1为4，表示1要和4放在一个集群；集群2的先出现集群为0，表示集群2的6要自己归在一个集群中。集群1（根据阶段4集群1现有3和5），集群2要放1。群II：1, 4 群III：6 群I：3, 5
5 4 2 3 6
0 0 0 1 0
集群共分5阶段。第一阶段首先合并距离最近的样本3和5,形成 G1；下一阶段（最右一栏）为4,故接续看在第4阶段中，G1和样本3形成复集群，因此在「先出现的阶段集群」中，集群为3 和 1。第二阶段，合并样本1和4,形成G2,因为下一阶段是3,所在第3阶段中，G2和1号样本形成复集群，且「先出现的阶段集群」中，集群1＝2.、、、系数随着集群的进行逐渐增大，开始增加得慢，后面增加快，表示集群开始时类间的差异小，结束时类间的差异大。
5 4 2 3 6
0 0 0 1 0
• •
下一阶段是3。阶段3中，集群1放1，集群2放2。先出现的集群1是2，表示1和2 要放在同一群中。而集群2的先出现集群是0；结束。
•
• •
群II：1, 4, 2
群III：6 群I：3, 5
由组间平均距离连接法，可知6个样本明显地分为三类
I： 3 、 5 II：1、2、4 III：6

电子类专业英语词汇 A

后验估计 a posteriori estimate先验估计 a priori estimate交流电子传动 ac (alternating current) electric drive验收测试 acceptance testing可及性 accessibility累积误差 accumulated error交-直-交变频器 ac-dc-ac frequency converter主动姿态稳定 active attitude stabilization驱动器，执行机构 actuator线性适应元 adaline适应层 adaptation layer适应遥测系统 adaptive telemeter system伴随算子 adjoint operator容许误差 admissible error集结矩阵 aggregation matrix层次分析法 ahp (analytic hierarchy process)放大环节 amplifying element模数转换 analog-digital conversion信号器 annunciator天线指向控制 antenna pointing control抗积分饱卷 anti-integral windup姿态轨道控制系统 aocs (attritude and orbit control system)非周期分解 aperiodic decomposition近似推理 approximate reasoning关节型机器人 articulated robot 配置问题，分配问题 assignment problem联想记忆模型 associative memory model 联想机 associatron渐进稳定性 asymptotic stability实际位姿漂移 attained pose drift姿态捕获 attitude acquisition姿态角速度 attitude angular velocity 姿态扰动 attitude disturbance姿态机动 attitude maneuver吸引子 attractor 可扩充性 augment ability增广系统 augmented system自动-手动操作器 automatic manual station自动机 automaton自治系统 autonomous system间隙特性 backlash characteristics基座坐标系 base coordinate system贝叶斯分类器 bayes classifier方位对准 bearing alignment波纹管压力表 bellows pressure gauge收益成本分析 benefit-cost analysis双线性系统 bilinear system 生物控制论 biocybernetics生物反馈系统 biological feedback system黑箱测试法 black box testing approach盲目搜索 blind search块对角化 block diagonalization玻耳兹曼机 boltzman machine自下而上开发 bottom-up development边界值分析 boundary value analysis头脑风暴法 brainstorming method广度优先搜索 breadth-first search蝶阀 butterfly valve计算机辅助工程 cae (computer aided engineering)清晰性 calrity计算机辅助制造 cam (computer aided manufacturing)偏心旋转阀 camflex valve规范化状态变量 canonical state variable电容式位移传感器 capacitive displacement transducer膜盒压力表 capsule pressure gauge计算机辅助研究开发 card直角坐标型机器人 cartesian robot串联补偿 cascade compensation突变论 catastrophe theory集中性 centrality链式集结 chained aggregation混沌 chaos特征轨迹 characteristic locus化学推进 chemical propulsion经典信息模式 classical information pattern分类器 classifier临床控制系统 clinical control system闭环极点 closed loop pole闭环传递函数 closed loop transfer function聚类分析 cluster analysis粗-精控制 coarse-fine control蛛网模型 cobweb model系数矩阵 coefficient matrix认知科学 cognitive science认知机 cognitron单调关联系统 coherent system组合决策 combination decision组合爆炸 combinatorial explosion压力真空表 combined pressure and vacuum gauge指令位姿 command pose相伴矩阵 companion matrix房室模型 compartmental model相容性，兼容性 compatibility补偿网络 compensating network补偿，矫正 compensation柔顺，顺应 compliance组合控制 composite control可计算一般均衡模型 computable general equilibrium model条件不稳定性 conditionallyinstability组态 configuration[nextpage} 连接机制 connectionism连接性 connectivity守恒系统 conservative system一致性 consistency约束条件 constraint condition消费函数 consumption function上下文无关语法 context-free grammar连续离散事件混合系统仿真 continuous discrete event hybrid system simulation连续工作制 continuous duty控制精度 control accuracy 控制柜 control cabinet控制力矩陀螺 control moment gyro控制屏，控制盘 control panel控制[式}自整角机 control synchro控制系统综合 control system synthesis控制时程 control time horizon可控指数 controllability index可控规范型 controllable canonical form控制仪表 controlling instrument合作对策 cooperative game可协调条件 coordinability condition协调策略 coordination strategy协调器 coordinator转折频率 corner frequency共态变量 costate variable费用效益分析 cost-effectiveness analysis轨道和姿态耦合 coupling of orbit and attitude临界阻尼 critical damping临界稳定性 critical stability穿越频率，交越频率 cross-over frequency电流[源}型逆变器 current source inverter截止频率 cut-off frequency控制论 cybernetics循环遥控 cyclic remote control 圆柱坐标型机器人 cylindrical robot微分控制器 d controller阻尼振荡 damped oscillation阻尼器 damper阻尼比 damping ratio数据采集 data acquisition数据加密 data encryption数据预处理 data preprocessing数据处理器 data processor直流发电机-电动机组传动 dc generator-motor set drive分散性 decentrality分散随机控制 decentralized stochastic control决策空间 decision space决策支持系统 decision support system分解集结法 decomposition-aggregation approach解耦参数 decoupling parameter演绎与归纳混合建模法 deductive-inductive hybrid modeling method延时遥测 delayed telemetry导出树 derivation tree微分反馈 derivative feedback描述函数 describing function希望值 desired value消旋体 despinner目的站 destination检出器 detector确定性自动机 deterministic automaton偏差 deviation偏差报警器 deviation alarm数据流图 dfd诊断模型 diagnostic model对角主导矩阵 diagonally dominant matrix膜片压力表 diaphragm pressure gauge差分方程模型 difference equation model微分动力学系统 differential dynamical system 微分对策 differential game差压液位计 differential pressure level meter差压变送器 differential pressure transmitter差动变压器式位移传感器 differential transformer displacement transducer微分环节 differentiation element数字滤波器 digital filer数字信号处理 digital signal processing数字化 digitization数字化仪 digitizer[nextpage} 尺度传感器 dimension transducer直接协调 direct coordination解裂 disaggregation 失协调 discoordination离散事件动态系统 discrete event dynamic system离散系统仿真语言 discrete system simulation language判别函数 discriminant function位移振幅传感器 displacement vibration amplitude transducer耗散结构 dissipative structure分布参数控制系统 distributed parameter control system扰动 distrubance扰动补偿 disturbance compensation多样性 diversity可分性 divisibility领域知识 domain knowledge主导极点 dominant pole剂量反应模型 dose-response model双重调制遥测系统 dual modulation telemetering system对偶原理 dual principle双自旋稳定 dual spin stabilization负载比 duty ratio能耗制动 dynamic braking动态特性 dynamic characteristics动态偏差 dynamic deviation动态误差系数 dynamic error coefficient动它吻合性 dynamic exactness动态投入产出模型 dynamicinput-output model计量经济模型 econometric model经济控制论 economic cybernetics经济效益 economic effectiveness经济评价 economic evaluation经济指数 economic index经济指标 economic indicator电涡流厚度计 eddy current thickness meter 有效性 effectiveness效益理论 effectiveness theory需求弹性 elasticity of demand电动执行机构 electric actuator电导液位计 electric conductance levelmeter电动传动控制设备 electric drive control gear电-液转换器 electric hydraulic converter电-气转换器 electric pneumatic converter电液伺服阀 electrohydraulic servo vale电磁流量传感器 electromagnetic flow transducer电子配料秤 electronic batching scale电子皮带秤 electronic belt conveyor scale电子料斗秤 electronic hopper scale仰角 elevation异常停止 emergency stop经验分布 empirical distribution内生变量 endogenous variable均衡增长 equilibrium growth平衡点 equilibrium point等价类划分 equivalence partitioning工效学 ergonomics误差 error纠错剖析 error-correction parsing估计量 estimate估计理论 estimation theory评价技术 evaluation technique事件链 event chain进化系统 evolutionary system外生变量 exogenous variable希望特性 expected characteristics外扰 external disturbance事实 fact base故障诊断 failure diagnosis快变模态 fast mode可行性研究 feasibility study可行协调 feasible coordination可行域 feasible region特征检测 feature detection特征抽取 feature extraction反馈补偿 feedback compensation前馈通路 feedforward path功能电刺激 fes (functional electrical stimulation)现场总线 field bus有限自动机 finite automaton工厂信息协议 fip (factory information protocol)一阶谓词逻辑 first order predicate logic固定顺序机械手 fixed sequence manipulator定值控制 fixed set point control流量传感器 flowsensor/transducer流量变送器 flow transmitter涨落 fluctuation柔性制造系统 fms (flexible manufacturing system)强迫振荡 forced oscillation形式语言理论 formal language theory形式神经元 formal neuron正向通路 forward path正向推理 forward reasoning分形体，分维体 fractal变频器 frequency converter频域模型降阶法 frequency domain model reduction method频域响应 frequency response全阶观测器 full order observer功能分解 functional decomposition功能相似 functional simularity模糊逻辑 fuzzy logic对策树 game tree闸阀 gate valve一般均衡理论 general equilibrium theory 广义最小二乘估计 generalized least squares estimation生成函数 generation function地磁力矩 geomagnetic torque几何相似 geometric similarity框架轮 gimbaled wheel全局渐进稳定性 global asymptotic stability全局最优 global optimum球形阀 globe valve目标协调法 goal coordination method文法推断 grammatical inference图搜索 graphic search重力梯度力矩 gravity gradient torque成组技术 group technology制导系统 guidance system陀螺漂移率 gyro drift rate陀螺体 gyrostat霍尔式位移传感器 hall displacement transducer半实物仿真 hardware-in-the-loop simulation和谐偏差 harmonious deviation和谐策略 harmonious strategy启发式推理 heuristic inference隐蔽振荡 hidden oscillation层次结构图 hierarchical chart递阶控制 hierarchical control递阶规划 hierarchical planning内稳态 homeostasis同态系统 homomorphic model 横向分解 horizontal decomposition内分泌控制 hormonal control液压步进马达 hydraulic step motor超循环理论 hypercycle theory积分控制器 i controller可辨识性 identifiability智能决策支持系统 idss (intelligent decision support system)图像识别 image recognition冲量 impulse 冲击函数，脉冲函数 impulse function点动 inching不相容原理 incompatibility principle增量运动控制 incremental motion control品质因数 index of merit电感式位移传感器 inductive force transducer归纳建模法 inductive modeling method工业自动化 industrial automation惯性姿态敏感器 inertial attitude sensor惯性坐标系 inertial coordinate system惯性轮 inertial wheel推理机 inference engine无穷维系统 infinite dimensional system信息采集 information acquisition红外线气体分析器 infrared gas analyzer固有非线性 inherent nonlinearity固有调节 inherent regulation初始偏差 initial deviation发起站 initiator入轨姿势 injection attitude投入产出模型 input-output model不稳定性 instability指令级语言 instruction level language绝对误差积分准则 integral of absolute value of error criterion平方误差积分准则 integral of squared error criterion积分性能准则 integral performance criterion积算仪器 integration instrument整体性 integrity智能终端 intelligent terminal互联系统，关联系统 interacted system互联预估法，关联预估法 interactive prediction approach互联 interconnection断续工作制 intermittent duty内扰 internal disturbance不变嵌入原理 invariant embedding principle库伦论 inventory theory逆奈奎斯特图 inverse nyquist diagram[nextpage} 逆变器 inverter投资决策 investment decision解释结构建模法 ism (interpretive structure modeling)同构模型 isomorphic model迭代协调 iterative coordination喷气推进 jet propulsion分批控制 job-lot control关节 joint卡尔曼-布西滤波器 kalman-bucy filer知识库管理系统 kbms (knowledge base management system)知识顺应 knowledge accomodation知识获取 knowledge acquisition知识同化 knowledge assimilation知识表达 knowledge representation梯形图 ladder diagram滞后超前补偿 lag-lead compensation拉格朗日对偶性 lagrange duality拉普拉斯变换 laplace transform大系统 large scale system侧抑制网络 lateral inhibition network最小成本投入 least cost input 最小二乘准则 least squares criterion物位开关 level switch天平动阻尼 libration damping极限环 limit cycle直线运动电气传动 linear motion electric drive直行程阀 linear motion valve线性规划 linear programming线性化方法 linearization technique称重传感器 load cell局部渐近稳定性 local asymptotic stability局部最优 local optimum对数幅相图 log magnitude-phase diagram长期记忆 long term memory线性二次调节器问题 lqr (linear quadratic regulator problem)集总参数模型 lumped parameter model李雅普诺夫渐近稳定性定理 lyapunov theorem of asymptotic stability宏观经济系统 macro-economic system磁卸载 magnetic dumping磁致弹性称重传感器 magnetoelastic weighing cell幅值裕度 magnitude margin幅值比例尺 magnitude scale factor幅频特性 magnitude-frequency characteristic机械手 manipulator人机协调 man-machine coordination手动操作器 manual station制造自动化协议 map (manufacturing automation protocol)边际效益 marginal effectiveness梅森增益公式 mason‘s gain formula主站 master station匹配准则 matching criterion最大似然估计 maximum likelihood estimation最大超调量 maximum overshoot极大值原理 maximum principle模型库 mb (model base)均方误差准则 mean-square error criterion最经济控制 mec (most economic control)机理模型 mechanism model元知识 meta-knowledge冶金自动化 metallurgical automation最小实现 minimal realization最小相位系统 minimum phase system最小方差估计 minimum variance estimation副回路 minor loop弹体-目标相对运动仿真器 missile-target relative movement simulator模态集结 modal aggregation模态变换 modal transformation模型置信度 model confidence模型逼真度 model fidelity模型参考适应控制系统 model reference adaptive control system模型验证 model verification模块化 modularization可动空间 motion space平均故障间隔时间 mtbf (mean time between failures)平均无故障时间 mttf (mean time to failures)多属性效用函数 multi-attributive utility function多重判据 multicriteria多级递阶结构 multilevel hierarchical structure多回路控制 multiloop control[nextpage} 多目标决策 multi-objective decision多态逻辑 multistate logic多段递阶控制 multistratum hierarchical control多变量控制系统 multivariable control system肌电控制 myoelectric control纳什最优性 nash optimality自然语言生成 natural language generation最近邻 nearest-neighbor必然性侧度 necessity measure负反馈 negative feedback 神经集合 neural assembly神经网络计算机 neural network computer尼科尔斯图 nichols chart思维科学 noetic science非单调关联系统 noncoherent system非合作博弈 noncooperative game非平衡态 nonequilibrium state非线性环节 nonlinear element非单调逻辑 nonmonotonic logic非参数训练 nonparametric training不可逆电气传动 nonreversible electric drive非奇异摄动 nonsingular perturbation非平稳随机过程 non-stationary random process核辐射物位计 nuclear radiation levelmeter章动敏感器 nutation sensor 奈奎斯特稳定判据 nyquist stability criterion目标函数 objective function可观测指数 observability index可观测规范型 observable canonical form在线帮助 on-line assistance通断控制 on-off control开环极点 open loop pole运筹学模型 operational research model光纤式转速表 optic fiber tachometer最优轨迹 optimal trajectory最优化技术 optimization technique轨道陀螺罗盘 orbit gyrocompass轨道摄动 orbit perturbation轨道交会 orbital rendezvous序参数 order parameter定向控制 orientation control始发站 originator振荡周期 oscillating period输出预估法 output prediction method椭圆齿轮流量计 oval wheel flowmeter总体设计 overall design 过阻尼 overdamping交叠分解 overlapping decomposition比例控制器 p control帕德近似 pade approximation帕雷托最优性 pareto optimality被动姿态稳定 passive attitude stabilization路径可重复性 path repeatability模式基元 pattern primitive主成分分析法 pca (principal component analysis)峰值时间 peak time罚函数法 penalty function method感知器 perceptron周期工作制 periodic duty计划评审技术 pert (program evaluation and review technique)摄动理论 perturbation theory悲观值 pessimistic value相位超前 phase lead 相轨迹 phase locus相轨迹 phase trajectory光电式转速传感器 photoelectrictachometric transducer短句结构文法 phrase-structure grammar物理符号系统 physical symbol system压电式力传感器 piezoelectric force transducer示教再现式机器人 playback robot可编程序逻辑控制器 plc (programmable logic controller)反接制动 plug braking旋塞阀 plug valve气动执行机构 pneumatic actuator点位控制 point-to-point control极坐标型机器人 polar robot极点配置 pole assignment零极点相消 pole-zero cancellation多项式输入 polynomial input投资搭配理论 portfolio theory位姿过调量 pose overshoot电位器式位移传感器 posentiometric displacement transducer位置测量仪 position measuring instrument正反馈 positive feedback电力系统自动化 power system automation模式识别 pr (pattern recognition)谓词逻辑 predicate logic电接点压力表 pressure gauge with electric contact压力变送器 pressure transmitter价格协调 price coordination[nextpage} 主协调 primal coordination主频区 primary frequency zone大道原理 principle of turnpike优先级 priority面向过程的仿真 process-oriented simulation生产预算 production budget产生式规则 production rule利润预测 profit forecast 程序设定操作器 program set station比例控制 proportional control比例微分控制器 proportional plus derivative controller协议工程 protocol engineering原型 prototype伪随机序列 pseudo random sequence伪速率增量控制 pseudo-rate-increment control脉冲持续时间 pulse duration脉冲调频控制系统 pulse frequency modulation control system脉冲调宽控制系统 pulse width modulation control system下推自动机 pushdown automaton脉宽调制逆变器 pwm inverter质量管理 qc (quality control)二次型性能指标 quadratic performance index定性物理模型 qualitative physical model量化噪声 quantized noise准线性特性 quasilinear characteristics排队论 queuing theory射频敏感器 radio frequency sensor斜坡函数 ramp function随机扰动 random disturbance随机过程 random process速率积分陀螺 rate integrating gyro比值操作器 ratio station可达性 reachability反作用轮控制 reaction wheel control实时遥测 real time telemetry可实现性,能实现性 realizability感受野 receptive field直角坐标型机器人 rectangular robot整流器 rectifier递推估计 recursive estimation降阶观测器 reduced order observer冗余信息 redundant information再入控制 reentry control回馈制动，再生制动 regenerative braking区域规划模型 regional planning model调节装载 regulating device调节 regulation关系代数 relational algebra继电器特性 relay characteristic遥控操作器 remote manipulator遥调 remote regulating远程设定点调整器 remote set point adjuster交会和对接 rendezvous and docking再现性 reproducibility热电阻 resistance thermometer sensor归结原理 resolution principle资源分配 resource allocation响应曲线 response curve回差矩阵 return difference matrix回比矩阵 return ratio matrix回响 reverberation可逆电气传动 reversible electric drive关节型机器人 revolute robot转速传感器 revolution speed transducer重写规则 rewriting rule刚性航天动力学 rigid spacecraft dynamics风险分析 risk decision机器人编程语言 robot programming language[nextpage} 机器人学 robotics鲁棒控制 robust control鲁棒性 robustness辊缝测量仪 roll gap measuring instrument根轨迹 root locus腰轮流量计 roots flowmeter浮子流量计，转子流量计 rotameter偏心旋转阀 rotary eccentric plug valve角行程阀 rotary motion valve旋转变压器 rotating transformer劳思近似判据 routh approximation method路径问题 routing problem采样控制系统 sampled-data control system采样控制系统 sampling control system饱和特性 saturation characteristics标量李雅普诺夫函数 scalar lyapunov function平面关节型机器人 scara (selective compliance assembly robot arm)情景分析法 scenario analysis method物景分析 scene analysis自力式控制器 self-operated controller自组织系统 self-organizing system自繁殖系统 self-reproducing system自校正控制 self-tuning control语义网络 semantic network半实物仿真 semi-physical simulation敏感元件 sensing element灵敏度分析 sensitivity analysis感觉控制 sensory control顺序分解 sequential decomposition序贯最小二乘估计 sequential least squares estimation伺服控制，随动控制 servo control伺服马达 servomotor过渡时间 settling time六分仪 sextant短期计划 short term planning短时程协调 short time horizon coordination信号检测和估计 signal detection and estimation信号重构 signal reconstruction相似性 similarity仿真中断 simulated interrupt仿真框图 simulation block diagram仿真实验 simulation experiment仿真速度 simulation velocity仿真器 simulator单轴转台 single axle table单自由度陀螺 single degree of freedom gyro单级过程 single level process单值非线性 single value nonlinearity奇异吸引子 singular attractor奇异摄动 singular perturbation汇点 sink受役系统 slaved system慢变子系统 slow subsystem欠实时仿真 slower-than-real-time simulation[nextpage} 社会控制论 socio-cybernetics社会经济系统 socioeconomic system软件心理学 software psychology太阳帆板指向控制 solar array pointing control电磁阀 solenoid valve源点 source比冲 specific impulse调速系统 speed control system自旋轴 spin axis自旋体 spinner稳定性判据 stability criterion稳定极限 stability limit镇定，稳定 stabilization施塔克尔贝格决策理论 stackelberg decision theory 状态方程模型 state equation model状态空间描述 state space description静态特性曲线 static characteristics curve定点精度 station accuracy平稳随机过程 stationary random process统计模式识别 statistic pattern recognition统计分析 statistical analysis稳态偏差 steady state deviation稳态误差系数 steady state error coefficient阶跃函数 step function步进控制 step-by-step control逐步精化 stepwise refinement随机有限自动机 stochastic finite automaton应变式称重传感器 strain gauge load cell策略函数 strategic function强耦合系统 strongly coupled system主观频率 subjective probability次优性 suboptimality监督学习 supervised training计算机监控系统 supervisory computer control system自持振荡 sustained oscillation旋进流量计 swirlmeter切换点 switching point符号处理 symbolic processing突触可塑性 synaptic plasticity协同学 synergetics句法分析 syntactic analysis系统评价 systemassessment[nextpage} 系统工程 system engineering系统同态 system homomorphism系统同构 system isomorphism系统学 systematologys-domain s 域转速表 tachometer靶式流量变送器 target flow transmitter作业周期 task cycle示教编程 teaching programming远动学 telemechanics频分遥测系统 telemetering system of frequency division type遥测 telemetry目的系统 teleological system目的论 teleology温度传感器 temperature transducer 模版库 template base张力计 tensiometer纹理 texture定理证明 theorem proving治疗模型 therapy model热电偶 thermocouple温度计 thermometer厚度计 thickness meter三位控制器 three state controller三轴姿态稳定 three-axis attitude stabilization推力矢量控制系统 thrust vector control system推力器 thruster时间常数 time constant时序控制器 time schedule controller定常系统，非时变系统 time-invariant system分时控制 time-sharing control时变参数 time-varying parameter自上而下测试 top-down testing拓扑结构 topological structure全面质量管理 tqc (total quality control)跟踪误差 tracking error权衡分析 trade-off analysis传递函数矩阵 transfer function matrix转换文法 transformation grammar瞬态偏差 transient deviation 过渡过程 transient process转移图 transition diagram电远传压力表 transmissible pressure gauge变送器 transmitter趋势分析 trend analysis三重调制遥测系统 triple modulation telemetering system涡轮流量计 turbine flowmeter[nextpage} 图灵机 turing machine双时标系统 two-time scale system超声物位计 ultrasonic levelmeter非调速电气传动 unadjustable speed electric drive无偏估计 unbiased estimation欠阻尼 underdamping一致渐近稳定性 uniformly asymptotic stability不间断工作制，长期工作制 uninterrupted duty单位圆 unit circle单元测试 unit testing非监督学习 unsupervised learing上级问题 upper level problem城市规划 urban planning效用函数 utility function价值工程 value engineering可变增益，可变放大系数 variable gain变结构控制 variable structure control system向量李雅普诺夫函数 vector lyapunovfunction速度误差系数 velocity error coefficient速度传感器 velocity transducer纵向分解 vertical decomposition振弦式力传感器 vibrating wire force transducer振动计 vibrometer粘性阻尼 viscous damping电压源型逆变器 voltage source inverter旋进流量计 vortex precession flowmeter涡街流量计 vortex shedding flowmeter方法库 wb (way base)称重传感器 weighing cell权因子 weighting factor加权法 weighting method惠特克-香农采样定理 whittaker-shannon sampling theorem维纳滤波 wiener filtering计算机辅助设计工作站 work station for computer aided designw-plane w平面零和对策模型 zero sum game model零基预算 zero-based budget零输入响应 zero-input response零状态响应 zero-state responsez-transform z变换自动化专业英语词汇A－Zacceleration transducer 加速度传感器acceptance testing 验收测试accessibility 可及性accumulated error 累积误差AC-DC-AC frequency converter 交-直-交变频器AC (alternating current) electric drive 交流电子传动active attitude stabilization 主动姿态稳定actuator 驱动器，执行机构adaline 线性适应元adaptation layer 适应层adaptive telemeter system 适应遥测系统adjoint operator 伴随算子admissible error 容许误差aggregation matrix 集结矩阵AHP (analytic hierarchy process) 层次分析法amplifying element 放大环节analog-digital conversion 模数转换annunciator 信号器antenna pointing control 天线指向控制anti-integral windup 抗积分饱卷aperiodic decomposition 非周期分解a posteriori estimate 后验估计approximate reasoning 近似推理a priori estimate 先验估计articulated robot 关节型机器人assignment problem 配置问题，分配问题associative memory model 联想记忆模型associatron 联想机asymptotic stability 渐进稳定性attained pose drift 实际位姿漂移attitude acquisition 姿态捕获AOCS (attritude and orbit control system) 姿态轨道控制系统attitude angular velocity 姿态角速度attitude disturbance 姿态扰动attitude maneuver 姿态机动attractor 吸引子augment ability 可扩充性augmented system 增广系统automatic manual station 自动-手动操作器automaton 自动机backlash characteristics 间隙特性base coordinate system 基座坐标系Bayes classifier 贝叶斯分类器bearing alignment 方位对准bellows pressure gauge 波纹管压力表benefit-cost analysis 收益成本分析bilinear system 双线性系统biocybernetics 生物控制论biological feedback system 生物反馈系统black box testing approach 黑箱测试法blind search 盲目搜索block diagonalization 块对角化Boltzman machine 玻耳兹曼机bottom-up development 自下而上开发boundary value analysis 边界值分析brainstorming method 头脑风暴法breadth-first search 广度优先搜索butterfly valve 蝶阀CAE (computer aided engineering) 计算机辅助工程CAM (computer aided manufacturing) 计算机辅助制造Camflex valve 偏心旋转阀canonical state variable 规范化状态变量capacitive displacement transducer 电容式位移传感器capsule pressure gauge 膜盒压力表CARD 计算机辅助研究开发Cartesian robot 直角坐标型机器人cascade compensation 串联补偿catastrophe theory 突变论centrality 集中性chained aggregation 链式集结chaos 混沌characteristic locus 特征轨迹chemical propulsion 化学推进calrity 清晰性classical information pattern 经典信息模式classifier 分类器clinical control system 临床控制系统closed loop pole 闭环极点closed loop transfer function 闭环传递函数cluster analysis 聚类分析coarse-fine control 粗-精控制cobweb model 蛛网模型coefficient matrix 系数矩阵cognitive science 认知科学cognitron 认知机coherent system 单调关联系统combination decision 组合决策combinatorial explosion 组合爆炸combined pressure and vacuum gauge 压力真空表command pose 指令位姿companion matrix 相伴矩阵compartmental model 房室模型compatibility 相容性，兼容性compensating network 补偿网络compensation 补偿，矫正compliance 柔顺，顺应composite control 组合控制computable general equilibrium model 可计算一般均衡模型conditionally instability 条件不稳定性configuration 组态connectionism 连接机制connectivity 连接性conservative system 守恒系统consistency 一致性constraint condition 约束条件consumption function 消费函数context-free grammar 上下文无关语法continuous discrete event hybrid system simulation 连续离散事件混合系统仿真continuous duty 连续工作制control accuracy 控制精度control cabinet 控制柜controllability index 可控指数controllable canonical form 可控规范型[control] plant 控制对象,被控对象controlling instrument 控制仪表control moment gyro 控制力矩陀螺control panel 控制屏，控制盘control synchro 控制[式]自整角机control system synthesis 控制系统综合control time horizon 控制时程cooperative game 合作对策coordinability condition 可协调条件coordination strategy 协调策略coordinator 协调器corner frequency 转折频率costate variable 共态变量cost-effectiveness analysis 费用效益分析coupling of orbit and attitude 轨道和姿态耦合critical damping 临界阻尼critical stability 临界稳定性cross-over frequency 穿越频率，交越频率current source inverter 电流[源]型逆变器cut-off frequency 截止频率cybernetics 控制论cyclic remote control 循环遥控cylindrical robot 圆柱坐标型机器人damped oscillation 阻尼振荡damper 阻尼器damping ratio 阻尼比data acquisition 数据采集data encryption 数据加密data preprocessing 数据预处理data processor 数据处理器DC generator-motor set drive 直流发电机-电动机组传动D controller 微分控制器decentrality 分散性decentralized stochastic control 分散随机控制decision space 决策空间decision support system 决策支持系统decomposition-aggregation approach 分解集结法decoupling parameter 解耦参数deductive-inductive hybrid modeling method 演绎与归纳混合建模法delayed telemetry 延时遥测derivation tree 导出树derivative feedback 微分反馈describing function 描述函数desired value 希望值despinner 消旋体destination 目的站detector 检出器deterministic automaton 确定性自动机deviation 偏差舱deviation alarm 偏差报警器DFD 数据流图diagnostic model 诊断模型diagonally dominant matrix 对角主导矩阵diaphragm pressure gauge 膜片压力表difference equation model 差分方程模型differential dynamical system 微分动力学系统differential game 微分对策differential pressure level meter 差压液位计differential pressure transmitter 差压变送器differential transformer displacement transducer 差动变压器式位移传感器differentiation element 微分环节digital filer 数字滤波器digital signal processing 数字信号处理digitization 数字化digitizer 数字化仪dimension transducer 尺度传感器direct coordination 直接协调disaggregation 解裂discoordination 失协调discrete event dynamic system 离散事件动态系统discrete system simulation language 离散系统仿真语言discriminant function 判别函数displacement vibration amplitude transducer 位移振幅传感器dissipative structure 耗散结构distributed parameter control system 分布参数控制系统distrubance 扰动disturbance compensation 扰动补偿diversity 多样性divisibility 可分性domain knowledge 领域知识dominant pole 主导极点dose-response model 剂量反应模型dual modulation telemetering system 双重调制遥测系统dual principle 对偶原理dual spin stabilization 双自旋稳定duty ratio 负载比dynamic braking 能耗制动dynamic characteristics 动态特性dynamic deviation 动态偏差dynamic error coefficient 动态误差系数dynamic exactness 动它吻合性dynamic input-output model 动态投入产出模型econometric model 计量经济模型economic cybernetics 经济控制论economic effectiveness 经济效益economic evaluation 经济评价economic index 经济指数economic indicator 经济指标eddy current thickness meter 电涡流厚度计effectiveness 有效性effectiveness theory 效益理论elasticity of demand 需求弹性electric actuator 电动执行机构electric conductance levelmeter 电导液位计electric drive control gear 电动传动控制设备electric hydraulic converter 电-液转换器electric pneumatic converter 电-气转换器electrohydraulic servo vale 电液伺服阀electromagnetic flow transducer 电磁流量传感器electronic batching scale 电子配料秤electronic belt conveyor scale 电子皮带秤electronic hopper scale 电子料斗秤elevation 仰角emergency stop 异常停止empirical distribution 经验分布endogenous variable 内生变量equilibrium growth 均衡增长equilibrium point 平衡点equivalence partitioning 等价类划分ergonomics 工效学error 误差error-correction parsing 纠错剖析estimate 估计量estimation theory 估计理论evaluation technique 评价技术event chain 事件链evolutionary system 进化系统exogenous variable 外生变量expected characteristics 希望特性external disturbance 外扰fact base 事实failure diagnosis 故障诊断fast mode 快变模态feasibility study 可行性研究feasible coordination 可行协调feasible region 可行域feature detection 特征检测feature extraction 特征抽取feedback compensation 反馈补偿feedforward path 前馈通路field bus 现场总线finite automaton 有限自动机FIP (factory information protocol) 工厂信息协议first order predicate logic 一阶谓词逻辑fixed sequence manipulator 固定顺序机械手fixed set point control 定值控制FMS (flexible manufacturing system) 柔性制造系统flow sensor/transducer 流量传感器flow transmitter 流量变送器fluctuation 涨落forced oscillation 强迫振荡formal language theory 形式语言理论formal neuron 形式神经元forward path 正向通路forward reasoning 正向推理fractal 分形体，分维体frequency converter 变频器frequency domain model reduction method 频域模型降阶法frequency response 频域响应full order observer 全阶观测器functional decomposition 功能分解FES (functional electrical stimulation) 功能电刺激functional simularity 功能相似fuzzy logic 模糊逻辑game tree 对策树gate valve 闸阀general equilibrium theory 一般均衡理论generalized least squares estimation 广义最小二乘估计generation function 生成函数geomagnetic torque 地磁力矩。

Cluster analysis

8 Cluster Analysis:Basic Concepts andAlgorithmsCluster analysis divides data into groups(clusters)that are meaningful,useful, or both.If meaningful groups are the goal,then the clusters should capture the natural structure of the data.In some cases,however,cluster analysis is only a useful starting point for other purposes,such as data summarization.Whether for understanding or utility,cluster analysis has long played an important role in a wide variety ofﬁelds:psychology and other social sciences,biology, statistics,pattern recognition,information retrieval,machine learning,and data mining.There have been many applications of cluster analysis to practical prob-lems.We provide some speciﬁc examples,organized by whether the purpose of the clustering is understanding or utility.Clustering for Understanding Classes,or conceptually meaningful groups of objects that share common characteristics,play an important role in how people analyze and describe the world.Indeed,human beings are skilled at dividing objects into groups(clustering)and assigning particular objects to these groups(classiﬁcation).For example,even relatively young children can quickly label the objects in a photograph as buildings,vehicles,people,ani-mals,plants,etc.In the context of understanding data,clusters are potential classes and cluster analysis is the study of techniques for automaticallyﬁnding classes.The following are some examples:488Chapter8Cluster Analysis:Basic Concepts and Algorithms •Biology.Biologists have spent many years creating a taxonomy(hi-erarchical classiﬁcation)of all living things:kingdom,phylum,class, order,family,genus,and species.Thus,it is perhaps not surprising that much of the early work in cluster analysis sought to create a discipline of mathematical taxonomy that could automaticallyﬁnd such classiﬁ-cation structures.More recently,biologists have applied clustering to analyze the large amounts of genetic information that are now available.For example,clustering has been used toﬁnd groups of genes that have similar functions.•Information Retrieval.The World Wide Web consists of billions of Web pages,and the results of a query to a search engine can return thousands of pages.Clustering can be used to group these search re-sults into a small number of clusters,each of which captures a particular aspect of the query.For instance,a query of“movie”might return Web pages grouped into categories such as reviews,trailers,stars,and theaters.Each category(cluster)can be broken into subcategories(sub-clusters),producing a hierarchical structure that further assists a user’s exploration of the query results.•Climate.Understanding the Earth’s climate requiresﬁnding patterns in the atmosphere and ocean.To that end,cluster analysis has been applied toﬁnd patterns in the atmospheric pressure of polar regions and areas of the ocean that have a signiﬁcant impact on land climate.•Psychology and Medicine.An illness or condition frequently has a number of variations,and cluster analysis can be used to identify these diﬀerent subcategories.For example,clustering has been used to identify diﬀerent types of depression.Cluster analysis can also be used to detect patterns in the spatial or temporal distribution of a disease.•Business.Businesses collect large amounts of information on current and potential customers.Clustering can be used to segment customers into a small number of groups for additional analysis and marketing activities.Clustering for Utility Cluster analysis provides an abstraction from in-dividual data objects to the clusters in which those data objects reside.Ad-ditionally,some clustering techniques characterize each cluster in terms of a cluster prototype;i.e.,a data object that is representative of the other ob-jects in the cluster.These cluster prototypes can be used as the basis for a489 number of data analysis or data processing techniques.Therefore,in the con-text of utility,cluster analysis is the study of techniques forﬁnding the most representative cluster prototypes.•Summarization.Many data analysis techniques,such as regression or PCA,have a time or space complexity of O(m2)or higher(where m is the number of objects),and thus,are not practical for large data sets.However,instead of applying the algorithm to the entire data set,it can be applied to a reduced data set consisting only of cluster prototypes.Depending on the type of analysis,the number of prototypes,and the accuracy with which the prototypes represent the data,the results can be comparable to those that would have been obtained if all the data could have been used.•Compression.Cluster prototypes can also be used for data compres-sion.In particular,a table is created that consists of the prototypes for each cluster;i.e.,each prototype is assigned an integer value that is its position(index)in the table.Each object is represented by the index of the prototype associated with its cluster.This type of compression is known as vector quantization and is often applied to image,sound, and video data,where(1)many of the data objects are highly similar to one another,(2)some loss of information is acceptable,and(3)a substantial reduction in the data size is desired.•Eﬃciently Finding Nearest Neighbors.Finding nearest neighbors can require computing the pairwise distance between all points.Often clusters and their cluster prototypes can be found much more eﬃciently.If objects are relatively close to the prototype of their cluster,then we can use the prototypes to reduce the number of distance computations that are necessary toﬁnd the nearest neighbors of an object.Intuitively,if two cluster prototypes are far apart,then the objects in the corresponding clusters cannot be nearest neighbors of each other.Consequently,to ﬁnd an object’s nearest neighbors it is only necessary to compute the distance to objects in nearby clusters,where the nearness of two clusters is measured by the distance between their prototypes.This idea is made more precise in Exercise25on page94.This chapter provides an introduction to cluster analysis.We begin with a high-level overview of clustering,including a discussion of the various ap-proaches to dividing objects into sets of clusters and the diﬀerent types of clusters.We then describe three speciﬁc clustering techniques that represent490Chapter8Cluster Analysis:Basic Concepts and Algorithms broad categories of algorithms and illustrate a variety of concepts:K-means, agglomerative hierarchical clustering,and DBSCAN.Theﬁnal section of this chapter is devoted to cluster validity—methods for evaluating the goodness of the clusters produced by a clustering algorithm.More advanced clustering concepts and algorithms will be discussed in Chapter9.Whenever possible, we discuss the strengths and weaknesses of diﬀerent schemes.In addition, the bibliographic notes provide references to relevant books and papers that explore cluster analysis in greater depth.8.1OverviewBefore discussing speciﬁc clustering techniques,we provide some necessary background.First,we further deﬁne cluster analysis,illustrating why it is diﬃcult and explaining its relationship to other techniques that group data. Then we explore two important topics:(1)diﬀerent ways to group a set of objects into a set of clusters,and(2)types of clusters.8.1.1What Is Cluster Analysis?Cluster analysis groups data objects based only on information found in the data that describes the objects and their relationships.The goal is that the objects within a group be similar(or related)to one another and diﬀerent from (or unrelated to)the objects in other groups.The greater the similarity(or homogeneity)within a group and the greater the diﬀerence between groups, the better or more distinct the clustering.In many applications,the notion of a cluster is not well deﬁned.To better understand the diﬃculty of deciding what constitutes a cluster,consider Figure 8.1,which shows twenty points and three diﬀerent ways of dividing them into clusters.The shapes of the markers indicate cluster membership.Figures 8.1(b)and8.1(d)divide the data into two and six parts,respectively.However, the apparent division of each of the two larger clusters into three subclusters may simply be an artifact of the human visual system.Also,it may not be unreasonable to say that the points form four clusters,as shown in Figure 8.1(c).Thisﬁgure illustrates that the deﬁnition of a cluster is imprecise and that the best deﬁnition depends on the nature of data and the desired results.Cluster analysis is related to other techniques that are used to divide data objects into groups.For instance,clustering can be regarded as a form of classiﬁcation in that it creates a labeling of objects with class(cluster)labels. However,it derives these labels only from the data.In contrast,classiﬁcation8.1Overview491(a)Original points.(b)Two clusters.(c)Four clusters.(d)Six clusters.Figure8.1.Different ways of clustering the same set of points.in the sense of Chapter4is supervised classiﬁcation;i.e.,new,unlabeled objects are assigned a class label using a model developed from objects with known class labels.For this reason,cluster analysis is sometimes referred to as unsupervised classiﬁcation.When the term classiﬁcation is used without any qualiﬁcation within data mining,it typically refers to supervised classiﬁcation.Also,while the terms segmentation and partitioning are sometimes used as synonyms for clustering,these terms are frequently used for approaches outside the traditional bounds of cluster analysis.For example,the term partitioning is often used in connection with techniques that divide graphs into subgraphs and that are not strongly connected to clustering.Segmentation often refers to the division of data into groups using simple techniques;e.g., an image can be split into segments based only on pixel intensity and color,or people can be divided into groups based on their income.Nonetheless,some work in graph partitioning and in image and market segmentation is related to cluster analysis.8.1.2Diﬀerent Types of ClusteringsAn entire collection of clusters is commonly referred to as a clustering,and in this section,we distinguish various types of clusterings:hierarchical(nested) versus partitional(unnested),exclusive versus overlapping versus fuzzy,and complete versus partial.Hierarchical versus Partitional The most commonly discussed distinc-tion among diﬀerent types of clusterings is whether the set of clusters is nested492Chapter8Cluster Analysis:Basic Concepts and Algorithmsor unnested,or in more traditional terminology,hierarchical or partitional.A partitional clustering is simply a division of the set of data objects into non-overlapping subsets(clusters)such that each data object is in exactly one subset.Taken individually,each collection of clusters in Figures8.1(b–d)is a partitional clustering.If we permit clusters to have subclusters,then we obtain a hierarchical clustering,which is a set of nested clusters that are organized as a tree.Each node(cluster)in the tree(except for the leaf nodes)is the union of its children (subclusters),and the root of the tree is the cluster containing all the objects. Often,but not always,the leaves of the tree are singleton clusters of individual data objects.If we allow clusters to be nested,then one interpretation of Figure8.1(a)is that it has two subclusters(Figure8.1(b)),each of which,in turn,has three subclusters(Figure8.1(d)).The clusters shown in Figures8.1 (a–d),when taken in that order,also form a hierarchical(nested)clustering with,respectively,1,2,4,and6clusters on each level.Finally,note that a hierarchical clustering can be viewed as a sequence of partitional clusterings and a partitional clustering can be obtained by taking any member of that sequence;i.e.,by cutting the hierarchical tree at a particular level. Exclusive versus Overlapping versus Fuzzy The clusterings shown in Figure8.1are all exclusive,as they assign each object to a single cluster. There are many situations in which a point could reasonably be placed in more than one cluster,and these situations are better addressed by non-exclusive clustering.In the most general sense,an overlapping or non-exclusive clustering is used to reﬂect the fact that an object can simultaneously belong to more than one group(class).For instance,a person at a university can be both an enrolled student and an employee of the university.A non-exclusive clustering is also often used when,for example,an object is“between”two or more clusters and could reasonably be assigned to any of these clusters. Imagine a point halfway between two of the clusters of Figure8.1.Rather than make a somewhat arbitrary assignment of the object to a single cluster, it is placed in all of the“equally good”clusters.In a fuzzy clustering,every object belongs to every cluster with a mem-bership weight that is between0(absolutely doesn’t belong)and1(absolutely belongs).In other words,clusters are treated as fuzzy sets.(Mathematically, a fuzzy set is one in which an object belongs to any set with a weight that is between0and1.In fuzzy clustering,we often impose the additional con-straint that the sum of the weights for each object must equal1.)Similarly, probabilistic clustering techniques compute the probability with which each8.1Overview493 point belongs to each cluster,and these probabilities must also sum to1.Be-cause the membership weights or probabilities for any object sum to1,a fuzzy or probabilistic clustering does not address true multiclass situations,such as the case of a student employee,where an object belongs to multiple classes. Instead,these approaches are most appropriate for avoiding the arbitrariness of assigning an object to only one cluster when it may be close to several.In practice,a fuzzy or probabilistic clustering is often converted to an exclusive clustering by assigning each object to the cluster in which its membership weight or probability is highest.Complete versus Partial A complete clustering assigns every object to a cluster,whereas a partial clustering does not.The motivation for a partial clustering is that some objects in a data set may not belong to well-deﬁned groups.Many times objects in the data set may represent noise,outliers,or “uninteresting background.”For example,some newspaper stories may share a common theme,such as global warming,while other stories are more generic or one-of-a-kind.Thus,toﬁnd the important topics in last month’s stories,we may want to search only for clusters of documents that are tightly related by a common theme.In other cases,a complete clustering of the objects is desired. For example,an application that uses clustering to organize documents for browsing needs to guarantee that all documents can be browsed.8.1.3Diﬀerent Types of ClustersClustering aims toﬁnd useful groups of objects(clusters),where usefulness is deﬁned by the goals of the data analysis.Not surprisingly,there are several diﬀerent notions of a cluster that prove useful in practice.In order to visually illustrate the diﬀerences among these types of clusters,we use two-dimensional points,as shown in Figure8.2,as our data objects.We stress,however,that the types of clusters described here are equally valid for other kinds of data. Well-Separated A cluster is a set of objects in which each object is closer (or more similar)to every other object in the cluster than to any object not in the cluster.Sometimes a threshold is used to specify that all the objects in a cluster must be suﬃciently close(or similar)to one another.This idealistic deﬁnition of a cluster is satisﬁed only when the data contains natural clusters that are quite far from each other.Figure8.2(a)gives an example of well-separated clusters that consists of two groups of points in a two-dimensional space.The distance between any two points in diﬀerent groups is larger than494Chapter8Cluster Analysis:Basic Concepts and Algorithmsthe distance between any two points within a group.Well-separated clusters do not need to be globular,but can have any shape.Prototype-Based A cluster is a set of objects in which each object is closer (more similar)to the prototype that deﬁnes the cluster than to the prototype of any other cluster.For data with continuous attributes,the prototype of a cluster is often a centroid,i.e.,the average(mean)of all the points in the clus-ter.When a centroid is not meaningful,such as when the data has categorical attributes,the prototype is often a medoid,i.e.,the most representative point of a cluster.For many types of data,the prototype can be regarded as the most central point,and in such instances,we commonly refer to prototype-based clusters as center-based clusters.Not surprisingly,such clusters tend to be globular.Figure8.2(b)shows an example of center-based clusters. Graph-Based If the data is represented as a graph,where the nodes are objects and the links represent connections among objects(see Section2.1.2), then a cluster can be deﬁned as a connected component;i.e.,a group of objects that are connected to one another,but that have no connection to objects outside the group.An important example of graph-based clusters are contiguity-based clusters,where two objects are connected only if they are within a speciﬁed distance of each other.This implies that each object in a contiguity-based cluster is closer to some other object in the cluster than to any point in a diﬀerent cluster.Figure8.2(c)shows an example of such clusters for two-dimensional points.This deﬁnition of a cluster is useful when clusters are irregular or intertwined,but can have trouble when noise is present since, as illustrated by the two spherical clusters of Figure8.2(c),a small bridge of points can merge two distinct clusters.Other types of graph-based clusters are also possible.One such approach (Section8.3.2)deﬁnes a cluster as a clique;i.e.,a set of nodes in a graph that are completely connected to each other.Speciﬁcally,if we add connections between objects in the order of their distance from one another,a cluster is formed when a set of objects forms a clique.Like prototype-based clusters, such clusters tend to be globular.Density-Based A cluster is a dense region of objects that is surrounded by a region of low density.Figure8.2(d)shows some density-based clusters for data created by adding noise to the data of Figure8.2(c).The two circular clusters are not merged,as in Figure8.2(c),because the bridge between them fades into the noise.Likewise,the curve that is present in Figure8.2(c)also8.1Overview495 fades into the noise and does not form a cluster in Figure8.2(d).A density-based deﬁnition of a cluster is often employed when the clusters are irregular or intertwined,and when noise and outliers are present.By contrast,a contiguity-based deﬁnition of a cluster would not work well for the data of Figure8.2(d) since the noise would tend to form bridges between clusters.Shared-Property(Conceptual Clusters)More generally,we can deﬁne a cluster as a set of objects that share some property.This deﬁnition encom-passes all the previous deﬁnitions of a cluster;e.g.,objects in a center-based cluster share the property that they are all closest to the same centroid or medoid.However,the shared-property approach also includes new types of clusters.Consider the clusters shown in Figure8.2(e).A triangular area (cluster)is adjacent to a rectangular one,and there are two intertwined circles (clusters).In both cases,a clustering algorithm would need a very speciﬁc concept of a cluster to successfully detect these clusters.The process ofﬁnd-ing such clusters is called conceptual clustering.However,too sophisticated a notion of a cluster would take us into the area of pattern recognition,and thus,we only consider simpler types of clusters in this book.Road MapIn this chapter,we use the following three simple,but important techniques to introduce many of the concepts involved in cluster analysis.•K-means.This is a prototype-based,partitional clustering technique that attempts toﬁnd a user-speciﬁed number of clusters(K),which are represented by their centroids.•Agglomerative Hierarchical Clustering.This clustering approach refers to a collection of closely related clustering techniques that producea hierarchical clustering by starting with each point as a singleton clusterand then repeatedly merging the two closest clusters until a single,all-encompassing cluster remains.Some of these techniques have a natural interpretation in terms of graph-based clustering,while others have an interpretation in terms of a prototype-based approach.•DBSCAN.This is a density-based clustering algorithm that producesa partitional clustering,in which the number of clusters is automaticallydetermined by the algorithm.Points in low-density regions are classi-ﬁed as noise and omitted;thus,DBSCAN does not produce a complete clustering.Chapter 8Cluster Analysis:Basic Concepts and Algorithms (a)Well-separated clusters.Eachpoint is closer to all of the points in itscluster than to any point in anothercluster.(b)Center-based clusters.Each point is closer to the center of its cluster than to the center of any other cluster.(c)Contiguity-based clusters.Eachpoint is closer to at least one pointin its cluster than to any point inanother cluster.(d)Density-based clusters.Clus-ters are regions of high density sep-arated by regions of low density.(e)Conceptual clusters.Points in a cluster share some generalproperty that derives from the entire set of points.(Points in theintersection of the circles belong to both.)Figure 8.2.Different types of clusters as illustrated by sets of two-dimensional points.8.2K-meansPrototype-based clustering techniques create a one-level partitioning of the data objects.There are a number of such techniques,but two of the most prominent are K-means and K-medoid.K-means deﬁnes a prototype in terms of a centroid,which is usually the mean of a group of points,and is typically8.2K-means497 applied to objects in a continuous n-dimensional space.K-medoid deﬁnes a prototype in terms of a medoid,which is the most representative point for a group of points,and can be applied to a wide range of data since it requires only a proximity measure for a pair of objects.While a centroid almost never corresponds to an actual data point,a medoid,by its deﬁnition,must be an actual data point.In this section,we will focus solely on K-means,which is one of the oldest and most widely used clustering algorithms.8.2.1The Basic K-means AlgorithmThe K-means clustering technique is simple,and we begin with a description of the basic algorithm.Weﬁrst choose K initial centroids,where K is a user-speciﬁed parameter,namely,the number of clusters desired.Each point is then assigned to the closest centroid,and each collection of points assigned to a centroid is a cluster.The centroid of each cluster is then updated based on the points assigned to the cluster.We repeat the assignment and update steps until no point changes clusters,or equivalently,until the centroids remain the same.K-means is formally described by Algorithm8.1.The operation of K-means is illustrated in Figure8.3,which shows how,starting from three centroids,the ﬁnal clusters are found in four assignment-update steps.In these and other ﬁgures displaying K-means clustering,each subﬁgure shows(1)the centroids at the start of the iteration and(2)the assignment of the points to those centroids.The centroids are indicated by the“+”symbol;all points belonging to the same cluster have the same marker shape.1:Select K points as initial centroids.2:repeat3:Form K clusters by assigning each point to its closest centroid.4:Recompute the centroid of each cluster.5:until Centroids do not change.In theﬁrst step,shown in Figure8.3(a),points are assigned to the initial centroids,which are all in the larger group of points.For this example,we use the mean as the centroid.After points are assigned to a centroid,the centroid is then updated.Again,theﬁgure for each step shows the centroid at the beginning of the step and the assignment of points to those centroids.In the second step,points are assigned to the updated centroids,and the centroids498Chapter8Cluster Analysis:Basic Concepts and Algorithms(a)Iteration1.(b)Iteration2.(c)Iteration3.(d)Iteration4.ing the K-means algorithm toﬁnd three clusters in sample data.are updated again.In steps2,3,and4,which are shown in Figures8.3(b), (c),and(d),respectively,two of the centroids move to the two small groups of points at the bottom of theﬁgures.When the K-means algorithm terminates in Figure8.3(d),because no more changes occur,the centroids have identiﬁed the natural groupings of points.For some combinations of proximity functions and types of centroids,K-means always converges to a solution;i.e.,K-means reaches a state in which no points are shifting from one cluster to another,and hence,the centroids don’t change.Because most of the convergence occurs in the early steps,however, the condition on line5of Algorithm8.1is often replaced by a weaker condition, e.g.,repeat until only1%of the points change clusters.We consider each of the steps in the basic K-means algorithm in more detail and then provide an analysis of the algorithm’s space and time complexity. Assigning Points to the Closest CentroidTo assign a point to the closest centroid,we need a proximity measure that quantiﬁes the notion of“closest”for the speciﬁc data under consideration. Euclidean(L2)distance is often used for data points in Euclidean space,while cosine similarity is more appropriate for documents.However,there may be several types of proximity measures that are appropriate for a given type of data.For example,Manhattan(L1)distance can be used for Euclidean data, while the Jaccard measure is often employed for documents.Usually,the similarity measures used for K-means are relatively simple since the algorithm repeatedly calculates the similarity of each point to each centroid.In some cases,however,such as when the data is in low-dimensional8.2K-means499Table8.1.Table of notation.Symbol Descriptionx An object.C i The i th cluster.c i The centroid of cluster C i.c The centroid of all points.m i The number of objects in the i th cluster.m The number of objects in the data set.K The number of clusters.Euclidean space,it is possible to avoid computing many of the similarities, thus signiﬁcantly speeding up the K-means algorithm.Bisecting K-means (described in Section8.2.3)is another approach that speeds up K-means by reducing the number of similarities computed.Centroids and Objective FunctionsStep4of the K-means algorithm was stated rather generally as“recompute the centroid of each cluster,”since the centroid can vary,depending on the proximity measure for the data and the goal of the clustering.The goal of the clustering is typically expressed by an objective function that depends on the proximities of the points to one another or to the cluster centroids;e.g., minimize the squared distance of each point to its closest centroid.We illus-trate this with two examples.However,the key point is this:once we have speciﬁed a proximity measure and an objective function,the centroid that we should choose can often be determined mathematically.We provide mathe-matical details in Section8.2.6,and provide a non-mathematical discussion of this observation here.Data in Euclidean Space Consider data whose proximity measure is Eu-clidean distance.For our objective function,which measures the quality of a clustering,we use the sum of the squared error(SSE),which is also known as scatter.In other words,we calculate the error of each data point,i.e.,its Euclidean distance to the closest centroid,and then compute the total sum of the squared errors.Given two diﬀerent sets of clusters that are produced by two diﬀerent runs of K-means,we prefer the one with the smallest squared error since this means that the prototypes(centroids)of this clustering are a better representation of the points in their ing the notation in Table8.1,the SSE is formally deﬁned as follows:。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

Cluster Stability Analysis using Sub-samplingReda Alhajj,Osman AbulDepartment of Computer Science,University of Calgary,Calgary,Alberta,Canada.{abul,alhajj}@cpsc.ucalgary.caFaruk PolatDepartment of Computer Engineering,Middle East Technical University,Ankara,Turkeypolat@.trAbstractCluster stability research is involved with the validity of clusters generated by a clustering al-gorithm.In other words,it answers whether generated clusters are true clusters or due to chance. Estimating true numbers of clusters is related to this problem,since often the cluster validity is based on this estimate.In the literature,there are a number of methods available for both purposes.In most of the cases,assessing validity turns out to be determining the best parameter of clustering algorithm.The conﬁdence estimation is addressed in relatively less number of research papers.In those,conﬁdence is given in terms of the proportion of cases clustering together.Our motivation is making conﬁdence estimation about the clusters itself,i.e.not speciﬁcally addressing speciﬁc cases. Here we propose three meta-methods from this perspective for cluster stability problem.To the best of the our knowledge,these methods are novel.The methods are all based on sub-sampling of the dataset.The methods are general and can be used with evaluation of clustering generated by wide range of clustering algorithms available.Theﬁrst method,ﬁrst makes a clustering using given clustering algorithm and cluster count. Next,it randomly samples from the labelled clusters,then it builds a supervised classiﬁer on the selected subset,the induced classiﬁer evaluates the non-selected portion.Random sub-sampling and evaluation steps are repeated many times,ﬁnally the overall accuracy gives the stability of the clustering.Toﬁnd the best stable clustering for the given algorithm,overall steps are repeated for all possible number of clusters and best stable clustering is chosen for conﬁdence estimation.Instead of random sub-sampling,10-fold cross-validation is also employed.The second method is based on the subset selection of original clusters.First of all given clustering algorithmﬁnds clusters.For each subset of these clusters,an algorithm that estimates the true number of clusters is used.The argument here is that,if initial clustering is stable,then for each subset of it we expect number of clusters estimated is the same as cardinality of selected subset. The above single step is for assessing the reliability of cluster itself.If the reliability of randomized algorithm like k-means is the concern,the overall steps are repeated for averaging.The conﬁdence is computed as the ratio of correct estimations.It may be the case that,clustering has given large number of clusters(e.g.say20clusters).In this case,trying all subsets become computationaly-intractable so we resort to subset sampling instead.The third method uses the idea that if a cluster is stable,further clustering the cases in the cluster will reveal one cluster.For each of the clusters,an estimator algorithm is run and expected to give that there is one cluster.The whole step is repeated many times with sub-sampling of dataset,i.e.a bootstrapping approach.Conﬁdence is computed similar to the second method.Bootstrapping approach is employed for conﬁdence estimation.The second and third method can also be used for selecting the best number of clusters in the sense that give highest conﬁdence.1IntroductionThe word”clustering”(a.k.a.unsupervised classiﬁcation)refers to methods of grouping objects based on some similarity measure between them.The algorithms for clustering can be classiﬁed into four classes,Partitional,Hierarchical,Density-based and Grid-based,[Halkidi01].For each of class there are subclasses and diﬀerent approaches,e.g.conceptual,fuzzy,self-organizing maps etc.The clustering task consists of all the steps of clustering problem and can be divided intoﬁve steps(last two is optional)[Jain99].1.Pattern representation2.Pattern proximity measure deﬁnition3.Clustering4.Data abstraction5.Cluster validity analysisIn the present paper we only consider the last step which is somehow related to other steps.Given a dataset,all applicable clustering algorithms produce a clustering depending on their parameters. Usually it is the case that,diﬀerent algorithms even the same algorithm with distinct parameters generate diﬀerent clusterings.Cluster validity analysis refers how to assess the conﬁdence in the resulting clusters.For a few dimensional datasets,the clustering result can be visualized and clusters can be validated by human experts.But,for large dimensions it becomes nearly impossible,so other methods that are automatic are pactness(i.e.members of each cluster should closer to each other) and separation(i.e.the clusters should be widely spaced)are the main criteria for evaluation of clustering results[Halkidi01].Based on these criteria a number of indices are proposed for evaluating clusters and selection of best cluster numbers.We attempt to cluster validity problem and propose three algorithms.In theﬁrst method,initial clustering results are tried to be validated by supervised classiﬁers.The dataset is divided into training and test sets and accuracy of classiﬁer is evaluated on the test set.Since,test set is also generated by the same distribution we expect high accuracy if the initial clustering is a good one. This method computes conﬁdence in the generalization capability of initial clustering.In the second method,the fact that if the initial clustering is a good one then each of its subsets should be good ones.This can be considered the conﬁdence estimation of initial clustering.The third method is similar to second method and takes the dual approach that for each generated clusters it should not tend to break itself on perturbations.In other words each cluster is expected to be stable itself.By repeating this process a number of times on subsamples we get a conﬁdence estimation.The paper is organized as follows.In the section2,some background and recent work are given on the cluster validity.Section3presents our three methods for cluster validity analysis.Experimental results are presented in Section4.Finally,we conclude in Section5.2Cluster Validity and StabilityThere are basically three methods of assessment of validity,internal,external and relative,[Jain99], [Halkidi01],[Fridlyand01].Internal indices measure how the clustering result reﬂects the structure inherent in the dataset.Here only the inherent features of the dataset is used for measurement, i.e.no external information is consulted.As inherent features usually between and within sum of square matrices are used.There are a number of indices available,including silhouette,gap, gapPC[Fridlyand01].These indices also deﬁne how to select the best number of clusters.In external assessment of validity there is a known priori structure,and an external index is computed using this structure and generated structure.These indices deﬁne a measure of degree of match between these two structures.The indices are usually deﬁned on contingency tables of two partitions.Entry,n ij,in the row i and column j of this table is number of patterns that belong to cluster i in the priori partition and cluster j in the generated partition.These indices include Jaccard,Rand and FM.The FM measure is used in Clest algorithm and given below,[Fridlyand01].F M=(1/2)(Z−n)Ri=1n i.2Cj=1n.j2(1)where n= Ri=1Cj=1n ij,Z=Ri=1Cj=1n2ij,n i.=Cj=1n ij and n.j=Ri=1n ij.R and C isthe number of clusters of priori and generated clusters,respectively.Relative assessment compares two structures and measures their relative merit.The idea is to run the clustering algorithm for possible number of parameters(e.g.for each possible number of clusters)and identify the clustering scheme that bestﬁts the dataset.Recent work on cluster validity research is concentrated on a kind of relative indices called cluster stability,[Roth02],[Ben-Hur02], [Ben-Hur03],[Kerr00],[Levine01],[Fridlyand01],[Zhang00].Cluster stability exploits the fact that when multiple data sources are sampled from the same distribution,the clustering algorithms should behave in the same way and produce similar structures.In[Roth02],a supervised predictor is built on each clustered resampling of original dataset and their match with the original clustering labelling is used as a measure of stability or degree of match. They show that selection of supervised predictor makes diﬀerence but measured validity is still valid for other choices.They deﬁne an instability measure for taking the game-theoretic approach.The number of clusters minimizing this instability measure is used as best cluster number.[Roth02]presents an algorithm for estimating the true number of clusters.For each cluster count, dataset is resampled twice and clustered using the same generic clustering algorithm.Similarity between these two clustering is measured using the either Jaccard coeﬃcient or matching coeﬃcient. The resampling and similarity computations are repeated many times for each number of clusters for conﬁdence estimation.The averaged values are used as measures of stability of clustering generated by the given clustering algorithm.The histograms and cumulative distributions are generated and plotted for selecting best cluster number.Smallest stable cluster count is estimated as the correct number of clusters.The selection is obvious in cumulative distributions diagram and they have also given a measure for automating this process.The algorithm has a nice property that if there is no gap between similarities across all cluster counts,it is said that dataset does not contain clusters,i.e. cluster count is1.Another resampling based method is given in[Levine01].In their settings full dataset is clustered ﬁrst and a number of subsamples are gathered from the dataset and each of them clustered inde-pendently using the same clustering algorithm.Between original clustering and each of subsampled clusterings aﬁgure of merit measure(degree of match in the connectivity matrix)is deﬁned.The ﬁgure of merit is computed for each possible number of parameter sets.The plot ofﬁgure of merit measure against parameter values is used to select the best parameters.A Gaussianﬁnite mixture based method for estimating true number of clusters is described in [Smyth96].The algorithmﬁrst divides dataset into training and test set.For each cluster count k a model isﬁtted to training set using the Expectation Maxmization(ML)algorithm.Resulting parameter set is evaluated on test set.These steps are repeated many times and average of them are used as estimates.3MethodsWe denote the input dataset by T having n patterns each of them having dimension of p.So,T is eﬀectively n×p matrix.The algorithms can be used for diﬀerent number of cluster counts and diﬀerent clusterings either generated by diﬀerent clustering algorithms or hand-constructed clusters.In the case of comparing diﬀerent clustering algorithms we collect the conﬁdence measure of them for all possible number of clusters.These data can be used for relative conﬁdence estimation of clustering algorithms on the given dataset.Any clustering algorithm operating on numeric values(e.g.k-means,ORCLUS, PAM,CLARA)having the cluster count as a parameter can be used conﬁdence estimation.For randomized algorithms like k-means conﬁdences should be averaged on several runs.Our methods enable someone to compare a number of clustering algorithms on a given dataset based on their conﬁdences in the stable clusterings.The ORCLUS algorithm is proposed for high-dimensional datasets.The idea behind the algorithm isﬁnding(potentially)diﬀerent arbitrarily projected subspaces for each of the clusters.It is an iterative algorithm and starts with an initial partitions and original axis-system.In each iteration,ﬁrst of all patterns are assigned to a cluster based on projected distance of them to seeds of current clustering.Then,centroid of clusters(seeds)are recomputed and the new projected subspaces are computed for each of the clusters.Following this,closer seeds are merged to obtain less number of clusters.Iteration continues until user-speciﬁed number of clustering is found and the projected subspace dimensionality of each cluster is reached to user-speciﬁed minimum.Contrary to feature selection methods which selects dimensions in the larger eigenvalues,the algorithm selects smaller eigenvalue subspaces.The reason behind this is to reduce the variability in the projected subspace,i.e.reduce the distance within cluster.The algorithm has capabilities ofdetecting outliers and scales to very large databases,for details see[Aggarwal02].3.1Validity estimation using supervised learningThe method validates the result of clustering with supervised classiﬁers.The idea behind this method is if the labels generated by cluster algorithm is valid(i.e.clusters are well-separated)the classiﬁer using this labelling will classify them with high accuracy.To test the validity of clustering result,we train classiﬁer on perturbed version of labelled patterns,and test it on the patterns not selected for training.For estimating conﬁdence the subsampling is repeated many times.The average accuracy is used as a measure of conﬁdence in the validity of clustering.Input:T=dataset,K=number of clusters,B=number of subsampling1.f=0.72.L=Cluster(T,K)3.For b=1to B do4.L b=subsample(L,f)5.C b=Build Classifier(L b)6.A b=Compute Accuracy(C b,L−L b)7.end do8.A=1B Bb=1A bFigure1:Validity estimation using supervised learning algorithm The algorithm is sketched in Figure1.In the step2,any clustering algorithm that partitions the patterns can be used.In the step5of the algorithm we use the Diagonal Linear Discriminant Analysis(DLDA)algorithm[Dudoit01].Authors experimented with several algorithms and DLDA is found to be one of the best in their settings and datasets.It is also employed in the Clest algorithm, a cluster estimation/validation method using discriminant analysis approach[Fridlyand01].DLDA is based on Maximum Likelihood(ML)approach.Classiﬁer C classiﬁes an instance x by using the class conditional probabilities,i.e.C(x)=arg maxkP(x|y=k)(2) For multivariate normal class density probabilities,i.e.P(x|y=k)∼N(µk,Σk),the classiﬁer becomesC(x)=arg mink(x−µk)Σ−1k(x−µk) +log|µk|(3)The special case is obtained when the the class densities have the same diagonal covariance matrix. In this case,the classiﬁcation formula known as DLDA discrimination rule is obtained as follows,C(x)=arg minkpj=1(x j−µkj)2σ2j(4)3.2Validity using subset of clustersThe method is designed for testing the stability of each subset of clusterings.If the initial clustering is valid,then the every subset of it is expected to be valid.To test the validity of subsets,subsampling based cluster count estimation algorithms can be used.By this way,the the conﬁdence in validity is computed based on stability of subset of original clustering.The algorithm is given in Figure2.In each iteration,a subset of labelling is selected randomly in step3.For K clusters,cluster i,1≤i≤K is selected with probabilityα,i.e.uniform selection. We set theα=0.5to make the expected value of selected cluster label size K/2.In the step4of the algorithm,a prediction-based resampling algorithm,Clest,[Fridlyand01]is used.In fact Clest is a method having several parameters and instantiations of parameters result in diﬀerent algorithms.For example,actual clustering and classiﬁer algorithms are generic.The algorithm is given in Figure3.Input:T=dataset,K=number of clusters,B=number of subset subsampling,kmax=maximum number of clusters1.L =Cluster (T,K )2.For b=1to min (B,2|K |−1)do3.L b =patterns belonging to b th subset of K4.K b =Estimate ClusterCount (L b ,kmax )5.A b =1(K b ==number of cluster (L b ))6.end do7.A = min (B,2|K |−1)b =1A bmin (B,2|K |−1)Figure 2:Validity using subsets of clusters algorithmSince clustering and classiﬁer algorithms are generic,one should select concrete algorithms.In the original algorithm authors selected Partitioning Around Medoids (PAM)algorithm for clustering and DLDA for classiﬁcation.3.3Validity using cluster tendencyIn this method,every cluster generated by clustering algorithm by subsampling is evaluated against null hypothesis that there is only one cluster.The motivation for this method is if a clustering algorithm produces reasonable structures for every subsamples,then every cluster is expected to be a tight structure,i.e.structure not having any further tendency of sub-clusters.The algorithm is presented in Figure 4.In step 6,we use the Clest algorithm for cluster count estimation.4Experiments and ResultsFor all of the methods,we analyze the clusterings generated by well-known k-means and ORCLUS algorithms.The analysis is done for cluster counts of 2to 10for all the datasets.The parameter B of all three methods is set to 50.In all of the experiments the parameters of Clest algorithm given in Figure 3is set as follows,pmax =0.05,dmin =0.05,size of learning set =2n/3,B =20,B 0=20and kmax =10.5ConclusionReferences[Jain99]Jain,A.K.,Murty,M.N.,Flynn,P.J.Data Clustering:A Review .ACM Computing Sur-veys,Vol 31,No.3.1999.[Halkidi01]Halkidi,M.,Batistakis,Y.,Vazirgiannis,M.On Clustering Validation Techniques .Journal of Intelligent Information Systems Vol.17:2-3.2001.[Ben-Hur02]Ben-Hur,A.,Elisseeﬀ,A.,Guyon,I.A stability based method for discovering structurein clustered data.Paciﬁc Symposium on Biocomputing.2002.[Ben-Hur03]Ben-Hur,A.,Guyon,I.Detecting stable clusters using principal component analysis.In Methods in Molecular Biology,M.J.Brownstein and A.Kohodursky (eds.)Humana press,pp.159-182.2003.[Kerr00]Kerr,M.K.,Churchill,G.A.Bootstrapping Cluster Analysis:Assessing the Reliability ofConclusions from Microarray Experiments.Proceedings of the National Academy of Sciences.2000.[Levine01]Levine,E.,Domany,E.Resampling Method for Unsupervised Estimation of Cluster Va-lidity.Neural Computation.2001.Input:T=dataset,K=number of clusters,B=number of runs,B 0=number ofresampling,kmax=maximum number of clusters1.T 0=T 2.For k=2to kmax do 3.For i=0to B 0do 4.For b=1to B do 5.Randomly split the T i into non-overlapping learning and test sets 6.Apply clustering algorithm P to to the learning set 7.Build a classiﬁer using the labelled learning set 8.Apply the resulting classiﬁer to the test set 9.Apply the clustering algorithm to the test set 10.s k,i,b =F M external index comparing the two sets of labels 11.end do 12.t k,i =median (s k,i,1,···,s k,i,B )13.T i +1=Randomly generate uniform dataset in the range of T 14.end do 15.end do 16.t 0k =1B 0B 0i =1t k,i 17.p k =#{i |t k,i ≥t k,0,i =1...B 0}B 018.d k =t k,0−t 0k 19.K ={k |2≤k ≤kmax ,p k ≤pmax,d k ≥dmin }20. K = 1if empty(K),arg max k ∈K d k otherwise.Figure 3:Clest algorithm[Fridlyand01]Fridlyand,J.,Dudoit,S.Applications of resampling methods to estimate the number ofclusters and to improve the accuracy of a clustering method.University of California,Statistics Department Technical Report No:600.2001.[Dudoit01]Dudoit,S.,Fridlyand,J.,Speed,parison of Discrimination methods for the clas-siﬁcation of tumors using gene expression data.Journal of American Statistical Association.2001.[Roth02]Roth,V.,et al.Stability-Based Model Order Selection in Clustering with Applications toGene Expression Data.ICANN.2002.[Keller00]Keller,A.,et al.Bayesian Classiﬁcation of DNA Array Expression Data.Universityof Washington,Computer Science Department Technical Report No:UW-CSE-2000-08-01.2000.[Aggarwal02]Aggarwal,C.,Yu,P.S.Redeﬁning Clustering for High-Dimensional Apllications.IEEETransactions on Knowledge and Data Engineering Vol.14.2002.[Smyth96]Smyth,P.Clustering using Monte Carlo Cross-Validation.KDD’96.1996.[Buhmann02]Buhmann,J.M.Learning and Data Clustering.Handbook of Brain Theory and NeuralNetworks,MIT Press.2002.[Zhang00]Zhang,K.,Zhao,H.Assessing Reliability of gene Clusters from Gene Expression Data.Functional Genomics.2000.Input:T=dataset,K=number of clusters,B=number of subsampling kmax= maximum number of clusters1.f=0.72.For b=1to B do3.T b=subsample(T,f)4.L b=Cluster(T b,K)5.for each cluster c∈L b do6.K b,c=Estimate ClusterCount(c,kmax)7.A b,c=1(Kb,c==1)8.end do9.end do10.A=1B Bb=1number of cluster(L b)c=1A b,cnumber of cluster(L b)Figure4:Validity using individual clusters algorithm。

Cluster Stability Analysis using Sub-sampling

合集下载

A MATLAB Toolbox and its Web based Variant for Fuzzy Cluster Analysis

国际自动化与计算杂志.英文版.

SPSS术语中英文对照

谷子DUS测试标准品种指纹图谱的构建与应用

（完整版）自动控制专业英语词汇

第十九章聚类分析 (Clustering Analysis) - 中南大学

泥浆专业词汇

量化研究与统计分析—集群分析

电子类专业英语词汇 A

Cluster analysis

文档推荐

最新文档

Cluster Stability Analysis using Sub-sampling

合集下载

A MATLAB Toolbox and its Web based Variant for Fuzzy Cluster Analysis

国际自动化与计算杂志.英文版.

SPSS术语中英文对照

谷子DUS测试标准品种指纹图谱的构建与应用

（完整版）自动控制专业英语词汇

第十九章 聚类分析 (Clustering Analysis) - 中南大学

泥浆专业词汇

量化研究与统计分析—集群分析

电子类专业英语词汇 A

Cluster analysis

文档推荐

最新文档

第十九章聚类分析 (Clustering Analysis) - 中南大学