A Topological Constraints Based Sequential Data Mining Approach on Telecom Networks Alarm Data ver 4
- 格式:doc
- 大小:261.53 KB
- 文档页数:5
最后的英⽂参考⽂献2英⽂正⽂资料英⽂正⽂资料ABSTRACTDesign and manufacturing are the core activities for realizing a marketable and profitable product. A number of evolutionary changes have taken place over the past couple of decades in the areas of both design and manufacturing. First we explore the developments in what is called CAD. The major focus in CAD technology development has been on advancing representation completeness. First there was the development of a two-dimensional (2D) drafting system in the 1960s. Then the extension of 2D drafting systems to three-dimensional (3D) models led to the development of wire frame-based modeling systems. However, it was not possible to represent higher order geometry data such as surface data. To bridge this gap, surface based models were developed in the early 1970s. Even though the surface models provided some higher level information, such as surface data for boundary representation, this was still not sufficient to represent solid or volume enclosure information. The need for solid modeling intensified with the development of application programs such as numerical control (NC) verification codes and automation mesh generation. A volume representation of the part is needed for performing topological validity checks. The solid modeling technology has evolved only since the mid-1970s. A large number of comprehensive software products are now available that enable integration of geometric modeling with design analysis and computer aided manufacturing .The latest evolutionary development in the CAD/CAM industry has been knowledge-based engineering systems that can capture both geometric and nongeometric product information, such as engineering rules, part dependences, and manufacturing constraints, resulting in more informationally complete product definitions.Optimum DesignIn the design of any component , there are always associate with the design certain desirable and undesirable effects. It is possible to obtain design solutions without1西安交通⼤学城市学院本科⽣毕业设计(论⽂)paying too much attention to these effects (other than casually checking that the component will perform its required function without failure); such a solution might be termed an adequate design .In many instances, however, it is necessary to give more than casual consideration to the various effects: either to maximize a desirable one or minimize an undesirable one . The design solution may then be termed an optimum design . For example , it may be required to minimize the cost of a component (particularly if the design is for mass production ), to minimize weight or deflection , or to obtain maximum power transmission capability or load carrying capacity .When any component is designed , certain functional requirements must be satisfied , and there are usually many design solutions which will satisfy these requirements. It is the purpose of the optimum design method to present a procedure of design which will give an optimum solution , taking account of all the factors involved .Any idealized engineering system can be described by a finite set of quantities. For example, an elastic structure modeled by finite elements is characterized by the mode coordinates … Some of these quantities are fixed in advance and they will not be changed by the redesign process (they are often called prescribed parameters ). The others are the design variables; they will be modified during each redesign process in order to gradually optimize the mechanical system. A function of the design variables must be defined, whose value permits selecting different feasible design variables; this is the objective function (e.g. the weight of an aerospace structures ). A design is said to be feasible if it satisfies all the requirements that are imposed to the mechanical system when performing its tasks. Usually , requiring that a design is feasible amounts to assigning upper or lower limits to quantities characterizing the system behavior (inequality constraints ).Sometimes given values , rather than lower or upper bounds , are imposed to these quantities (equality constraints ).Taking again the case of structural optimization , the behavior constraints are placed on stresses, displacement , frequencies, buckling loads, etc…Reliability DesignConsumer products, industrial machinery , and military equipment are intently evaluated for reliability of performance and life expectancy. Although the “military” and particular industrial users (for example ,power plants both fossil fuel and muclear 2 fuel ) have always followed some sort of reliability programs, consumer products have of late received the widest attention and publicity. One of the most important foundations for product reliability is its design, and it is apparent that the designershould at least be acquainted with some of the guidelines.The article entitle d “A Manual of Reliability ”offers the following definition of reliability:” Reliability is the probability that a device will perform without failure a specific function under given conditions for a given period of time “. From this definition, we see that a thorough and in-depth analysis of reliability will involve statistics and probability theory .All products , systems , assemblies, components and parts exhibit different failure rates over their service lives. Although the shape of the curve varies, most exhibit a low failure rate during most of their useful lives and higher failure rates at the beginning and end of their usefullives.The curve is usually shaped like a bathtub as is shown in figure 1. Infant mortality of manufactured parts occurs because a certain percentage, however small , of seemingly identical parts are defective. If those parts are included in a system, the system will fail early in its service life. Product warranties are usually designed to reduce customer losses due to infant mortality. Parts wear out due to friction, overload , plastic deformation , fatigue , changes in composition do to excessive heat,3西安交通⼤学城市学院本科⽣毕业设计(论⽂)corrosion ,fouling , abuse , etc.The design function of engineering should include an examination of reliability and should seek to provide adequate reliability in a part or system commensurate with its use. When the safety of people is concerned, product reliability with respect to potential injury producting failure must be very high . Human health and safety cannot be compromised for the sake of profit .Computer-Aided DesignThe computer has grown to become essential in the operations of business, government, the military, engineering, and research. It has also demonstrated itself ,especially in recent years, to be a very powerful tool in design and manufacturing . In this chapter, we consider the application of computer technology to the design of a product. That is computer-aided design or CAD. Computer-aided design involves any type of design activity which makes use of the computer to develop, analyze, or modify an engineering design. Modern CAD systems (also often called CAD/CAM systems ) are based on interactive computer graphics (ICG). Interactive computer graphics denotes a user-oriented system in which the computer is employed to create, transform, and display data in the form of picture or symbols . The user in the computer graphics design system is the designer , who communicates data and commands to the computer through any of several input devices. The computer communicates with the user via a cathode ray tube (CRT).The designer create an image on the CRT screen by entering commands to call the desired software subroutines stored in the computer . In most systems, the image is constructed out of basic geometric elements-points, lines, circles, and so on. It can be modified according to the commands of the designer-enlarged, reduced in size, moved to another location on the screen, rotated, and other transformations. Through these various manipulations , the required details of the image are formulated.The typical ICG system is a combination of hardware and software.The hardware includes a central processing unit(CPU),one or more workstations (including the graphics display terminals), and peripheral devices such as printers, plotters, and drafting equipment . The software consists of the computer programs needed to implement graphics processing on the system. The software would also typically include additional specialized application programs to accomplish the particular4engineering functions required by the user company .It is important to note the fact that the ICG system is one component of a computer-aided design system. The other major component is the human designer . Interactive computer graphics is a tool used by the designer to solve a design problem. In effect, the ICG system magnifies the powers of the designer. This has been referred to as the synergistic effect. The designer performs the portion of the design process that is most suitable to human intellectual skills (conceptualization, independent thinking ); the computer performs the task best suited to its capabilities (speed of calculations, visual display, storage of large amounts of data ), and the resulting system exceeds the sum of its components.There are many benefits of computer-aided design, only some of which can be easily measured. Some of the benefits are intangible, reflected in improved work quality, more pertinent and usable information, and improved control, all of which are difficult to quantify. Other benefits are tangible, but the savings from them show up far downstream in the production process, so that it is difficult to assign a dollar figure to them in the design phase. Some of the benefits that derive from implementing CAD/CAM can be directly measured. In the subsections that follow, we elaborate on some of potential benefits of an integrated CAD/CAM system.Increased productivity translates into a more competitive position for the firm because it will reduce staff requirements on a given project. This leads to lower costs in addition to improving response time on projects with tight schedules.Surveying some of the larger CAD/CAM vendors, one finds that the productivity improvement ratio for a designer/draftsman is usually given as a range, typically from a low end of 3:1 to a high end in excess of 10:1(often far in excess of that figure). Productivity improvement in computer-aided design as compared to the traditional design process is dependent on such factors as:Complexity of the engineering drawing ;Level of detail required in the drawing ;Degree of repetitiveness in the designed parts;Degree of symmetry in the parts;5西安交通⼤学城市学院本科⽣毕业设计(论⽂)6Extensiveness of library of commonly used entities .As each of these factors is increased , the productivity advantage of CAD will tend to increase.Interactive computer-aided design is inherently faster than the traditional design process. It also speeds up the task of preparing reports and lists (e.g, the assembly lists) which are normally accomplished manually. Accordingly, it is possible with a CAD system to produce a finished set of component drawings and the associated reports in a relatively short time. Shorter lead times in design translate into shorter elapsed time between receipt of a customer order and delivery of the final product.The design analysis routines available in a CAD system help to consolidate the design process into a more logical word pattern. Rather than having a back-and-forth exchange between design and analysis groups, the same person can perform the analysis while remaining at a CAD workstation. This helps to improve the concentration of designers, since they are interacting with their designs in a real-time sense. Because of this analysis , capability designs can be created which are closer to optimum. There is a time saving to be derived from the computerized analysis routines, both in designer time and in elapsed time. This saving results from the rapid response of the design analysis and from the time no longer lost while the design finds its way from the designer’s drawing board to the design analyst’s queue and back again.An example of the success of this is drawn from the experience of the General Electric Company with the T700 engine. In designing a jet engine, weight is an important design consideration. During the design of the engine, weights of each component for each design alternative must be determined. This had in the past been done manually by dividing each part into simple geometrical shapes to conveniently compute the volumes and weights. Through the use of CAD and its mass properties analysis function, the mass properties were obtained in 25% of the time formerly taken.英⽂译⽂英⽂译⽂设计⽅法对于⽣产⼀种适合市场销售从⽽获利的产品来说,设计及制造是核⼼任务。
高分子构象的荧光光谱研究薛奇南京大学化学化工学院,高分子科学与工程系荧光反射光谱技术中,有一种称为“无辐射能量转移”荧光方法。
如果在不同分子链上分别接上荧光给予体及接受体,当两个分子链距离较大(>3nm)时,荧光光谱中只出现给予体的信号,当二者距离较小(<3nm)时,荧光光谱中同时出现给予体及接受体的信号。
这一方法可用来检测链间分子链段的距离。
如果荧光的“接受体”和“给予体”接在同一根高分子链上时,则可测定分子内链段的距离,衡量分子链是否塌缩。
这种荧光光谱方法简称为NRET,(无辐射能量转移的英文缩写)。
由于NRET灵敏度极高,这是一种研究分子构象的重要方法。
本课题组用NRET研究了高分子链在溶液中的构象。
可以检测到高分子链在溶液中的构象转变。
提出了关于溶剂分子体积作用的新的概念。
最近,我们用NRET方法研究了超薄膜的构象,及与玻璃化转变的相关性。
从分子水平上为理解受限态高分子的玻璃化转变提出了重要的概念。
DI-02多电荷排斥作用体系的荧光关联光谱:理论、模拟、实验赵江中国科学院化学研究所,北京市海淀区中关村北一街二号多级动态过程普遍存在于多电荷软物质体系之中,如:胶体、乳液、聚电解质溶液、凝胶等等,如何在这种多级动态过程存在的情况下,获取单分子、单粒子的信息一直是软物质物理领域的重要研究课题及难点问题。
我们成功地将双色交叉关联光谱方法引入到这方面的研究中,发展了理论分析方法,从模拟与实验的双重层面上对带电胶体体系、聚电解质溶液等多电荷强排斥相互作用体系开展研究,有效地实现了分子与颗粒识别,获得了单分子、单粒子的动态信息以及相互作用参数。
235超速离心分析技术在高分子科学中的应用张广照华南理工大学,广州五山381号,510640超速离心分析技术可通过沉降速度,测量高分子在溶液中的沉降系数、扩散系数、流体力学半径和摩尔质量。
在本报告中,我们将介绍超速离心分析技术的原理,以及该技术在聚电解质和中性高分子动力学行为研究中的应用。
AppNote 10655Programmable Electrical Rule Checking (PERC)By: Dina MedhatLast Modified: 28-Oct-2008©Copyright Mentor Graphics Corporation 1995-2008. All rights reserved.This document contains information that is proprietary to Mentor Graphics Corporation. The original recipient of this document may duplicate this document in whole or in part for internal business purposes only, provided that this entire notice appears in all copies. In duplicating any part of this document, the recipient agrees to make every reasonableeffort to prevent the unauthorized use and distribution of the proprietary information.Trademarks that appear in Mentor Graphics product publications that are not owned by Mentor Graphics are trademarksof their respective owners.IntroductionReliability is a growing concern for integrated circuit designers. Calibre PERC (Programmable Electrical Rule Checking) can address reliability challenges that arise during the circuit and electrical verification process. PERC is specifically designed to perform electrostatic discharge (ESD) and multiple power domain checks. Calibre PERC allows you to customize ERC checks at the schematic level as well as geometrical and electrical checks at layout, which gives you more power and flexibility to handle emerging circuit verification demands for design implementation.ESD, advanced ERC, and multiple power domains are top issues on a long list of complex new geometrical and electrical verification requirements. All of these advanced requirements can only be described by a topological view rather than single device/pin to net relation. A topological view incorporates many layout-related parameters as well as circuitry-dependant checks.PERC in the Design FlowPERC allows you to verify both the source and the layout sides of a design. It can work in hierarchical mode to provide fast speed and maximum capacity. Moreover it can perform netlist transformation: device reduction, logic injection, and gate recognition.Inputs for PERC are:1.Rule file including:F Function statement containing the PERC rule check proceduresb.SVRF Statementsi.PERC commands: PERC REPORT, PERC NETLIST, PERC PROPERTY & PERCLOADii.Some LVS commands2.SPICE netlist (or GDSII layout database, from which the SPICE netlist extractor generates a netlist) Outputs from PERC are:1.Report file, listing the checks that were run and their results2.SVDB for RVE supportBy running PERC on the schematic you can identify electrical errors earlier in the design cycle and correct them before layout is implemented. Once schematic checking has been completed, layout based verification withgeometrical constraints can be performed to achieve more complete checks. To run post-layout, complex geometrical parameters can be calculated and incorporated into the PERC check, thereby incorporating both electrical and geometrical data into one verification step.PERC Naming ConventionsPERC commands follow the naming conventions established in LVS. Hence it uses same names for built-in devices (ex: MN, MP, R, …) and pins (ex: g, s, d, p, n, ….). Moreover equivalent device types established with the “LVS DEVICE TYPE” specification statement are supported as built-in devices.If netlist transformation is performed, then PERC also recognizes the logic gates and/or logic injections formed by LVS. These are considered as built-in devices (ex: INV, NAND2, _invv, ….) with built-in pins (ex: output, input, ….). Besides the individual device types, PERC provides four reserved keywords for referencing generic logic gates and logic injection devices:1.lvsGate: Device type referring to all logic gates2.lvsInjection: Device type referring to all logic injection devices3.lvsIn: Pin name referring to all input pins of logic gates and gate-based injection devices4.lvsOut: Pin name referring to all output pins of logic gates and gate-based injection devicesThere are more useful reserved keywords:1.lvsPower: List of power nets defined by “LVS POWER NAME” statement2.lvsGround: List of ground nets defined by “LVS GROUND NAME” statement3.lvsTopPorts: List of nets connected to ports in the top cell4.lvsTop: Generic cell name referring to the top level cellAlso case sensitivity of device types, subtypes, pin names, and net names follows the rules of LVS defined by “LVS COMPARE CASE” statement. Consequently they are case insensitive by default.PERC CommandsIn order for PERC commands to be available to Tcl interpreter, this statement must be included in every TVF function definition:package require CalibreLVS_PERCFor Initialization Commands:•These commands allow the user to initialize the netlist before executing any rule checks•There are three kinds:o Net type commands: Used to label nets with net typeso Net path commands: Used to create net paths across deviceso Parameter commands: Used to customize the runtime environmentPath through resistorPower net type Pad net typeGround net typeFor Low level rule checking Commands:•These commands do not output results to the report file•They provide access to data in input netlist (cells, placements, instances, nets, pins, properties) •They use iterators mechanism•An iterator is a handle in TCL that points to an element in the input netlistAccess to pmosAccess to resistorAccess to nmosAccess to nmosFor High level rule checking Commands:•These commands provide a way to write complex rule checks•There are two basic category of commands:o Rule commands: Used to define rule checks and output results to the report file and RVEo Math commands: Used to compute parameters over a list of devicesPERC InvocationThe command-line switches behave similarly as in LVS, except -hier, which controls the hcell list for PERC. If -hier is not specified, PERC runs flat. Otherwise, PERC runs hierarchically. The hcell list is accumulated using the following four methods:1. –automatch: This option causes all cells in input netlist to be added to the hcell list2. –hcell: This option causes all cells that are listed in cell_file_name and belong to the input netlist to beadded to the hcell list3. HCELL: This SVRF statement causes all cells that are listed using “HCELL” statements and belong tothe input netlist to be added to the hcell list4. LVS EXCLUDE HCELL: This SVRF statement causes all cells that are listed using “LVS EXCLUDEHCELL” statements to be removed from the hcell listExamples:•Run PERC Hierarchically on a netlist and have hcells recognized automatically: calibre –perc –hier –auto rules• Run PERC Hierarchically on a netlist and have hcells recognized from a user-specified hcell list: calibre –perc –hier –hcell cell_list rules• Run PERC Flat on a netlist:calibre –perc rules• Run PERC Hierarchically on a layout and have hcells recognized automatically:calibre –spice –perc –hier –auto rules• Run PERC Flat on a layout:calibre –spice –perc rulesPERC & ElectroStatic Discharge (ESD)ESD rules verification is needed to prevent chip catastrophic failures. In complex designs, sophisticated ESD protection device structures have to be verified. Usually, these structures are formed from a group of devices to construct better ESD prevention circuitry. Other considerations also have to be added, such as geometrical constraints of device dimensions, number of fingers, distance from supply pads, ….. etc. Calibre PERC handles these complex requirements with a straight-forward methodology.Example 1:ESD Protection for Gate connected to IO Pad requires existence of:• A resistor with resistance value greater than 100 ohmExample 2:ESD Protection for IO pads with DTSCR (Diode String Triggered SCR) requires existence of:• Bipolar transistors• ResistorPERC & Multiple Power DomainsIn multiple power domains, system integration and IP reuse complicates the circuit verification problem because of the various connections between different domains. Design hierarchy and constraints need to be considered where specific rules are applied on a top cell and/or pad frame but others are applied between blocks that cross multiple power domains. PERC helps you identify where there are inappropriate connectors between different power domains.Example:Serially connected gates can not be on different supplies. Level shifter should bridge the two gates because faulty switching or gate burn out might occur.SummaryCalibre PERC provides the capability to achieve advanced circuit verification and allows designers to perform checks to address ESD (ElectroStatic Discharge) issues, errors arising from designing across multi-power domains, and advanced ERC concerns. PERC provides integration of topology, geometry and circuit to perform user-configurable verification.。
第24卷第7期2 0 18年7月计算机集成制造系统Computer Integrated Manufacturing SystemsVol. 24 No. 7July 2 0 18D O I:10. 13196/j.cims.2018. 07. 028过程间引发约束变化的最小高级修改序列识别张学伟1,刘明菊2,邢建春1+,周启臻1%.陆军工程大学国防工程学院,江苏南京210007;2.中国洛阳电子装备试验中心,河南洛阳471003)摘要:鉴于引发约束变化的最小高级修改序列在实现数据感知过程转换、合并、版本控制等方面具有重要作用,提出一种识别数据感知过程间引发约束变化的最小高级修改序列的方法。
该方法定义了数据感知过程的活动约束图,然后基于活动约束图枸建两个数据感知过程的约束矩阵,最后利用约束矩阵和数字逻辑识别一个数据感知过程间转换所需引发约束变化的最小修改序列。
大量实验评估了所提方法和现有方法的准确性与时效性。
结果表明,所提方法比现有方法具有更高的准确性,其平均准确率达到89. 89%。
关键词:数据感知过程;引发约束变化的最小修改序列#呈序约束图#约束矩阵中图分类号:TP311 文献标识码:AMinimum sequence identification of high-level change operations changing constraintsbetween data-aware processesZHANGXuexvei1,LIUMingju2,XINGJianchun1+ ,ZHOUQizhen1(1. Department of Defense Engineering, PLA Army EngineeringUniversity,Nanjing 210007,China;2. China Luoyang Electronic Equipment Testing Center, Luoyang 471003, China)Abstract: Owing to the i mportance of minimum sequence of high —level change operations changing constraints between data —aware processes in the aspects of data —aware process retrieval, merge and version control an identit--cation method was proposed. In this method , the activity constraint graph of data-aware process was defined. Thenthe constraint matrixes of a pair of data-aware processes were constructed based on activity dep constraint matrixes and the concept of digital logical were used to identify a minimum sequence of h operations changing constraints between data-aware processes. Extensive experiments on real and synt were conducted to evaluate the accuracy and ef? ciency of proposed approach and previous approaches. Experimentalresults demonstrated that the proposed approach could achieve a higher average accuracy (89. 89%) than that vious approaches.Keywords:data-aware process; minimum sequence of high-level change operations changing constraints; activityconstraint graph;constraint matrixi问题的提出随着智能制造领域工业化和信息化产业的深度 融合,以信息技术和管理技术为基础,提高业务过程 管理效率为出发点,提升企业竞争力为目标的业务过程管理(Business Process Management, BPM)[13]具有广阔的应用前景和研究空间。
连续体结构的拓扑优化设计一、本文概述Overview of this article随着科技的不断进步和工程需求的日益增长,连续体结构的拓扑优化设计已成为现代工程领域的研究热点。
拓扑优化旨在通过改变结构的内部布局和连接方式,实现结构性能的最优化,从而提高工程结构的承载能力和效率。
本文将对连续体结构的拓扑优化设计进行深入研究,探讨其基本原理、方法、应用以及未来的发展趋势。
With the continuous progress of technology and the increasing demand for engineering, the topology optimization design of continuum structures has become a research hotspot in the field of modern engineering. Topology optimization aims to optimize the structural performance by changing the internal layout and connection methods of the structure, thereby improving the load-bearing capacity and efficiency of engineering structures. This article will conduct in-depth research on the topology optimization design of continuum structures, exploring their basic principles, methods,applications, and future development trends.本文将介绍连续体结构拓扑优化的基本概念和原理,包括拓扑优化的定义、目标函数和约束条件等。
一. What is CAD, CAPP, CAM and CIMS, and briefly describe the relationship between them.1.CAD: computer-aided design ,it mainly refers to the use of the computer to complete the entire process of product design generally including CAD design and analysis of two aspects.2, CAPP: Computer Aided Process Planning, it refers to the use of computer technology to complete process planning of part machining.3.CAM: computer-aided manufacturing, it refers to the use of computers and numerical control equipment (such as CNC machine tools, machining centers, etc.) to manufacture parts.4.CAD / CAM: computer-aided design and computer-aided manufacturing referred to using the computer as the primary means to generate and use a variety of digital information and graphical information for the design and manufacture of the product.5.The relationship between CAD, CAPP, CAM: Generally speaking, CAD / CAM CAD / CAPP / CAM of. Referred to that contained in the CAD / CAM systems have CAPP. CAD system to produce parts (including geometricinformation and process manufacturing information, such as tolerance requirements, surface roughness, etc.). CAPP system to accept information from CAD, and the use of reasonable design, process design knowledge processing, optimized processing parameters and processing equipment. CAM CAD / CAPP / CAM system to accept the part information from the CAD and CAPP process planning and process parameters, after the processing of the CAM system to generate NC code, incoming numerical control equipment control equipment automatically processed6.CAD / CAM technology trends: integrated, intelligent, standardization, network-based, three-dimensional二:CADCAM1: Fill in the blanks (the big issue 20 small problem, every empty 1.5 points, a total of 30 points)(1) FMS is the abbreviation of Flexible Manufacturing System.(2) APT is an automatic programming tool (Automatically Programmed Tool) .(3) CAM is the abbreviation of computer aidedmanufacturing.(4) IMS is an abbreviation of the Intelligent Manufacturing Systems (Intelligent Manufacturing System.(5) RPM Rapid Prototyping / parts manufacturing (Rapid Prototyping / Parts Manufacturing abbreviation. (6) AM agile manufacturing Agile Manufacturing of abbreviations.(7) CAD functions can be grouped into geometric model, the four categories of engineering analysis, dynamic simulation and automatic drawing.(8) the engineering database for input and output, and management is required in the design process of the data used and generated by the pattern, documents, etc..(9) usually refers only to the narrow CAM NC program preparation, including the regulation of the tool path, the cutter location file generated simulation of the tool path and NC code generation.(10) The development of application software based on the the User programming language (UPL) supporting and CAD / CAM software system integrally has a good user interface, such enhancements may extend the functionof the CAD / CAM software system.(11) The application software can be automatic programming software, including identifying and processing of the source language software written by the CNC language (such as the APT language software) and a variety of CAD / CAM software.(12) to optimize the design including the optimization of the overall program the part structure optimization and optimization of process parameters.(13) CAD / CAM is a human-computer interaction process, from the shape of the products, ideas and programs, structural analysis process simulation system at any time to ensure the user to view, modify, intermediate results, real-time editing process.(14) modeling technique is the core of the CAM system, for the design and manufacture of the product to provide all the basic data and the original information, is the basis of the subsequent processing.(15) CAD / CAM system according to the three-dimensional model to calculate the geometric characteristics and physical characteristics of the corresponding objects.(16) CAD / CAM system with good information transmission management functions and information exchange process, to support the design and manufacture of the whole process of information transmission and exchange, and the transmission and sharing of information between multiple designers and design teams.(17) CIMS subsystem including management information systems, manufacturing automation systems and CAD / CAPP / CAM integrated system.(18) networking equipment necessary equipment, CAD / CAM systems set up its network equipment is composed of a computer network hubs, network cards, the transmission medium.(19) System software is responsible for managing hardware resources and software resources, for all users, it is a public computer software, including operating systems, compilers, graphics jack and jack standard.(20) Boolean operations, including pay and poor kinds of operations.2: Explain the following terms (the big issue of 6, atotal of 18 points)(1) parametric modeling: parametric modeling is to first establish the constraint relationship between the graphics and size parameters, and then use constraints to define and modify the geometry model. Size constraints and topological constraints reflect the factors to be considered in the design. Because the parameters of a set of parameters to maintain a certain relationship with these constraints, the initial design of the entity is natural to meet these constraints, enter the new parameter values will also keep these constraints and get a new geometric model.(2) plane contour sweeping method: planar contour sweeping method is closely combined with a two-dimensional system, commonly used in a way prismatic or rotary generated. Planar contour scanning method is translated in space by any plane contour a distance or around a fixed axis of rotation of will scan an entity. The plane contour scanning of the prerequisites is to have a closed planar contour (ie the plane contour scanning voxel). Closed planar contour is moved along one coordinate direction orrotation around a given axis.(3) Forming manufacturing technology: molding manufacturing technology is casting, plastic processing, connection, a collection of techniques of powder metallurgy unit. Prototyping and manufacturing technologies are manufacturing the workpiece blank, close to the shape of the part to be made directly workpiece precision molding technology. Plastic molding and grinding combined will replace most of the small parts machining.(4) Boolean operators: Boolean operators is the basis of geometric modeling technology, it is a set operation from Boolean algebra. Boolean operations voxel can be combined into complex shapes, the two objects combined to construct a new object. Boolean operations can facilitate the construction of complex geometric entities. Therefore, in the geometric modeling Boolean operation is very important. Boolean operators include cross (intersection), and (Union) (Difference) three operations.(5) contour milling: the tool contour inspection hierarchical processing of the workpiece, theperipheral contour finishing choose a method.The milling part contour to be considered to maximize the use of the climb milling processing methods, this can increase the surface roughness of parts and machining accuracy and reduce the Machine "flutter". To select reasonable into the retraction position, try to avoid along the part contour method to pause midway to cut and feed; into the retraction position is to be elected in a less important position; when the workpiece border open, in order to ensure processing surface quality, should be outside the boundaries of the workpiece into and retract.(6) virtual manufacturing: the virtual manufacturing is to achieve the essence of the actual manufacturing process on the computer, that computer simulation and virtual reality technology, group work together on the computer, product design, process planning, manufacturing, performance analysis, quality inspection and enterprise levels, process management and control products such as the nature of the manufacturing process in order to enhance the manufacturing process at all levels of decision-makingand control.(7)modification technology: modification technology, including the technology of heat treatment and surface engineering. The major trends is achieved through a variety of new precision heat treatment and composite processing the accurate performance parts, precision of shapes and sizes and a variety of special performance requirements of the surface (coating), while significantly reducing energy consumption and completely eliminate the pollution of the environment.(8) placed characteristics: Place features include hole features, round features, chamfer feature array characteristics. Placing characterized in that the parametric feature to change the location of the feature size and shape parameters, characteristic shape can be changed. Placed generally characterized by late in Part modeling was gradually added, because these features are designed to supplement and refine Part, such as the premature join, will shape the inconvenience.(9) feature-based modeling: feature-based modeling is usually characterized by the shape model the accuracyfeature model, material characteristics model, the shape of the feature model is the core and foundation of feature-based modeling. Is characterized by an integrated concept, as a carrier of information in the product development process, in addition to containing the part geometry and topology information, but also includes some non-geometric information required in the design and manufacturing process. Feature-based modeling is a built in solid modeling based on the use of the characteristics of the concept-oriented modeling method of the entire product design and manufacturing process design, it not only contains information related to the production, but also to describe the relationship between these information .(10) tool set sub-sequence method: Tool centralized sub-sequence method is used by the the tool division process, the site can be completed with the same knife finish machining parts, and then the second, third knife they can accomplish other parts. This can reduce the number of tool changes, compressed air-way time, reduce unnecessary positioning error.(11) geometric modeling: geometric modeling bygeometric elements such as point, line, surface, body, after geometric transformations such as translation, rotation and cross, and poor Boolean operations, resulting in a solid model. Geometric modeling technology as the basis of the CAD / CAM technology, wide range of applications in the field of mechanical engineering.(12) characterized in tree: a feature-based modeling process, the characteristics of 11 successively added to the model, the follow-up characteristic feature attached to the front, the front characteristics of the changes will affect the follow-up characteristics of the changes. In order to properly record the course of feature-based modeling, the use of the feature tree "concept, feature-based modeling process as a tree growing process, starting from the root (basic features), and gradually grow tree twigs (additional features). Part structure of different levels of complexity, the complexity of the feature tree type. Brief answer (the big 4, a total of 25 points)(1) CAD / CAM system (6)1. A variety of CAD / CAM systems basic human-computerinteraction (also known as man-machine dialogue) (1) 2. The operator according to the specific requirements of the instruction issued to the computer (1),3. Computer will be the result of the operation in the form of graphics or data displayed on the computer display (1)4. Operator via the input device to issue a new command to the computer (1).5. Computer according to the new instructions to complete the new instruction (1).6. The job information in the CAD / CAM generated continuously modify, exchange, access process. (2) (2) which aspects should pay attention to when the workpiece is being clampedShould maximize the use of modular fixture, but when the workpiece bulk, high precision, you can design special fixtures. (1)2 part positioning, clamping parts should be without prejudice to the measurement of the various parts of the processing, the replacement of the tool as well as important parts of the tool and the workpiece, tool and fixture collision phenomenon, in particular, should beavoided. (2)3 clamping force should seek close to the main points of support within the triangle formed by the points of support; should seek close to the cutting area, and rigid; Try not above the aperture processing, to reduce the warp. (2)4 parts clamping, positioning to consider the consistency of repeated installations, in order to reduce the time on the knife to improve the consistency of the same batch parts processing; generally the same batch of parts with the same positioning reference and the same clamping means. (2)(3) how to determine the point on the knife and tool change point (6)1. Knife point CNC machining starting point of the tool relative to the workpiece movement, the position of the tool in the workpiece coordinate system. (1)2. Determine the knife point of principle is: to facilitate the mathematical processing and simplify programming; easy alignment on the machine; to facilitate inspection process; processing errors caused by small. (2)3. Knife point can be set on the part fixture or machine, but must be part of the positioning reference coordinate dimensions, so as to determine the relationship between the machine coordinate system and workpiece coordinate system. (1)4. When you require a higher accuracy of the knife, the knife point should be selected on the part of the design basis or process basis. For parts positioning hole, the election center of the hole as the knife point. (1)5. Should be on the knife, the knife point coincide with the knife sites. Knife sites for end mills, end mill as the center of the underside of the head, the ball head center ball end mill for turning, boring tool is the tool tip for the drill bit to drill tip. (1)6. The tool change point should be determined according to the process content. In order to prevent the tool bumps workpiece is when the tool change, tool change point should be located in the outside of the part or fixture. (1)(4) the difference of the feature tree and the course number and contact (7)1. "Feature tree" and subsequent features characteristic attached to the front, the front characteristics of the changes will affect the follow-up characteristic changes. A complex parts by many characteristics, the characteristics of complex dependencies (2)2. Feature tree for parametric modeling system (2)3. History tree the variable modeling system using (2)4. Characteristics in addition to the "history tree" remain associated with the characteristics of the front, at the same time to establish contact with the system global coordinate system. The history tree also allows a number of parts merged together to construct a complex parts. The history tree clearly documented the design process, easy to modify, easy multiplayer co-design.(1)IV: Analysis answer the following questions (the big issue of small, a total of 27 points)(1) feature-based solid modeling method which is divided into several categories, its shape What are the steps (13)A modeling program planning: including thecharacteristics of the analysis part, the relationship between the analysis part features, the construction method of the analysis of the characteristics of the construction sequence and characteristics. (4)2 Create the basic characteristics of the basic features: the building blocks of a part. (3)3 to create additional features: add on additional features one by one according to the shape of program planning. (2)4 Edit to modify the characteristics of: any time in the process of feature-based modeling can be modified characteristics, including modifying the characteristics of the shape, size, location, or characterized by affiliation, or even delete constructed features. (3)5 generated drawings: 3D to 2D technologies interact to generate two-dimensional drawings. (2)(2) CNC machining process decision with which aspects and Description (14)1 Make sure the processing program. Determine the processing program should be considered a reasonable and economical use of CNC machine tools, and give fullplay to the functions of the CNC machine tools. (2) 2 fixture design and selection. Should pay particular attention to speedy completion of the positioning of the workpiece and clamping process, in order to reduce the auxiliary time. Using modular fixture, production preparation period is short, fixture parts can be used repeatedly, the effect of the economy. In addition, the fixture should be easy to install, easy to size relationship between the coordination of the workpiece and machine coordinate system.3 Select the feed path. Reasonable choice of the feed path for CNC machining is very important. Should consider the following aspects: as far as possible to shorten the feed path, to reduce idling knife trip, improve production efficiency; reasonable selection from the knife point, entry point and cut way to ensure a smooth cut, there is no impact; ensure the accuracy and surface of the machined parts roughness requirements; guarantee the security of the process, to avoid the interference of the surface of the tool and the non-processing; help simplify the numerical calculation, to reduce the number of block programmingworkload. (5)4 Select a reasonable tool. To select the tool should be based on the performance of the workpiece material, the processing capacity of the machine, the type of processing operations, cutting and other process-related factors, including the structure of the tool type, material grades and geometrical parameters. (3)5 to determine a reasonable cutting.。
基于ANSYS的结构拓扑优化林丹益;李芳【摘要】针对拓扑优化技术在现实中的应用问题,将拓扑优化技术应用到自行车车架和多拱拱桥的最优化设计中.开展了各种拓扑优化方法的分析研究,建立了“以单元材料密度为设计变量,以结构的柔顺度最小化为目标函数,体积减少百分比为约束函数”的数学模型;通过采用商用有限元软件ANSYS中的拓扑优化设计模块对自行车车架和多拱拱桥进行了拓扑优化设计,优化结果表明所得拓扑结构清晰,并与实际的自行车车架和多拱拱桥非常相似.研究结果表明,该结构拓扑优化方法正确而有效,具有一定的工程应用前景.%In order to solve the application problems of topological optimization technology in reality, the bicycle frames and multiple arch bridge was investigated.After the analysis of all kinds of methods of topological optimizaiton, the mathematical model that unit material density as design variables, the minimum of structural compliance as the objective function, the volume reduction percentage as the constraint function was established. The topology optimization design module of the commercial finite element software ANSYS was used to the bicycle frame and multiple arch bridge for the topology optimization design.The topological structure is clear and they are very likely to the bicycle frame and multiple arch bridge in reality. The results indicate that the method is correct and effective, it has a certain engineering application prospect.【期刊名称】《机电工程》【年(卷),期】2012(029)008【总页数】5页(P898-901,915)【关键词】拓扑优化;ANSYS;自行车车架;多拱拱桥【作者】林丹益;李芳【作者单位】浙江工业大学机械工程学院,浙江杭州310014;浙江工业大学机械工程学院,浙江杭州310014【正文语种】中文【中图分类】TH112;U4840 引言连续体结构优化按照设计变量的类型和求解问题的难易程度可分为尺寸优化、形状优化和拓扑优化3个层次,分别对应于3个不同的产品设计阶段,即详细设计、基本设计和概念设计3个阶段。
第一部分、计算机算法常用术语中英对照Data Structures 基本数据结构Dictionaries 字典Priority Queues 堆Graph Data Structures 图Set Data Structures 集合Kd-Trees 线段树Numerical Problems 数值问题Solving Linear Equations 线性方程组Bandwidth Reduction 带宽压缩Matrix Multiplication 矩阵乘法Determinants and Permanents 行列式Constrained and Unconstrained Optimization 最值问题Linear Programming 线性规划Random Number Generation 随机数生成Factoring and Primality Testing 因子分解/质数判定Arbitrary Precision Arithmetic 高精度计算Knapsack Problem 背包问题Discrete Fourier Transform 离散Fourier变换Combinatorial Problems 组合问题Sorting 排序Searching 查找Median and Selection 中位数Generating Permutations 排列生成Generating Subsets 子集生成Generating Partitions 划分生成Generating Graphs 图的生成Calendrical Calculations 日期Job Scheduling 工程安排Satisfiability 可满足性Graph Problems -- polynomial 图论-多项式算法Connected Components 连通分支Topological Sorting 拓扑排序Minimum Spanning Tree 最小生成树Shortest Path 最短路径Transitive Closure and Reduction 传递闭包Matching 匹配Eulerian Cycle / Chinese Postman Euler回路/中国邮路Edge and Vertex Connectivity 割边/割点Network Flow 网络流Drawing Graphs Nicely 图的描绘Drawing Trees 树的描绘Planarity Detection and Embedding 平面性检测和嵌入Graph Problems -- hard 图论-NP问题Clique 最大团Independent Set 独立集Vertex Cover 点覆盖Traveling Salesman Problem 旅行商问题Hamiltonian Cycle Hamilton回路Graph Partition 图的划分Vertex Coloring 点染色Edge Coloring 边染色Graph Isomorphism 同构Steiner Tree Steiner树Feedback Edge/Vertex Set 最大无环子图Computational Geometry 计算几何Convex Hull 凸包Triangulation 三角剖分Voronoi Diagrams Voronoi图Nearest Neighbor Search 最近点对查询Range Search 围查询Point Location 位置查询Intersection Detection 碰撞测试Bin Packing 装箱问题Medial-Axis Transformation 中轴变换Polygon Partitioning 多边形分割Simplifying Polygons 多边形化简Shape Similarity 相似多边形Motion Planning 运动规划Maintaining Line Arrangements 平面分割Minkowski Sum Minkowski和Set and String Problems 集合与串的问题Set Cover 集合覆盖Set Packing 集合配置String Matching 模式匹配Approximate String Matching 模糊匹配Text Compression 压缩Cryptography 密码Finite State Machine Minimization 有穷自动机简化Longest Common Substring 最长公共子串Shortest Common Superstring 最短公共父串DP——Dynamic Programming——动态规划recursion ——递归第二部分、编程词汇A2A integration A2A整合abstract 抽象的abstract base class (ABC)抽象基类abstract class 抽象类abstraction 抽象、抽象物、抽象性access 存取、访问access level访问级别access function 访问函数account 账户action 动作activate 激活active 活动的actual parameter 实参adapter 适配器add-in 插件address 地址address space 地址空间address-of operator 取地址操作符ADL (argument-dependent lookup)ADO(ActiveX Data Object)ActiveX数据对象advancedaggregation 聚合、聚集algorithm 算法alias 别名align 排列、对齐allocate 分配、配置allocator分配器、配置器angle bracket 尖括号annotation 注解、评注API (Application Programming Interface) 应用(程序)编程接口app domain (application domain)应用域application 应用、应用程序application framework 应用程序框架appearance 外观append 附加architecture 架构、体系结构archive file 归档文件、存档文件argument引数(传给函式的值)。
对于给定的道路路网拓扑空间结构G(N,A),N为路段的节点集,A为有向弧集,也即该路网中的路段集。
上层规划模型的主要目标为使得诱导疏散后的道路,无论是在大型活动顺利召开时还是在发生突发事件需要应急疏散时,都能保证道路路网的饱和度达到最小;一方面,为保证大型活动的顺利举行,场馆周边路网要较为通畅,饱和度要尽可能低;另一方面,在大型活动疏散过程中,要求疏散的时间最短。
大型活动正常举办时,出行者会根据诱导信息以用户最优为主选择疏散最短路径,从而达到个体疏散时间最短的目标;而当大型活动中有突发事件发生时,为防止二次事故的发生,需要启动应急预案,预案要求在疏散过程中以总疏散时间最短为目标,进行以系统最优为主的疏散路径选择,从而达到总疏散时间最短的目标。
For a given road network topological space structure G (N, A), N is the link node set, A is the Directed arc Set, namely, the network of the section of the set. The main goal of the upper programming model is that for the road network after that induced evacuation, the road network saturation can be minimized whether in large smooth activities or in accidents need emergency evacuation. Hence, in order to ensure large activities being held successfully, the saturation degree should be optimized as low as possible; On the other hand, the whole system of the evacuation route choice should be optimized, so the second goal is to minimize total evacuation time.因此,本文建立上层规划问题以周边路网饱和度最低为目标函数,下层规划问题以疏散时间最短为目标函数的双层规划模型,在此模型中,上层规划问题的目标函数和约束条件依赖于下层规划问题的最优解,下层规划问题的最优解又受到上层决策变量的影响。
浅析拓扑关系推理在GIS中应用摘要:拓扑关系是在语义层次上最重要的一种空间关系,拓扑推理的研究主要有两类基本的方法:基于区域连接的RCC方法和基于点集的“n-交集”模型。
GIS空间推理的关键问题是如何利用存贮在数据库中的基本数据信息并结合相关的空间约束来获取所需的未知空间信息。
而对拓扑关系的推理,是GIS空间推理、查询与分析的基础,直接影响GIS的发展与应用。
结合人类的认知模式,并结合时空、模糊、层次等拓扑关系来进行GIS的空间推理,使模型的描述方式更符合人们对拓扑信息的表达和认知方式,并走向网络化和大众化,是空间拓扑推理的发展趋势。
Abstracts: Topology is one of the most important spatial relationships in the semantic level. There are two basic approaches in topological reasoning: region-based methods of RCC and "n-intersection" model based on points. One of the key problems in GIS spatial reasoning is how to use the basic data in the database with relevant space constraints information so as to obtain the required spatial information. What’s more, the topological reasoning is the foundation of GIS spatial reasoning, querying and analysis which has a direct impact on the development and application of GIS. It is useful to combine spatial reasoning with time and space, fuzzy, hierarchical topology for GIS and other spatial reasoning methods in a human cognitive pattern making the model easily to be understood in the expression of the topology information and cognitive styles. It is a trend moving Topological reasoning towards networking and popularity.关键字:拓扑关系,空间推理,空间查询,空间分析引言:近年来空间关系理论已在地理信息系统、智能导航、机器人、计算机视觉、影像理解、图片数据库和CAD/CAM 等领域引起普遍关注。
A New Cluster Isolation Criterion Basedon Dissimilarity IncrementsAna L.N.Fred,Member,IEEE,and Jose´M.N.Leita˜o,Member,IEEE Abstract—This paper addresses the problem of cluster defining criteria by proposing a model-based characterization of interpattern relationships.Taking a dissimilarity matrix between patterns as the basic measure for extracting group structure,dissimilarity increments between neighboring patterns within a cluster are analyzed.Empirical evidence suggests modeling the statistical distribution of these increments by an exponential density;we propose to use this statistical model,which characterizes context,to derive a new cluster isolation criterion.The integration of this criterion in a hierarchical agglomerative clustering framework produces a partitioning of the data, while exhibiting data interrelationships in terms of a dendrogram-type graph.The analysis of the criterion is undertaken through a set of examples,showing the versatility of the method in identifying clusters with arbitrary shape and size;the number of clusters is intrinsically found without requiring ad hoc specification of design parameters nor engaging in a computationally demanding optimization procedure.Index Terms—Clustering,hierarchical methods,context-based clustering,cluster isolation criteria,dissimilarity increments,model-based clustering.æ1I NTRODUCTIONI N this section,we review existing clustering methodolo-gies and algorithms,and outline the goals and the main ideas proposed in this paper.1.1Review of Clustering ApproachesClustering has been applied in a variety of domains,whose main goals are exploratory pattern analysis and data mining, decision-making,and machine learning.Most of the existing work in clustering deals with developing new clustering algorithms.Two main strategies have been adopted:hier-archical methods and partitional methods[1],[2].Partitional methods organize patterns into a small number of clusters.Model-based techniques assume that patterns belonging to a cluster can be given a simple and compact description in terms of a parametrical distribution(such as a Gaussian),a representative element(the centroid or the median,for instance),or some geometrical primitive(lines, planes,circles,ellipses,curves,surfaces,etc.).Such ap-proaches assume particular cluster shapes,partitions being obtained,in general,as a result of an optimization process using a global criterion.Parametric density approaches,such as mixture decomposition techniques[3],[4],[5],[6],and prototype-based methods,such as central clustering[7], square-error clustering[8],K-means[2],[1],or K-medoids clustering[9],emphasize compactness,imposing hyper-spherical clusters in the data.Model order selection is sometimes left as a design parameter or it is incorporated in the clustering procedure[10],[11],[5].The K-means is probably the best known and most widely used algorithm in this category.Assuming a priori knowledge about the number of classes,and based on the square-error criterion,it is a computationally efficient clustering technique that identifies hyperspherical clusters.Extensions of the basic method include:use of Mahalanobis distance to deal with hyperellipsoidal clusters[2];fuzzy algorithms[12];adapta-tions to straight line fitting[13].Optimization-based cluster-ing algorithms adopting shape fitting approaches include [14],[15],[16].Cost-functional clustering methods based on a minimum variance criterion favor spherical clusters.Other optimization-based clustering algorithms do not assume particular cluster shapes,such as the work in[17],proposing a pairwise clustering cost function emphasizing cluster con-nectedness.Nonparametric density-based clustering meth-ods attempt to identify high-density clusters separated by low-density regions by either exploiting regions of high sample density[18]or regions with less data,such as in valley seeking clustering algorithms[19],[20].Hierarchical methods,mostly inspired by graph theory [21],consist of a sequence of nested data partitions in a hierarchical structure,that can be represented graphically as a dendrogram[2].Both agglomerative[2],[22]and divisive approaches[23](such as those based on the minimum spanning tree—MST[2])have been attempted.Variations of the algorithms are obtained depending on the definition of similarity measures between patterns and between clusters [24],the later ultimately determining the structure of the clusters identified.The single-link(SL)and the complete-link (CL)methods[2]are the best known techniques in this class, emphasizing,respectively,connectedness and compactness. Prototype-based hierarchical methods define similarity be-tween clusters based on cluster representatives,such as the centroid or the median;like the prototype-based partitional algorithms,these techniques fail to identify clusters of arbitrary shapes and sizes,imposing spherical structure in the data.Variations of the prototype-based hierarchical clustering include the use of multiple prototypes per cluster, as in the CURE algorithm[25].Other algorithms compute similarity between clusters by the aggregate of the similarities.The authors are with the Instituto de Telecomunicac¸o˜es,Instituto SuperiorTe´cnico,Av.Rovisco Pais1049-001,Lisbon,Portugal.E-mail:{afred,jleitao}@lx.it.pt.Manuscript received14Feb.2001;revised26Mar.2002;accepted27Dec.2002.Recommended for acceptance by R.Kumar.For information on obtaining reprints of this article,please send e-mail to:tpami@,and reference IEEECS Log Number113621.0162-8828/03/$17.00ß2003IEEE Published by the IEEE Computer Society(emphasizing interconnectivity,such as the group-average method[2])among pairs of patterns belonging to distinct clusters,or selecting a particular pair.Other hierarchical agglomerative clustering algorithms follow a split and merge technique;the data being initially split into a high number of small clusters and merging being based on intercluster similarity.A final partition is selected among the clustering hierarchy by thresholding techniques or based on measures of cluster validity.Density-based techniques usually define initial clusters by seeking high-density points(by simple use of K-means clustering[28],applying kernel-based density estimation[18]or using density gradient estimation,the modes being detected with the hill climbing mean shift procedure[29],[30]),density similarity guiding the merging process;simple thresholding[28]or cluster validity indices weighting intercluster connectivity and cluster isolation(low-density regions separating clusters)[18]are used to select a clustering.In the work in[30],an initial random space tessellation is produced to which a mean shift procedure is applied to detect cluster centers.A two phase clustering algorithm is presented in[31],according to which initial subclusters are obtained using a graph partitioning technique to the K-nearest neighbor graph of the data set,followed by a dynamic merging of subclusters under a hierarchical agglom-erative framework.The density-based clustering algorithm presented in[32]explores the idea of intracluster homo-geneity and uniformity,working on links from a complete graph.1.2Goals and Outline of the PaperIn this paper,we address the problem of cluster defining criteria under a model-based framework.A new cluster isolation criterion,briefly outlined in[33],underlying a hypothesis of smooth dissimilarity increments between neighboring patterns,is presented and discussed.It is shown that dissimilarity increments between neighboring patterns within a cluster have a smooth evolution,whose statistical distribution can be modeled by an exponential density function.Dissimilarity increments,by means of their statis-tical model,characterize context.The proposed isolation criterion is supported on a pair-wise context analysis.This isolation criterion is merged in a hierarchical agglomerative clustering algorithm,producing a data partitioning and simultaneous accessibility to the intrinsic data interrelation-ships in terms of a dendrogram-type graph.The structure of the obtained dendrogram,unlike conventional hierarchical clustering methods,is constrained by the isolation criterion, expanding the range of pattern structures handled by these methods,namely,situations containing both sparse and dense clusters.Additionally,the problem of deciding the number of clusters is subsumed and intrinsically dictated by the criterion.Section6studies the distribution of dissimilarity increments,supporting the smooth evolution hypothesis, and outlines the new cluster isolation criterion(Section2.2). Critical evaluation and mathematical manipulation of the parametric context model—exponential distribution—leads to the definition of an intrinsic isolation parameter (Section 2.3).A hierarchical agglomerative algorithm adopting this criterion is described in Section 3.The novelty of the proposed method and its relation to work in the literature is outlined in Section4.The characteristics of the new method are analyzed and illustrated through a set of examples(Section5),covering synthetic data (random data,Gaussian mixtures,concentric patterns, and clusters of arbitrary shape and size)and examples from the UCI Machine Learning Repository[34](Iris data and the Wisconsin Breast Cancer Data Set).Results are compared with the single-link method and the k-means algorithm.A discussion of the proposed method with the SL and the K-means algorithm is presented in Section6. Conclusions are drawn in Section7.2S MOOTHNESS H YPOTHESIS AND C LUSTERI SOLATION C RITERIONLet X be a set of patterns,and x i represent an element in this set.Assume that interpattern relationships are mea-sured by some dissimilarity function,dð:;:Þ.The definition of dð:;:Þis problem and data representation dependent;it may be,for instance,the Euclidean distance for patterns in multidimensional feature spaces;string edit distances[35], [36],[37],[38]are commonly used for quantifying resem-blance between string patterns.The proposed cluster isolation criterion is derived from the following intuitive concepts and assumptions: .A cluster is a set of patterns sharing important characteristics,defining a context..Dissimilarity between neighboring patterns within a cluster should not occur with abrupt changes..The merging of well separated clusters results in abrupt changes in dissimilarity values.The first concept states that a cluster gathers interrelated patterns,the pattern dependence profile being a character-istic of the cluster,thus defining a context;this enables its distinction from other clusters.The last two items state a hypothesis of smooth evolution of dissimilarity changes,or increments,between neighboring patterns within a cluster, nonsatisfaction of this condition being associated with cluster isolation.This smoothness hypothesis is the genesis of the proposed cluster isolation criterion,the dissimilarity increments measuring continuity within a cluster.2.1Distribution of Dissimilarity Increments Consider a set of patterns X.Given x i,an arbitrary element of X and some dissimilarity measure,dð:;:Þ,between patterns,letðx i;x j;x kÞbe the triplet of nearest neighbors, obtained as follows:ðx i;x j;x kÞÀnearest neighborsx j:j¼arg minldðx l;x iÞ;l¼if gx k:k¼arg minldðx l;x jÞ;l¼i;l¼jÈÉ: The dissimilarity increment between the neighboring patterns is defined asd incðx i;x j;x kÞ¼j dðx i;x jÞÀdðx j;x kÞj;which can be seen as the first derivative of the dissimilarity function at the first point of the ordered list of neighboring samples.There is experimental evidence that the increments of the dissimilarity measure between neighboring patterns,as defined above,typically exhibit an exponential distribution, pðxÞ¼ expÀ x;x>0,as illustrated in Fig.1.This figure plots histograms and corresponding fitted distributions of dissim-ilarity increments for a variety of data sets.Two-dimensional examples were chosen for simplicity of representation:FREDAND LEIT~A O:A NEW CLUSTER ISOLATION CRITERION BASED ON DISSIMILARITY INCREMENTS3 Fig.1.Histograms(bar graphs)and fitted exponential distributions(solid line curves)of the dissimilarity increments computed over neighboring patterns in the data.The Euclidean distance was used as the dissimilarity measure.(a)There are2,000uniformly distributed patterns within a square, (b)500patterns generated from a Gaussian distribution(Nð½0;0 ;½100;010 Þ),(c)ring-shaped data(1,000random patterns),(d)1,000patterns generated according to the stochastic model:yðkþ1Þ¼yðkÞþn1ðkÞ,xðkþ1Þ¼xðkÞþn2ðkÞ,with n1ðkÞ,n2ðkÞbeing noise uniformly distributed in the interval½À:25;:25 ,(e)directional expanding data generated by the model:xðkþ1Þ¼xðkÞþn sðkÞk,yðkþ1Þ¼yðkÞþnðkÞ,where n s and nðkÞrepresent uniform noise in the range½À10;10 and½0;10 ,respectively,and(f)grid corrupted by zero mean Gaussian noise,with standard deviation0.1..random samples(uniform distribution),.2D Gaussian process,.noisy ring shaped pattern,.2D stochastic process,.directional expanding pattern,and.grid corrupted by Gaussian noise.The Euclidean distance is used as the dissimilarity measure in these examples.As shown in Fig.2d,the statistical distribution of the dissimilarity increments within the same context or data formation model(cluster)has a smooth evolution,where the parameter of the fitted exponential probability density function characterizes data sparseness.It can be observed that distinct data generation models lead to very similar curves(for instance,patterns in Figs.1c and1d),while an increasing number of observations from the same process (corresponding to decreasing data dispersion levels)results in increasing values for the parameter of the exponential distribution(see Fig.2).Thus,by adopting the dissimilarity derivatives as features for context characterization,a single parametric model(exponential distribution)is obtained for distinct cluster shapes or data generation paradigms.When con-sidering well-separated clusters,it is clear that dissimilarity increments between patterns in different clusters are positioned far on the tail of the distribution associated with the other cluster.We explore this property in defining a cluster isolation criterion in the next section.2.2Isolation CriterionWe extend the previous concept of dissimilarity increments between neighboring patterns to define the concept of gap between clusters.Let C i,C j be two clusters candidate for merging,as the ones shown in Fig.3,and consider the nearest pattern pair,ðx i;x jÞ,linking these clusters,such that x i2C i and x j2C j (x i x12and x j x18Fig.3).We shall represent the dissimilarity between these patterns,dðx i;x jÞ,as dðC i;C jÞ(corresponding to the distance between the two clusters, according to the nearest-neighbor rule).Let x k be the nearest neighbor of x i within C i(pattern x3in Fig.3),and let d tðC iÞ¼dðx i;x kÞ.The tripletðx k;x i;x jÞ,therefore,corre-sponds to neighboring patterns.We define dissimilarity increment or gap between clusters i and j as the asymmetric increase in the dissimilarity value,needed in order to allow the data association into a single cluster:gap i¼j dðC i;C jÞÀd tðC iÞj:ð1Þ4IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,VOL.25,NO.8,AUGUST2003 Fig.2.Fitted exponential distributions for the dissimilarity increments of2D data,randomly generated from a uniform distribution:(a)2,000samples, (b)1,000samples,and(c)500samples.(d)Steep exponentials(higher parameter)correspond to high-densitypatterns.Fig.3.Definition of gap.The figure shows18two-dimensional patternsgrouped in two clusters.The patterns are linked by the minimumspanning tree,adopting the Euclidean distance as the edge weight.In a similar way,we find x l (x 17in Fig.3),the nearestpattern to x j belonging to C j ,and define gap between cluster j and i :gap j ¼j d ðC i ;C j ÞÀd t ðC j Þj ¼j d ðC i ;C j ÞÀd ðx j ;x l Þj .Dissimilarity increments between neighboring patterns within a cluster is a measure of pattern continuity.The statistical distribution of dissimilarity increments is modeledby an exponential distribution.Let ^ i ¼1^ i ,^j ,be the average of dissimilarity increments in cluster C i and C j ,respectively.Tails of these distributions correspond to patterns in frontier or borderline situations,where continuity is broken.The gaps,gap i ,gap j ,represent the increase in neighboring pattern distances needed in order to join the two clusters,measuring intercluster continuity,as seen from each cluster perspective.If the two clusters are well separated,these gaps will have high values (compared to intracluster statistics),being located on the tails of each cluster statistic,and corresponding to a discontinuity in both clusters structure.In situations of touching clusters with distinct densities,as in the example shown in Fig.4,context analysis is needed in order to identify the clusters.The dashed line in Fig.4links the nearest-neighbor patterns connecting the two clusters;remaining lines link the intracluster nearest neighbors to each of these elements.From this figure,it is intuitive to see that the element from the cluster on the right could naturally be included in the left cluster since the increment (gap 1¼0:0150)is small compared to the intracluster statistic (^1¼0:0268).From the context of the cluster on the right,however,the dissimilarity increment (gap 2¼0:0542)is large compared to the average dissimilarity increments within this cluster:^2¼0:0068.Therefore,taking the one-sided perspective of cluster C 1,the two clusters could be merged;from the context of C 2,the clusters are isolated.The cluster isolation criterion consists of setting a limit on the dissimilarity increments,such that most of the patterns exhibiting the same statistical structure or model (densely or sparsely connected)are included in the same cluster,while all others,not satisfying this smoothness hypothesis,are rejected:.Let C i ,C j be two clusters which are candidates for merging,and let i , j be the respective mean values of the dissimilarity increments in each pute the increments for each cluster,gap i and gap j ,as defined in (1).If gap i ! i (gap j ! j ),isolate cluster C i (C j )and continue the clustering strategy with the remaining patterns.If neither cluster exceeds the gap limit,merge them.Notice that the above criterion can be regarded as a context-dependent cluster isolation rule where the context is modeled by the parametric distribution of dissimilarity increments.The isolation rule consists of comparing the value of the dissimilarity increment,seen from the context of each cluster,with a dynamic threshold, i ,computed from this context;inconsistency of gap values in a given context (cluster)determines the isolation of that cluster.The design parameter, ,constrains the degree of isolation;values in the range 3to 5provide reasonable choices,as justified in the next section.2.3Setting the Isolation ParameterAs seen previously,the structure of the dissimilarity increments within a cluster is summarized by an exponen-tial distribution;the parameter of this distribution thus characterizes each cluster.Well-separated clusters are clearly identified by the analysis of these distributions,as samples not belonging to a given cluster will be placed far in the tail of the cluster distribution.A reasonable choice for the isolation parameter, ,is to set it at a point on the tail that does not reject a significant amount of data nor does it allow grouping of patterns that are clearly atypical.Theoretical analysis of the exponential distribution leads to the following interesting result (see Appendix A):The crossing of the tangential line,at points which are multiples ofthe distribution’s mean value,i Â1,with the x axis,is givenby ði þ1ÞÂ1;this is shown in Fig.5.Therefore,setting the threshold, ,to some multiple of the distribution mean,i.e., inside the interval 3to 5is a reasonable choice.In examples throughout the paper,the typical value used is ¼3.3H IERARCHICAL C LUSTERING A LGORITHMIn this section,we incorporate the cluster isolation criterion described in Section 2.2in a hierarchical agglomerative clustering algorithm.Each cluster,C i ,is characterized by: ½i —the estimate of the mean value of the dissimilarity increments within the cluster;jumps ½i —the number of elements used in this estimate.The algorithm starts withFREDAND LEIT ~AO:A NEWCLUSTER ISOLATION CRITERION BASED ON DISSIMILARITY INCREMENTS 5Fig.4.Touching classes with distinct densities.Fig.5.Defining a threshold on the gap values (x axis).Dots are locatedon points which are multiple of the distribution mean,1¼0:05and dashed lines are tangents at those points.The crossing at the x axisoccur at points i,i being a positive integer.Values for i in the range ½3;5 cover the most significant part of the distribution.each pattern in a cluster,the dissimilarity matrix between pattern pairs being computed.It evolves by selecting the most similar pair of clusters and applying the cluster isolation criterion from each cluster context;clusters are thus either isolated(one or both)and frozen on the dendrogram,or merged;frozen clusters are not available for further merging. Statistics ½i are updated along the merging process.Estimates of the mean values ½i are not reliable for very small cluster sizes;this may lead to premature isolation of clusters.In order to overcome this situation,widening of the isolation parameter for small cluster sizes may be adopted [39];alternatively,inhibition of cluster isolation actions may be implemented when clusters are very small[33].In this paper,we replace the term ^ i by the dynamic threshold t dyn Cið ;^ i;ni;njÞ¼ ^ iÂwiden factðni;njÞþdelta factðniÞ:ð2ÞExpression(2)hastwoterms.Thefirsttermincreasesthevalue of the estimate^ i by multiplying it by a factor greater than or equal to1,widen factðni;njÞ,where ni jumps½i and nj jumps½j are the number of elements available for the computation of the distribution means for cluster C i and C j, respectively.We define the amplifying factor widen factðni;njÞas a monotonous decreasing function of ni;nj:widen factðni;njÞ¼1þ Â1À11þeÀ:4ðniÀ10Þ|fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}f1ðniÞÂ2À11þeÀ:4ðnjÀ10Þ|fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}f2ðnjÞ:ð3ÞThe reasoning underlying(3)is the following(see Fig.6).If cluster C i has few samples,the estimate^ ðC iÞshould be enlarged to compensate for possible underestimation of the true distribution mean;this widening effect smoothly vanishes as the number of terms ni used in the computation of the estimate^ ðC iÞincreases(Fig.6a),which is modeled by the term f1ðniÞ,a sigmoid-like function.The term f2ðnjÞexpresses the reinforcement of the widening effect when the number of elements in the competing cluster C j is also low (Fig.6b),taking values greater or equal to1.When both clusters have low cardinality the combined action of f1and f2favors clusters merging.When cluster C i has already a sufficiently large number of elements,the estimate of^ ðC iÞis considered to be reliable and term f1ðniÞtends to zero,thus annihilating the influence of term f2(the size of cluster C j becomes irrelevant—see Fig.6a,ni!25).In(3), is a scaling parameter(default value:3).When the number of elements available for the estima-tion of the dissimilarity increments statistic,n i,is extremely low(such as when the number of cluster’s samples is less than10),the estimate for the parameter is very poor. Applying a multiplicative factor to the threshold term may not solve the underestimation problem in this situation,in particular,when^ is near zero.The second term in(2),with large values vanishing for ni¼10,boosts near zero estimates for extremely small sized clusters:delta factðniÞ¼big valÂ1À11þeÀ10ðniÀ5Þ;ð4Þwhere big val is a large positive number.In order to compute the gap between clusters,one needs to know the distances between nearest neighbor ing the nearest-neighbor rule for updating intercluster dissim-ilarity,dðC i;C jÞgives the desired distance between nearest neighbors in each cluster.Considering that most similar patterns are joined first,dissimilarity values growing along the evolution of the clustering algorithm,we will approximate the exact value of the gap by gap i¼dðC i;C jÞÀd t½i ,with d t½i representing the dissimilarity in the last merging performed in cluster C i(see Fig.7).This approximation prevents further computation of nearest neighbors in each cluster,leading to a computationally more efficient algorithm.The following gives a schematic description of the clustering algorithm.Input:N samples; (default value is3).Output:Data partitioning.Steps:1.Set:Final clusters¼ ;n¼N;Put the i th sample in cluster C i;i¼1;...;n;Clusters¼SiC i;i¼1;...;n;d t½i ¼ ½i ¼jumps½i ¼0;i¼1;...;n;2.If(Clusters¼¼ )or(n¼¼1)6IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,VOL.25,NO.8,AUGUST2003 Fig.6.Amplification term widen fact associated with the estimate^ for cluster C i.(a)Amplification factor as a function of the number of terms used in the computation of the gaps distribution mean for cluster C i( ¼1).(b)Reinforcement of the amplifying term as a function of the number of elements in cluster C j( ¼1).then stop,returning the clusters found in Final clusters S Clusters ;else continue.3.Choose the most similar pair of clusters ðC i ;C j Þfrom Clusters .Letgap i ¼d ðC i ;C j ÞÀd t ½ini ¼jumps ½i gap j ¼d ðC i ;C j ÞÀd t ½jnj ¼jumps ½j 4.If ((gap i <t dyn C i ð ; ½i ;ni;nj Þ)and(gap j <t dyn C j ð ; ½j ;nj;ni Þ))then join the clusters C i ,C j into cluster C i;j :C i;j ¼C i S C j Let I be the index for the merged cluster;Replace C i ,C j by C i;j in Clusters ;d t ½I ¼d ðC i ;C j Þ;jumps ½I ¼jumps ½i þjumps ½j þ2;½I ¼ ½i jumps ½i jumps ½I þ ½j jumps ½jjumps ½I þgap i þgap j jumps ½I ;Go to step 2.else continue.5.If (gap i !t dyn C i ð ; ½i ;ni;nj Þ)then set Final clusters ¼Final clusters S C i ;Remove C i from Clusters ;n ¼n À1.end ifIf (gap j !t dyn C j ð ; ½j ;nj;ni Þ)then set Final clusters ¼Final clusters S C j ;Remove C j from Clusters ;n ¼n À1.end ifGo to step 2.4R ELATED W ORKThe distinctive feature of the proposed scatter measure,which forms the basis of the cluster isolation criterion,consists of analyzing and modeling dissimilarity increments in neighboring patterns,instead of statistical or geometrical manipulations of the dissimilarity values between patterns.Dissimilarity increments measure continuity within a cluster.The work presented in [32]explores the concept of uniformity to detect clusters with similar interior distances.It works on links from a complete graph.Initial clusters are defined by gathering links differing in length by no more than a given threshold.The length difference within these clusters,whichis similar to the dissimilarity increment proposed in this paper,has an a priori fixed upper value;in our method,increments are compared to an adaptive threshold,which depends on individual cluster statistics.The merging process proposed in [32]is based on the comparison of intracluster average distances;in our method,the distribution of increments within a cluster is modeled by a parametric model (exponential distribution),the parameter summariz-ing cluster structure being the average value of increments between neighboring patterns.Increment values computed from the nearest pair of patterns in distinct clusters are compared to each cluster statistic to decide for merging.The proposed cluster isolation criterion has been evaluated in the context of hierarchical agglomerative clustering,adopting a nearest-neighbor rule for measuring the similarity between clusters.This new algorithm is therefore closely related to graph-theoretical methods,in particular,with the single-link method:both methods start with single element clusters,merging most similar clusters first,and updating the similarity matrix according to the nearest-neighbor rule.A major distinction between the two methods is that the standard SL method uses a fixed threshold on dissimilarity values for cutting the resulting dendrogram,while the herein proposed method uses an adaptive threshold on dissimilarity first derivatives,based on the computation of intracluster statistics of dissimilarity increments.These statistics are scatter measures,characterizing density of clusters.With the proposed cluster isolation criterion,the new algorithm is able to identify clusters with different densities,which requires special treatment when using graph-theoretical methods,such as detecting and removing denser clusters,and then clustering the remaining patterns.With the proposed approach,this situation is easily handled as,according to the asymmetric isolation criterion,denser clusters are identified and frozen on the dendrogram,the clustering process based on dissimilarity increments proceeding with the remaining data.Some authors have adopted postproces-sing of the dendrogram produced by the SL method [26],[27]or,equivalently,processing of the minimum spanning tree (MST),in order to obtain a final data partition.Zhan [21]proposed a technique for the identification of clusters from a minimum spanning tree by removing inconsistent links based on the comparison of the link distance (dissimilarity between linked patterns)with the average of nearby link distances on both sides of the link.Inconsistent links removal is therefore based on local dissimilarity statistics;our method,however,evaluates overall clusters statistics (of dissimilarity incre-ments instead of distances)along the clustering process,eventually conditioning the final form of the dendrogram.This dynamic construction of the dendrogram,the final topology being conditioned by intracluster statistics,opposes to the static behavior of the above methods,based on postprocessing of structures.A dynamic hierarchical agglom-erative procedure is proposed in [31].In that work,however,similarity between clusters combines interconnectivity and relative closeness measures based on the K-nearest neighbor graph of the data set,isolation criteria consisting of the comparison of the similarity value with a user specified parameter,controlling,simultaneously with the K parameter,the characteristics of the desired clusters.FREDAND LEIT ~AO:A NEW CLUSTER ISOLATION CRITERION BASED ON DISSIMILARITY INCREMENTS 7Fig.7.Definition of gap on the dendrogram produced by the single-linkmethod for the data in Fig.3.。
1.软件理论包括:–计算模型与可计算理论、算法理论基础、算法设计与分析、程序设计语言理论基础、程序设计语言设计及其编译技术、数理逻辑、数据抽象与基本数据类型的实现技术等。
对软件工程而言,充满了方法论的内容:–如何分析、如何设计、如何编程、如何测试、如何维护、…;–软件模型与软件系统的“质量”在很大程度上依赖于开发者本身的经验与水平。
因为缺乏对软件开发过程的理论层面的刻画,没有将数目众多的方法论总结提升为理论,故而只能是“工程”。
2.体系结构”的共性一组基本的构成要素——构件–这些要素之间的连接关系——连接件–这些要素连接之后形成的拓扑结构——物理分布–作用于这些要素或连接关系上的限制条件——约束–质量——性能3.软件体系结构(SA):–提供了一个结构、行为和属性的高级抽象–从一个较高的层次来考虑组成系统的构件、构件之间的连接,以及由构件与构件交互形成的拓扑结构–这些要素应该满足一定的限制,遵循一定的设计规则,能够在一定的环境下进行演化。
–反映系统开发中具有重要影响的设计决策,便于各种人员的交流,反映多种关注,据此开发的系统能完成系统既定的功能和性能需求4.体系结构= 构件+ 连接件+ 拓扑结构+ 约束+ 质量5.目标:提高软件质量–功能属性(functional properties):能够完成用户功能性需求;–非功能性属性(non-functional properties):能够以用户要求的性能标准,合理、高效的实现各类功能性需求。
6.软件体系结构关注的是:如何将复杂的软件系统划分为模块、如何规范模块的构成和性能、以及如何将这些模块组织为完整的系统。
⏹主要目标:建立一个一致的系统及其视图集,并表达为最终用户和软件设计者需要的结构形式,支持用户和设计者之间的交流与理解。
⏹分为两方面:–外向目标:建立满足最终用户要求的系统需求;–内向目标:建立满足系统设计者需要以及易于系统实现、维护和扩展的系统构件构成。
电路基础常用英语单词Component 部件device 器件Integration circuit(IC)集成电路Electric diagram 电气图model 模型Lumped parameter element 集总参数元件Distributed 分布resistive 电阻性Dynamic 动态periodic 周期性direct current(DC)恒定电路(直流)time varying 时变电路alternating交变(AC)terminal 端扭reference direction 参考方向accessible 可接通associated 关联signal 信号branch支路node节点loop 回路mesh 网孔network电网络kirchhoff’s current law 基尔霍夫电流定律(KCL)constraint 约束graph 图、图形linearly dependent 线性相关kirchhoff’s voltage law基尔霍夫电压定律(KVL)directed graph 定向图voltage current relation(VCR)电压电流关系resistor 电阻元件linear 线性conductance 电导memoryless 无记忆nonlinear 无线性bilateral 双向性unilateral 单向性passivity 无源性active有源companion network 伴侣网络backward Euler 后向欧拉controlled source 受控源transfer 转移ground 地node voltage 节点电压potentiometer 电位器topological constraints 拓扑约束element constraints 原件约束planar circuit 平面电路tableau analysis 表格分析branch analysis 支路分析mesh current网孔电流self resistance 自电阻mutual resistance 互电阻reciprocity 互易self conductance 自电导mutual conductance 互电导operational amplifier 运算放大器op-amp 运放virtual ground 虚地voltage follower电压跟随器duality对偶性excitation 激励response 响应homogeneity齐次性proportionality 比例性driving point 策动点superposition 叠加性bit 比特binary二进制digit 数字digital-analog converter, DAC数模转换器digital-analog conversion, DAC 数模转换partition 分解subnetwork 子网络one-port 单口substitution theorem 置换定律equivalence 等效redundant 多余match匹配instantaneous 即时的dynamic element 动态元件memory 记忆capacitor 电容元件capacitance 电容inductor 电感元件flux linkage 磁链inductance 电感self inductance 自感state状态step阶跃first order circuit 一阶电路zero state response 零状态响应zero input response 零输入相应complete response 全响应dc steady state 直流稳态time constant时间常数unit step function单位阶跃函数delayed 延时pulse脉冲piecewise-constant signal 分段常量信号pulse train 脉冲串unit impulse function 单位冲激函数sampling property 取样性质related source 相关电源inspection method视察法transient state 瞬态steady state 稳态transition 过渡forced强制nature 固有free 自由sinusoidal 正弦amplitude 振幅initial phase 初相sinusoidal steady state,SSS 正弦稳态响应damped oscillation 阻尼振荡characteristic equation 特征方程over damped 过阻尼critically damped 临街阻尼under damped 欠阻尼resonant谐振overshoot 上冲complex number 复数phasor 相量impedance 阻抗admittance 导纳reactance 电抗susceptace电纳phasor model 相量模型effective value有效值rootmean-square value 方均根植average power 平均功率active power有功功率reactive power 无功功率apparent power视在功率power factor 功率因素var 乏load 负载phase voltage 相电压phase sequence相序neutral point中性点line voltage 线电压harmonics 谐波frequency response频率响应low pass(LP)低通band-with(BW)通频带cut-off frequency 截止频率resonance 谐振selectivity 选择性hand pass 带通peak-to-valley 峰谷值peak-to-peak峰峰值coupled inductor 耦合电感turns ratio匝比reflected impedance反映阻抗referred 折合auto-transformer自耦合变压器ideal transformer 理想变阻器transformer ratio变比leakage flux漏磁通mutual flux 互磁通iron-core铁芯magnetizing 磁化Laplace 拉普拉斯leakage inductance 漏感。
INTEGRATION,the VLSI journal 38(2005)541–548Automatic cell placement for quantum-dot cellular automata $Ramprasad Ravichandran a ,Sung Kyu Lim b,Ã,Mike Niemier aaCollege of Computing,Georgia Institute of Technology,Atlanta,GA 30332,USAbSchool of Electrical and Computer Engineering,Georgia Institute of Technology,777Atlantic Drive NW,Atlanta 30305,GA 30332,USAReceived 14July 2004;accepted 21July 2004AbstractQuantum-dot cellular automata (QCA)is a novel nano-scale computing mechanism that can represent binary information based on spatial distribution of electron charge configuration in chemical molecules.In this paper we develop the first cell-level placement of QCA circuits under buildability constraints.We formulate the QCA cell placement as a unidirectional geometric embedding of k-layered bipartite graphs.We then present an analytical and a stochastic solution for minimizing the wire crossings and wire length in these placement solutions.r 2004Elsevier B.V.All rights reserved.MSC:94C15;68W35;03G12Keywords:Quantum-dot Cellular Automata;Placement1.IntroductionOne approach to computing at the nano-scale is the quantum-dot cellular automata (QCA)[1,2]concept that represents information in a binary fashion,but replaces a current switch with a cell having a bi-stable charge configuration.A wealth of experiments have been conducted with/locate/vlsi0167-9260/$-see front matter r 2004Elsevier B.V.All rights reserved.doi:10.1016/j.vlsi.2004.07.002$A short version (Ravichandran et al.,2004)is published in the Proceedings of ACM Great Lake Symposium on VLSI,2004.ÃCorresponding author.Tel.:4048940373;fax:4043851746.E-mail address:limsk@ (S.K.Lim).metal-dot QCA,with individual devices,logic gates,wires,latches and clocked devices,all having been realized.In this article,we develop the first cell-level placement of QCA circuits.We formulate the QCA cell placement as a unidirectional geometric embedding of k-layered bipartite graphs.We then present an analytical and a stochastic solution for minimizing the wire crossings and wire length in these placement solutions.Our goal is to identify several objectives and constraints that enhance the buildability of QCA circuits and use them in our placement optimization process.The results are intended to define what is computationally interesting and could actually be built within a set of predefined placement constraints.A QCA cell is illustrated in Fig.1(a).Two mobile electrons are loaded into this cell and can move to different quantum dots by means of electron tunneling.Coulombic repulsion will cause the electrons to occupy only the corners of the QCA cell,resulting in two specific polarizations.The fundamental QCA logical gate is the three-input majority gate.It consists of five cells and implements the logical equation AB þBC þAC as shown in Fig.1(b).The QCA wire is a horizontal row of QCA cells and a binary signal propagates from left-to-right because of electrostatic interactions between adjacent cells as shown in Fig.1(c).A QCA wire can also be comprised of cells rotated 45 :Here,as a binary signal propagates down the length of the wire,it alternates between a binary 1and a binary 0polarization.QCA wires are able to cross in the plane without the destruction of the value being transmitted on either wire as shown in Fig.1(c).Our work focus on the following undesirable design schematic characteristics associated with a near-to-midterm buildability point:large amounts of deterministic device placement,long wires,clock skew,and wire crossings.We will use CAD to:(1)identify logic gates and blocks that can be duplicated to reduce wire crossings;(2)rearrange logic gates and nodes to reduce wire crossings;(3)create shorter routing paths to logical gates (to reduce the risk of clock skew and susceptibility to defects and errors);and (4)reduce the area of a circuit (making it easier to physically build).Some of these problems have been individually considered in existing work for silicon-based VLSI design,but in combination,form a set of constraints unique to QCA requiring a unique toolset to solve them.2.Problem formulationQCA placement is divided into three steps:zone partitioning,zone placement,and cell placement.An illustration is shown in Fig.2.The purpose of zone partitioning is to decompose an input circuit such that a single potential modulates the inner-dot barriers in all of the QCA cells that are grouped within a clocking zone.The zone placement step takes as input a set of(a)(b)(c)Fig.1.Illustration of QCA device,majority gate,and wires.R.Ravichandran et al./INTEGRATION,the VLSI journal 38(2005)541–548542R.Ravichandran et al./INTEGRATION,the VLSI journal38(2005)541–548543(d)(e)(f)Fig.2.Illustration of QCA placement steps.(a)input circuit represented with a DAG(directed acyclic graph),(b)zone partitioning,(c)wire block insertion,(d)zone placement,(e)wire crossing minimization at zone-level,(f)cell placement. zones—with each zone assigned a clocking label obtained from zone partitioning.The output of zone placement is the best possible layout for arranging the zones on a two dimensional chip area. Finally,cell placement visits each zone to determine the location of each individual logic QCA cell—a cell used to build majority gates.Our recent work on zone partitioning and zone placement work is available in[3].The focus of this article is on cell placement that is formally defined as follows:Definition1.Cell placement:we seek a placement of individual logic gates in the logic block so that area,wire crossing and wirelength are minimized.The following set of constraints exists during QCA cell placement:(1)the timing constraint:the signal propagation delay from the beginning of a zone to the end of a zone should be less than a clock period established from zone partitioning;(2)the terminal constraint:the I/O terminals are located on the top and bottom boundaries of each logic block;(3)the signal direction constraint:the signalflow among the logic QCA cells needs to be unidirectional–from the input to the output boundary for each zone. The signal direction is caused by QCA’s clocking scheme,where an electricfield E created by underlying CMOS wire is propagating in uni-directionally within each block.Thus,cell placement needs to be done in such a way to propagate the logic outputs in the same direction as E.In order to balance the length of intra-zone wires,we construct cell-level k-layered bipartite graph for each zone and place this graph.We define the k-layered bipartite graph as follows:Definition 2.K-layered bipartite graph:a directed graph G ðV ;E Þis k-layered bipartite graph if (i)V is divided into k disjoint partitions,(ii)each partition p is assigned a level,denoted lev ðp Þ;and (iii)for every edge e ¼ðx ;y Þ;lev ðy Þ¼lev ðx Þþ1:3.Cell placement algorithmThis section presents our cell placement algorithm,which consists of feed-through insertion,row folding,and wire crossing and wirelength optimization steps.3.1.Feed-through insertionIn order to satisfy the relative ordering and to satisfy the signal direction constraint,the original graph G ðV ;E Þis mapped into a k-layered bipartite graph G 0ðV 0;E 0Þwhich is obtained by insertion of feed-through gates,where V 0is the union of the original vertex set V and the set of feed-through gates,and E 0is the corresponding edge set.The following algorithm performs feed-through insertion.feed-through_insertion(G ðV ;E Þ)if (V is empty)return ;n ¼V :pop ðÞ;if (n has no child with bigger level)return ;g =new feed-through;lev ðg Þ¼lev ðn Þþ1;for (each child c of n )g ¼parent ðc Þ;c ¼child ðg Þ;n ¼parent ðg Þ;g ¼child ðn Þ;add g into G ;feed-through_insertion(G (V,E ));In this algorithm,we traverse through every vertex in the graph.For a given vertex,if any of the outgoing edges terminate at a vertex with topological order more than one level apart,a new feed-through vertex is added to the vertex set.The parent of the feed-through is set to the current vertex,and all children of the current vertex which have a topological order difference of more than one is set as the children of the feed-through.We do not need to specifically worry about the exact level difference between the feed-through and the child nodes,since this feed-through insertion is a recursive process.This algorithm runs in O ðk j V 0jÞ;where k is the maximum degree of V 0:Fig.3shows the graph before and after feed-through insertion.3.2.Row-folding algorithmAfter the feed-through insertion stage,some rows may have more gates than the average number of gates per row.The row with the largest number of gates defines the width of the entire zone,and hence the width of the global column that the zone belongs to.This increases the circuitR.Ravichandran et al./INTEGRATION,the VLSI journal 38(2005)541–548544area by a huge factor.Hence,rows with a large number of cells are folded into two or more rows.This is done by inserting feed-through gates in place of the logic gates and moving the gates to the next row.Row-folding decreases the width of the row since feed-throughs have a lower width than the gate it replaces.A gate g is moved into the next existing row if it belongs to the row that needs to be folded and all paths that g belongs to contain at least one feed-through with a higher topological order than g .The reason for the feed-through condition is that g ,along with all gates between g and the feed-through can be pushed to a higher row,and the feed-through can be deleted without violating the topological ordering constraint.The following algorithm performs row folding.row_folding ðG ;w Þif (w is a feed-through)return (TRUE);if (w :level ¼G :max level )return (FALSE);RETVAL =TRUE;k ¼w :outdegree ;i ¼0;while (RETVAL and i o k )RETVAL =row_folding ðG ;w :CHILD ði ÞÞ;i ¼i þ1;return (RETVAL);This algorithm returns true if a node can be moved,and false if a new row has to be inserted.If this feed-through criterion is not met,and the row containing g has to be folded,then a new row is inserted and g is moved into that row.3.3.Wirelength and wire crossing minimizationA width-balanced k-layered bipartite graph is formed via feed-through insertion and row folding stages.This graph is placed in such a way that all cells of the same longest path length are placed in the same row.The next step is then to rearrange the cells in each row to reduce wire crossing.Wire crossing minimization is already NP-hard for bipartite graphs with two rowsFig.3.Illustration of feed-through insertion,where a cell-level k-layered bipartite-graph is formed via feed-through nodes.R.Ravichandran et al./INTEGRATION,the VLSI journal 38(2005)541–548545only [4].Our approach for wire crossing minimization in k -layered bipartite graphs is to use a well-known barycenter heuristic [4]to build the initial solution and refine it with Simulated Annealing.In barycenter heuristic,the nodes in the top layer are fixed and used to rearrange the nodes in the bottom layer.For each node v in the bottom layer,we compute the center of mass,i.e.,m ðv Þ¼Pu 2FI ðv Þcolumn ðu Þ=j FI ðv Þj ;where FI ðv Þdenotes the fan-in nodes of v .These nodes are then sorted in an increasing order of m ðv Þand placed from the left-most column.During Simulated Annealing,a move is performed by swapping two randomly chosen gates in the same row in order to minimize the total wirelength and wire crossing.We initially compute the wirelength and wire crossing and incrementally update these values after each move so that the update can be done much faster.This speedup allows us to explore a greater number of candidate solutions,and as a result,obtain better quality solutions.We use the adjacency matrix to compute the number of wire crossings.In a bipartite graph,there is a wire crossing between two layers v and u if v i talks to u j and v x talks to u y ;where i ,j ,x ,and y denote the relative positional ordering of the nodes,and either,i o x o j o y or i o x o y o j or x o i o y o j or x o i o j o y without loss of generality.In terms of an adjacency matrix,this can be regarded as if either the point ði ;j Þis included in the lower left sub-matrix of ðx ;y Þor vice versa.Fig.4shows an example of wire crossing computation.The total crossing is computed by adding the product of every matrix element and the sum of its left lower sub-matrix entries.i.e.P ðA ij ÂP PA xy Þ;where i þ1o x o n and 1o y o j À1:However,this method is computationally expensive if we have to perform it frequently.In our incremental wire crossing calculation,we first take the row-wise sum of all entries as in Fig.4(c).Then we use this to compute the column-wise sum as in Fig.4(d).Finally,we multiply all the entries in the original matrix and the column-wise sum matrix to compute the total wire crossing–each entry ðr ;c Þin the original matrix is multiplied by the entry ðr þ1;c À1Þin the column-wise sum matrix.In the Simulated Annealing process,when we swap two nodes,it is identical to swapping the corresponding rows in the above matrices.Hence,it is enough if we just update the values of the rows in between the two rows that are being swapped.4.Experimental resultsOur algorithms were implemented in C++/STL,compiled with gcc v2.96run on Pentium III 746MHz machine.The benchmark set consists of seven biggest circuits from ISCAS89and five(b)A B C D 101011001201003(a)(c)A B C D 101012011221113(d)A B C D 211014221253213Fig.4.Illustration of incremental wire crossing computation.(a)a bipartite graph with 3wire crossings,(b)adjacency matrix of (a),(c)row-wise sum of (b)from left to right,(d)column-wise sum of (c)from bottom to top.Each entry in (d)now represent the total sum of entries in low-left ing (b)and (d),wire crossing is A 2ÂB 1þB 3ÂC 2¼3;where A 2and B 3are from (b)and B 1and C 2from (d).R.Ravichandran et al./INTEGRATION,the VLSI journal 38(2005)541–548546R.Ravichandran et al./INTEGRATION,the VLSI journal38(2005)541–548547 Table1QCA cell placement resultsAnalytical SA+WL SA+WC SA+WL+WC wire xing wire xing wire xing wire xing b1455861238286802343054510374051134948 b1595711667235804040069030742080178947 s132073119548140601553030610145032501982 s158503507634186102213042700214039192978 s3841794141195458304840080240732098199929 s38584195824017592207559014013098202010133122 s5378119915662806690136007301344841 s9234217020510720115402329098016402159 Ave4192741169801995038950274038806878 Ratio 1.00 1.00 4.0526.99.29 3.690.929.27 Time1806041128012901biggest circuits from ITC99suites due to the availability of signalflow information.Table1shows our cell placement results where we report net wirelength and number of wire crossings for the circuits using our analytical solution and all threeflavors of our Simulated Annealing algorithm. We observe in general that analytical solution is better than all threeflavors of the Simulated Annealing methods,except the wirelength of SA+WL+WC.But,the tradeoff in wire crossings makes the analytical solution more viable,since wire crossings pose a bigger barrier than wirelength in QCA architecture.One interesting note is that when comparing amongst the three flavors of Simulated Annealing wefind that SA+WC has the best wire crossing number.But surprisingly,in terms of wirelength,SA+WL does not outperform SA+WL+WC.We speculate that this behavior is because lower number of wire crossings has a strong influence on wirelength, but smaller wirelength does not necessarily imply smaller crossing.5.Conclusions and ongoing worksIn this article,we proposed a QCA cell placement problem and present an algorithm that will help automate the process of design within the constraints imposed by physical scientists.Work to address QCA routing and node duplication for wire crossing minimization are underway.The outputs from this work and the work discussed here will be used to generate computationally interesting and optimized designs for experiments by QCA physical scientists.References[1]R.Ravichandran,diwala,J.Nguyen,M.Niemier,S.K.Lim,Automatic cell placement for quantum-dotcellular automata,in:Proceedings of the Great Lakes Symposium on VLSI,2004.[2]I.Amlani,A.Orlov,G.Toth,G.Bernstein,C.Lent,G.Snider,Digital logic gate using quantum-dot cellularautomata,Science (1999)289–291.[3]J.Nguyen,R.Ravichandran,S.K.Lim,M.Niemier,Global placement for quantum-dot cellular automata basedcircuits,Technical Report GIT-CERCS-03-20,Georgia Institute of Technology,2003.[4]K.Sugiyama,S.Tagawa,M.Toda,Methods for visual understanding of hierarchical system structures,IEEETrans.Syst.Man Cybern.(1981)109–125.R.Ravichandran et al./INTEGRATION,the VLSI journal 38(2005)541–548548。
A Topological Constraints Based Sequential Data Mining Approach on Telecom Network Alarm DataZhaogang Wang, Bin Zhang,Guohui LiPattern Recognition and Intelligent System Lab,Beijing University of Posts and Telecommunications (BUPT), Beijing, ChinaWangZhaogang928@AbstractThe issue of fault location received extensive concern in the field of telecom network management. The data mining approaches is introduced to extract clues from the telecom alarm data for fault location. Aiming at the key problems in telecom data mining, we have made a comprehensive analysis on the telecom network and its data as well as the fault propagation,some important characteristics are discovered, a fault location oriented network model is built to serve the traditional approaches in data transforming and data mining. An enhanced data mining algorithms based on the topological constraints of the telecom network is proposed to introduce the constraints of real world to the data mining procedures. A data mining tool (PRISMiner) is implemented to benchmark the new algorithm, and our experiments shows that the new algorithm is quite effective in improving the accuracy and efficiency of the PreFixSpan mining algorithm.1 Introduction1.1 BackgroundsAs the evolution of the modern communication system, fault location has been focused on by many researchers. Many fault location solutions have been presented. The Expert Systems [1] and the graph theory based approaches [2] have been discussed and many of which have been adapt to industrial network management systems. The above approaches are applicable in their dissimilar fields and situations, whereas their drawbacks are also significant. To locate the root fault in the modern telecom with large network elements and different infrastructures, we need to introduce data mining approaches to fault location. The devices with faults generate many correlated alarms through the propagation path, and these correlated alarms can be extracted as frequent sequences by the data mining algorithms. The regularity of the fault propagation can be thus inferred by the frequent sequences obtained through data mining.In telecom networks (especially mobile communication networks), the reciprocity and interaction among network devices are the root cause of fault propagation. Thus, discovering sequential patterns from alarm sequences of correlative devices seem to be more accurate and efficient [3]. Therefore, special techniques focused on data preprocessing, optimizations and constraints for mining algorithms are in demand.1.2 Key Problems of Data Mining on Telecom Alarm DataThe data mining algorithms provide only statistical constraints to the data, neglecting the inherent relationship between the items [4]. Topologically connected devices or nodes are more likely to propagate fault than those are not. As a result, taking the actual constraints into account is a key technique to improve the accuracy and efficiency of the mining algorithms. In order to apply the constraints to telecom alarm mining, a proper model describing the topological relation of the telecom network needs to be set up in advance. This introduces a new problem: the telecom network modeling. A proper model should act like an abstract view of the network, which hides the details irrelevant to our study and emphasizes the attributes we are focused on [5].As discussed above, an accurate and efficient telecom alarm mining system should take measures to introduce actual constraints in order to remove redundancy in pattern or sequence generation, and to build a proper telecom model to exploit the topological information of the network. Our motivation and main idea is to overcome the two key problems by proposing new approaches of constraint based mining and telecom network modeling.1.3 Paper organizationIn this section, we introduced the backgrounds of data mining for telecom network fault location and the key problems of doing that. In Section 2, a new topological constraint based mining algorithm will be proposed, including the special preprocessing approaches and the topological modeling of the telecom network. Next, in Section 3, a telecom alarm mining tool adopting the above approaches, PRISMiner, will be introduced, as well as the implementation features. A series of experiments are also designed and performed to benchmark the new approach. The future work of the mining task and fault location is discussed in the last section.2 Topological Based Sequential Mining Algorithm on Telecom Alarm DataThe characteristics of the alarm data give us some obstacles of scanning the telecom network alarm data for clues of fault location. As a result, we proposed a particular approach for telecom network modeling and data preprocessing to overcome the difficult of the imperfection and noise of data. To improve the accuracy and efficiency of the mining algorithm, an improved PreFixSpan Algorithm equipped with the topological constraint based algorithm is proposed. The following section tries to give detailed descriptions of the network modeling, preprocessing approach, and the improved PreFixSpan.2.1 Topological Constraints Oriented Telecom Network ModelingAccording to the analysis to the telecom network structure and its fault propagation, we conclude that the alarms are propagated through the network topology. To exploit the structural information and regularity of fault propagation in preprocessing and the mining algorithm, we need to set up a model to indicate the topological relationship of the network. As we are using the model to indicate the fault propagation, the model should be an abstract of the connectivity of network elements, especially those are topologically connected. Topologically connected network element may generate alarms because of the same fault, which hence make up of the propagation path. The alarm may propagate from lower layers to higher layers, and vice versa. However, no matter how the alarms propagate, the path is limited within a network element cluster, which is defined as the set of topologically connected network elements. The network element cluster is a model of the network fault propagation path. Thus, to determine whether some specific alarms are topologically connected, is to determine whether the alarms are included in the same cluster. Our aim is to design an algorithm to generate clusters from telecom structures.We can divide the telecom network elements into several subnets according to their functions. In fact, network elements of different subnet have different ways of inter connections. In general, network elements can be divided into two main categories: common network elements and private connected network elements. The common network elements are connected any other network elements logically or physically. On the other hand, the private connected network elements are only connected with elements of the same the subnet. Firstly, we filtered out the common network elements, whose device types are recorded to form the public connected cluster. Secondly, we come to deal with the private connected elements. The ways of connections between private connected elements differs from subnets. As for the voice subnet, which has the most numerous elements, connections doesn’t exist between any two elements, but only among specific number of elements. According to the analysis to the propagation path, we find the topologically connected elements from the lowest BTS to elements of upper layers. Each group of elements’ on the same propagation path is treated as a voice subnet cluster, leading by a BTS. The clusters are then consists of device IDs of topologically connected elements.Subnets other than the voice subnet, such as the GPRS subnet and the Intelligent subnet has less number of elements, so they are treated as special clusters respectively. Special clusters also include those groups of elements that are probably topologically connected. For instance, the high level elements of voice subnet might be horizontally connected (not through a common fault propagation path), are thus treated as special clusters.These categories of clusters make up the topological constraints oriented telecom network model. The model can help to determine whether some specific network elements are topologically connected via the following algorithm: Topological Constraint Algorithm:Function IsTopoConnectedInput:Array,Precision,CommonCluster,VoiceClusters, SpecialClustersOutput: Whether the Array of elements are topologically connectedTemp ←ΦFor i ← 1 to Array.sizeIf (Array[i] not in CommonClusters)Temp.add(Array[i])N←0For i ← 1 to Temp.sizeIf (Temp[i] in SpecialClusters)N←N+1If (N >= Temp.size * Precision)Return trueElseN←0For i ← 1 to Temp.sizeIf (Temp[i] in VoiceClusters)N←N+1If (N >= Temp.size * Precision)Return trueElseReturn falseThe algorithm ignores the common cluster elements of the input elements, as they are definitely connected with other elements. The return value indicates whether the input elements are partial or completed connected within the same cluster. This algorithm can be easily used to generate transactions and sequences in data transforming as well as data mining.3.2 Telecom Network Alarm Data PreprocessingThe raw telecom alarm data need to be transformed into well-formed data ready for mining. However, the input format of typical pattern and sequence mining algorithms are not relational. Pattern mining algorithms require transaction data while the sequence mining algorithms require sequence data.The traditional method of transforming relational data into transactions and sequences is the fixed sliding window method. Giving a specific time window length, the alarms occurs within the time window is put into a transaction and the order of occurrence of the alarms of the same transactions are ignored. In order to avoid that some continuous alarms are separated due to the window border, the overlapped window skill are introduced. The sliding step of the fixed window is half or 1/3 of the window length, thus some alarms might occur twice in different transactions, but continuous alarms are not likely to be separated. Based on the transactions, use a smaller fixed sliding window to generate sequences. Another fixed window shorter than the transaction window is adapted inside the transaction alarms. The alarms occurred in the same small sliding window is put into an element. The order of occurrence of alarms within an element is ignored while the order of elements is not exchangeable.The fixed sliding window is simple to implement, but might not be suitable for the telecom alarms. Although the overlapped window can avoiding adjacent alarms form being separated, yet not all the correlated are temporal adjacent. According to the analysis from the previous section, the propagation of alarms are through the topological connections between network elements, but not limited to a fixed time span. Some propagation may take a long time while some may be shorter. As a result, the selection of alarms to generate transactions and sequences should be based on the topology constraint rather than a fixed time window.Thus, we proposed a topology based overlapped sliding window method to transform relational data into transactions, and further into sequences. The start point of the window is sliding down from the time line with a small step of time span, in order to include any possible starts of the propagation sequence. The end of each window is no longer determined by a fixed length of time window, but by the topological relationship between adjacent alarms. The newly covered alarms are included in the transaction if and only if they are topological connected with the former ones that are already included in the transactions. If the newly covered alarm are topological irrelevant, the window continue to grow until it reaches a specific threshold. Thus, the transactions only contain topological connected alarms, which are far more refined than the ones generated by the fixed window method. Same method can be adapted to transactions to generate sequences.3.3 Topological Constraints Data Mining on Telecom Alarm DataWe exploit the data mining algorithms on telecom alarm data to extract the frequent alarm sequences to help the fault location. After the preprocessing procedure discussed above, the telecom data are ready for mining. However, the mining procedure might be very time consuming if the mining algorithm itself is inefficient. Therefore, we have to make a choice among the various mining algorithms to select the best one to fulfill the mining task.The telecom alarm data is a sparse but large dataset. The frequent patterns are mainly short patterns, while long pattern only exist with low support. The telecom alarm data always have huge size. In China, a provincial telecom operator generates about 900,000 alarm records per week, with the size of 40MB, and about 5 millions alarm records per year, with the size of 2GB.As for the sparse datasets, the Apriori-like algorithms [7] as well as the pattern growth based algorithms [8] are both suitable. But facing up to such huge sized database, trying to use the Apriori-like algorithms is obviously not a wise decision, as frequent database scanning can be a nightmare due to the huge size of the database. The pattern growth based algorithms do not need to scan the entire database in every step of iteration. Instead, the information of frequent patterns is stored in particular data structures whose size is always limited. The pattern growth based algorithms are much more efficient than the Apriori-like algorithms, especially in occasions with low support. All in all, we choose the PreFixSpan [9] Algorithm. The PreFixSpan algorithm is an efficient sequential mining algorithm, which exploit the projection database structure to store frequent patterns to avoid scanning the original database. Besides, PreFixSpan divide the result set into several disjoint subsets. Mining iterations within respective subset can be performed concurrently.The main idea of PreFixSpan algorithm is to search the frequent patterns in the projection database rather than the original database. The PreFixSpan algorithm employed the pattern growth methodology and the divide-and-conquer strategy, and it’s hence very efficient. However, the PreFixSpan algorithm only obtains the statistically frequent patterns, which is only not definitely indicating meaningful rules. Some items, say noises, only happen to occur together for many times. The traditional mining algorithms, including PreFixSpan cannot distinguish the really associated items from those occasionally frequent co-occurring items. Thus, as discussed above, the constraint from real world models have to be introduced to solve the problem. We exploit the Topological Constraint Algorithm in the PreFixSpan algorithm in the following routine:Procedure: Topological Constraint Based PreFixSpan: Input: Prefix, Projection DB, Min.SupportOutput: Frequent PatternsN ← All frequent items in Projection DBFor each item x in NP ← Append x to PrefixIf support (p) >= Min.Supportand IsTopoConnected(p)Output pCall Topological Constraint Based PreFixSpanwith p, ProjectionDB w.r.t. p, Min.Support The underlined statement is the core step of our approach. Differing from the original PreFixSpan, a new frequent pattern has to be both frequent (frequent than the Minimal Support) and topologically connected, then it can grow to generate longer patterns recursively. With this added constraint, patterns from the occasionally co-occurring items are pruned in the recursive tree, thus huge computation time consumed on these meaningless patterns are saved, and theaccuracy is improved as well. In next section, experiments will be performed to benchmark the new approach.4 Experiments and Conclusions4.1 Introduction to the Experiment PlatformTo implement and benchmark the algorithms and approach we proposed, a data mining tool is designed and developed as the experiment platform. The tool named PRISSMiner is designed to perform and benchmark data mining algorithms for fault location tasks. The tool consists of three components: data preprocessing, data mining and algorithm benchmarking.The data preprocessing component refines the raw alarm data to remove the redundancy and noise and transform the data into the format ready for mining. The data mining components is an implementation of 6 data mining algorithms, including 4 non-sequential ones : Apriori[7], FP-Growth[8], Eclat[9], Relim[10], and 2 sequential ones: GSP[11] and PreFixSpan[12]. Each algorithm can be equipped with the Topological Constraint Algorithm if the users prefer to. The algorithms are implemented in C++, and the STL containers are employed to build the basic data structures of the algorithms. The outputs as well as the performance statistics of the algorithms are stored in specific files for benchmarking. The benchmarking component sets a unified standard to all algorithms. The outputs and the performances are listed in a predefined table to make clear comparisons. The benchmark table can be easily exported to text file for chart drawing with other tools.The following experiment is performed with the PRISMiner to benchmark the approaches we proposed with the baseline approaches.4.2 Experiment design and result analysis Experiment 1 Fixed sliding window vs. Topology based in data transformingAs we claimed, the sliding window method is easy to implement, but too much of the generated transactions or sequences are meaningless as they are occasionally co-occurred. The topology based method that we proposed only combines the alarms that occurred in the same fault propagation path, and is thus more reasonable. We made a comparison between the two different ways of data transforming on the same raw data table.Table 1 shows that the topology based is more efficient than the sliding window method. The number of generated sequence by topology based method is much less than that of the fixed window. But more refined frequent patterns are mined in turn. The algorithms execution time of topology based method is much less than the fixed window method, as the meaningless sequences are suppressed.Table 1 Fixed sliding window method vs. Topology based method Sup.Way oftransforrmTime forTransformTime forExecution#OriginalRecords# Seqs #patterns 0.5 F.S.W 23s109ms 4m1s609ms 69715 1349 240.3 F.S.W 23s109ms 17m17s94ms 69715 1349 188 0.1 TopoBased 20s750ms 125ms 69713 796 130.05 TopoBased 20s750ms 172ms 69713 796 360.03 TopoBased 20s750ms 250ms 69713 796 102 0.01 TopoBased 20s750ms 704ms 69713 796 636241880013361026361002003004005006007000.50.30.10.050.030.01#patternsSupportF.S.WFigure 1 Results of PreFixSpan Using the Fixed Sliding Window Method vs. Topology Based Method for Data Transform Figure 1 shows that, as for low support mining, the topology based method generated data contribute much more patterns than the fixed window, while the latter one can hardly get any result in tolerable time. So it is more appropriate to use the constraint based method for sparse datasets as telecom alarm records.Experiment 2 Constraint based PreFixSpan with original PreFixSpanWe made a comparison between the topological constraint based PreFixSpan with the original version on the same data set. The result is show in Table 2.Tabl e 2 Constraint based PreFixSpan vs. Original PreFixSpan Sup. IsTopoConstraintTime forExecution#patterns0.5No 3m34s390ms 24Yes 4m1s609ms 190.3No 71m31s140ms 195Yes 17m17s94ms 188From the table, we see that the time of PreFixSpan consume is largely reduce by the TopoConstraint Algorithm with support 0.3. But with support 0.5, the time increase. This is because the time cost of determining whether patterns are topologically connected is larger than the process time of the pruned patterns. In low support occasions, which are more usual, the time cost by the pruned patterns is much more than the cost to determine whether to prune them. As a result, the Constraint Based Algorithm is thus useful for most of occasions. Besides, the frequent patterns that the algorithms output has be refined. We made another comparison between the above algorithms with the GSP algorithm, which is anapriori-like sequential mining algorithm. The result is shown in Figure 2, which shows that the number of patterns isslightly refined both in PreFixSpan and GSP.Figure 2 The effect of Topol ogical Constraint Algorithm to SequentialAlgorithms4.3 ConclusionsThe above experiments show that the Topological Constraint Based Algorithm is quite useful in both data transforming and data mining.When adopted in data transforming, the method can largely reduce the number of generated sequences. With the reduction of meaningless sequences, the mining algorithm hence has an easier job to mining meaningful patterns. The topology based method can help the algorithms get results in low support while the fixed window method cannot.When adding the topological constraint into the mining procedure, the accuracy of the results is improved. Fewer but more meaningful results are obtained. The time cost is also reduce as the meaningless patterns are pruned. The lower the support is, the significant the effect of the algorithm is.As a result, we claim that the Topological Constraint Based Algorithm is an effective way of improving the accuracy and efficient of telecom alarm data mining. Based on the conclusion, we can further infer that the network model we build for telecom network is a proper model for fault location use, as it serves the data transform and mining quite well.5 Future workAfter the frequent sequences in the alarm data are accurately mined, it is easy to generate correlation rules, but is difficult to infer the root fault cause from the mining results. The frequent sequences indicate the alarms have a very high probability of occurring as the given order. We know that the cause takes place before the effect, but the early events are not definitely the cause of the posterior event. The frequent sequences only give a clue, and yet the approaches to infer the root cause of telecom network fault remains a popular topic.6 AcknowledgementThis research work has been supported by IBM China Research Laboratory. Thank my mentor, Zhang Bin and Li Guohui for their guidance. Thank my classmates for the co-operation during the research and development.7 References[1] A. Patel, G. McDermott, C. Mulvihill. Integrating network management and artificial intelligence, in: B.Meandzija, J.Westcott (Eds.), Integrated Network Management I, North-Holland, Amsterdam, 1989, pp.647–660.[2] B. Gruschke, Integrated event management: Event correlation using dependency graphs, in: A.S. Sethi (Ed.), Ninth Internat. Workshop on Distributed Systems:Operations and Management, University of Delaware, Newark, DE, October 1998.[3] Mika Klemettinen, A knowledge discovery methodology for telecommunication network alarm databases. 1999[4] J Han, LVS Lakshmanan, RT Ng. Constraint-Based, Multidimensional Data Mining. COMPUTER, 1999[5] Dilmar Malheiros Meira. A Model For Alarm Correlation in Telecommunication Networks. [Dissertation]. Federal University of Minas Gerais.[6]Dilmar Malheiros Meira,Jose Maros Silva Nogueira. Modelling a telecommunication Network for Fault Management Applications. IEEE.[7] Agrawal R, Srikant R. Fast algorithms for mining association rules [A]. Proceedings of the 20th Int’1 Conference on Very Large Databases[C]. Santiago, Chile, 1994.487-499.[8] J.Han, J.Pei, Y . Yin. Ming Freuqent Patterns Without Candidate Generation. In SIGMOD ’00, May 2000.[9] M. Zaki, S. Parthasarathy, M. Ogihara, and W. Li.New Algorithms for Fast Discovery of Association. Rules. Proc. 3rd Int. Conf. on Knowledge Discovery and Data Mining (KDD’97), 283–296. AAAI Press, Menlo Park, CA, USA 1997[10] Christian Borgelt. Keeping Things Simple: Finding Frequent Item Sets by Recursive Elimination.[11] Ramakrishnan Srikant, Rakesh Agrawal. Mining Sequential Patterns: Generalizations and Performance Improvements.[12] J. Pei, J. Han,etc.Mining Sequential Patterns by Pattern-Growth:The PrefixSpan Approach。