Using features of arden syntax with object-oriented medical data models for guideline model
- 格式:pdf
- 大小:31.01 KB
- 文档页数:5
Journal of Visual Languages and Computing(2002)13,601^622doi:10.1006/S1045-926X(02)00026-5available online at onThe Role of Visual Perception in DataVisualizationMehdi DastaniInstitute of Information and Computing Sciences,Universiteit Utrecht,P adualaan14,De Uithof3584CH Utrecht,The Netherlands,e-mail:mehdi@cs.uu.nlReceived12July2001;accepted27March2002This paper presents a perceptually motivated formal framework for e¡ective visualiza-tion of relational data.In this framework,the intended structure of data and the per-ceptual structure of visualizations are formally and uniformly de¢ned in terms of relations that are induced on data and visual elements by data and visual attributes, respectively.Visual attributes are analyzed and classi¢ed from the perceptual point of view and in terms of perceptual relations that they induce on visual elements.The presented framework satis¢es a necessary condition for e¡ective data visualizations.This condition is formulated in terms of a structure preserving map between the intended structure of data and the perceptual structure of visualization.r2002 Published by Elsevier Science Ltd.1.IntroductionT heories of e FF ective data visualization aim to analyze and explain the relation between data and their e¡ective visualizations[1^6].This relation is essentially a denota-tion relation in the sense that visual elements and relations denote data elements and relations,respectively.For example,the number of cars produced by di¡erent factories can be visualized by a bar-chart.Each bar represents a car factory and its length represents the number of produced cars by that factory.The perceivable length relations between bars denote relations between the numbers of cars produced by di¡erent factories. The e¡ectiveness of a visualization depends on many conditions such as conventions, subjective user preferences,cultural settings,and the structural correspondence with its denoting data[4,5,7^9].For example,the e¡ectiveness of a temperature map in which temperature values are visualized by color hue values increases when conventional color hue values are used for high and low temperatures,i.e.red for high temperatures and blue for low temperatures.Although the satisfaction of some conditions enhance the e¡ec-tiveness of visualizations,the satisfaction of other conditions are necessary for e¡ective data visualization.In general,we distinguish conditions for e¡ective data visualization into necessary and su⁄cient conditions.This distinction is closely related to the well-known distinction between expressiveness and e¡ectiveness[2,3].There are at least two reasons for not using the expressiveness/e¡ectiveness distinction.First,the expressiveness criterion is de¢ned 1045-926X/02/$-see front matter r2002Published by Elsevier Science Ltd.with respect to a graphical language which we want to abstract from.The aim of this paper is to propose a general framework for e¡ective data visualization,which is inde-pendent of graphical languages and thus not limited to their expressions.Second,the expressiveness criterion is claimed to be satis¢ed if and only if data are represented in the visual structure and the e¡ectiveness criterion is claimed to be satis¢ed if the mechan-ism of the humanvisual system is taken into account[2].W e believe that the expressiveness criterion makes only sense if visual structures are perceivable by human viewers.This implies that both expressiveness and e¡ectiveness criteria depend on the human visual system and that the expressiveness/e¡ectiveness distinction is not a distinction between perceptually depending and perceptually not depending conditions.In the context of e¡ective data visualization,the structural correspondence condition is a necessary condition without which the e¡ectiveness of visualizations cannot be guar-anteed.This condition should be de¢ned between the structure of data and the percep-tual structure of its visualization.The emphasize on perceptual structure in this formulation is based on the assumption that the e¡ectiveness of visualizations depend on the e¡ective use of the capabilities of the human visual system to perceive visual struc-tures.W e argue that a theory of e¡ective data visualization should therefore be based on theories of human visual system that describe human capabilities of perceiving visual relations.For example,the human capability to perceive relation between color hue va-lues is limited to the identity relation.For this reason,a visualization in which the age of individuals is visualized by color hue values is not e¡ective if the intended data relations are quantitative relations between ages.In this paper,we focus on the structural correspondence condition for e¡ective data visualization and propose a formal framework for e¡ective data visualization,which sa-tis¢es the structural correspondence condition.This paper is organized as follows.In Section2,we propose a process model for e¡ective data visualization and identify the steps through which the structural correspondence between data and visualizations should be established.For this model,we discuss the class of input data structures,the projection step from data structures to perceptual structures,the layout of perceptual structures,and¢nally the necessary condition for e¡ective data visualization.In Section 3,some general types of data attributes that are important for data visualization are intro-duced and a classi¢cation of attribute-based data structures is proposed.Similarly,various visual attributes are analyzed from the perceptual point of view and a classi¢cation of perceptual structures is proposed.In Section4,we build on formal de¢nitions of at-tributes and specify data and perceptual structures in a formal way.The structural correspondence condition for e¡ective data visualization is then de¢ned as a structure-preserving relation between data and perceptual structures.Finally,in Section5,we conclude the paper and discuss future research directions.2.A Process Model for E¡ective DataVisualizationThe process of data visualization is considered here as the inverse of the interpretation process,i.e.the visualization process generates for an input data a visualization the interpretation of which results the original data.Since the e¡ectiveness of a visualization can be measured in terms of the easiness and directness of acquiring its intended interpretation and because the interpretation of a visualization depends on human visualperception,an e¡ective visualization should strongly bene¢t from the capabilities of the humanvisual system.W e formulate,therefore,the e¡ectiveness of visualization as follows:a visualizationpresentstheinputdata e¡ectively iftheintended structureofthedata and theperceptualstructure ofthevisualization coincide .Based on this view,a process model for e¡ective data visualization [10]is schematically illustrated in Figure 1.This process starts with an input data which is assumed to consist of data elements.The ¢rst step in data visualization is to determine the structure of data,i.e.how data elements are related to each other.This is not a trivial decision process since for example a set of integers may be related to each other nominally (i.e.integers are used as identi¢ers),ord-inally (i.e.integers are used as ordinals),or quantitatively (integers are used as quantities).This step in the process of data visualization transforms the input data to what is called structured data.The second step in data visualization is to determine visual elements that represent data elements in such a way that the perceptual structure of the decided visual elements represents the structure of the data.W e assume that the result of this step is not necessarily a drawable visualization,but an abstract perceptual structure.Therefore,the third and the last step in data visualization is the layout process which transforms the abstract perceptual structure to a drawable visualization.On the other hand,the interpretation process starts with a visualization,possibly to-gether with a legend.Since the input of the interpretation process is a visualization and because visualizations must be perceived before they can be understood as denoting data,the ¢rst interpretation step is perception.The process of perception determines percei-vable relations among visual elements on the basis of their visual attribute values.As visual attribute values are assigned to visual elements by the layout process,perception is considered as the inverse of the layout process.Subsequently ,visual elements are mapped to data elements.This mapping is possibly supported by a legend that deter-mines the correspondence between visual and data attributes and their values.In the case of e¡ective data visualization,the relations between data elements are structurally iden-tical to the perceivable relations between their representing visual elements.Finally ,given the structured data,the table of attribute values can be trivially generated.In the rest of this section,we elaborate on di¡erent steps of this process model.2.1.Structuringthe Input DataIn this paper,the input data for the visualization process is assumed to be nested rela-tional data.This class of data can be represented as m Ân data tables consisting ofFigure 1.A process model for e¡ective data visualizationm columns and n rows.A column A i ði ¼1;y ;m Þconsists of values of one data attri-bute and a row B j ðj ¼1;y ;n Þconsists of m values each belongs to one of attributes A 1;y ;A m :A row of a table is called a data entry a .The values of attributes A 1;y ;A m may themselves be rows of another k Âl table that consists of A 01;y ;A 0k columns and B 01;y ;B 0l rows.In such a case,attributes A 1;y ;A m are called aggregated attributes.An attribute value,which is a single value,is called an atomic value.For example,consider T able 1,which describes the cooperation degree between pairs of car companies.In this table,company1and company2are aggregated attributes since their values are rows fromT able 2.This table describes the number of sold cars having a certain type and being produced in a certain country .In this data table,car-type ,sold-numbers ,and country are attributes that have atomicvalues.These two data tables c an be c ombined into one single data table by means of aggregated attributes.T able 3is obtained by combining T ables 1and 2.According to the proposed data visualization process model,the ¢rst step is to deter-mine the intended structure of the input data.In general,we assume that an input data can be structured in many di¡erent ways and that the intended structure of the input data is determined by human users.W e also assume that human users have access to di¡erent transformations that provide di¡erent structures of the input data [2].T wo types of data structures are distinguished.The structures of the ¢rst type are con-stituted by relations that are intended to exist among values of individual data attributes.These relations characterize the type of the corresponding attributes.For instance,values of the sold-number attribute are from the domain of integers and they may be intended to be related to each other by some arithmetical relations.These arithmetical relations charac-terize the type of the sold-number attribute.A data attribute together with its type is called a typed data attribute and a table that is de¢ned on typed data attributes is called a typed data table.In a typed data table,relations that are de¢ned on values of individual attri-butes are induced on data entries in the sense that two data entries are related in the same way as their attribute values are related.Relations on data entries may be induced by dif-ferent data attributes.For instance,data entries inT able 2may be considered as equivalentT able 1.A data table describing the cooperationdegrees between car companiescompany1cooperation-degree company 2d 1d 01good d 02d 2d 01best d 03d 3d 02good d 01d 4d 03best d 01d 5d 03normal d 04d 6d 04normald 03aMany researchers in this ¢eld,such as Bertin [1]and Card etal .[2],have used tables as the format of the input data for the visualization process.However,they have used di¡erent terminology than used in this paper.For example,Bertin uses the term data characteristics for data attributes and data objects for data entries ,while Card etal .use data variables for data attributes and data cases for data entries .The terminology used in this paper is motivated by the relational database terminology .or non-equivalent according to the car-type attribute,while they are quantitatively related to each other according to the sold-number attribute.The structures of the second type are constituted by binary relations that are de¢ned on values of two di¡erent aggregated attributes.The values of these aggregated attributes should be the entries of one and the same table.In fact,the characterizing relation for each single aggregated attribute is the identity relation,but since the values of two aggre-gated attributes are from one and the same domain (i.e.the entries from one and the same table),two aggregated attributes together de¢ne a binary relation between domain ele-ments.These binary relations are implicit in the data table and do not characterize any single attribute.For instance,the binary relation de¢ned on the values of thecompany1and company2attributes in T able 1can be represented as:fðd 01;d 02Þ;ðd 01;d 03Þ;ðd 02;d 01Þ;ðd 03;d 01Þ;ðd 03;d 04Þ;ðd 04;d 03Þg :This binary relation relates data entries fromT able 2.Binary relations can be characterized by their structural properties such as functional (injective,surjective,bijective),transitive,symmetric,and re£exive properties.InT able 1the values of the company1and company2attributes de¢ne such a binary relation which is non-functional,symmetric,non-re£exive,and non-transitive.Notice that this type of structures can be extended by allowing n -ary relations between aggregated attribute values.In such a case,data tables consist of n aggregated attributes.2.2.Proje ctingStructure d Data to Pe rce ptual StructureThe second step in visualizing data is the projection of the structured data into a percep-tual structure.A perceptual structure is de¢ned as a typed visual table which is basicallyT able 2.A data table describing car companies in terms of car types,the number of sold cars,and thecountries they produce carscar-typesold-number country d 01vw 40000Germany d 02toyota 50000Japan d 03opel 30000Germany d 04mazda20000JapanT able 3.T ables 1and 2are combinedcompany1cooperation-degreecompany 2car-type sold-number country car-type sold-number country vw 40000Germany good toyota 50000Japan vw 40000Germany best opel 30000Germany T oyota 50000Japan good vw 40000Germany Opel 30000Germany best vw 40000Germany Opel 30000Germany normal mazda 20000Japan Mazda20000Japannormal opel30000Germanya table constituted by typed visual attributes.The type of a visual attribute is determined by the relations that can be perceived among values of that attribute.As we will explain in the next section,the perceivable relation for each visual attribute is determined before-hand based on theoretical and empirical studies of human visual perception.The rows of a typed visual table will then be called visual entries.The projection of the structured data into a perceptual structure consists of a bijective function from typed data attributes to typed visual attributes and another bijective func-tion from rows of the data table (i.e.data entries)to rows of the visual table (i.e.visual entries).There are di¡erent functions that map typed data attributes to typed visual attri-butes.These di¡erent functions specify alternative visualizations of data.W e assume that the second function does not map data attribute values to visual attribute values,but to variables that stand for visual attribute values.These variables will be instantiated with actual values by the layout process.As we will explain in Section 4,for e¡ective data visualization the projection should be a structure-preserving projection.This can be guaranteed by ensuring that the ¢rst function maps each typed data attribute to a visual attribute with identical type,and that the second function maps data attribute values to visual variables in such a way that whenever two data attribute values are in a certain relation (characterizing the type of that data attribute),their corresponding visual vari-ables are in the corresponding perceptual relation as well.This projection speci¢es abstract visual entries that are related to each other by percep-tual relations induced by the typed visual attributes.For example,considerT able 1once again.The ¢rst function maps the company1data attribute to the connection1visual attribute,company2to connection2,cooperation-degree to size ,car-type to xpos ,sold-number to ypos ,and country to color-hue .The second function maps data entries to abstract visual entries such that the structure preserving condition is satis¢ed.For instance,consider the data attribute country which is characterized by the identity relation.Whenever twovalues of the country attribute are identical,their corresponding color-hue variables are identical as well.This projection results typed visual tables illustrated inT ables 4and 5.2.3.The Layout of VisualizationThe variables in the typed visual table should be instantiated with visual attribute values to generate a drawable visualization of data.However,the typed visual table consists of only those visual attributes to which a data attribute is projected.Therefore,it may be the case that some visual attributes,the values of which are necessary to generate a drawableT able 4.Avisual table for the data described inT able 1connection1Size connection 2v 1v 01size 2v 02v 2v 01size 3v 03v 3v 02size 2v 01v 4v 03size 3v 01v 5v 03size 1v 04v 6v 04size 1v 03visualization,are not used by the projection.For instance,if the position attribute is not used by a projection,thenvisual entries should receive positionvalues in order to become drawable.These unused visual attributes will be called undecided visual attributes in con-trast to the decided attributes to which data attributes are projected.Consider the visualizations that are illustrated in Figure 2.These are three possible visualizations for the data represented inT ables 1and 2.In visualization 2-A,shape and size are undecided visual attributes.Moreover,in visualization 2-B position and shape,and invisualization 2-C position,are undecided visual attributes.Invisualization 2-C,the label attribute is decided for the car-type ,sold-number ,and country attributes.This example shows that more than one data attribute can be projected to a visual attribute.Of course,this does not hold for all visual attributes.For example,only one data attribute can be projected to the color-hue attribute since otherwise the e¡ectiveness of visualization cannot be guaranteed.W e assume that the projection function is de¢ned based on this general visualization knowledge.Visualization 2-C shows also that a drawable visualization does not need to have values for all visual attributes.For example,in this visualization visual entries do not need shape values to become drawable.W e assume that the combination of decided visual attributes determines which undecided visual attribute values should be speci¢ed for visual entries to become drawable.The generation of values for decided and undecided visual attributes is a constraint satisfaction process.The constraints on the values of the decided visual attributes are imposed by the perceptual structure determined at the previous visualization step.Although the perceptual structure does not impose any constraints on the valuesof1000020000500003000040000(a)(c)Figure 2.Visualizations of the data described byT able 1T able 5.Avisual table for the data described inT able 2xposypos color-hue v 01seg 1pos 4hue 1v 02seg 2pos 5hue 2v 03seg 3pos 3hue 1v 04seg 4pos 2hue 2undecided visual attributes,these values cannot be generated arbitrarily since otherwise unwanted visual implicature [11]may occur.In the next section,we will discuss this issue in more details.The determination of values for decided and undecided visual attributes will be called the layout process.It should be noted that the textual information inserted in these visualizations may either be the values of the label attribute or it may be a part of the interpretation function.In order to distinguish these two kinds of textual information,we use capital letters only to indicate that a textual information is a value of the visual label attribute.2.4.E¡ective DataVisualizationA data table can be visualized in many di¡erent ways by deciding di¡erent visual attributes for each data attribute and by specifying di¡erent visual entries for each data entry .In Figure 3,two di¡erent visualizations for T ables 1and 2are illustrated.In visualization 3-A,the xpos ,ypos ,and color-hue attributes visualize the car-type ,sold-number ,and country attri-butes,respectively .Moreover,the links between visual elements,de¢ned by the connection1and connection2attributes,visualize the company1and company2attributes,respectively .Final-ly ,the thickness attribute of the links visualizes the cooperation-degree attribute.Invisualization 3-B,the shape ,color-hue ,and label attributes visualize the sold-number ,country,and car-type attri-butes,respectively .Like visualization 3-A,the links and their thickness visualize the companies and their degrees of cooperations,respectively .In visualizations 3-A and 3-B the inserted numbers are parts of the interpretation function (i.e.legend).Although these two visualizations represent the same data,visualization 3-B is not an e¡ective visualization if the quantitative relations between the numbers of sold cars are the intended data relations.In this visualization,the perceivable relation induced by the shape attribute is the identity relation which implies that one can only perceive that the numbers of sold cars are equal or not.Of course,one may deduce the intended quanti-tative data relations indirectly by means of the interpretation information that are in-serted into visualization 3-B.For example,the interpretation values 20000and 40000implies that the second is twice the ¢rst.However,this inference is based on the knowl-edge of the perceiver about the structure of real numbers rather than being based on perceivable relations.The e¡ectiveness of visualization is the ability to perceive data rela-tions directly by structurally similar perceivable relations.In order to illustrate the direct perception of relations,consider visualizations 4-A and 4-B (Figure 4).These visualizations are the same as visualizations 3-A and3-B,100002000050000300004000050000vwtoyota opel mazda(a)(b)Figure 3.Alternative visualizations of the data described inT able 1respectively ,except that in these visualizations the interpretation information are not pre-sented.Knowing that the shape attribute visualizes the sold-number attribute in visualization 4-B,it is impossible to perceive any quantitative relation between visual elements and therefore it is impossible to infer any quantitative data relation.In contrast,knowing that the ypos attribute visualizes the sold-number attribute in visualization 4-A,it is easy to per-ceive quantitative relations between visual elements and therefore easy to infer quantita-tive data relations.In the rest of this paper,whenever we refer to the perceptual structure of visualizations we mean the structure of visualizations without using any interpretation information.The above examples show that the choice of visual attributes for data attributes may in£uence the e¡ectiveness of visualizations.However,the speci¢cations of visual entries by the layout process may in£uence the e¡ectiveness of visualizations as well.W e argued that the layout process should specify values for both decided and undecided visual attri-butes.In the context of e¡ective data visualization,the generation of values for the decided visual attributes is constrained by the perceivable relations.For example,in vi-sualization 3-A where the color-hue attribute is decided for the country attribute,the layout process has generated identical color-hue values whenever their corresponding data attri-bute values are identical.The generation of values for the undecided visual attributes may also in£uence the e¡ectiveness of visualizations.For example,consider again visualization 2-A.Although this visualization represents the same information as visualization 3-A,it is not an e¡ec-tive visualization.The reason is that a humanviewer perceives di¡erences between shapes of visual elements and may infer that these di¡erences visualize di¡erences in the pre-sented data.In order to avoid this type of unwanted visual implicatures,the values of undecided visual attributes should be speci¢ed in such a way to induce an identity rela-tion on the involved visual elements.In this way ,visual elements will be perceived as being identical to each other with respect to undecided visual attributes.This issue will be further explained in Section 4.33.An Attribute-based Classi¢cation of StructuresAn important step in data visualization is the projection of data structures (typed data table)to perceptual structures (types visual table).In this section,a unifying approach is proposed to analyze di¡erent classes of data and perceptual structures.This approachisopel toyota vw mazda(a)Figure 4.Visualizations without any interpretation valuesbased on a formalization of attributes in terms of their characterizing relations.The ad-vantage of this approach is that data and perceptual structures are both formalized in a uniform way such that the projection between them can be formulated in a formal and intuitive way .Based on this formalization a classi¢cation of data and perceptual structures is proposed.In order to de¢ne attributes in a formal way ,we use some notions from the measurement theory [12,13].The theory of measurement provides both a classi¢cation of attributes as well as their mathematical de¢nitions by introducing di¡erent types of measurementscales .In fact,a measurement scale is a map from a structured set of elements to a structured set of values like the set of real numbers,the set of integers,or a set of strings.In order to model aggregated attributes,we also allow an attribute to map a structured set of elements to another structured set of elements.In the measurement theory ,relational systems are used to represent structured sets.De¢nition 1.A relational system is a pair /A ;R 1;y ;R n S where A is a set of elements,and R 1;y ;R n are relations de¢ned on A :W e consider an attribute as a measurement scale,which map a structured set into another structured set.De¢nition 2.An attribute is a homomorphism H from a relational system /A ;R 1;y ;R n S into a relational system /B ;S 1;y ;S n S :The set A is the set of ele-ments and the set B is either a set of elements or a set of attribute values such as the set of real numbers,the set of integers,or a set of strings.In the context of data visualization,A can be either a set of visual elements or a set of data elements,R 1;y ;R n can be either perceptual relations or data relations,B is either a set of elements or a set of values,and S 1;y ;S n are characterizing relations de¢ned on B :These characterizing relations are abstract mathematical relations such as ¼;r and þ:The homomorphism guarantees that the relations that an attribute induces on elements have identical structural properties as its characterizing relations.In the next subsections,we introduce data attributes and their corresponding data structures,followed by visual attributes and their corresponding perceptual structures.3.1.Data AttributesBelow are ¢ve di¡erent types of data attributes that are relevant for data visualization.Nominal attributes :A nominal attribute is a homomorphicmap H from a relational system /A ;E S into a relational system /B ;¼S :Note that the homomorphicmap assigns values such as real numbers or strings to elements of A :Ordinal attributes :An ordinal attribute is a homomorphicmap H from a relational sys-tem /A ;$S into the relational system /B ;r S :The homomorphicmap matc hes the ordinal relation among attribute values with the ordinal relation among elements of A :Interval attributes :An interval attribute is a homomorphicmap H from a relational sys-tem /A ;$;3S into the relational system /R k ;r ;~S ;where ~is a quaternary me-trical relation de¢ned on real numbers.In the context of data visualization,we may use the quaternary metrical relation ~de¢ned as:ab ~cd 3j a Àb j r j c Àd j :This attribute。
Convergence and DivergenceLecture NotesIt is not always possible to determine the sum of a series exactly. For one thing, it is common for the sum to be a relatively arbitrary irrational number:"8œ"_8#$%""""8#$%œ" âœ"Þ#*"#)'áThe sum of this series isn't something simple like or — it's just some arbitrary real È#Î'1#number, whose digits can be determined only by adding together the terms of the series.(For example, the decimal approximation above was obtained by adding together the first hundred terms.)We have encountered this sort of problem before. Recall that there is no way to find the exact value of the integral:(!"B/.B œ"Þ%'#'&#á#The best you can do is a decimal approximation using rectangles or trapezoids.For series, these sorts of problems are ubiquitous. There are very few general techniques forfinding the exact sum of a series, and even relatively simple series such as cannot be !8œ"_$"Î8summed exactly. One of the hardest problems in mathematics, the Riemann hypothesis,essentially just asks for what values of the series:(a b :œâ"""""#$%::::sums to exactly zero. This problem was first proposed nearly 150 years ago, and it still has notbeen solved . (In the year 2000, the Clay Mathematics Institute announced a $1 million prize for a solution to this problem.)Because finding the exact sum of a series is so hard, we will usually concern ourselves not with adding up a given series exactly. Instead, we will focus on a much easier problem: can we at least figure out whether a given series converges?Positive SeriesFor various reasons, it is simpler to understand convergence and divergence for series whose terms are all positive numbers. We shall refer to such series as . Because each positive series partial sum of a positive series is greater than the last, every positive series either converges ordiverges to infinity. (As we shall see later on, series with negative terms have other possible behaviors.)RULE FOR POSITIVE SERIESIf is a positive series, then either !+8 1. converges to a positive number, or!+82. diverges to infinity.!+8We have seen many examples of convergent series, the most basic being:""""#%)"'âœ"This series is geometric, with each term a constant multiple of the last. (In this case, each term ishalf as big as the previous one.) This repeated multiplication causes the terms of a geometric series to become small very quickly. For example, the th term of the above series is:"!!""#œ¸!Þ()*"!!"#'('&!'!!##)##*%!"%*'(!$#!&$(',,,,,,,,,,We also know that a series diverges if its terms don't approach zero. For example:"#$%#$%&âœ_The terms in the above series get closer and closer to , causing the series to diverge to ."_ However, there are also lots of divergent series whose terms do approach zero. Here is an illustrative example:EXAMPLE 1Consider the series"â"""""""""##$$$%%%%Even though the individual terms of this series converge to zero, the sum of the entire series isinfinite. To see this, consider what happens if we group similar terms together:" â"""""""""##$$$%%%%ŒŒ Œ Because each group of terms adds up to , the total sum must be infinite:"" " " " âœ_.èThe idea is that a series only converges if its terms are small quickly (or become small ). For the series in the last example, the hundredth term is , and the thousandth term is — much "Î"%"Î%&larger than the hundredth or thousandth term of a geometric series.Here is a picture illustrating the sum of the series in the last example:" # "âDIVERGENCE" # " $ " $" $ As you can see, the individual terms of the series are getting smaller, but not enough forquickly the sum to be finite. By contrast, here is a picture of the geometric series :" â#%$*âCONVERGENCE#$ "% * ) #(The terms of this series become very small very quickly, forcing the sum to converge.If you want to know whether a series with positive terms converges, the main question is !+8how quickly the terms approach zero as .+8Ä_8The Comparison TestThe basic technique for understanding positive series is to .compare them with each other This is based on the following principle:THE COMPARISON THEOREMLet and be positive series, and suppose that!!+,88+Ÿ,88for each term. Then!!+Ÿ,88.That is, if the terms of are smaller than the corresponding terms of , then the sum of !!+,88!!+,88 must be less than the sum of .EXAMPLE 2D etermine whether the series converges or diverges."8œ"_8"8SOLUTIONRecall that becomes large very quickly as . Then the reciprocals 88Ä_"Î888must become small very quickly, which ought to cause the series to converge. To show this, let's examine the first few terms of the series:"8œ"_8"""""8%#(#&'$"#&œ" âAs expected, the terms become very small, very quickly. For example, each term of this series issmaller than the corresponding term of the series :!"Î#8" â""""%#(#&'$"#&"â""""#%)"'It follows that :the sum of the top series must be smaller than the sum of the bottom series Œ Œ "â " âœ#""""""%#(#&'#%)This proves that the series converges to some number smaller than two."8œ"_8"8èIn general, if you know that a series converges, then any series must converge as well.smaller On the other hand, if you know that a series diverges, then any series must diverge as larger well. This is the basic test for convergence:COMPARISON TESTLet and be positive series.!!+,88 1. If converges and , then must converge as well.!!,+Ÿ,+88882. If diverges and , then must diverge as well.!!,+ ,+8888EXAMPLE 3Determine whether the series converges or diverges."8œ"_8"8†#SOLUTIONWe know that the sum of converges:"Î#8"8œ"_8"""""##%)"'œâœ"How do things change when we introduce the extra factor of ?8The answer is that the new factor makes the denominator bigger:8†# #88and therefore makes the whole fraction :smaller ""8†##Ÿ88Since converges, the smaller series must converge as well.""8œ"8œ"__88""#8†#èThe argument in the above example was entirely algebraic, and therefore somewhat abstract. In case you weren't convinced, here's a numerical comparison of the two series:""8œ"_88œ"_8""""""8†##)#%'%"'!œâ""""""##%)"'$#œ âEach term in the first series is smaller than the corresponding term of the second series. Since the second series adds up to , the first series must add up to some number less than ."" In the next example, we use the comparison test to show that a series diverges:EXAMPLE 4The series"8œ#_"""""8#$%&œ âis known as the . It is a very important fact that .harmonic series the harmonic series diverges There are several different ways to show this, and one of them is to use a comparison.What series should we compare to? Well, if we want to show that the sum of the harmonic series is infinite, we must show that its terms are bigger than the terms of another series that sums to infinity. This requires quite a bit of cleverness. Here's one comparison that works:harmonic series new series""""""""#$%&'()*â""""""""#%%))))"'âAs you can see, each term of the harmonic series is than the corresponding term in thebigger new series. But the new series diverges, which can be seen by grouping the terms together:"""""""""#%%))))"'"' â âœâœ_""""####Œ Œ Œ .Since the harmonic series is even larger than this divergent series, it must diverge as well.èAnother way of thinking about the reasoning above is that, if we group together certain terms of the harmonic series, then the sum of each grouping is at least :"Î#"""""""""#$%&'()*"' â â""""####Œ Œ Œ In some sense, the harmonic series is the most basic series that diverges. Now that we know about it, we can use it for comparisons:EXAMPLE 5Determine whether the series converges or diverges."8œ#_"8ln SOLUTIONRecall that goes to infinity as . This makes the terms ofln 88Ä_very slowly the above series go to zero very slowly (e.g. the th term of this series is slightly greater "than ), which indicates that the series ought to diverge."Î( We can verify this by comparing with the harmonic series. It is easy to check thatln 8Ÿ8for any value of , and so8""88ln .Since the harmonic series diverges, the larger series must diverge as well.""8œ#8œ#__""88ln èP-SeriesA is a series of the form:-series "8œ"_:"8.where is a constant. For example, when , this is the harmonic series:::œ""8œ"_"8œ" â:œ"""""#$%&()When , it is the sum of the reciprocals of the squares::œ#"8œ"_#"8œ"â:œ#""""#$%&####()Higher values of would correspond to cubes, fourth powers, and so forth. The exponent ::could also be a fraction. For example, gives the reciprocals of the square roots::œ"Î#"ÈÈÈÈÈ8œ"_"""""8œ" â:œ"Î##$%&()The larger the value of , the more quickly the terms of a -series go to zero. The following::theorem tells us how the convergence of a -series depends on :::CONVERGENCE OF P-SERIES The -series:"8œ"_:"8converges for , and diverges for .: ":Ÿ"That is, the -series behaves just like the improper integral:"Î8!8œ"_:("_:"B .B ,which is finite for , and infinite for . This similarity between -series and improper : ":Ÿ":integrals is not a coincidence. Indeed, the simplest way to show that a -series converges is to :compare it with an integral .EXAMPLE 6Consider the following series:"8œ#_#"""""8%*"'#&œâWe shall show that this series converges by comparing it with an improper integral.To start with, consider the following arrangement of rectangles:CThe areas of the rectangles are , , , and so on, with the total area being the sum of the"Î%"Î*"Î"'series. However, all of the rectangles lie under the graph of :"ÎB # CIn particular, the total area of the rectangles must be less than the area under the curve. But thearea under is finite:"ÎB #("_#"B .B œ"It follows that"8œ#_#"8 ".In particular, this series must converge.èUsing this exact same argument, it is possible to show that"(8œ#_::"_""8B .B for any positive exponent . In particular, the -series must converge whenever the integral does.::The Hierarchy of SeriesMost of the series we have been discussing fit into a hierarchy:""""""""88x $#888¥¥â¥¥¥â¥¥¥¥â¥8:888#èëëëëëëëéëëëëëëëêèëëëëëëëëëëëëéëëëëëëëëëëëëêÈgeometric series -seriesln Each series in this hierarchy is the reciprocal of a function from the asymptotic hierarchy (see theLimits at Infinity notes). Because we have taken reciprocals, the order of the functions has reversed (so is the largest, and is the smallest)."Î8"Î8ln 8 The series on the left side of this hierarchy converge (since they are the smallest), while the series on the right side diverge. The barrier between convergence and divergence is in the middle of the -series::"""""""""88x $#8888¥¥â¥¥¥â¥¥¥â¥¥¥â¥8888#"Þ"Èln convergent divergent»Note that the harmonic series is the first -series that diverges.: Many complicated series can be handled by determining where they fit on the hierarchy:LIMIT COMPARISON TESTLet and be series of positive numbers, and suppose that:!!+,88+µ,88Then the two series either both converge or both diverge.Remember that means .+µ,œ"+,888Ä_88limEXAMPLE 7Determine whether the following series converge or diverge:(a)(b)(c)"""È8œ"8œ"8œ"___#$8%&8"8 "8 #8 &8$SOLUTIONFor series (a), observe that:""8 &8µ##Since converges, series (a) must converge as well."8œ"_#"8For series (b), observe that:È8 "8"888µœ%$$#Since the harmonic series diverges, series (b) must diverge as well."8œ"_"8 For series (c), observe that:8 ###$$$µœ&88888ŒSince is a convergent geometric series, series (c) must converge as well."Œ 8œ"_8#$èSome series fall between the spots shown on the hierarchy. In this case, one must deduce wherethe series lies:EXAMPLE 8Determine whether the series converges or diverges."8œ"_#ln 88SOLUTION Remember that goes to infinity, but more slowly than positive power of .ln 88any For example,ln 8¥8!Þ!".Therefore, the in the numerator has no more effect than a would:ln 88!Þ!"ln 88"888¥œ##"Þ**!Þ!".Since converges, the smaller series must converge as well.""8œ"8œ"__"Þ**#"888ln èBased on our reasoning in the last example, must fit into the hierarchy , but ln 8"88##after before any smaller power of :8⥥¥â¥¥¥¥¥â"8""""888888##"Þ***"Þ**"Þ*ln There are many other spots like this in the hierarchy. For example, is bigger than , but 8"$$88smaller than any other geometric series:⥥¥â¥¥¥¥¥â"8""""$$#Þ***#Þ**#Þ*#888888a b a b a bEXERCISES1–4 çDetermine whether the given -series converges.: 1. 2.""ÈÈ8œ"8œ"__""888$%3. 4.""ÈÈÈ8œ"8œ"__#8888$5–14 çUse the Limit Comparison Test to determine whether the given series converges or diverges.5. 6.""È8œ"8œ"__#8"8 "8 " 7.8.""8œ"8œ"__8$#$"% &8 8 9.10.""Èa b 8œ"8œ"__"$8 8x 88 $ 11.12.""È8œ"8œ"__$8#8"& "8 &8 % 13.14. ""ÈÈ8œ"8œ"__#8"!#8&8 $8#8 %8$& 8$15–20 çFind the position of the given series on the hierarchy,and determine whether it converges or diverges.15.16.""Èa b 8œ"8œ"__88""$88x17.18.""Èa b 8œ"8œ"__&#ln ln 8888819.20.""Èa b8œ"8œ"__""888ln ln ln 21. Does the series converge or diverge?"a b 8œ#_8"8ln 22. Does the series converge or diverge?"È8œ"_"8x 23. Does the series converge or diverge?"a b 8œ"_"8"8tan 24. Does the series converge or diverge?"8œ"_8"#ln。
Gradient-Based Learning Appliedto Document RecognitionYANN LECUN,MEMBER,IEEE,L´EON BOTTOU,YOSHUA BENGIO,AND PATRICK HAFFNER Invited PaperMultilayer neural networks trained with the back-propagation algorithm constitute the best example of a successful gradient-based learning technique.Given an appropriate network architecture,gradient-based learning algorithms can be used to synthesize a complex decision surface that can classify high-dimensional patterns,such as handwritten characters,with minimal preprocessing.This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task.Convolutional neural networks,which are specifically designed to deal with the variability of two dimensional(2-D)shapes,are shown to outperform all other techniques.Real-life document recognition systems are composed of multiple modules includingfield extraction,segmentation,recognition, and language modeling.A new learning paradigm,called graph transformer networks(GTN’s),allows such multimodule systems to be trained globally using gradient-based methods so as to minimize an overall performance measure.Two systems for online handwriting recognition are described. Experiments demonstrate the advantage of global training,and theflexibility of graph transformer networks.A graph transformer network for reading a bank check is also described.It uses convolutional neural network character recognizers combined with global training techniques to provide record accuracy on business and personal checks.It is deployed commercially and reads several million checks per day. Keywords—Convolutional neural networks,document recog-nition,finite state transducers,gradient-based learning,graphtransformer networks,machine learning,neural networks,optical character recognition(OCR).N OMENCLATUREGT Graph transformer.GTN Graph transformer network.HMM Hidden Markov model.HOS Heuristic oversegmentation.K-NN K-nearest neighbor.Manuscript received November1,1997;revised April17,1998.Y.LeCun,L.Bottou,and P.Haffner are with the Speech and Image Processing Services Research Laboratory,AT&T Labs-Research,Red Bank,NJ07701USA.Y.Bengio is with the D´e partement d’Informatique et de Recherche Op´e rationelle,Universit´e de Montr´e al,Montr´e al,Qu´e bec H3C3J7Canada. Publisher Item Identifier S0018-9219(98)07863-3.NN Neural network.OCR Optical character recognition.PCA Principal component analysis.RBF Radial basis function.RS-SVM Reduced-set support vector method. SDNN Space displacement neural network.SVM Support vector method.TDNN Time delay neural network.V-SVM Virtual support vector method.I.I NTRODUCTIONOver the last several years,machine learning techniques, particularly when applied to NN’s,have played an increas-ingly important role in the design of pattern recognition systems.In fact,it could be argued that the availability of learning techniques has been a crucial factor in the recent success of pattern recognition applications such as continuous speech recognition and handwriting recognition. The main message of this paper is that better pattern recognition systems can be built by relying more on auto-matic learning and less on hand-designed heuristics.This is made possible by recent progress in machine learning and computer ing character recognition as a case study,we show that hand-crafted feature extraction can be advantageously replaced by carefully designed learning machines that operate directly on pixel ing document understanding as a case study,we show that the traditional way of building recognition systems by manually integrating individually designed modules can be replaced by a unified and well-principled design paradigm,called GTN’s,which allows training all the modules to optimize a global performance criterion.Since the early days of pattern recognition it has been known that the variability and richness of natural data, be it speech,glyphs,or other types of patterns,make it almost impossible to build an accurate recognition system entirely by hand.Consequently,most pattern recognition systems are built using a combination of automatic learning techniques and hand-crafted algorithms.The usual method0018–9219/98$10.00©1998IEEE2278PROCEEDINGS OF THE IEEE,VOL.86,NO.11,NOVEMBER1998Fig.1.Traditional pattern recognition is performed with two modules:afixed feature extractor and a trainable classifier.of recognizing individual patterns consists in dividing the system into two main modules shown in Fig.1.Thefirst module,called the feature extractor,transforms the input patterns so that they can be represented by low-dimensional vectors or short strings of symbols that:1)can be easily matched or compared and2)are relatively invariant with respect to transformations and distortions of the input pat-terns that do not change their nature.The feature extractor contains most of the prior knowledge and is rather specific to the task.It is also the focus of most of the design effort, because it is often entirely hand crafted.The classifier, on the other hand,is often general purpose and trainable. One of the main problems with this approach is that the recognition accuracy is largely determined by the ability of the designer to come up with an appropriate set of features. This turns out to be a daunting task which,unfortunately, must be redone for each new problem.A large amount of the pattern recognition literature is devoted to describing and comparing the relative merits of different feature sets for particular tasks.Historically,the need for appropriate feature extractors was due to the fact that the learning techniques used by the classifiers were limited to low-dimensional spaces with easily separable classes[1].A combination of three factors has changed this vision over the last decade.First, the availability of low-cost machines with fast arithmetic units allows for reliance on more brute-force“numerical”methods than on algorithmic refinements.Second,the avail-ability of large databases for problems with a large market and wide interest,such as handwriting recognition,has enabled designers to rely more on real data and less on hand-crafted feature extraction to build recognition systems. The third and very important factor is the availability of powerful machine learning techniques that can handle high-dimensional inputs and can generate intricate decision functions when fed with these large data sets.It can be argued that the recent progress in the accuracy of speech and handwriting recognition systems can be attributed in large part to an increased reliance on learning techniques and large training data sets.As evidence of this fact,a large proportion of modern commercial OCR systems use some form of multilayer NN trained with back propagation.In this study,we consider the tasks of handwritten character recognition(Sections I and II)and compare the performance of several learning techniques on a benchmark data set for handwritten digit recognition(Section III). While more automatic learning is beneficial,no learning technique can succeed without a minimal amount of prior knowledge about the task.In the case of multilayer NN’s, a good way to incorporate knowledge is to tailor its archi-tecture to the task.Convolutional NN’s[2],introduced in Section II,are an example of specialized NN architectures which incorporate knowledge about the invariances of two-dimensional(2-D)shapes by using local connection patterns and by imposing constraints on the weights.A comparison of several methods for isolated handwritten digit recogni-tion is presented in Section III.To go from the recognition of individual characters to the recognition of words and sentences in documents,the idea of combining multiple modules trained to reduce the overall error is introduced in Section IV.Recognizing variable-length objects such as handwritten words using multimodule systems is best done if the modules manipulate directed graphs.This leads to the concept of trainable GTN,also introduced in Section IV. Section V describes the now classical method of HOS for recognizing words or other character strings.Discriminative and nondiscriminative gradient-based techniques for train-ing a recognizer at the word level without requiring manual segmentation and labeling are presented in Section VI. Section VII presents the promising space-displacement NN approach that eliminates the need for segmentation heuris-tics by scanning a recognizer at all possible locations on the input.In Section VIII,it is shown that trainable GTN’s can be formulated as multiple generalized transductions based on a general graph composition algorithm.The connections between GTN’s and HMM’s,commonly used in speech recognition,is also treated.Section IX describes a globally trained GTN system for recognizing handwriting entered in a pen computer.This problem is known as “online”handwriting recognition since the machine must produce immediate feedback as the user writes.The core of the system is a convolutional NN.The results clearly demonstrate the advantages of training a recognizer at the word level,rather than training it on presegmented, hand-labeled,isolated characters.Section X describes a complete GTN-based system for reading handwritten and machine-printed bank checks.The core of the system is the convolutional NN called LeNet-5,which is described in Section II.This system is in commercial use in the NCR Corporation line of check recognition systems for the banking industry.It is reading millions of checks per month in several banks across the United States.A.Learning from DataThere are several approaches to automatic machine learn-ing,but one of the most successful approaches,popularized in recent years by the NN community,can be called“nu-merical”or gradient-based learning.The learning machine computes afunction th input pattern,andtheoutputthatminimizesand the error rate on the trainingset decreases with the number of training samplesapproximatelyasis the number of trainingsamples,is a number between0.5and1.0,andincreases,decreases.Therefore,when increasing thecapacitythat achieves the lowest generalizationerror Mostlearning algorithms attempt tominimize as well assome estimate of the gap.A formal version of this is calledstructural risk minimization[6],[7],and it is based on defin-ing a sequence of learning machines of increasing capacity,corresponding to a sequence of subsets of the parameterspace such that each subset is a superset of the previoussubset.In practical terms,structural risk minimization isimplemented byminimizingisaconstant.that belong to high-capacity subsets ofthe parameter space.Minimizingis a real-valuedvector,with respect towhichis iteratively adjusted asfollows:is updated on the basis of a singlesampleof several layers of processing,i.e.,the back-propagation algorithm.The third event was the demonstration that the back-propagation procedure applied to multilayer NN’s with sigmoidal units can solve complicated learning tasks. The basic idea of back propagation is that gradients can be computed efficiently by propagation from the output to the input.This idea was described in the control theory literature of the early1960’s[16],but its application to ma-chine learning was not generally realized then.Interestingly, the early derivations of back propagation in the context of NN learning did not use gradients but“virtual targets”for units in intermediate layers[17],[18],or minimal disturbance arguments[19].The Lagrange formalism used in the control theory literature provides perhaps the best rigorous method for deriving back propagation[20]and for deriving generalizations of back propagation to recurrent networks[21]and networks of heterogeneous modules[22].A simple derivation for generic multilayer systems is given in Section I-E.The fact that local minima do not seem to be a problem for multilayer NN’s is somewhat of a theoretical mystery. It is conjectured that if the network is oversized for the task(as is usually the case in practice),the presence of “extra dimensions”in parameter space reduces the risk of unattainable regions.Back propagation is by far the most widely used neural-network learning algorithm,and probably the most widely used learning algorithm of any form.D.Learning in Real Handwriting Recognition Systems Isolated handwritten character recognition has been ex-tensively studied in the literature(see[23]and[24]for reviews),and it was one of the early successful applications of NN’s[25].Comparative experiments on recognition of individual handwritten digits are reported in Section III. They show that NN’s trained with gradient-based learning perform better than all other methods tested here on the same data.The best NN’s,called convolutional networks, are designed to learn to extract relevant features directly from pixel images(see Section II).One of the most difficult problems in handwriting recog-nition,however,is not only to recognize individual charac-ters,but also to separate out characters from their neighbors within the word or sentence,a process known as seg-mentation.The technique for doing this that has become the“standard”is called HOS.It consists of generating a large number of potential cuts between characters using heuristic image processing techniques,and subsequently selecting the best combination of cuts based on scores given for each candidate character by the recognizer.In such a model,the accuracy of the system depends upon the quality of the cuts generated by the heuristics,and on the ability of the recognizer to distinguish correctly segmented characters from pieces of characters,multiple characters, or otherwise incorrectly segmented characters.Training a recognizer to perform this task poses a major challenge because of the difficulty in creating a labeled database of incorrectly segmented characters.The simplest solution consists of running the images of character strings through the segmenter and then manually labeling all the character hypotheses.Unfortunately,not only is this an extremely tedious and costly task,it is also difficult to do the labeling consistently.For example,should the right half of a cut-up four be labeled as a one or as a noncharacter?Should the right half of a cut-up eight be labeled as a three?Thefirst solution,described in Section V,consists of training the system at the level of whole strings of char-acters rather than at the character level.The notion of gradient-based learning can be used for this purpose.The system is trained to minimize an overall loss function which measures the probability of an erroneous answer.Section V explores various ways to ensure that the loss function is differentiable and therefore lends itself to the use of gradient-based learning methods.Section V introduces the use of directed acyclic graphs whose arcs carry numerical information as a way to represent the alternative hypotheses and introduces the idea of GTN.The second solution,described in Section VII,is to eliminate segmentation altogether.The idea is to sweep the recognizer over every possible location on the input image,and to rely on the“character spotting”property of the recognizer,i.e.,its ability to correctly recognize a well-centered character in its inputfield,even in the presence of other characters besides it,while rejecting images containing no centered characters[26],[27].The sequence of recognizer outputs obtained by sweeping the recognizer over the input is then fed to a GTN that takes linguistic constraints into account andfinally extracts the most likely interpretation.This GTN is somewhat similar to HMM’s,which makes the approach reminiscent of the classical speech recognition[28],[29].While this technique would be quite expensive in the general case,the use of convolutional NN’s makes it particularly attractive because it allows significant savings in computational cost.E.Globally Trainable SystemsAs stated earlier,most practical pattern recognition sys-tems are composed of multiple modules.For example,a document recognition system is composed of afield loca-tor(which extracts regions of interest),afield segmenter (which cuts the input image into images of candidate characters),a recognizer(which classifies and scores each candidate character),and a contextual postprocessor,gen-erally based on a stochastic grammar(which selects the best grammatically correct answer from the hypotheses generated by the recognizer).In most cases,the information carried from module to module is best represented as graphs with numerical information attached to the arcs. For example,the output of the recognizer module can be represented as an acyclic graph where each arc contains the label and the score of a candidate character,and where each path represents an alternative interpretation of the input string.Typically,each module is manually optimized,or sometimes trained,outside of its context.For example,the character recognizer would be trained on labeled images of presegmented characters.Then the complete system isLECUN et al.:GRADIENT-BASED LEARNING APPLIED TO DOCUMENT RECOGNITION2281assembled,and a subset of the parameters of the modules is manually adjusted to maximize the overall performance. This last step is extremely tedious,time consuming,and almost certainly suboptimal.A better alternative would be to somehow train the entire system so as to minimize a global error measure such as the probability of character misclassifications at the document level.Ideally,we would want tofind a good minimum of this global loss function with respect to all theparameters in the system.If the loss functionusing gradient-based learning.However,at first glance,it appears that the sheer size and complexity of the system would make this intractable.To ensure that the global loss functionwithrespect towith respect toFig.2.Architecture of LeNet-5,a convolutional NN,here used for digits recognition.Each plane is a feature map,i.e.,a set of units whose weights are constrained to be identical.or other2-D or one-dimensional(1-D)signals,must be approximately size normalized and centered in the input field.Unfortunately,no such preprocessing can be perfect: handwriting is often normalized at the word level,which can cause size,slant,and position variations for individual characters.This,combined with variability in writing style, will cause variations in the position of distinctive features in input objects.In principle,a fully connected network of sufficient size could learn to produce outputs that are invari-ant with respect to such variations.However,learning such a task would probably result in multiple units with similar weight patterns positioned at various locations in the input so as to detect distinctive features wherever they appear on the input.Learning these weight configurations requires a very large number of training instances to cover the space of possible variations.In convolutional networks,as described below,shift invariance is automatically obtained by forcing the replication of weight configurations across space. Secondly,a deficiency of fully connected architectures is that the topology of the input is entirely ignored.The input variables can be presented in any(fixed)order without af-fecting the outcome of the training.On the contrary,images (or time-frequency representations of speech)have a strong 2-D local structure:variables(or pixels)that are spatially or temporally nearby are highly correlated.Local correlations are the reasons for the well-known advantages of extracting and combining local features before recognizing spatial or temporal objects,because configurations of neighboring variables can be classified into a small number of categories (e.g.,edges,corners,etc.).Convolutional networks force the extraction of local features by restricting the receptive fields of hidden units to be local.A.Convolutional NetworksConvolutional networks combine three architectural ideas to ensure some degree of shift,scale,and distortion in-variance:1)local receptivefields;2)shared weights(or weight replication);and3)spatial or temporal subsampling.A typical convolutional network for recognizing characters, dubbed LeNet-5,is shown in Fig.2.The input plane receives images of characters that are approximately size normalized and centered.Each unit in a layer receives inputs from a set of units located in a small neighborhood in the previous layer.The idea of connecting units to local receptivefields on the input goes back to the perceptron in the early1960’s,and it was almost simultaneous with Hubel and Wiesel’s discovery of locally sensitive,orientation-selective neurons in the cat’s visual system[30].Local connections have been used many times in neural models of visual learning[2],[18],[31]–[34].With local receptive fields neurons can extract elementary visual features such as oriented edges,endpoints,corners(or similar features in other signals such as speech spectrograms).These features are then combined by the subsequent layers in order to detect higher order features.As stated earlier,distortions or shifts of the input can cause the position of salient features to vary.In addition,elementary feature detectors that are useful on one part of the image are likely to be useful across the entire image.This knowledge can be applied by forcing a set of units,whose receptivefields are located at different places on the image,to have identical weight vectors[15], [32],[34].Units in a layer are organized in planes within which all the units share the same set of weights.The set of outputs of the units in such a plane is called a feature map. Units in a feature map are all constrained to perform the same operation on different parts of the image.A complete convolutional layer is composed of several feature maps (with different weight vectors),so that multiple features can be extracted at each location.A concrete example of this is thefirst layer of LeNet-5shown in Fig.2.Units in thefirst hidden layer of LeNet-5are organized in six planes,each of which is a feature map.A unit in a feature map has25inputs connected to a5case of LeNet-5,at each input location six different types of features are extracted by six units in identical locations in the six feature maps.A sequential implementation of a feature map would scan the input image with a single unit that has a local receptive field and store the states of this unit at corresponding locations in the feature map.This operation is equivalent to a convolution,followed by an additive bias and squashing function,hence the name convolutional network.The kernel of the convolution is theOnce a feature has been detected,its exact location becomes less important.Only its approximate position relative to other features is relevant.For example,once we know that the input image contains the endpoint of a roughly horizontal segment in the upper left area,a corner in the upper right area,and the endpoint of a roughly vertical segment in the lower portion of the image,we can tell the input image is a seven.Not only is the precise position of each of those features irrelevant for identifying the pattern,it is potentially harmful because the positions are likely to vary for different instances of the character.A simple way to reduce the precision with which the position of distinctive features are encoded in a feature map is to reduce the spatial resolution of the feature map.This can be achieved with a so-called subsampling layer,which performs a local averaging and a subsampling,thereby reducing the resolution of the feature map and reducing the sensitivity of the output to shifts and distortions.The second hidden layer of LeNet-5is a subsampling layer.This layer comprises six feature maps,one for each feature map in the previous layer.The receptive field of each unit is a 232p i x e l i m a g e .T h i s i s s i g n i fic a n tt h e l a r g e s t c h a r a c t e r i n t h e d a t a b a s e (a t28fie l d ).T h e r e a s o n i s t h a t i t it h a t p o t e n t i a l d i s t i n c t i v e f e a t u r e s s u c h o r c o r n e r c a n a p p e a r i n t h e c e n t e r o f t h o f t h e h i g h e s t l e v e l f e a t u r e d e t e c t o r s .o f c e n t e r s o f t h e r e c e p t i v e fie l d s o f t h e l a y e r (C 3,s e e b e l o w )f o r m a 2032i n p u t .T h e v a l u e s o f t h e i n p u t p i x e l s o t h a t t h e b a c k g r o u n d l e v e l (w h i t e )c o ro fa n d t h e f o r e g r o u n d (b l ac k )c o r r e s p T h i s m a k e s t h e m e a n i n p u t r o u g h l y z e r o r o u g h l y o n e ,w h i c h a c c e l e r a t e s l e a r n i n g I n t h e f o l l o w i n g ,c o n v o l u t i o n a l l a y e r s u b s a m p l i n g l a y e r s a r e l a b e l ed S x ,a n d l a ye r s a r e l a b e l e d F x ,w h e r e x i s t h e l a y L a y e r C 1i s a c o n v o l u t i o n a l l a y e r w i t h E a c h u n i t i n e a c hf e a t u r e m a p i s c o n n e c t28w h i c h p r e v e n t s c o n n e c t i o n f r o m t h e i n p t h e b o u n d a r y .C 1c o n t a i n s 156t r a i n a b l 122304c o n n e c t i o n s .L a y e r S 2i s a s u b s a m p l i n g l a y e r w i t h s i s i z e 142n e i g h b o r h o o d i n t h e c o r r e s p o n d i n g f T h e f o u r i n p u t s t o a u n i t i n S 2a r e a d d e d ,2284P R O C E E D I N G S O F T H E I E E E ,V O L .86,N O .11,N O VTable 1Each Column Indicates Which Feature Map in S2Are Combined by the Units in a Particular Feature Map ofC3a trainable coefficient,and then added to a trainable bias.The result is passed through a sigmoidal function.The25neighborhoods at identical locations in a subset of S2’s feature maps.Table 1shows the set of S2feature maps combined by each C3feature map.Why not connect every S2feature map to every C3feature map?The reason is twofold.First,a noncomplete connection scheme keeps the number of connections within reasonable bounds.More importantly,it forces a break of symmetry in the network.Different feature maps are forced to extract dif-ferent (hopefully complementary)features because they get different sets of inputs.The rationale behind the connection scheme in Table 1is the following.The first six C3feature maps take inputs from every contiguous subsets of three feature maps in S2.The next six take input from every contiguous subset of four.The next three take input from some discontinuous subsets of four.Finally,the last one takes input from all S2feature yer C3has 1516trainable parameters and 156000connections.Layer S4is a subsampling layer with 16feature maps of size52neighborhood in the corresponding feature map in C3,in a similar way as C1and yer S4has 32trainable parameters and 2000connections.Layer C5is a convolutional layer with 120feature maps.Each unit is connected to a55,the size of C5’s feature maps is11.This process of dynamically increasing thesize of a convolutional network is described in Section yer C5has 48120trainable connections.Layer F6contains 84units (the reason for this number comes from the design of the output layer,explained below)and is fully connected to C5.It has 10164trainable parameters.As in classical NN’s,units in layers up to F6compute a dot product between their input vector and their weight vector,to which a bias is added.This weighted sum,denotedforunit (6)wheredeterminesits slope at the origin.Thefunctionis chosen to be1.7159.The rationale for this choice of a squashing function is given in Appendix A.Finally,the output layer is composed of Euclidean RBF units,one for each class,with 84inputs each.The outputs of each RBFunit(7)In other words,each output RBF unit computes the Eu-clidean distance between its input vector and its parameter vector.The further away the input is from the parameter vector,the larger the RBF output.The output of a particular RBF can be interpreted as a penalty term measuring the fit between the input pattern and a model of the class associated with the RBF.In probabilistic terms,the RBF output can be interpreted as the unnormalized negative log-likelihood of a Gaussian distribution in the space of configurations of layer F6.Given an input pattern,the loss function should be designed so as to get the configuration of F6as close as possible to the parameter vector of the RBF that corresponds to the pattern’s desired class.The parameter vectors of these units were chosen by hand and kept fixed (at least initially).The components of thoseparameters vectors were set to1.While they could have been chosen at random with equal probabilities for1,or even chosen to form an error correctingcode as suggested by [47],they were instead designed to represent a stylized image of the corresponding character class drawn on a7。
G EN GE DA Development Environmentfor Visual LanguagesRoswitha Bardohl,Magnus Niemann,Manuel SchwarzeDepartment of Computer ScienceTechnical University Berlin,Germanyrosi,maggi,omanuels@cs.tu-berlin.deAbstractWithin this contribution G EN GE D is presented,a development environment for vi-sual languages.G EN GE D offers a hybrid language for defining the syntax of visual languages consisting of an alphabet and a grammar.So far,the main components of G EN GE D are given by an alphabet and a grammar editor.The syntax description is the input of a diagram editor allowing the syntax-directed manipulation of diagrams. The grammar definition as well as the manipulation of diagrams is based on algebraic graph transformation and graphical constraint solving.Keywords:visual language,algebraic graph transformation,constraint solving,rule-and constraint-based editor.1IntroductionVisual languages(VLs)emerge in an increasing frequency(cf.[VLxx]for more de-tails).Usually,they arefixed implemented in a visual environment(VE).This is the main disadvantage when regarding VLs for e.g.,software modeling or programming. Whenever the concepts of a language or visual means are changed a partly reimple-mentation of the supporting VE is necessary leading to a time and so far cost expensive affair.This is the reason why generators of VEs emerge more and more whose input is at least a textual syntax description of a specific VL.However,visual statements are difficult to describe textually because of their graphical structure.In the opposite, visual descriptions are mostly not sufficient enough to define all necessities[Sch98]. To avoid such problems we propose G EN GE D,a development environment for VLs.G EN GE D allows mainly the visual definition of VLs.The textual parts are concerned with graphical constraints and common datatypes like strings or integers.Within G EN GE D the syntax of a VL is defined by an alphabet and a grammar. Accordingly,the main components of G EN GE D as illustrated by Figure1are given by an alphabet and a grammar editor.The constraint-based alphabet editor allows the definition of a VL-alphabet.Such an alphabet is a type description for instances as they occur within diagrams.The grammar editor is rule-and constraint-based because we use attributed graph grammars,respectively graph transformation,as underlying formalisms.The defined VL-alphabet is the input of both,a pure constraint-based editorfor testing the alphabet description as well as the grammar editor.Within the grammar editor so-called alphabet rules are generated from the alphabet.These rules are the edit commands of the grammar editor.The user defined VL-grammar is the input of a rule-and constraint-based diagram editor allowing the syntax-directed manipulation of VL-diagrams by applying grammar rules.Figure1.The G EN GE D compontents.Parsing facilities as provided by D IA G EN[MV95]or V LCC[CODL95]are still not supported within G EN GE D.In the opposite,a textual syntax description is the input of D IA G EN and not a visual one.V LCC supports the visual definition of an alphabet but in the current state the grammar must be defined textually.In addition,within a generated diagram editor the movement of graphics do not lead to the movement of inherent graphics which is one of the main necessities of a diagram editor.Usually,a diagram comprises several symbols and connections between symbols. In G EN GE D a diagram is represented by an attributed graph structure consisting of a logical level(abstract syntax)which is connected with the layout level(concrete syntax) by suitable operations[BTMS99].In the implementation,the logical level of a dia-gram is mapped onto an attributed graph,where the nodes represent the symbols of the diagram and the edges the connections.The logical level of common datatypes like strings or integers,which are also interpreted as symbols,are mapped onto the attributes of a node.Based on attributed graphs the manipulation of diagrams is supported by the integrated graph transformation machine A GG[TER99].The layout level of a diagram is given by a set of graphics and graphical constraints.These constraints treat mainly positions and sizes of graphics.The solving of graphical constraints is supported by the integrated constraint solver P AR C ON[Gri96].To make the work with G EN GE D as easy as possible,the editors have a similiar GUI.I.e.,the GUI comprises mainly a structure view and a drawing area for the ele-ments belonging to the VL-alphabet,VL-grammar or diagram,respectively.The struc-ture view holds the(textual)logical level of symbols and connections which can be selected for further progress.This paper is organized as follows:In Section2the components of the alphabet edi-tor are discussed together with the test editor.The grammar editor is topic of Section3 and additionally,the rule-and constraint-based diagram editor.Some conclusional re-marks can be found in Section4.2Alphabet and Test EditorThe alphabet editor supports the definition of a VL-alphabet consisting of symbols and connections.Accordingly,the alphabet editor comprises a symbol and a connection edi-tor.E.g.,symbols of a VL for class diagrams are given by a class represented by three rectangles,or an association represented by an ually,the association arrow is connected with classes,i.e.,it begins and ends at a class symbol.In order to test the defined alphabet we developed a simple test editor.It supports the pure constraint-based drawing of diagrams,i.e.,without regarding any VL-rules.Symbol Editor The symbol editor works similar to a common graphical editor:Several primitive graphics are available like rectangle,circle,ellipse,polygon,line,etc.The user can change the form of every primitive and change default properties likefill and line color,style,etc.As usual,primitives can be translated,scaled,rotated and stretched. Not only the drawing of symbol graphics is supported but also the import of bitmap graphics(GIF,JPG).In contrast to a common graphical editor,the grouping of graphical primitives in order to build up a symbol graphic is done via constraints.E.g.,the three rectangles of a class symbol must be connected by constraints.A constraint menu is available where suitable constraints can be selected.Constraints must be used also to define certain at-tachment points which are useful for defining connections within the sequently toggled connection editor.Such attachment points are treated as primitives,too,and so far,they are to be connected with other primitives by constraints.Datatypes like strings,integers or list of strings are predefined symbols,i.e.,they are given by implemented components.A corresponding button is given like for common primitives.For every choosen datatype the predefined default value can be changed by the user.Similar to changeable properties of primitives,also datatype specific visual properties can be changed,e.g.,color or font for strings.For each symbol a unique name must be defined which is interpreted as the symbol type.The symbol graphic,i.e.,primitives and constraints,defines the default layout which is used for instances.The same is true wrt.datatype symbols where the default layout is given by the default value and visual properties.Connection Editor Connections between symbols can be defined using the connec-tion editor.On the logical level each connection is represented by a unary operation (connection type)and a set of constraints according to the layout level.I.e.,for every connection a source and target symbol must be selected from the structure view,and suitable constraints must be defined using the same constraint menu as it is available within the symbol editor.Connections can be defined under consideration of semanti-cal aspects of a VL.Consider e.g.the begin and end connection between an association and a class symbol.For both connections,i.e.,operations according to the logical level, defining the association symbol as the source and the class as the target,requires already the presence of classes when an association is to be inserted.This fact results from the underlying formalism of attributed graph structures[BTMS99].Test Editor In order to test the behaviour of defined constraints a pure constraint-based test editor is available.Here instances are derived from the templates of the VL-alphabet when elements of the editor’s structure view are selected.Then,the corresponding sym-bol graphics are visualized in the drawing area.Connections between symbols can be established under consideration of the corresponding types.The connection constraints are solved automatically.High-Level-Constraints Within the alphabet editor graphical constraints are used in-tensively,once to connect primitive graphics in order to build up symbol graphics and once to define the connections.The constraint solving is done by P AR C ON[Gri96], a constraint solver for so-called Low-Level-Constraints(LLCs).Such LLCs are not very user-friendly which is the reason that High-Level-Constraints(HLCs)are avail-able whose names and parameters are visualized in the constraint menu.For the declaration of HLCs we developed a textual definition language,which sup-ports mathematical functions,intervals,overloading of constraints,calling of“subcon-straints”,casting of types,etc.[Sch99].Internally,a HLC is mapped onto arbitrarily many LLCs processable by P AR C ON.A LLC can be one the following forms:normal linear formula,product formula,formula to set a point-to-point distance,or a formula to set the equality for two points.In every LLC we can use the parameters of a HLC as well as local variables.For instance,the following simple HLC sets a minimal length for a line and uses a local variable,a linear formula as well as an expression for a point-to-point distance:constraint MinLength(Line l,ConstValue minLength){ Value length;#local variablelength>=minLength;#linear formulal.s R=length*l.t;#distance between start and end point }There are already a lot of predefined HLCs stored within the HLC database which can be extended by own HLCs.Nevertheless,although the declaration of HLCs can be done only textually,the constraint menu allows the user to select and apply suitable HLCs.Consistency checks are also done by the constraint component of G EN GE D,i.e., some contradictionarily constraints can be detected.3Grammar and Diagram EditorThe grammar of a VL comprises a start diagram and a set of VL-rules.Each VL-rule consists of a left hand side(LHS),a right hand side(RHS)and an optional set of negative application conditions(NACs).NACs describe certain diagram constella-tions which must not exist when a rule is to be applied.Both,LHS and RHS,and also each NAC are itself diagrams.The elements of a diagram(symbols and connections) are instances of the corresponding templates in the VL-alphabet,i.e.,they are typed over the alphabet[BE98].The grammar editor is based on a given VL-alphabet,i.e.,it uses the defined sym-bols and connections to generate a set of edit commands,called alphabet rules.Alphabet rules allow to insert and delete symbols and connections in the corresponding diagram. Such diagrams are the start diagram as well as the LHS,RHS,and NACs of the aimed VL-rules.According to the alphabet editor,the names of the alphabet rules are illu-strated in the structure view of the grammar editor.They can be selected for visualizing them in the corresponding drawing area.A match must be defined between the alphabet rule’s LHS and the current diagram when a rule should be applied.Concretely,symbols can be selected for a match but not connections which are represented by constraints.They are mapped implicitely.It is possible to add and remove mappings to and from a match.During a transformationstep mainly the logical level of the diagram is taken into account and thereafter the layout level,where graphical constraints are solved.The transformation of the logical level is done by the integrated graph transformation machine A GG.After transforma-tion the correctness of the resulting diagram is checked.I.e.,because of the mapping from the diagram’s graph structure to a simple graph some side effects may occur af-ter transformation.Therefore,we have to make sure that for every symbol all outgoing connections with the appropriate target symbols exist.If the resulting diagram is not correct,the transformation is cancelled and the user gets a corresponding message.The layout level of the resulting transformation is built up accordingly to the logical level: For each added(deleted)element a graphical object,respectively constraint set,is added (deleted).After constraint solving via P AR C ON the layout level is viewed.Oncefinished the diagrams of a VL-rule(LHS,RHS,NACs),the mappings between their symbols can be defined in the same way as matches for rule application.I.e.,it is possible to add and remove mappings to and from the rule,even to complete the rule or NAC mappings in order to make them total.Datatypes and Rule Parameters Datatypes are predefined symbols which are defined through algebraic operations and constants.A list of string datatype,for example,can be defined by the empty list as a constant and operations like first or add.Default datatype values set in the symbol editor are constants.When defining a VL-rule these constants can be changed by variables or complex expressions.In order to make the treatment of datatypes as easy and extendable as possible we use Java objects and Java expressions as available in the attribute component of A GG[Mel99].The only extension to the A GG concepts lies in the graphical represen-tation,so datatype objects are represented by graphics,not only by textual expressions. Datatype objects(constants,variables or complex expressions)have to be declared us-ing a corresponding menu.On the LHS of a VL-rule only variables may occur as at-tributes of a symbol.When a rule is applied the variables are matched implicitely,and the values are transformed in a way given by the expression on the RHS.VL-rules may have additionally rule parameters for datatype values which can be defined textually using a corresponding menu.A rule parameter is–similar to a variable –given by a name and a type like x:String.Such parameters may express already some semantical aspects of a VL.In the case of class diagrams this may be classes, where each one must have a certain user defined class name.This fact can be expressed by a rule parameter.Whenever the VL-rule allowing to insert a class is applied,the user is forced to define a class name.Diagram Editor The user-defined VL-grammar is the input of the aimed diagram editor allowing the syntax-directed manipulation of diagrams.The edit commands are the VL-rules.In order to apply a VL-rule,a rule name appearing in the structure view must be selected.Matching is then done by successively asking the user to provide the symbols corresponding to the symbols occuring in the rule’s LHS.When all symbols are mapped,the match is completed by connection mappings,and the transformation takes place(including the check for correctness).Whenever a VL-rule comprises rule parameters,the user is successively asked to define suitable values for these parameters.These values are(graphically)placed after transformation under consideration of the defined connection constraint.4ConclusionIn this paper we introduced the G EN GE D environment supporting the syntax definition of VLs.This definition is twofold:it consists of an alphabet and a grammar definition supported by corresponding editors.The user-defined grammar is the input of a syntax-directed diagram editor whose edit commands are the VL-rules.G EN GE D can be used to define several kinds of diagrammatic VLs because it allows any kind of connections. Furthermore,G EN GE D offers a hybrid language allowing the user to annotate common graphical symbols with datatype symbols,and to connect graphics by constraints.In this framework,VLs are represented by algebraic graph grammars.The manipu-lation of diagrams is done by the integrated graph transformation system A GG[TER99] together with the constraint solver P AR C ON[Gri96].The current implementation of G EN GE D is done using Java1.1.,so is the graph transformation system A GG.P AR C ON is used as a server.It is implemented in Objective C.The implementation of G EN GE D as well as some case studies are described in[Sch99](alphabet and test editor)and in[Nie99](grammar and diagram editor).G EN GE D is available at http://tfs.cs.tu-berlin.de/genged.We have to mention that the current implementation is not complete concerning several desirable features.E.g.,parsing is not provided as well as code generation.With respect to industrial relevance,both features are important as well as the connection of several generated diagram editors.These are the main topics for the future.References[BE98]R.Bardohl and H.Ehrig.Conceptual Model of the Graphical Editor G EN GE D.In Proc.Theory and Application of Graph Transformation,Paderborn,Germany,November1998.[BTMS99]R.Bardohl,G.Taentzer,M.Minas,and A.Sch¨u rr.Application of Graph Transfor-mation to Visual Languages.In[Roz99].1999.[CODL95]G.Costagliola,S.Orefice,and A.De Lucia.Automatic Generation of Visual Pro-gramming Environments.IEEE Computer,28(3):56–66,March1995.[Gri96]P.Griebel.ParCon-Paralleles L¨o sen von grafischen Constraints.PhD thesis,Pader-born University,February1996.[Mel99] B.Melamed.Grundkonzeption und-implementierung einer Attributkomponente f¨u r ein Graphtransformationssystem.Diploma Thesis,Computer Science,TU Berlin,1999.[MV95]M.Minas and G.Viehstaedt.DiaGen:A Generator for Diagram Editors providing Direct Manipulation and execution of Diagrams.In Proc.Symp.on Visual Languages,Darmstadt,Germany,September,5-91995.IEEE Computer Society Press.[Nie99]M.Niemann.Konzeption und Implementierung eines generischen Grammatikeditors f¨u r visuelle Sprachen.Diploma Thesis,Computer Science,TU Berlin,1999.[Roz99]G.Rozenberg,editor.Handbook of Graph Grammars and Computing by Graph Transformations,Volume2:Applications.World Scientific Publishing,1999.[Sch98]J.Schiffer.Visuelle Programmiersprachen.Addison-Wesley,1998.[Sch99]M.Schwarze.Konzeption und Implementierung eines generischen Alphabeteditors f¨u r visuelle Sprachen.Diploma Thesis,Computer Science,TU Berlin,1999. [TER99]G.Taentzer,C.Ermel,and M.Rudolf.The AGG Approach:Language and Tool Environment.In[Roz99].1999.[VLxx]Proc.IEEE Symposium/Workshop on Visual Languages,Los Alamitos,CA,19xx.IEEE Computer Society Press.。
REALIZABLE MIMO DECISION FEEDBACK EQUALIZERSClaes Tidestav,Anders Ahl´e n and Mikael SternadSignals and Systems,Uppsala University,PO Box528,SE-75120Uppsala,Sweden.URL:http://www.signal.uu.se/Publications/abstracts/c991.htmlABSTRACTA general MMSE decision feedback equalizer(DFE)for multiple input-multiple output channels is presented.It is derived under the constraint of realizability,requiringfinite smoothing lag and causal filters both in the forward and feedback link.The proposed DFE, which provides an optimal structure as well as optimalfilter de-grees,is obtained as an explicit solution in terms of the channel and noise description.A generalization of the scalar zero-forcing DFE is also presented.Conditions for the existence of such an equal-izer are discussed.We argue that when no zero-forcing equalizer exists,poor near-far resistance of the corresponding optimal mini-mum mean square error DFE can be expected.A numerical exam-ple indicates the potential advantages of using a decision feedback equalizer with appropriate structure.The main improvements are obtained at moderate to high signal-to-noise ratios.1.INTRODUCTIONDuring the last years,communication channels with several inputs and outputs have received increasing interest.For example,such channels occur in cellular systems with antenna arrays at the re-ceiver.A detector employed for this purpose must be able to can-cel the effect of the dispersive channel as well as the interfering cross-talk.Multivariable generalizations of equalizers are poten-tial candidates for such a detector.One equalizer which has good performance at moderate com-plexity is the decision feedback equalizer(DFE).Traditionally, DFE derivations either results in a DFE with non-causal feedfor-wardfilter[1],or a DFE where the structure isfixed prior to the design[2].In thefirst case,the DFE cannot be accurately real-ized,whereas in the second case,the structure of the DFE may be inappropriate for the channel,leading to suboptimal performance.In this paper we shall use a general multiple input-multiple output(MIMO)decision feedback equalizer(GDFE)for the de-tection.In contrast to a conventional MIMO DFE,the GDFE al-lows for IIRfilters in both the feedforward and the feedback link. We derive closed form expressions for the parameters of a GDFE which minimizes the mean square error under the constraint of re-alizability,i.e.under the constraint offinite decision delay and causalfilters.We also derive a general condition for the existence of a zero-forcing MIMO DFE.It turns out that the existence of a zero-forcing DFE can be used as an indication of near-far resis-tance of the corresponding minimum mean square DFE.In a numerical example,we indicate the potential performance improvement of the general DFE as compared to a DFE withfixed structure.For high SNR,the improvement is significant.2.MULTIV ARIABLE CHANNEL MODELS2.1.PrerequisitesA communication channel having multiple inputs and multiple out-puts will be called a multivariable channel or a multiple input-multiple output(MIMO)channel.Multivariable channels will be described by rational matrices,i.e.matrices whose elements are causal rational functions in the unit delay operator.In addition,we assume that the channels are time-invariant and stable.In some cases,the denominators of all the matrix elements will be constants,rather than polynomials.In these cases,the multivariable channel can be described by a polynomial matrix.For any polynomial matrix,we also define(1) where represents the forward shift operator.The degree of a polynomial matrix equals the highest degree of any of its elements,and is denoted.2.2.Detection using antenna arraysIn cellular systems,multi-element antennas,also known as an-tenna arrays,are frequently used to reject interference and to re-duce the effect of fading and noise[3].The presence of an antenna array at the receiver leads to a channel model with multiple out-puts,one for each antenna element.When several users transmit simultaneously,we can explicitly incorporate multiple users into the model.Assuming thatis the symbol sequence transmitted from user,is the sig-nal received at antenna,and represents the additive noise received at the same antenna,we define(2a)(2b)(2c) We also denote the scalar channel from transmitter to receiver antenna as and define the rational matrix.........(3)Using(2a),(2b),(2c)and(3),we can now express the signal re-ceived at the antenna array by the MIMO model(4)Winters[4]used this model to design a detector,which simultane-ously detects the signals from all users.The channel model(4)can also be used to describe fractionally spaced sampling[5]in which a scalar received signal is sampled several times during a symbol period.2.3.Multiuser detection in DS-CDMAIn direct sequence code division multiple access(DS-CDMA)sys-tems,the different users will interfere with each other,since the spreading sequences used to multiplex the signals cannot be cho-sen completely orthogonal,The magnitude of this multiple access interference(MAI)is determined by the cross-correlation between the spreading sequences of the different users.The presence of MAI implies that we can formulate a multivariable model to de-scribe this scenario[6]:(5) In(5),the matrices contain partial cross-correlations be-tween the spreading sequences:where outside;is the propaga-tion delay of user with and denoting the symbol period and complex conjugate,respectively.The elements of the diagonal matrix represent the energies and phase shifts of the respective users.The output of the channel model is the sampled out-put from the correlators used to despread the received signal,and constitutes noise.Obviously this model can be expressed by means of(4).Design of multiuser detectors for DS-CDMA based on the model(5)has been studied extensively over the last decade,see e.g.[7].3.PROBLEM STATEMENTConsider the received sequence of measurement vectors.As-sume that each vector can be described as a sum of the output from a dispersive,multivariable channel and a multivariate noise term as depicted in Fig.1.Both the channel and the noise modelFigure1:The multivariable system model.are parameterized by left matrix fraction descriptions(MFD:s):(6) The polynomial matrix has rows and columns, whereas,and are square polynomial matrices of dimension.These three matrices are assumed to be stably invertible,i.e.the roots of,and all lie inside the unit circle.We assume that the leading matrix coefficient ofis non-singular and that the denominator matrices and are assumed monic.To simplify the presentation and the design equations,and are also assumed to be diagonal.The polynomial elements in the matrices may have complex coefficients,and are assumed to be correctly estimated.Each element in the vector is taken from afinite set of values,the so-called symbol alphabet.Each is considered to be a stochastic variable with zero mean,which is uncorrelated with the disturbance vector.Finally,we assume that the transmit-ted symbol vectors are white with covariance matrix(7)The noise vector in(6)has elements.It is a possi-bly complex-valued,white stochastic process with zero mean and covariance matrix(8) For future reference,we also defineFigure2:The general IIR decision feedback equalizer(GDFE).It is important to note that the GDFE(10)is required to be re-alizable.This constraint implies that the smoothing lag must be finite,and thefilters must be causal and stable.In[5],it is illus-trated what happens when the constraint of realizability is relaxed.Given the received sequence of symbol vectors and the channel model(6),we want tofind the stable and causal rational matrices which minimize the estimation error covariance matrix1(11)where the estimation error is defined as(12)We are also interested infinding the conditions under which a zero-forcing solution to the equalization problem exists.While a scalar zero-forcing equalizer removes all intersymbol interference from the symbol estimate,a natural extension to the multivariable case would be to require that both the intersymbol interference and the co-channel interference can be removed[8].A multivariable zero forcing(ZF)equalizer can then be defined accordingly:Definition1Consider the channel model(6)and a multivariable equalizer which forms the estimate of a transmitted symbol vector.If(13)where is uncorrelated with all transmitted symbol vectors ,then the equalizer is said to be zero-forcing.To obtain a closed form expression for the parameters of the optimal GDFE,we adopt the usual assumption that all decisions affecting the current estimate are correct.4.OPTIMUM GENERAL DECISION FEEDBACKEQUALIZERS4.1.The optimum MMSE GDFEIntroduce the polynomial matrices(14a)(14b)and define and by the coprime factorization(15)where constitutes an irreducible MFD.The general MMSE DFE is then given by the following theo-rem:Theorem1Assume that a multivariable channel can be described by(6),and that the transmitted data and the noise are described by(7)and(8)respectively.Assuming correct past decisions,the general multivariable DFE(10)minimizes the estimation error co-variance matrix(11)if and only if(16a)(16b) Above,and,together with the polynomial matrices and satisfy the two coupled polynomial matrix equations(17a)(17b) where the degrees of the unknown polynomials satisfy(18)Proof:See[5].Equation(19)may have several solutions.However,in some cases no solution to(19)will exist.The precise condition for this is stated in Lemma1.Lemma1There exists a solution to(19)if and only if every com-mon right factor2of and is also a right factor of .Proof:The results follows from the general theory of Dio-phantine equations.See[9].If we definethe minimal delay of user in any channel.(20) then the second corollary can be stated as follows: Corollary2If for some,and will always have a common right factor which is not a right factor of.Proof:See[5].2The right factors should be members of the ring of stable and causal rational matrices[9].101010101010B E Rof the conventional DFE (dashed)for correct decisions and real decisions.5.A NUMERICAL EXAMPLETo illustrate the potential performance improvements of the GDFE,we have performed a Monte Carlo simulation.Consider the two input-two output FIR channel0.979+0.2040.826+0.563-0.843-0.5380.403+0.915Over this channel,we transmit two BPSK modulated signals,i.e..At the receiver,noise is added.Thenoise is Gaussian and can be described by the MA model-0.481-0.148-0.629+0.592-0.53+0.2650.795-0.132This noise model has zeros in 0.2980.321,and repre-sents rather weakly colored noise.We compare the performanceof two DFE:s with smoothing lag:The GDFE,designed using Theorem 1.The fixed structure,or conventional,DFE described in [2]with feedforward filter degreeand feedback filter degree .3In Fig.3,the bit error rate (BER)is shown as a function of the signal-to-noise ratio (SNR)of a single user.The SNR:s of the two users are identical.The simulations are performed using either correct decisions or decisions from the decision device.From Fig.3,we conclude that it is clearly advantageous to take the noise model into account.The GDFE does this in an optimal way,whereas the conventional DFE does not.The performance of the two DFE:s are almost identical for low SNR,but for high SNR,the difference is significant.From Fig.3,we also note that with real decisions,the per-formance of both DFE:s worsen,but that the difference in perfor-mance between the two DFE:s remains,except at low SNR:s.。
1) What is the difference between a compiler and an interpreter?∙ A compiler is a program that can read a program in one language - the source language - and translate it into an equivalent program in another language – the target language and report any errors in the source program that it detects during the translation process.∙ Interpreter directly executes the operations specified in the source program on inputs supplied by the user.2) What are the advantages of:(a) a compiler over an interpretera. The machine-language target program produced by a compiler is usually much faster than an interpreter at mapping inputs to outputs.(b) an interpreter over a compiler?b. An interpreter can usually give better error diagnostics than a compiler, because it executes the source program statement by statement.3) What advantages are there to a language-processing system in which the compiler produces assembly language rather than machine language?The compiler may produce an assembly-language program as its output, becauseassembly language is easier to produce as output and is easier to debug.4.2.3 Design grammars for the following languages:a) The set of all strings of 0s and 1s such that every 0 is immediately followed by at least 1.S -> SS | 1 | 01 | ε4.3.1 The following is a grammar for the regular expressions over symbols a and b only, using + in place of | for unions, to avoid conflict with the use of vertical bar as meta-symbol in grammars:rexpr -> rexpr + rterm | rtermrterm -> rterm rfactor | rfactorrfactor -> rfactor * | rprimaryrprimary -> a | ba) Left factor this grammar.rexpr -> rexpr + rterm | rtermrterm -> rterm rfactor | rfactorrfactor -> rfactor * | rprimaryrprimary -> a | bb) Does left factoring make the grammar suitable for top-down parsing?No, left recursion is still in the grammar.c) In addition to left factoring, eliminate left recursion from the original grammar.rexpr -> rterm rexpr‟rexpr‟ -> + rterm rexpr | εrterm -> rfactor rterm‟rterm‟ -> rfactor rterm | εrfactor -> rprimary rfactor‟rfactor‟ -> * rfactor‟ | εrprimary -> a | bd) Is the resulting grammar suitable for top-down parsing?Yes.Exercise 4.4.1 For each of the following grammars, derive predictive parsers and show the parsing tables. You may left-factor and/or eliminate left-recursion from your grammars first.A predictive parser may be derived by recursive decent or by the table driven approach. Either way you must also show the predictive parse table.a) The grammar of exercise 4.2.2(a).4.2.2 a) S -> 0S1 | 01This grammar has no left recursion. It could possibly benefit from left factoring. Here is the recursive decent PP code.s() {match(…0‟);if (lookahead == …0‟)s();match(…1‟);}OrLeft factoring the grammar first:S -> 0S‟S‟ -> S1 | 1s() {match(…0‟); s‟();}s‟() {if (lookahead == …0‟)s(); match(…1‟);elsematch(…1‟);}Now we will build the PP tableS -> 0S‟S‟ -> S1 | 1First(S) = {0}First(S‟) = {0, 1}Follow(S) = {1, $}The predictive parsing algorithm on page 227 (fig4.19 and 4.20) can use this table for non-recursive predictive parsing.b) The grammar of exercise 4.2.2(b).4.2.2 b) S -> +SS | *SS | a with string +*aaa.Left factoring does not apply and there is no left recursion to remove.s() {if(lookahead == …+‟)match(…+‟); s(); s();else if(lookahead == …*‟)match(…*‟); s(); s();else if(lookahead == …a‟)match(…a‟);elsereport(“syntax error”);}First(S) = {+, *, a}Follow(S) = {$, +, *, a}The predictive parsing algorithm on page 227 (fig4.19 and 4.20) can use this table for non-recursive predictive parsing.5.1.1 a, b, c: Investigating GraphViz as a solution to presenting trees5.1.2: Extend the SDD of Fig. 5.4 to handle expressions as in Fig. 5.1:1.L -> E N1.L.val = E.syn2. E -> F E'1. E.syn = E'.syn2.E'.inh = F.val3.E' -> + T Esubone'1.Esubone'.inh = E'.inh + T.syn2.E'.syn = Esubone'.syn4.T -> F T'1.T'.inh = F.val2.T.syn = T'.syn5.T' -> * F Tsubone'1.Tsubone'.inh = T'.inh * F.val2.T'.syn = Tsubone'.syn6.T' -> epsilon1.T'.syn = T'.inh7.E' -> epsilon1.E'.syn = E'.inh8. F -> digit1. F.val = digit.lexval9. F -> ( E )1. F.val = E.syn10.E -> T1. E.syn = T.syn5.1.3 a, b, c: Investigating GraphViz as a solution to presenting trees5.2.1: What are all the topological sorts for the dependency graph of Fig. 5.7?1.1, 2, 3, 4, 5, 6, 7, 8, 92.1, 2, 3, 5, 4, 6, 7, 8, 93.1, 2, 4, 3, 5, 6, 7, 8, 94.1, 3, 2, 4, 5, 6, 7, 8, 95.1, 3, 2, 5, 4, 6, 7, 8, 96.1, 3, 5, 2, 4, 6, 7, 8, 97.2, 1, 3, 4, 5, 6, 7, 8, 98.2, 1, 3, 5, 4, 6, 7, 8, 99.2, 1, 4, 3, 5, 6, 7, 8, 910.2, 4, 1, 3, 5, 6, 7, 8, 95.2.2 a, b: Investigating GraphViz as a solution to presenting trees5.2.3: Suppose that we have a production A -> BCD. Each of the four nonterminals A, B, C, and D have two attributes: s is a synthesized attribute, and i is an inherited attribute. For each of the sets of rules below, tell whether (1) the rules are consistent with an S-attributed definition (2) the rules are consistent with an L-attributed definition, and (3) whether the rules are consistent with any evaluation order at all?a) A.s = B.i + C.s1.No--contains inherited attribute2.Yes--"From above or from the left"3.Yes--L-attributed so no cyclesb) A.s = B.i + C.s and D.i = A.i + B.s1.No--contains inherited attributes2.Yes--"From above or from the left"3.Yes--L-attributed so no cyclesc) A.s = B.s + D.s1.Yes--all attributes synthesized2.Yes--all attributes synthesized3.Yes--S- and L-attributed, so no cyclesd)∙ A.s = D.i∙ B.i = A.s + C.s∙ C.i = B.s∙ D.i = B.i + C.i1.No--contains inherited attributes2.No--B.i uses A.s, which depends on D.i, which depends on B.i (cycle)3.No--Cycle implies no topological sorts (evaluation orders) using the rules5.3.1: Below is a grammar for expressions involving operator + and integer or floating-point operands. Floating-point numbers are distinguished by having a decimal point.1. E -> E + T | T2.T -> num . num | numa) Give an SDD to determine the type of each term T and expression E.1. E -> Esubone + T1. E.type = if (E.type == float || T.type == float) { E.type = float } else{ E.type = integer }2. E -> T1. E.type = T.type3.T -> numsubone . numsubtwo1.T.type = float4.T -> num1.T.type = integerb) Extend your SDD of (a) to translate expressions into postfix notation. Use the binary operator intToFloat to turn an integer into an equivalent float.Note: I use character ',' to separate floating point numbers in the resulting postfix notation. Also, the symbol "||" implies concatenation.1. E -> Esubone + T1. E.val = Esubone.val || ',' || T.val || '+'2. E -> T1. E.val = T.val3.T -> numsubone . numsubtwo1.T.val = numsubone.val || '.' || numsubtwo.val4.T -> num1.T.val = intToFloat(num.val)5.3.2 Give an SDD to translate infix expressions with + and * into equivalent expressions without redundant parenthesis. For example, since both operators associate from the left, and * takes precedence over +, ((a*(b+c))*(d)) translates into a*(b+c)*d. Note: symbol "||" implies concatenation.1.S -> E1. E.iop = nil2.S.equation = E.equation2. E -> Esubone + T1.Esubone.iop = E.iop2.T.iop = E.iop3. E.equation = Esubone.equation || '+' || T.equation4. E.sop = '+'3. E -> T1.T.iop = E.iop2. E.equation = T.equation3. E.sop = T.sop4.T -> Tsubone * F1.Tsubone.iop = '*'2. F.iop = '*'3.T.equation = Tsubone.equation || '*' || F.equation4.T.sop = '*'5.T -> F1. F.iop = T.iop2.T.equation = F.equation3.T.sop = F.sop6. F -> char1. F.equation = char.lexval2. F.sop = nil7. F -> ( E )1.if (F.iop == '*' && E.sop == '+') { F.equation = '(' || E.equation || ')' }else { F.equation = E.equation }2. F.sop = nil5.3.3: Give an SDD to differentiate expressions such as x * (3*x + x * x) involving the operators + and *, the variable x, and constants. Assume that no simplification occurs, so that, for example, 3*x will be translated into 3*1 + 0*x. Note: symbol "||" implies concatenation. Also, differentiation(x*y) = (x * differentiation(y) + differentiation(x) * y) and differentiation(x+y) = differentiation(x) + differentiation(y).1.S -> E1.S.d = E.d2. E -> T1. E.d = T.d2. E.val = T.val3.T -> F1.T.d = F.d2.T.val = F.val4.T -> Tsubone * F1.T.d = '(' || Tsubone.val || ") * (" || F.d || ") + (" || Tsubone.d || ") * (" ||F.val || ')'2.T.val = Tsubone.val || '*' || F.val5. E -> Esubone + T1. E.d = '(' || Esubone.d || ") + (" || T.d || ')'2. E.val = Esubone.val || '+' || T.val6. F -> ( E )1. F.d = E.d2. F.val = '(' || E.val || ')'7. F -> char1. F.d = 12. F.val = char.lexval8. F -> constant1. F.d = 02. F.val = constant.lexval。
Mining Decision Trees from Data Streams in a Mobile EnvironmentHillol Kargupta and Byung-Hoon ParkDepartment of Computer Science and Electrical Engineering1000Hillltop Circle,University of Maryland Baltimore CountyBaltimore,MD21250,USAhillol,park1@AbstractThis paper presents a novel Fourier analysis-based tech-nique to aggregate,communicate,and visualize decision trees in a mobile environment.Fourier representation of a decision tree has several useful properties that are partic-ularly useful for mining continuous data streams from small mobile computing devices.This paper presents algorithms to compute the Fourier spectrum of a decision tree and the vice versa.It offers a framework to aggregate decision trees in their Fourier representations.It also describes a touch-pad/ticker-based approach to visualize decision trees using their Fourier spectrum and an implementation for PDAs.1.IntroductionAnalyzing and monitoring time-critical data streams us-ing mobile devices in a ubiquitous manner is important for many applications infinance,defense,process control and other domains.These applications demand the ability to quickly analyze large amount of data.Decision trees(e.g., CART[4],ID3[21],and C4.5[22])are fast,and scalable. So decision tree-based data mining is a natural candidate for monitoring data streams from ubiquitous devices like PDAs,palmtops,and wearable computers.However,there are several problems.Mining time-critical data streams usually requires on-line learning that often produces a series of models[6,8,15, 23]like decision trees.From a data mining perspective it is important that these models are properly aggregated.This is because different data blocks observed at different time frames may generate different models that may actually be-long to a single model which can be generated when all the data blocks are combined and mined together.Even for de-cision trees[5,24]that are capable of incrementally modi-fying themselves based on new data,in many applications Patent application pending (e.g.,multiple data streams observed at different distributed locations)we end up with an ensemble of trees.Apart from better understanding of the model,communication of large number of trees over a wireless network also poses a major problem.We need on-line data mining algorithms that can easily aggregate and evolve models in an efficient represen-tation.Visualization of decision trees[1]in a small display is also a challenging task.Presenting a decision tree with even a moderate number of features in a small display screen is not easy.Since the number of nodes in a decision tree may grow exponentially with respect to the number of features defining the domain,drawing even a small tree in the dis-play area of a palmtop device or a cell phone is a difficult thing to do.Reading an email in a cell phone is sometimes annoying;so imagine browsing over a large number of tree-diagrams in a small screen.It simply does not work.We need an alternate approach.We need to represent trees in such a way that they can be easily and intuitively presented to the user using a small mobile device.This paper takes a small step toward that possibility.It considers manipulation and visualization of decision trees for mining data streams from small computing devices.It points out that Fourier basis offers an interesting representa-tion of decision trees that can facilitate quick aggregation of a large number of decision trees[9,18]and their visualiza-tion in a small screen using a novel“decision tree-ticker”. The efficient representation of decision trees in Fourier rep-resentation also allows quicker communication of tree en-sembles over low-bandwidth wireless networks.Although we present the material in the context of mobile devices,the approach is also useful for desktop applications.Section2explains the relation between Fourier repre-sentation and decision trees.It also presents an algorithm to compute the Fourier spectrum of a decision tree.Section 3considers aggregation of multiple trees in Fourier repre-sentation.Section4presents a decision tree visualization technique using a touch-pad and a ticker.Section5de-scribes an application of this technology for mining stockdata streams.Finally,Section6concludes this paper.2.Decision Trees as Numeric FunctionsThis paper adopts an algebraic perspective of decision trees.Note that a decision tree is a function that maps the domain members to a range of class labels.Sometimes,it is a symbolic function where features take symbolic(non-numeric)values.However,a symbolic function can be eas-ily converted to a numeric function by simply replacing the symbols with numeric values in a consistent manner.See Figure1for an example.A numeric function-representation of a decision tree may be quite useful.For example,we may be able to aggregate a collection of trees(often produced by ensemble learning techniques)by simply performing ba-sic arithmetic operations(e.g.adding two decision trees, weighted average)in their numeric ter in this paper we will see that a numeric representation is also suitable for visualizing and aggregating decision trees.Once the tree is converted to a numeric discrete function, we can also apply any appropriate analytical transformation that we want.Fourier transformation is one such possibility and it is an interesting one.Fourier basis offers an additively decomposable representation of a function.In other words, the Fourier representation of a function is a weighted linear combination of the Fourier basis functions.The weights are called Fourier coefficients.The coefficients completely de-fine the representation.Each coefficient is associated with a Fourier basis function that depends on a certain subset of features defining the domain of the data set to be mined. The following section presents a brief review of Fourier rep-resentation.2.1.A Brief Review of the Fourier BasisFourier bases are orthogonal functions that can be used to represent any function.Consider the function space over the set of all-bit Boolean feature vectors.The Fourier ba-sis set that spans this space is comprised of Fourier ba-sis functions;for the time being let us consider only dis-crete Boolean Fourier basis.Each Fourier basis function is defined as.Where and are bi-nary strings of length.In other words,and;denotes the inner product of and.can either be equal to1 or-1.The string is called a partition.The order of a partition is the number of1-s in.A Fourier basis func-tion depends on some only when.Therefore,a partition can also be viewed as a representation of a cer-tain subset of-s;every unique partition corresponds to a unique subset of-s.If a partition has exactly number of1-s then we say the partition is of order since the corre-sponding Fourier function depends on only those number of features corresponding to the1-s in the partition.A function,that maps an-dimensional space of binary strings to a real-valued range,can be written using the Fourier basis functions:;where is the Fourier coefficient corresponding to the partition;.The Fourier coefficient can be viewed as the relative contribution of the partition to the function value of.Therefore,the absolute value of can be used as the“significance”of the corresponding parti-tion.If the magnitude of some is very small compared to other coefficients then we may consider the-th partition to be insignificant and neglect its contribution.2.2.Fourier Analysis with Non-binary FeaturesThe Fourier basis can be easily extended to the non-Boolean case.Consider a domain defined by possibly non-Boolean features where the-th feature can take dis-tinct values.Let.The generalized Fourier basis function over an-ary feature space is defined as,(1) When we can write,.Where and are-ary strings of length.The Fourier coefficients for non-Boolean domain can be defined as follows:(2)where is the complex conjugate of.Since a decision tree is a function defined over a discrete space(inherently discrete or some discretization of a contin-uous space)we can compute its Fourier transformation.We shall discuss the techniques to compute the Fourier spec-trum of a decision tree later.It turns out that the Fourier representation of a decision tree with bounded depth has some very interesting properties[12,14].These observa-tions are discussed in the following section.2.3.Fourier Spectrum of a Decision Tree:Whybother?For almost all practical applications decision trees have bounded depths.The Fourier spectrum of a bounded depth decision tree has some interesting properties.1.The Fourier representation of a bounded depth(say)Boolean decision tree only has a polynomial number of non-zero coefficients;all coefficients corresponding to partitions involving more than feature variables are zero.The proof is relatively straight forward.Figure andthat simply2.If the order of a partition be its number of defining fea-tures then the magnitude of the Fourier coefficients de-cay exponentially with the order of the corresponding partition;in other words low order coefficients are ex-ponentially more significant than the higher order co-efficients.This was proved in[14]for Boolean deci-sion trees.Its counterpart for trees with non-boolean features can be found elsewhere[18].These observations suggest that the spectrum of the de-cision tree can be approximated by computing only a small number of low-order1coefficients.So Fourier basis offers an efficient numeric representation of a decision tree in the form of an algebraic function that can be easily stored,com-municated,and manipulated.2.4.From Fourier Coefficients to a Decision TreeFourier coefficients have important physical meanings. Recall that every coefficient is associated with a Fourier ba-sis function.Any given basis function depends on a unique subset of features defining the domain.For an-bit Boolean domain there are unique feature subsets.There are also different Fourier basis functions and each of them is asso-ciated with a unique subset.The magnitude of a coefficient 1Order of a coefficient is the number of features defining the corre-sponding partition.Low-order coefficients are the ones for which the or-ders of the partitions are relatively small represents the“strength”of the contribution of the corre-sponding subset of features to the overall function value. So if the magnitude of a coefficient is relatively large thenthe corresponding features together have strong influenceon the function value.For example,consider a linear func-tion of the form.All terms in this func-tion are linear;so the features together(in a multiplicative sense)do not make any contribution to the function value.This linearity is also reflected in its Fourier spectrum.Itis easy to show that all Fourier coefficients of this function corresponding to basis functions that depend on more thanone variable are zero.This connection between the struc-ture of the function and its spectrum is a general property of the Fourier basis.The magnitudes of the Fourier coef-ficients expose the underlying structure of the function by identifying the dependencies and correlation among differ-ent features.However,Fourier spectrum of a decision tree tells usmore than the interactions among the features.This coef-ficients can also tell us about the distribution of class labelsat any node of the decision tree.Recall that any node in the tree is associated with some feature.A downward linkfrom the node is labeled with an attribute value of this -th feature.As a result,a path from the root node to a suc-cessor node represents the subset of domain that satisfiesthe different feature values labeled along the path.Thesesubsets are essentially similarity-based equivalence classes. In this paper we shall call them schemata(schema in sin-gular form).If is a schema in a Boolean domain,then,where denotes a wild-card that matchesany value of the corresponding feature.For example,the path in Figure1(Bottom)represents the schema,since all members of the data subset at the final node of this path take feature values and for and respectively.The distribution of class labels in a schema is an impor-tant aspect since that identifies the utility of the schema as a decision rule.For example,if all the members of some schema has a class label value of1then can be used as an effective decision rule.On the other hand,if the propor-tion of label values1and0is almost equal then cannot be used as an effective decision rule.In decision tree learning algorithms like ID3and C4.5this“skewedness”in the dis-tribution for a given schema is measured by computing the information-gain defined in the following.where is the set of all possible values for at-tribute,and is the set of those members of that have a value for attribute;if and are the propor-tions of the positive and negative instances in some,then.The gain computation clearly depends on the proportion of class labels in a schema which can be determined directly from the Fourier spectrum of the tree.For example consider a problem with Boolean class labels.The total num-ber of members of schema with class labels1,(3)where,if*ifififLet be the total number of features that are set to spe-cific values in order to define the schema and be the total number of features defining the entire domain.So the total number of members covered by the schema is .The total number of members in with class label ,.Clearly,we can compute this distribu-tion information for any given schema using the spectrum of the tree.In other words,we can construct the tree using the Fourier spectrum of the information gain.The following section presents a fast technique for computing the Fourier spectrum of a decision tree.puting the Fourier Transform of a TreeThe Fourier spectrum of a decision tree can be easily computed by traversing the leaf nodes in a systematic fash-ion.The proposed algorithm is extremely fast and the over-head of computing these coefficients is minimal compared to the load for learning the tree.Let be the complete in-stance space.Let us also assume that there are leaf nodes in the decision tree and the-th leaf node covers a subset of ,denoted by.Therefore,.The-th coefficient of the spectrum can be computed as follows:Now note that both and take a constant value for every member in.Since the path to a leaf node represents a schema,with some abuse of symbols we can write,Where is a schema defined by a path to respectively. Further details about this algorithm can be found elsewhere [18].For each path from the root to a leaf node(schema h), all non-zero Fourier coefficient are detected by enumerating all possible value for each attribute in h.The running time of this algorithm is.3.Aggregating Multiple TreesThe Fourier spectrum of a decision tree-ensemble classi-fier can also be computed using the algorithm described in the previous section.Wefirst have to compute the Fourier spectrum of every tree and then aggregate them using the chosen scheme for constructing the ensemble.Let be an ensemble of different decision trees where the out-put is a weighted linear combination of the outputs of these trees.where and are decision tree and its weight re-spectively.is set of non-zero Fourier coefficients that are detected by decision tree and is a Fourier coefficient in.Now we can write,(4)where and.Therefore Fourier spectrum of(an ensemble clas-sifier)is simply the weighted spectrum of each tree.The spectrum of the ensemble can be directly useful for at least two immediate purposes:1.visualization of the ensemble through its Fourier spec-trum,2.minimizing the communication overhead,3.understanding the ensemble classifier through its spec-trum,and4.construction of simpler and smaller ensemble(possi-bly a single concise tree)from the aggregated spec-trum.The following section considers the visualization aspect. 4.Visualization of the Fourier RepresentationFourier representation offers a unique way to visualize decision trees.This section presents a ticker and touchpad-based approach to visualize decision trees that are espe-cially suitable for smaller displays like the ones that we getin palmtop devices.However,the approach is also suitablefor desktop applications.4.1.Fourier Spectrum-based Touch-pad and TickerFourier coefficients of a function provide us a lot of use-ful information.As we noted earlier,individual coefficients tell us about strongly mutually interacting features thatsignificantly contribute toward the overall function value.Moreover,different subsets of these coefficients can also provide us the distribution information(i.e.“information-gain”value used in ID3/C4.5)at any node.This section de-velops a novel approach for visualizing decision trees thatexploits both of these properties of the Fourier spectrum.The approach is based on two primary components:1.A touch-pad that presents a2D density plot of all thesignificant coefficients and2.a ticker that allows continuous monitoring ofinformation-gain produced by a set of classifiers con-structed by a certain group of features(possibly se-lected using the touch-pad).This is particularly im-portant for time-critical applications.Each of them is described in details next.4.2.The Touch-padThe touch-pad is a rectangular region that presentsa graphical interface to interact with the distribution of Fourier coefficients.Let us consider any arbitrary set ofcoefficients.These coefficients can be viewed as a graph where every node in is associated with a unique coefficient;is the set of edges where every edge betweennode and is associated with the“distance”() between their associated partitions.The distance metric() can be chosen in different ways.For example,if the domain is binary,partitions are also going to be binary strings.So we may choose hamming distance as the metric for defin-ing the graph.For discrete non-binary partitions we may chose any of the widely known distance metrics.The choice of distance metric does not fundamentally change the pro-posed approach.Any reasonable distance metric that cap-tures the distance between a pair of integer strings(i.e.par-titions)should workfine with the proposed visualization ap-proach.Once the graph is defined,our next task is to project it to a two dimensional space in such a way that “neighbors”in the original high dimensional space do re-main“neighbors”in the projected space.In other words, we would like to have near isometric projections where for all pairs and and constant value of;and are the projections of and respectively.This work makes use of existing off-the-shelf algorithms to con-struct this projection.It uses the widely known self organiz-ing feature map(SOM)[10,11,13]for this purpose.The SOM takes the Fourier coefficients and maps them to a two dimensional grid where coefficients with similar partitions are located in neighboring positions.Although,this par-ticular approach uses the SOM for the2-D projection,any other technique(possibly greedy algorithms)should work fine as long as the relative distance between the nodes are reasonably preserved.The touch-pad is also color coded in order to properly convey the distribution of significant and insignificant coefficients.Figure2shows a screen shot of the implementations of the touch-pad in a HP Jornada palmtop device.The touch-pad is also interactive.Since the display area is usually small our implementation of the touch-pad of-fers variable degrees of resolutions.The user can interac-tively zoom-in or zoom-out of a specific region.The user can alsofind out the subset of features defining a certain part of the touch-pad by interactively choosing a region from the screen.By marking a certain region of interest,the user will get a better understanding about interesting patterns among features residing and theirfiner and more detailed visual-ization.This interactive feature of the touch-pad also gives the user an opportunity to explore the dependencies among selected subset of features with the aid of a dynamically changing“ticker”.The following section describes this in detail.4.3.The TickerThe ticker is provided to allow continuous monitoring of specific classifiers produced by the decision trees.This is particularly important for time-critical applications.The ticker shows the“strengths”(measures like information-gain used frequently in the decision tree literature)for a subset of“good”decision rules produced by the trees.Fig-ure2shows screen shot of the ticker depicting the strengths of a set of classifiers at a particular time.As new data arrive the strengths are modified.It is inherently designed to complement the touch-pad-based approach to visualize the spectrum of the decision tree.Recall that the information-gain(i.e.difference in the entropies)at a particular node of a decision tree is computed by using the distribution of the domain members that it cov-ers.As we noted earlier,the information-gain can also be computed from the spectrum of the decision er’s se-lection of a certain region from the touch-pad identifies the current interest of the user to a certain subset of features. The user can invoke the ticker and instruct it to monitor all the decision rules that can be constructed using those fea-tures.The user can also interactively select the decisionFigure2.(Top)Fourier spectrum-based de-cision tree mining interface for HP Jornada.(Bottom)An enlarged view of the interface.The interface has three windows.The left-most window is the touch-pad.The bottomwindow shows the ticker.The right window is a small area to interactively construct and compute the accuracy of the classifiers.rules that are comprised of different features.The ticker in turn collects the relevant Fourier coefficients and computes the information-gain associated with all the decision rules that can be constructed using those features and shows only the good ones.5.Experiments with Financial Data StreamsThis section presents the experimental performance of the proposed aggregation approach for combining multiple decision trees.It presents results using two ensemble learn-ing techniques,namely Bagging and Arcing.The ensemble learning literature considers different ways to compute the output of the ensemble.Averaging the outputs of the individual models with uniform weight is probably the simplest possibility.Perrone and Cooper [19][17]refer to this method as Basic Ensemble Method (BEM)or naive Bagging.Breiman proposed an Arcing method Arc-fx[2][3]for mining from large data set and stream data.It is fundamentally based on the idea of Arc-ing–adaptive re-sampling by giving higher weights to those instances that are usually mis-classified.We consider both of these ensemble learning techniques to create an ensem-ble of decision trees from the data stream.These trees are however combined using the proposed Fourier spectrum-based approach which the regular ensemble learning tech-niques do not offer.We also performed experiments us-ing an AdaBoost-based approach suggested elsewhere[7]. However,we choose not to report that since its performance appears to be considerably inferior to those of Bagging and Arcing for the data set we use.Not all the models generated from different data blocks should always be aggregated together.Sometimes we may want to use a pruning algorithm[16,20]for selecting the right subset of models.Sometimes a“windowing scheme”[7,3]can be used where only most recent classifiers are used for learning and classification.Our experiments were performed using a semi-synthetic data stream with174Boolean attributes.The objective is to continuously evolve a decision tree-based predictive model for a Boolean attribute.The data-stream generator is es-sentially a C4.5decision tree learned from three years of S&P100and Nasdaq100stock quote data.The original data are pre-processed and transformed to discrete data by assigning discrete values for“increase”and“decrease”in stock quote between consecutive days(i.e.local gradient). Decision trees predict whether the Yahoo stock is likely to increase or decrease based on the attribute values of the stocks.We assume a non-stationary sampling strategy in order to generate the data.Every leaf in the decision tree-based data generator is associated with a certain probability value. Data samples are generated by choosing the leaves accord-ing to the assigned probability distribution.This distribu-tion is changed several times during a single experiment. We also add white noise to the generator.Test data set is comprised of instances.We implemented the naive Bagging and Arcing tech-niques and performed various tests over the validation data set as described below.Decision tree models are gener-ated from every data block collected from the stream and their spectrums are combined using the BEM and Arc-ing.We studied the accuracy of each model with var-ious sizes()of data block at each update.We use.We also studied the accuracy of each model generated using the“windowing”technique with various window sizes,.All the re-sults are measured over iterations where every iteration corresponds to a unique discrete time step.Figure3plots the classification accuracies of Bagging and Arcing with various data block sizes.Bagging con-verges rapidly with all different block sizes before or around 50iterations,while Arcing shows gradual increase through-out all iteration steps.Although we stopped at300itera-A c c u r a c y (%)UpdatesAccuracy of Bagging with Various Block Sizes vs UpdatesA c c u r a c y (%)UpdatesAccuracy of Arcing with Various Block Sizes vs UpdatesFigure 3.Accuracy vs.iterations with vari-ous data block sizes:(Top)Bagging (Bottom)Arcing.tions,the accuracy does not seem to reach its asymptotic value.Figure 4plots the classification accuracies of Bag-ging and Arcing with various window size.With relatively large window size (80and 100),we observed small decrease in accuracies in both cases.Figure 5plots the classification accuracies of Bagging and Arcing using different signifi-cant fractions of the Fourier spectrum.It shows that only a small fraction of all the coefficients in the spectrum is suffi-cient for accurate classification.Further details about these and additional experiments studying the complexity of the Fourier representations of decision trees can be found else-where [18].6.ConclusionThe emerging domain of wireless computing is alluding the possibility of making data mining ubiquitous.However,this new breed of applications is likely to have different ex-pectations and they will have to work with different system resource requirements.This will demand dramatic changes in the current desktop technology for data mining.This pa-per considered a small but important aspect of this issue.This paper presented a novel Fourier analysis-based ap-proach to enhance interaction with decision trees in a mo-bile environment.It observed that a decision tree is a func-5055606570758085050100150200250A c c u r a c y (%)UpdatesAccuracy of Bagging with Different Window Sizes vs UpdatesW=50W=80W=1005055606570758085050100150200250A c c u r a c y (%)UpdatesAccuracy of Arcing with Different Window Sizes vs UpdatesW=50W=80W=100Figure 4.Accuracy vs.iterations with vari-ous window sizes:(Top)Bagging (Bottom)Arcing.tion and its numeric functional representation in Fourier ba-sis has several utilities;the representation is efficient and easy to compute.It is also suitable for aggregation of multiple trees frequently generated by ensemble-based data stream mining techniques like boosting and bagging.This approach also offers a new way to visualize decision trees that is completely different from the traditional tree-based presentation used in most data mining software.The ap-proach presented here is particularly suitable for portable applications with small display areas.The touch-pad and ticker-based approach is very intuitive and can be easily implemented on touch sensitive screen often used in small wireless devices.We are currently extending the ticker in-terface,introducing different additional application func-tionalities,and exploring related techniques to mine real-valued financial data online.AcknowledgmentsThe authors acknowledge supports from the United States National Science Foundation CAREER award IIS-0093353and NASA (NRA)NAS2-37143.。
Using Features of Arden Syntax with Object-OrientedMedical Data Models for Guideline ModelingMor Peleg, Ph.D.1, Omolola Ogunyemi, Ph.D.2, Samson Tu, M.S. 1,Aziz A. Boxwala, M.B.B.S., Ph.D.2, Qing Zeng, Ph.D.2,Robert A. Greenes, M.D., Ph.D.2, Edward H. Shortliffe, M.D., Ph.D.3 1Stanford Medical Informatics, Stanford University School of Medicine, Stanford, CA2Decision Systems Group, Harvard Medical School, Brigham & Women’s Hospital, Boston, MA; 3 Department of Medical Informatics, Columbia University, New York, NYComputer-interpretable guidelines (CIGs) can d e-liver patient-specific decision support at the point of care. CIGs base their recommendations on eligibility and decision criteria that relate medical concepts to patient data. CIG models use expression languages for specifying these criteria, and define models for medical data to which the expressions can refer. In developing version 3 of the GuideLine Interchange Format (GLIF3), we used existing standards as the medical data model and expression language. We investigated the object-oriented HL7 Reference I n-formation Model (RIM) as a default data model. We developed an expression language, called GEL, based on Arden Syntax’s logic grammar. Together with other GLIF constructs, GEL reconciles incom-patibilities between the data models of Arden Syntax and the HL7 RIM. These incompatibilities include Arden’s lack of support for complex data types and time intervals, and the mismatch between Arden’s single primary time and multiple time attributes of the HL7 RIM.1 IntroductionComputer-interpretable guidelines (CIGs) that are linked to electronic medical records (EMRs) are a means for providing patient-specific advice automati-cally at the point-of-care. There are several method-ologies for encoding guidelines to make them com-puter-interpretable.1-4 All of these methodologies have constructs for defining criteria that relate medical concepts to patient data. While each methodology has different constructs, they all use some sort of expres-sion language for specifying logical decision and eligibility criteria, and a data model for medical con-cepts and patient data.GLIF is a format for encoding and sharing computer-interpretable clinical guidelines developed by the InterMed Collaboratory, a joint project of medical informatics groups at Harvard, Stanford, and Colum-bia universities. GLIF34 is GLIF’s third version. This paper discusses the approach that InterMed has used to design a data model and expression language for GLIF3. GLIF3 is designed to support computer-based execu-tion, with integration into clinical systems environ-ments and access to data in the EMR. While the con-trol-flow of the GLIF model was already formally expressed in GLIF2, decision and eligibility criteria, and patient data, were expressed in natural language that did not support execution.4 Therefore, our goals for GLIF3 were to develop: (1) a formal expression language that allows computation on time-stamped data, and (2) a data model that specifies a hierarchy of classes, which represent medical concepts and data and defines their characterizing attributes, including timing aspects.GLIF3 enables a guideline specification to be com-putable by formally specifying patient data items, action specifications, decision criteria, and eligibility criteria. For patient data items, there must be a defini-tion of medical concepts to which the data refer. This definition is given by a medical ontology, which con-tains the clinical meaning of the concepts as well as their structure, or data model. As an example, a test result should have at least two attributes, one that represents the test value and one for the biologically-relevant time of the test. Defining medical concepts in relation to standard medical vocabularies allows the guideline encoding to contain concepts that are institution-independent. Mapping to institution-dependent EMR codes and procedures can therefore be specified in another level.The GLIF3 guideline model uses the medical ontol-ogy in two ways. First, action specifications such as recommended test orders or notifications are defined in terms of formal medical concepts contained in the medical ontology. Second, a formal expression lan-guage uses data item names that are defined by the medical ontology. The expression language is used to express decision criteria, eligibility criteria, and p a-tient states.2 BackgroundIn creating GLIF3, we have tried to build on stan-dards so as to avoid duplicating the work of others.Toward this end, we determined that working with the HL7 organization would be desirable. The Clini-cal Guidelines Special Interest Group in HL7 was formed this year, to move toward a standardized rep-resentation for computer-based guidelines. To fur-ther the harmonization with ongoing HL7 efforts, we developed an expression language derived from Ar-den Syntax’s logic grammar. Arden Syntax is a stan-dard of the American Society for Testing and Materi-als6 and is being further developed under HL7. We used standard controlled medical vocabularies to de-fine medical concepts. We investigated the HL7 RIM version 1.0 as a default medical data model, as GLIF3 allows a guideline encoder to use any object-oriented (OO) model for defining the structure of the concepts used by the guideline.2.1 HL7’s Reference Information ModelThe clinical part of HL7’s RIM version 1.05 can model medical knowledge and patient data uni-formly. All clinical data are specified as Act objects, or Acts. Acts have attributes that provide information about clinical concepts they represent and their tim-ing. Relationships are used to model the Act circum-stances and allow grouping the Acts and reasoning about them. Acts are specialized into procedures per-formed on a patient, observations about the patient, medications given to the patient, and other kinds of services. Different sub-classes contain additional attributes that help characterize an Act. For example, Observation has a value attribute, whereas Medica-tion has attributes about dosage and route of admini-stration. Act objects have a mood that distinguishes the ways in which they can be conceived: as an event that occurred, a definition, intent, order, etc.52.2 Arden SyntaxArden Syntax is used to compose Medical Logic Modules (MLMs), each of which represents a single medical decision. Arden Syntax assumes a simple data model. It supports the following simple types: null, Boolean, number, time (timestamp), duration, and string. In addition, it supports a list of simple types and a query result, which is a list whose values are associated with a primary time – the medically relevant time of occurrence. An element of a list or a query result can have only one data value; no com-plex data values are supported. For example, a drug’s administration route and dose must be represented as different variables.An MLM has logic and data slots that define its deci-sion logic and the medical data items it uses. These slots use expressions whose syntax is defined by a grammar that we refer to as “Arden Syntax l o gic grammar”. The data slot of an MLM in Arden Syntax specifies how data are retrieved from an EMR. How-ever, only part of this specification is defined by the syntax. Mapping the MLM variables to the institu-tional EMR must be defined in a non-standard way within curly braces, a difficulty that is called the curly braces problem in the Arden community. This problem implies that sharing an MLM among differ-ent institutions requires adapting the MLMs to local systems and terminology standards.3 GLIF3’s expression language and its interaction with the medical data modelWe used Arden Syntax’s logic grammar as our basis for developing GLIF3’s expression language, as Ar-den Syntax has been accepted as a standard for deliv-ering alerts and reminders and is a m ature language that has been implemented and tested by multiple academic groups and commercial vendors.In developing GLIF3’s domain ontology, we used an OO data model because of the advantages it provides. One advantage is that an OO data model encapsulates the attributes of a patient data item into a single com-plex data object. For example, an HL7 Medication object can specify the particular medication used, the dose, the route of administration, the time period for which it was prescribed, and the date of first prescrip-tion, among other things. Encapsulation i m proves data integrity and makes conceptualizing easier. An-other advantage of an OO data model is that it e n-ables a declarative way of specifying medical con-cepts and data items used by a guideline. This method could facilitate mapping concepts and data items to institutional EMRs, thus addressing Arden’s curly braces problem, as suggested by Jenders et al 7. HL7 RIM is suitable as a medical data model for GLIF3 because it is general enough to represent the data structures for a wide range of medical data and concepts in a uniform manner while using a small number of classes.5 Moreover, HL7 is creating stan-dards for messaging interfaces for EMRs based on the RIM.Figure 1 shows an example of the definition of a GLIF3 data item representing an ACE Inhibitor treatment that uses the HL7 RIM Medication object.3.1 Incompatibilities between Arden Syntax logic grammar and the HL7 RIMUsing Arden Syntax’s logic grammar in the context of a complex OO data model is not straightforward, as there are several incompatibilities between the two models. As mentioned, the major incompatibility is that Arden Syntax has a simple data model, whereas the HL7 RIM model is object oriented. Thus, manyof the standard operators for Arden Syntax’s logic grammar cannot be used compatibly with the HL7 RIM.Another incompatibility is that Arden’s data model does not contain intervals as a data type, whereas the data model in the HL7 RIM and in GLIF3 allows expression of temporal and other types of intervals. In addition, HL7 RIM data model classes can have many different times associated with them, while Arden Syntax is limited to a single timestamp. For example, in HL7 RIM, an Act data model c lass may be associated with an activity time interval that repre-sents the time that the medical action took place, a critical time interval that represents the biologically relevant time period for the action, and a recording (availability) timestamp of the action. Arden Syntax defines many operators (e.g., latest) that assume that each data value can be associated with at most one time stamp. Clearly, this compatibility issue between the two models is problematic.(Instance of Variable_Data_Item){name: ACEI_Itemconcept: {(instance of Concept)concept_name: ACEI;concept_id: C0003015;concept_source: UMLS}data_model_class_id: Medicationdata_model_source_id: HL7-RIMdata_value: {(instance of Medication)service_cd: ACEI concept;mood_cd: event;critical_time: {low: null;high: null;}…} } Figure 1. A variable data item that defines ACE In-hibitor treatment. Attribute names are on the left, followed by their values. Complex values are in curly braces. The ACEI data specify the appropriate UMLS code and HL7 RIM class (Medication). The figure shows two attributes of Medication. Other attributes include dosage_quantity, rate_quantity, and route_code.3.2 The GLIF3 approach for using Arden Syn-tax’s logic grammar with the HL7 RIMTo address the incompatibilities, we created an e x-pression language that we call the Guideline Expres-sion Language (GEL). GEL is based on Arden Syn-tax’s logic grammar but supports (1) complex data structures; (2) timestamp, numeric, and duration i n-tervals; and (3) function definitions. In addition, we created constructs in GLIF3 that extract patient data taken from the EMR as HL7 RIM objects and trans-form them into GEL-compatible data structures. This solution allows an Arden-derived expression lan-guage to be used with an object-oriented data model. Examples of this approach follow.3.2.1 GELIn addition to supporting Arden Syntax’s basic data types, GEL supports complex data types, intervals, lists of these new types, and query results, the data values of which can be a complex type or an interval.A complex data type can have any number of attrib-utes, each of which is a GEL data type.GEL supports most of the Arden Syntax logic gram-mar operators1, and it also provides a mechanism for users to define functions. This feature is important for facilitating GLIF3’s support for user-defined data model classes and instances. A user may want to de-fine a data model in which primary time has no spe-cial meaning, but in which other meaningful relation-ships might be semantically important. For example, inheritance relationships among concepts (e.g., drug hierarchies) might be a central part of the data model. GEL supports complex data types, but their attributes are treated equally a nd do not convey any special semantic meaning. A way to restore such semantics is to define GEL functions. For example, a function can be defined that would deduce the children of a con-cept based on inheritance relationships. This a p-proach would restore the semantics of a “parent” at-tribute of a concept class.Adding several functions that can be used by GLIF3 when writing GEL expressions is useful. The most important of these functions, selectAttribute is analo-gous to the “dot” operator of OO languages, which is used to access an object’s attributes. For example, if currentACEI has a complex data type and one of its attributes is critical_time, then the value of criti-cal_time can be extracted by the following GEL e x-pression: “SelectAttribute(“critical_time”, currentACEI)”.3.2.2 GLIF3 constructs that return data consistent with the GEL data modelGLIF3’s Get_Data_Action retrieves patient data from the EMR as HL7 RIM objects and transforms them to query result data types. It allows a mapping to be specified from GLIF3’s default data model, the HL7 RIM, to GEL’s data model. A guideline author can use Get_Data_Action to specify that an attribute of a complex RIM class is the source of data values for the query result, and that values of another attribute serve as the primary time in the query result. Thus, the query result is a list of value and primary time 1 GEL currently does not support all of the numeric functions of Arden Syntax but they can be defined by GEL functions.pairs similar to Arden’s query result data type. How-ever, the value attribute in GEL’s query result holds a simple or a complex GEL type. Get_Data_Action specifies which data item from the EMR will serve as the source of data, and which attribute will be s e-lected from the data item. In this way, specific attrib-utes of the data item can serve as the source of the data, rather than the entire data item. For ex ample, Get_Data_Action can retrieve all instances of Medi-cation data items that refer to ACE Inhibitor treat-ments (Figure 2). It can assign the value of their data value attribute, which is a RIM Medication o bject (Figure 1), to the query result’s “value” attribute, and assign the end time of each Medication treatment (critical_time.high) to the “primary time” attribute of the query result elements.(Instance of Get_Data_Action)data_item: ACEI_Itemattribute_to_be_assigned: data_valuevariable_name: ACEIprimary_time: data_value.critical_time.high(Instance of Query_Result)value: (Medication instance) value: (Medication instance) .. primary_time: 2002-01-08 primary_time: 1999-03-02Figure 2. The Get_Data_Action and its query result that holds ACEI Medication objects data values and their primary times. In this example, the latest query result element has the primary_time 2002-01-08.Because GEL query results have the same data model as Arden query results, many Arden operators can be used in GEL expressions. For example, GEL expres-sions can be written to select the latest element in an ACEI query result, to check that the latest prescrip-tion applies to the current date, or to form a list of current medications given to a patient out of single current medications. The Assignment_Action of GLIF3 can be used to set a variable with a result of a GEL expression. For example, a variable called cur-rent ACEI can be assigned with a GEL expression that returns the value of the “value” attribute of the latest ACEI element of the query result in Figure 2, provided its primary time is current (>= now), i.e., the medication prescription is still active. In the e x-ample shown in Figure 2, the primary time of the latest ACEI query result element is 2002-01-08. Fig-ure 3(a) shows the GEL expression.(a) CurrentACEI := SelectAttribute(“value”, latest ACEIwhere time of it >= now)(b) ACEIStartTime := SelectAttribute(“low”,SelectAttribute(“critical_time”, currentACEI)); ACEIStartTime < (now – 4 weeks)Figure 3. GEL expressions The variables that hold results of Get_Data and As-signment_Action can be u sed within decision and eligibility criteria expressed in GEL. Figure 3(b) specifies an expression and a criterion in GEL. The expression extracts the start time (critical_time.low) of the current ACE Inhibitor prescription from the value of variable currentACEI, extracted by the e x-pression shown in Figure 3(a). The criterion checks whether an ACE Inhibitor was prescribed before 4 weeks ago.4 DiscussionOur solutions can help guide others who are seeking to use Arden Syntax as an expression language that will be used with object-oriented EMR models. HL-7, within which both the RIM and Arden Syntax are being developed, is also seeking ways in which the two models can be harmonized. Our suggestions can provide possible direction to this effort.The InterMed Collaboratory has explored the possi-bility of reconciling Arden Syntax’s logical grammar with domain concepts and patient data represented as complex objects. GEL takes a conservative approach to reconciling the difference between Arden Syntax’s data model and object-oriented medical data models. It does so by augmenting the data model in a limited way and adding several functions that enable (1) ex-traction of attributes from the complex data types, and (2) operations on lists of complex data values, while maintaining the existing Arden operators. GEL’s data model allows complex data types. How-ever, GEL’s expression language does not directly support the object-oriented model. Instead, the selec-tAttribute function of GEL is used to extract attribute values from complex types. In addition, user-defined functions can be used to convey the semantics of re-lationships provided by the OO model, such as the inheritance relationship discussed in Section 3.2.1. They can also be used to implement methods of ob-ject-oriented classes.A disadvantage of GEL is its limitation for specifying expressions that relate to more than one of an object’s attributes. For example, in a query result of Amo x-icillinPrescriptions where the result values have a dose of 500 mg and an oral route, it is useful to write an expression that selects the start time from the val-ues of the result. The “where” part of the expression would ideally be written as “where selectAttrib-ute(“route”, it) = “oral”…”. However, “it” cannot be an argument of a GEL function because the function needs to get the entire list or query result as a parameter. This restriction bars writing “where” clauses that use GEL functions. As a result, “where”cannot be used to check constraints on specific a t-tribute values other than primary_time. Instead, there is a need to specify a loop that looks at single ele-ments in the list or query result, and evaluates criteria imposed on the element’s attribute values. Alterna-tively, specific GEL functions can be written to i m-plement such loops for special cases.We used GLIF3 for encoding two clinical guidelines: management of chronic cough and hypertension management. GEL was sufficient to encode the cough guidelines, but we had to add a special GEL function for encoding the hypertension guideline. While the special function allowed us to encode the second guideline, adding special-purpose functions is not a principled approach and there are likely to be additional expressions that cannot be encoded with-out adding more functions into GEL.Other studies have tried to reconcile different data models. One study described a translation between complex data models using an intermediate OO rule-based approach.8 When they translate an OO model into a non-OO model, the methods in the OO model cannot be translated, and they show up as a discrep-ancy between the two models. Another group devel-oped a translation of high-order object queries to first-order relational queries.9 We did not find pub-lished work that tried to reconcile an OO data model with an expression language that does not support operators on relations between attributes of complex schema/objects.We are developing an OO expression language that can associate methods with each class in the OO medical data model. Binding methods with the data will provide a more systematic and scalable approach to defining an expression language. The OO expres-sion language can have several predefined classes, such as List and Math, which contain Arden’s list and numeric operators. For example, merge will be a method of the List class, whereas sin will be a method of the Math class. In addition to the built-in classes, the medical domain ontology classes will likely define methods to which the OO expression language can refer.We have found that the Arden community also de-sires development of an expression language that can work with the object-oriented HL7 RIM data model. As part of the Arden and Clinical Guidelines special interest groups in HL7, we will be further developing the expression language that will work with the RIM. For the reasons we have outlined in this paper, we believe that designing an expression syntax that is designed from the outset to be compatible with the object-oriented data model is preferable to trying to reconcile two radically different data mo dels.AcknowledgementsSupported in part by Grant LM06594 with support from the Department of the Army, Agency for Healthcare Research and Quality, and the National Library of Medicine.References1. Sugden B, Purves IN, Booth N, Sowerby M. The PRODIGY Project - the Interactive Development of the Release One Model. AMIA Symp 1999; 359-363.2. Fox J, Thomson R. Decision support and disease management: alogic engineering approach. IEEE transactions on Information Tech in Biomed 1998;2(4):1-12.3. Musen MA, Tu SW, Das AK, Shahar Y. EON: A component-based approach to automation of proto-col-directed therapy. JAMIA 1996;3:367-388.4. Peleg M, Boxwala A, Ogunyemi O, et al. GLIF3: The E volution of a Guideline Representation Format. Proc. AMIA Annual Symposium; 2000; p. 645-649.5. Schadow G, Russler DC, Mead CN, McDonald CJ. Integrating Medical Information and Knowledge in the HL7 RIM. Proc. AMIA Annual Symposium 2000; p. 764-768.6. Hripcsak G, Ludemann P, Pryor TA, Wigertz OB, Clayton PD. Rationale for the Arden Syntax. Comput Biomed Res 1994;27(4):291-324.7. Jenders RA, Sujansky W, Broverman C, Chadwick M. Towards Improved Knowledge Sharing: Assess-ment of the HL7 Reference Information Model to Support Medical Logic Module Queries. Proc AMIA Annu Fall Symp; 1997; p. 308-312.8. Su SYW, Fang SC, Lam H. An object-oriented rule-based approach to data model and schema trans-lation. Florida: Departament of Computer and Infor-mation Sciences, U niversity of Florida; 1992. Report No.: TR-92-015.9. Quian X, Raschid L. Query Interoperation Among Object-Oriented and Relational Databases. In: Proc. of the 11th Intl. Conf. on Data Engineering; Taipei, Taiwan; 1995.。