S04.A Frequent Pattern Based Framework for Event Detection in Sensor Network Stream Data
- 格式:pdf
- 大小:271.61 KB
- 文档页数:10
Finding Minimum Representative Pattern SetsGuimei Liu School of Computing National University ofSingapore liugm@.sgHaojun ZhangSchool of ComputingNational University ofSingaporezhanghao@.sgLimsoon WongSchool of ComputingNational University ofSingaporewongls@.sgABSTRACTFrequent pattern mining often produces an enormous num-ber of frequent patterns,which imposes a great challenge on understanding and further analysis of the generated pat-terns.This calls for the need offinding a small number of representative patterns to best approximate all other pat-terns.An ideal approach should1)produce a minimum number of representative patterns;2)can restore the sup-port of all patterns with error guarantee;and3)have good efficiency.Few existing approaches can satisfy all the three requirements.In this paper,we develop two algorithms, MinRPset and FlexRPset,forfinding minimum representa-tive pattern sets.Both algorithms provide error guarantee. MinRPset produces the smallest solution that we can pos-sibly have in practice under the given problem setting,and it takes a reasonable amount of time tofinish.FlexRPset is developed based on MinRPset.It provides one extra pa-rameter K to allow users to make a trade-offbetween result size and efficiency.Our experiment results show that Min-RPset and FlexRPset produce fewer representative patterns than RPlocal—an efficient algorithm that is developed for solving the same problem.FlexRPset can be slightly faster than RPlocal when K is small.Categories and Subject DescriptorsH.2.8[DATABASE MANAGEMENT]:Database Ap-plications—Data MiningKeywordsrepresentative patterns,frequent pattern summarization 1.INTRODUCTIONFrequent pattern mining is an important problem in the data mining area.It wasfirst introduced by Agrawal et al.in1993[4].Frequent pattern mining is usually per-formed on a transaction database D={t1,t2,...,t n},where t j is a transaction containing a set of items,j∈[1,n].Let I={i1,i2,...,i m}be the set of distinct items appearing in D.A pattern X is a set of items in I,that is,X⊆I.If a transaction t∈D contains all the items of a pattern X, then we say t supports X and t is a supporting transaction of X.Let T(X)be the set of transactions in D support-ing pattern X.The support of X,denoted as supp(X),is defined as|T(X)|.If the support of a pattern X is larger than a user-specified threshold min sup,then X is called a frequent pattern.Given a transaction database D and a minimum support threshold min sup,the task of frequent pattern mining is tofind all the frequent patterns in D with respect to min sup.Many efficient algorithms have been developed for mining frequent patterns[9].Now the focus has shifted from how to efficiently mine frequent patterns to how to effectively uti-lize them.Frequent patterns has the anti-monotone prop-erty:if a pattern is frequent,then all of its subsets must be frequent too.On dense datasets and/or when the minimum support is low,long patterns can be frequent.All the sub-sets of these frequent long patterns are frequent too based on the anti-monotone property.This leads to an explosion in the number of frequent patterns.The huge quantity of patterns can easily become a bottleneck for understanding and further analyzing frequent patterns.It has been observed that the complete set of frequent pat-terns often contains a lot of redundancy.Many frequent patterns have similar items and supporting transactions.It is desirable to group similar patterns together and represent them using one single pattern.Frequent closed pattern is proposed for this purpose[14].Let X be a pattern and S be the set of patterns appearing in the same set of transactions as X,that is,S={Y|T(Y)=T(X)}.The longest pattern in S is called a closed pattern,and all the other patterns in S are subsets of it.The closed pattern of S is selected to represent all the patterns in S.The set of frequent closed patterns is a lossless representation of the complete set of frequent patterns.That is,all the frequent patterns and their exact support can be recovered from the set of frequent closed patterns.The number of frequent closed patterns can be much smaller than the total number of frequent patterns, but it can still be tens of thousands or even more. Frequent closed patterns group patterns supported by ex-actly the same set of transactions together.This condition is too restrictive.Xin et al.[23]relax this condition to fur-ther reduce pattern set size.They propose the concept of δ-covered to generalize the concept of frequent closed pat-To appear in Proc. KDD 2012.tern.A pattern X1isδ-covered by another pattern X2if X1 is a subset of X2and(supp(X1)−supp(X2))/supp(X1)≤δ. The goal is tofind a minimum set of representative pat-terns that canδ-cover all frequent patterns.Whenδ=0,the problem corresponds tofinding all frequent closed patterns. Xin et al.show that the problem can be mapped to a set cover problem.They develop two algorithms,RPglobal and RPlocal,to solve the problem.RPglobalfirst generates the set of patterns that can beδ-covered by each pattern,and then employs the well-known greedy algorithm[8]for the set cover problem tofind representative patterns.The optimal-ity of RPglobal is determined by the optimality of the greedy algorithm,so the solution produced by RPglobal is almost the best solution we can possibly have in practice.However, RPglobal is very time-consuming and space-consuming.It is feasible only when the number of frequent patterns is not large.RPlocal is developed based on FPClose[10].It inte-grates frequent pattern mining with representative pattern finding.RPlocal is very efficient,but it produces much more representative patterns than RPglobal.In this paper,we analyze the bottlenecks forfinding a mini-mum representative pattern set and develop two algorithms, MinRPset and FlexRPset,to solve the problem.Algorithm MinRPset is similar to RPglobal,but it utilizes several tech-niques to reduce running time and memory usage.In par-ticular,MinRPset uses a tree structure called CFP-tree[13] to store frequent patterns compactly.The CFP-tree struc-ture also supports efficient retrieval of patterns that areδ-covered by a given pattern.Our experiment results show that MinRPset is only several times slower than RPlocal, while RPglobal is several orders of magnitude slower.Algo-rithm FlexRPset is developed based on MinRPset.It pro-vides one extra parameter K which allows users to make a trade-offbetween efficiency and the number of represen-tative patterns selected.When K=∞,FexRPset is the same as MinRPset.With the decrease of K,FlexRPset be-comes faster,but it produces more representative patterns. When K=1,FlexRPset is slightly faster than RPlocal,and it still produces fewer representative patterns than RPlocal in almost all casese.The rest of the paper is organized as follows.Section2in-troduces related work.Section3gives the formal problem definition.The two algorithms,MinRPset and FlexRPset, are described in Section4and Section5respectively.Ex-periment results are reported in Section6.Finally,Section 7concludes the paper.2.RELATED WORKThe number of frequent patterns can be very large.Besides frequent closed patterns,several other concepts,such as gen-erators[5],non-derivable patterns[7],maximal patterns[12], top-k frequent closed patterns[19]and redundancy-aware top-k patterns[22],have been proposed to reduce pattern set size.The number of generators is larger than that of closed patterns.Furthermore,the set of generators itself is not lossless.It requires a border to be lossless[6].Non-derivable patterns are generalizations of generators.A bor-der is also needed to make non-derivable patterns lossless. The number of maximal patterns is much smaller than the number of closed patterns.All frequent patterns can be recovered from maximal patterns,but their support infor-mation is lost.Another work that also ignores the support information is[3].It selects k patterns that best cover a collection of patterns.Frequent closed patterns preserve the exact support of all frequent patterns.In many applications,knowing the ap-proximate support of frequent patterns is sufficient.Several approaches have been proposed to make a trade-offbetween pattern set size and the precision of pattern support.The work by Xin et al.[23]described in Section1is one such ap-proach.Another approach proposed by Pei et al.[15]mines a minimal condensed pattern-base,which is a superset of the maximal pattern set.Pei et e heuristic algorithms tofind condensed pattern-bases.All frequent patterns and their support can be restored from a condensed pattern-base with error guarantee.Yan et al.[24]use profiles to summarize patterns.A profile consists of a master pattern,a support and a probability dis-tribution vector which contains the probability of the items in the master pattern.The set of patterns represented by a profile are subsets of the master pattern,and their support is calculated by multiplying the support of the profile and the probability of the corresponding items.To summarize a collection of patterns using k profiles,Yan et al.partition the patterns into k clusters,and use a profile to describe each cluster.There are several drawbacks with this profile-based approach:1)It makes contradictory assumptions.On one hand,the patterns represented by the same profile are sup-posed to be similar in both item composition and supporting transactions,thus the items in the same profile are expected to be strongly correlated.On the other hand,based on how the support of patterns are calculated from a profile,the items in the same profile are expected to be independent. It is hard to make a balance between the two contradicting requirements.2)There is no error guarantee on the esti-mated support of patterns.3)The proposed algorithm for generating profiles is very slow because it needs to scan the original dataset repeatedly.4)The support of a pattern is not determined by a single profile,but by all the profiles whose master pattern is a superset of the pattern.Thus it is very costly to recover the support of a pattern using profiles.5)The boundary between frequent patterns and infrequent patterns cannot be determined using profiles. Several improvements have been made to the profile-based approach.Jin et al.[11]develop a regression-based ap-proach to minimize restoration error.They cluster patterns based on restoration errors instead of similarity between pat-terns,thus their approach can achieve lower restoration er-ror.However,there is still no error guarantee on the restored support.CP-summary[16]uses conditional independence to reduce restoration error.It adds one more component to each profile:a pattern base,and the new profile is called c-profile.The items in a c-profile are expected to be in-dependent with respect to the pattern base.CP-summary provides error guarantee on estimated support.However, patterns of a c-profile often share little similarity,so a c-profile is not representative of its patterns any more.Profiles can be considered as generalizations of closed pat-terns.Wang et al.[18]make generalization on another con-cise representation of frequent patterns—non-derivable pat-terns.They use Markov Random Field(MRF)to summa-rize frequent patterns.The support of a pattern is estimated from its subsets,which is similar to non-derivable patterns. Markov Random Field model is not as intuitive as profiles, and it is also expensive to learn.It does not provide error guarantee on estimated support either.3.PROBLEM STATEMENTWe follow the problem definition in[23].The distance be-tween two patterns is defined based on their supporting transaction sets.Definition1(D(X1,X2)).Given two patterns X1and X2,the distance between them is defined as D(X1,X2)= 1−|T(X1)∩T(X2)||T(X1)∪T(X2)|.Definition2(ǫ-covered).Given a real numberǫ∈[0,1]and two patterns X1and X2,we say X1isǫ-covered by X2if X1⊆X2and D(X1,X2)≤ǫ.In the above definition,condition X1⊆X2ensures that the two patterns have similar items,and condition D(X1,X2)≤ǫensures that the two patterns have similar supporting trans-action sets and similar support.Based on the definition,a patternǫ-covers itself.Lemma1.Given two patterns X1and X2,if pattern X1 isǫ-covered by pattern X2and we use supp(X2)to approxi-mate supp(X1),then the relative error supp(X1)−supp(X2)supp(X1)isno larger thanǫ.Proof.supp(X1)−supp(X2)supp(X1)=1−supp(X2)supp(X1)=1−|T(X2)||T(X1)|≤1−|T(X1)∩T(X2)||T(X1)∪T(X2)|≤ǫ.Lemma2.If a frequent pattern X1isǫ-covered by pattern X2,then supp(X2)≥min sup·(1−ǫ).Proof.Based on Lemma1,1−supp(X2)supp(X1)≤ǫ,so we have supp(X2)≥supp(X1)·(1−ǫ)≥min sup·(1−ǫ).Our goal here is to select a minimum set of patterns that canǫ-cover all the frequent patterns.The selected patterns are called representative patterns.Based on Lemma1,the restoration error of all frequent patterns is bounded byǫ. We do not require representative patterns to be frequent. Based on Lemma2,the support of representative patterns must be no less than min sup·(1−ǫ).The problem is how to find a minimum representative pattern set?In the next two sections,we describe two algorithms to solve the problem.4.THE MINRPSET ALGORITHMLet F be the set of frequent patterns in a dataset D with respect to threshold min sup,andˆF be the set of patterns with support no less than min sup·(1−ǫ)in D.Obviously, F⊆ˆF.Given a pattern X∈ˆF,we use C(X)to denote the set of frequent patterns that can beǫ-covered by X.We have C(X)⊆F.If X is frequent,we have X∈C(X).A straightforward algorithm forfinding a minimum repre-sentative pattern set is as follows.First we generate C(X) for every pattern X∈ˆF,and we get|ˆF|sets.The elements of these sets are frequent patterns in F.Let S={C(X)|X∈ˆF}.Finding a minimum representative pattern set is nowequivalent tofinding a minimum number of sets in S that can cover all the frequent patterns in F.This is a set cover problem,and it is NP-hard.We use the well-known greedy algorithm[8]to solve the problem,which achieves an ap-proximation ratio of k i=11i,where k is the maximal size of the sets in S.We call this simple algorithm MinRPset.The greedy algorithm is essentially the best-possible poly-nomial time approximation algorithm for the set cover prob-lem.Our experiment results have shown that it usually takes little time tofinish.Generating C(X)s is the main bottle-neck of the MinRPset algorithm when F andˆF are large because we need tofind C(X)s over a large F for a large number of patterns inˆF.We use the following techniques to improve the efficiency of MinRPset:1)consider closed pat-terns only;2)use a structure called CFP-tree tofind C(X)s efficiently;and3)use a light-weight compression technique to compress C(X)s.4.1Considering closed patterns onlyA pattern is closed if all of its supersets are less frequent than it.If a pattern X1is non-closed,then there exists another pattern X2such that X1⊂X2and supp(X2)=supp(X1).Lemma3.Given two patterns X1and X2such that X1⊆X2and supp(X1)=supp(X2),if X2isǫ-covered by a pat-tern X,then X1must beǫ-covered by X too.The above lemma directly follows from Definition2.It im-plies that instead of covering all frequent patterns,we can cover frequent closed patterns only,which leads to the fol-lowing lemma.Lemma4.Let F be the set of frequent patterns in a dataset D with respect to a threshold min sup.If a set of patterns Rǫ-covers all the frequent closed patterns in F,then Rǫ-covers all the frequent patterns in F.Lemma5.Given two patterns X1and X2such that X1⊆X2and supp(X1)=supp(X2),if a pattern X isǫ-covered by X1,then X must beǫ-covered by X2too.This lemma also directly follows from Definition2.It sug-gests that we can use closed patterns only to cover all fre-quent patterns.The number of frequent closed patterns can be orders of magnitude smaller than the total number of frequent pat-terns.Consider only closed patterns improves the efficiency of the MinRPset algorithm in two aspects.On one hand,itTable 1:Anexample dataset D TID Transactions 1a,c,e,f,m,p 2b,e,v3a,b,f,m,p 4d,e,f,h,p 5a,c,d,m,v 6a,c,h,m,s 7a,f,m,p,u 8a,b,d,f,g Table 2:Frequent patterns (minsup =3)ID Itemsets ID itemsets ID itemsets 1a:69ac:317acm:32b:310af:418afm:33c:311am:519afp:34d:312ap:320amp:35e:313cm:321fmp:36f:514fm:322afmp:37m:515fp:48p:416mp:3Figure 1:CFP-tree constructed on the frequent pat-terns in Table 2reduces the size of individual C (X )s since now they contain only frequent closed patterns.On the other hand,it reduces the number of patterns whose C (X )needs to be generated as now we need to generate C (X )s for closed patterns only.4.2Using CFP-tree to find C (X )s efficientlyThe CFP-tree structure is specially designed for storing and querying frequent patterns [13].It resembles a set-enumer-ation tree [17].We use an example dataset D in Table 1to illuminate its structure.Table 2shows all the frequent pat-terns in D when min sup =3.The CFP-tree constructed from the frequent patterns is shown in Figure 1.Each node in a CFP-tree is a variable-length array.If a node contains multiple entries,then each entry contains exactly one item.If a node has only one entry,then it is called a singleton node .Singleton nodes can contain more than one item.For example,node 2in Figure 1is a singleton node with two items m and a .An entry E stores several pieces of information:(1)m items (m ≥1),(2)the support of E ,(3)a pointer pointing to the child node of E and (4)the id of the entry which is assigned using preordering.In the rest of this paper,we use E.items ,E.support ,E.child and E.preorder to denote the above fields.Every entry in a CFP-tree represents one or more patterns with the same support,and these patterns contain the items on the path from the root to the entry.Items contained insingleton nodes are optional.Let E be an entry,X m be the set of items in the multiple-entry nodes and X s be the set of items in the singleton nodes on the path from the root to the parent of E respectively.The set of patterns represented by E is {X m ∪Y ∪Z |Y ⊆X s ,Z ⊆E.items,Z =∅}.The longest pattern represented by E is X m ∪X s ∪E.items .Let us look at an example.Node 4contains only one entry.For this entry,we have X m ={p },X s ={f }and E.items ={m,a }.Hence node 4represents 6itemsets:{p,m },{p,a },{p,m,a },{p,f,m },{p,f,a }and {p,f,m,a }.We use E.pattern to denote the longest pattern represented by E .The above feature makes CFP-tree a very compact structure for storing frequent patterns.The number of entries in a CFP-tree is much smaller than the total number of patterns stored in the tree.For each entry,we consider its longest pattern only based on Lemma 3and Lemma 5.For an entry E ,only its longest pattern can be closed.Other patterns of E that are shorter than the longest pattern cannot be closed based on the definition of closed patterns.If the longest pattern of an entry is not closed,then we call the entry a non-closed entry .The CFP-tree structure has the following property.Property 1.In a multiple-entry node,the item of an entry E can appear in the subtrees pointed by entries before E ,but it cannot appear in the subtrees pointed by entries after E .For example,in the root node of Figure 1,item p is allowed to appear in the subtrees pointed by entries b ,c ,d and e ,but it is not allowed to appear in the subtrees pointed by entries f ,m and a .This property implies the following lemma.Lemma 6.In a CFP-tree,the supersets of a pattern can-not appear on the right of the pattern.They appear either on the left of the pattern or in the subtree pointed by the pattern.4.2.1Finding one C (X )Given a pattern X ,C (X )contains the subsets of X that can be ǫ-covered by X .CFP-tree supports efficient retrieval of subsets of patterns.To find the subsets of a pattern X in a CFP-tree,we simply traverse the CFP-tree and match the items of the entries against X .For an entry E in a multiple-entry node,if its item appears in X ,then entry E represents some subsets of X and the search is continued on its subtree.Otherwise,entry E and its subtree is skipped because all the patterns in the subtree of E contain E.items /∈X ,and these patterns cannot be subsets of X .An entry E in a singleton node can contain items not in X ,and these items are simply ignored.Algorithm 1shows the pseudo-codes for retrieving C (X ).Initially,cnode is the root node of the CFP-tree.Parameter Y contains the set of items to be searched in cnode .It is set to X initially.Once an entry E is visited,the item of E is removed from Y when Y is passed to the subtree of E (line 8,18).The item of E is also excluded when Y is passed to the entries after E (line 21).This is because the item ofAlgorithm 1Search CX AlgorithmInput:cnode is a CFP-tree node;//cnode is the root node initially.Y is the set of items to be searched in cnode ;//Y =X initially.supp (X )is the support of X ;Output:C (X );Description:1:if cnode contains only one entry E then2:if E.support ==supp (X )AND E.pattern ⊂X then 3:Mark E as non-closed;4:if E is not marked as non-closed then 5:if E.items Y =∅AND E.support ≤supp (X )(1−ǫ)then 6:Put E.preorder into C (X );7:if E.child =NULL AND Y −E.items =∅then 8:Search CX(E.child ,Y −E.items ,supp (X ));9:else if cnode contains multiple entries then10:for each entry E ∈cnode from left to right do 11:if E.items ∈Y then 12:if E.support ==supp (X )AND E.pattern ⊂Xthen13:Mark E as non-closed;14:if E is not marked as non-closed then15:if E.support ≤supp (X )(1−ǫ)then 16:Put E.preorder into C (X );17:if E.child =NULL AND Y −E.items =∅then 18:Search CX(E.child ,Y −E.items ,supp (X ));19:if supp (E.pattern ∪Y )>supp (X )(1−ǫ)then 20:return ;21:Y =Y −E.items ;E cannot appear in the subtrees pointed by entries after E based on Property 1.During the search of C (X )s,we also mark non-closed pat-terns.If the longest pattern of E is a proper subset of X and E.support =supp (X ),then E is marked as non-closed (line 2-3,12-13),and it is skipped in subsequent search.The early termination technique.If a pattern is ǫ-covered by X ,then its support must be no larger than supp (X )(1−ǫ)based on Definition 2.We use this requirement to further improve the efficiency of Algorithm 1.Given an en-try E in a multiple-entry node,after we visit the subtree ofE ,if we find supp (E.pattern ∪Y )>supp (X )(1−ǫ),where Y is the set of items that is passed to E ,then there is no need to visit the subtrees pointed by entries after E (line 19-0).The reason being that all the subsets of X in these subtrees must be subsets of (E.pattern ∪Y ),and their support must belarger than supp(X )(1−ǫ)too based on the anti-monotone prop-erty.We call this pruning technique the early termination technique.4.2.2Finding C (X )s of all closed patternsAlgorithm 2shows the pseudo-codes for generating all C (X )s.It traverses the CFP-tree in depth-first order from left to ing this traversal order,the supersets of a pattern X that are on the left of X are visited before X .If the support of X is the same as one of these supersets,then X should be marked as non-closed when Search CX is called for that superset.If X is not marked as non-closed when X is visited,it means that X is more frequent than all its supersets on its left.Based on Lemma 6,the supersets of a pattern appear either on the left of the pattern or in thesubtree pointed by the pattern.If X is also more frequentthan its child entries,then X must be closed.The conditions listed at line 2and line 3ensure that Algorithm 2generates C (X )s for only closed patterns.Algorithm 2DFS Search CXs AlgorithmInput:cnode is a CFP-tree node;//cnode is the root node initially.Output:C (X )s;Description:1:for each entry E ∈cnode from left to right do 2:if E is not marked as non-closed then 3:if E is more frequent than its child entries then 4:X =E.pattern ;5:C (X )=Search CX(root ,X ,E.support );6:if E.child =NULL then 7:DFS Search CXs(E.child );In Algorithm 2,if an entry E is marked as non-closed be-cause it has the same support as one of its supersets on its left,then all the patterns in the subtree pointed by E are non-closed.We can safely skip E and its subtree in subsequent traversal (line 2).The same pruning is done in Algorithm 1(line 4,14).This observation has been used in almost all frequent closed pattern mining algorithms to prune non-closed patterns [10,20].4.3Compressing C (X )sIn a CFP-tree,each entry E has an id,which is denoted as E.preorder .In Algorithm 1,we put the ids of entries in C (X )s if E is ǫ-covered by X (line 6,16).Each id takes 4bytes.Both the total number of C (X )s and the size of indi-vidual C (X )s grow with the number of frequent (closed)pat-terns.When the number of frequent closed patterns is large,the total size of C (X )s can be very large.If the main mem-ory cannot accommodate all C (X )s,the greedy set cover algorithm becomes very slow.To alleviate this problem,we compress C (X )s using a light-weight compression technique [21].Each entry id occupies one or more bytes depending on its value.To reduce the number of bytes needed for storing entries ids,we sort the entry ids in ascending order and store the differences be-tween consecutive ids instead.Our experiment results show that this compression technique can reduce the space needed for storing C (X )s by about three quarters.Algorithm 3MinRPset AlgorithmDescription:1:Mine patterns with support ≥min sup ·(1−ǫ)and store them in a CFP-tree;let root be the root node of the tree;2:DFS Search CXs(root );3:Remove non-closed entries from C (X )s;4:Apply the greedy set cover algorithm on C (X )s to find rep-resentative patterns;5:Output representative patterns;Algorithm 3shows the pseudo-codes of the MinRPset algo-rithm,and it calls Algorithm 2to find C (X )s.Note that we store all patterns,including non-closed patterns,with support no less than min sup ·(1−ǫ)in a CFP-tree (line 1).Non-closed patterns are identified during the search of C (X )s.Hence it is possible that some C (X )s contains somenon-closed entries.These non-closed entries are removed from C(X)s(line3)before the greedy set cover algorithm is applied.5.THE FLEXRPSET ALGORITHMWhen the number of frequent patterns is large on a dataset, the MinRPset algorithm may become very slow since it needs to search subsets over a large CFP-tree for a large number of patterns.Furthermore,the set of C(X)s may become too large tofit into the main memory.To solve this problem, instead of searing C(X)s for all closed patterns,we can se-lectively generate C(X)s such that every frequent pattern is covered a sufficient number of times,in the hope that the greedy set cover algorithm can stillfind a near-optimal solu-tion.Apparently,the fewer the number of C(X)s generated, the more efficient the algorithm is.This is the basic idea of the FlexRPset algorithm.The FlexRPset algorithm uses a parameter K to control the minimum number of times that a frequent pattern needs to be covered.Algorithm4shows how FlexRPset selectively generates C(X)s.The other steps of FlexRPset are the same as those of MinRPset.Algorithm4Flex Search CXs AlgorithmInput:cnode is a CFP-tree node;//cnode is the root node initially.K is the minimum number of times that a frequent closed pattern needs to be covered;Output:C(X)s;Description:1:for each entry E∈cnode from left to right do2:if E is not marked as non-closed then3:if E.child=NULL then4:Flex Search CXs(E.child);5:if E is more frequent than its child entries then6:if(E is frequent AND E is covered less than K times) OR(∃an ancestor entry E′of E such that E′is fre-quent,E′can beǫ-covered by E and E′is covered lessthan K times)then7:X=E.pattern;8:C(X)=Search CX(root,X,E.support);Algorithm4still uses the depth-first order to traverse a CFP-tree from left to right.It traverses the subtree of an entry Efirst(line3-4)before it processes entry E(line5-8), which means that when entry E is processed,all the super-sets of E have been processed already based on Lemma6, and entry E cannot be covered any more except by E itself. If E is frequent and it is covered less than K times,then we generate C(E.patterns)to cover E(thefirst condition at line6).If E has already be covered at least K times when E is visited,then we look at the ancestor entries of E.For an ancestor entry E′of E,most of its supersets are already processed too when E is visited,hence not many remaining entries can cover E′.If E′is frequent,E′can beǫ-covered by E and E′is covered less than K times,then we also gen-erate C(E.patterns)to cover E′(the second condition at line6).6.EXPERIMENTSIn this section,we study the performance of our algorithms. The experiments were conducted on a PC with2.33Ghz Intel Duo Core CPU and3.25GB memory.Our algorithms were implemented using C++.We downloaded the source codes of RPlocal from the IlliMine package[2].All source codes were compiled using Microsoft Visual Studio2005.6.1DatasetsThe datasets used in the experiments are shown in Table 3.They are obtained from the FIMI repository[1].Table 3shows some basic statistics of these databases:number of transactions(|T|),number of distinct items(|I|),maxi-mum length of transactions(MaxTL)and average length of transactions(AvgTL).Table3:Datasetsdataset|T||I|MaxTL AvgTLaccidents3401834685233.81chess3196753737.00connect675571294343.00mushroom81241192323.00pumsb4904621137474.00pumsb star4904620886350.486.2Comparing with RPlocalThefirst experiment compares MinRPset and FlexRPset with RPlocal.Let N be the number of representative pat-terns generated by an algorithm.Figure2shows the ratio of N to the number of representative patterns generated by RPlocal when min sup is varied.Obviously,the ratio is al-ways1for RPlocal.ǫis set to0.2on mushroom and to0.1 on other datasets.In[23],the authors have shown that the number of representative patterns selected by RPlocal can be orders of magnitude smaller than the number of frequent closed patterns.MinRPset further reduces the number of representative patterns by10%-65%.The FlexRPset algo-rithm generates a similar number of representative patterns with MinRPset when K is large.When K gets smaller,the number of representative patterns generated by FlexRPset increases.When K=1,FlexRPset still generates less repre-sentative patterns than RPlocal in most of the cases. Figure3shows the running time of the several algorithms when min sup is varied.The running time of MinRPset and FlexRPset includes time for mining frequent patterns.The running time of all algorithms increases with the descrease of min sup.MinRPset has similar running time with RPlocal on mushroom.On accidents and pumsb star,when min sup is relatively high,the running time of MinRPset and RPlocal is similar too.In other cases except for dataset pumsb, MinRPset is several times slower than RPlocal.RPglobal is often hundreds of times slower than RPlocal as shown in [23].This indicates the techniques used in MinRPset is very effective in reducing running time.On pumsb,MinRPset is more than10times slower than RPlocal when min sup≤0.7,but it achieves the greatest reduction in the number of representative patterns on this dataset.FlexRPset has similar running time with RPlocal when K is small.When K=10,the running time of FlexRPset is close to that of RPlocal,and the number of representative pat-terns generated by FlexRPset is close to that of MinRPset.。
2016年6月大学英语四级真题第二套2016年6月大学英语四级真题(第2套)该真题及答案摘自于巨微英语四级真题逐句精解,中间省略听力原文,详细资料可以关注公众号:巨微英语四六级或者官网渠道获得电子版。
Part I WritingDirections:For this part, you are allowed 30 minutes to write a letter to express your thanks to one of your school teachers upon entering college. You should write at least 120 words but no more than 180 words.Part ⅡListening Comprehensionwill hear two or three questions. Both the news report and the questions will be spoken only once. After you hear a question, you must choose the best answer from the four choices marked A), B), C) and D). Then mark the corresponding letter on Answer Sheet 1with a single line through the centre.Questions 1 and 2 are based on the news report you have just heard.1. A)How college students can improve their sleep habits.B)Why sufficient sleep is important for college students.D)Whether the Spanish company could offer better service.4. A)Inefficient management. B)Poor ownership structure.C)Lack of innovation and competition.D)Lack of runway and terminal capacity.Questions 5 to 7 are based on the news report you have just heard.5. A)Report the nicotine content of their cigarettes.B)Set a limit to the production of their cigarettes.C)Take steps to reduce nicotine in their products.D)Study the effects of nicotine on young smokers.6. A)The biggest increase in nicotine content tended to be in brands young smokers like.B)Big tobacco companies were frank withtheir customers about the hazards of smoking.C)Brands which contain higher nicotine content were found to be much more popular.D)Tobacco companies refused to discuss the detailed nicotine content of their products.7. A)They promised to reduce the nicotine content in cigarettes.B)They have not fully realized the harmful effect of nicotine.C)They were not prepared to comment on the cigarette study.D)They will pay more attention to the quality of their products.Section BDirections: In this section, you will hear two long conversations. At the end of each conversation, you will hear four questions. Both the conversation and the questions will be spoken only once. After you hear a question, you must choosethe best answer from the four choices marked A), B), C)and D). Then mark the corresponding letter on Answer Sheet 1 with a single line through the centre.Questions 8 to 11 are based on the conversation you have just heard.8. A)Indonesia. B)Holland. C)Sweden.D)England.9.A)Getting a coach who can offer real help.B)Talking with her boyfriend in Dutch.C)Learning a language where it is not spoken .D)Acquiring the necessary ability to socialize .10. A)Listening language programs on the radio.B)Trying to speak it as much as one can.C)Making friends with native speakers.D)Practicing reading aloud as often as possible.11.A)It creates an environment for socializing.B)It offers various courses with credit points.C)It trains young people’s leadership abilities.D)It provides opportunities for language practice.Questions 12 to 15 are based on the conversation you have just heard.12. A)The impact of engine design on rode safety.B)The role policemen play in traffic safety.C)A sense of freedom driving gives.D)Rules and regulations for driving.13. A)Make cars with automatic control.B)Make cars that have better brakes.C)Make cars that are less powerful.D)Make cars with higher standards.14. A)They tend to drive responsibly.B)They like to go at high speed.C)They keep within speed limits.D)They follow traffic rules closely.15.A)It is a bad idea. B)It isnot useful.C)It is as effective as speed bumps . D)It should be combined with education.Section CDirections:In this section, you will hear three passages. At the end of each passage, you will hear three or four questions. Both the passage and the questions will be spoken only once. After you hear a question, you must choose the best answer from the four choices marked A), B), C)and D).Then mark the corresponding letter on Answer Sheet 1 with a single line through the centre.Questions 16 to 18 are based on the passage you have just heard.16.A)The card got damaged . B)The card was found invalid.C)The card reader failed to do the scanning.D)The card reader broke down unexpectedly.17. A)By converting the credit card with a layer of plastic.B)By calling the credit card company for confirmation.C)By seeking help from the card reader maker Verifone.D)By typing the credit card number into the cash register.18.A)Affect the sales of high-tech appliances.B)Change the life style of many Americans.C)Give birth to many new technological inventions.D)Produce many low-tech fixes for high-tech failures.Questions 19 to 21 are based on the passage you have just heard.19. A)They are set by the dean of the graduate school.B)They are determined by the advising board.C)They leave much room for improvement.D)They vary among different departments.20. A)By consulting the examining committee .B)By reading the Bulletin of Information.C)By contacting the departmental office.D)By visiting the university’s website.21. A)They specify the number of credits students must earn.B)They are harder to meet than those for undergraduates.C)They have to be approved by the examining committee.D)They are the same among various divisions of the university.Questions 22 to 25 are based on the passage you have just heard.22. A)Students majoring in nutrition.B)Students in health classes.C)Ph.D. candidates in dieting.D)Middle and high school teachers.23. A)Its overestimate of the effect of dieting.B)Its mistaken conception of nutrition.C)Its changing criteria for beauty.D)Its overemphasis on thinness.24. A)To illustrate her point that beauty is but skin deep.B)To demonstrate the magic effect of dieting on women.C)To explain how computer images can be misleading.D)To prove that technology has impacted our culture.25. A)To persuade girls to stop dieting.B)To promote her own concept of beauty.C)To establish an emotional connection with students.D)To help students rid themselves of bad living habits.Section ADirections:In this section, thereis a passage with ten blanks.You are required to select one word for each blank from a list of choices given in a word bank following the passage. Read the passage through carefully before making your choices. Each choice in the bank is identified by a letter. Please mark the corresponding letter for each item on Answer Sheet 2with a single line through the centre. You may not use any of the words in the bank more than once.Contrary to popular belief, older people generally do not want to live with their children. Moreover, most adult children 26 every bit as much care and support to their aging parents as was the case in the “good old days”, and mostolder people do not feel 27 .About 80% of people 65years and older have living children, and about 90% of them have 28 contact with their children. About 75% of elderly parents who don’t go to nursing homes live within 30 minutes of at least one of their children.However, 29 having contact with children does not guarantee happiness in old age. In fact, some research has found that people who are most involved with their families have the lowest spirits. This research may be 30 , however, as ill health often makes older people more 31 and thereby increases contact with family members. So it is more likely that poor health, not just family involvement, 32 spirits.Increasingly, researchers have begun to look at the quality of relationships, rather than at the frequency of contact, between the elderly and their children. If parents and children share interests and values and agree on childrearingpractices and religious 33 , they are likely to enjoy each other’s company. Disagreements on such matters can 34 cause problems. If p arents are agreed by their daughter’s divorce, dislike her new husband, and disapprove of how she is raising their grandchildren, 35 are that they are not going to enjoy her visits.Section BDirections: In this section, you are going to read a passage with ten statements attached to it. Eachstatement contains information given in one of the paragraphs. Identify the paragraph from which the information is derived. You may choose a paragraph more than once. Each paragraph is marked with a letter. Answer the questions by marking the corresponding letter on Answer Sheet 2.Could Food Shortages Bring Down Civilization?[A] For many years I have studied globalagricultural, population, environmental and economic trends and their interactions. The combined effects of those trends and the political tensions they generate point to the breakdown of governments and societies. Yet I, too, have resisted the idea that food shortages could bring down not only individual governments but also our global civilization.[B] I can no longer ignore that risk. Ourcontinuing failure to deal with the environmental declines that are underminingthe world food economy forces me to conclude that such a collapse is possible.[C] As demand for food rises faster thansupplies are growing, the resulting food-price inflation puts severe stress on the governments of many countries. Unable to buy grain or grow their own, hungry people take to the streets. Indeed, even before the steep climb in grain prices in 2008, the number of failing states was expanding. If the food situation continues to worsen, entire nations will break down at an ever increasing rate. In the 20th century the main threat to international security was superpower conflict; today it is failing states.[D] States fail when national governments canno longer provide personal security, food security and basic social services such as education and health care. When governments lose their control on power, law and order begin to disintegrate. After a point,countries can become so dangerous that food relief workers are no longer safe and their programs are halted. Failing states are of international concern because they are a source of terrorists, drugs, weapons and refugees(难民), threatening political stability everywhere.[E] The surge in world grain prices in 2007 and2008—and the threat they pose to foodsecurity——has a different, more troublingquality than the increases of the past.During the second half of the 20th century,grain prices rose dramatically several times.In 1972, for instance, the Soviets,recognizing their poor harvest early, quietlycornered the world wheat market. As aresult, wheat prices elsewhere more thandoubled, pulling rice and corn prices upwith them. But this and other price shockswere event-driven——drought in the SovietUnion, crop-shrinking heat in the U.S. CornBelt. And the rises were short-lived: prices typically returned to normal with the next harvest.[F] In contrast, the recent surge in world grainprices is trend-driven, making it unlikely to reverse without a reversal in the trends themselves. On the demand side, those trends include the ongoing addition of more than 70 million people a year, a growing number of people wanting to move up the food chain to consume highly grain-intensive meat products, and the massive diversion(转向) of U.S. grain to the production of bio-fuel.[G] As incomes rise among low-incomeconsumers, the potential for further grain consumption is huge. But that potential pales beside the never-ending demand for crop-based fuels. A fourth of this year’s U.S.grain harvest will go to fuel cars.[H] What about supply? The threeenvironmental trends——the shortage of fresh water, the loss of topsoil and the rising temperatures——are making it increasingly hard to expand the world’s grain supply fast enough to keep up with demand. Of all those trends, however, the spread of water shortages poses the most immediate threat.The biggest challenge here is irrigation, which consumes 70% the world’s fresh water.Millions of irrigation wells in many countries are now pumping water out of underground sources faster than rainfall can refill them.The result is falling water tables(地下水位) in countr ies with half the world’s people, including the three big grain producers——China, India and the U.S.[I] As water tables have fallen and irrigationwells have gone dry, China’s wheat crop, the world’s largest, has declined by 8% since it peaked at 123 million tons in 1997. But water shortages are even more worrying in India.Millions of irrigation wells have significantly lowered water tables in almost every state.[J] As the world’s food security falls to pieces, individual countries acting in their own self-interest are actually worsening the troubles of many. The trend began in 2007, when leading wheat-exporting countries such as Russia and Argentina limited or banned their exports, in hopes of increasing local food supplies and thereby bringing down domestic food prices. Vietnam banned its exports for several months for the same reason. Such moves may eliminate the fears of those living in the exporting countries, but they are creating panic in importing countries that must rely on what is then left for export.[K] In response to those restrictions, grain-importing countries are trying to nail down long-term trade agreements that would lock up future grain supplies.Food-import anxiety is even leading to new efforts by food-importing countries to buy or lease farmland in other countries. In spite of such temporary measures, soaring food prices and spreading hunger in many other countries are beginning to break down the social order.[L] Since the current world food shortage is trend-driven, the environmental trends that cause it must be reversed. We must cut carbon emissions by 80% from their 2006 levels by 2020, stabilize the world’s population at eight billion by 2040, completely remove poverty, and restore forests and soils. There is nothing new about the four objectives. Indeed, we have made substantial progress in some parts of the world on at least one of these——the distribution of family-planning services and the associated shift to smaller families.[M]For many in the development community,the four objectives were seen as positive, promoting development as long as they did not cost too much. Others saw them as politically correct and morally appropriate.Now a third and far more significant motivation presents itself: meeting these goals may be necessary to prevent the collapse of our civilization. Yet the cost we project for saving civilization would amount to less than $200 billion a year, 1/6 of current global military spending. In effect, our plan is the new security budget.36.The more recent steep climb in grain pricespartly results from the fact that more and more people want to consume meat products.37. Social order is breaking down in many countries because of food shortages.38. Rather than superpower conflict, countriesunable to cope with food shortages now constitute the main threat to world security.39. Some parts of the world have seen successful implementation of family planning.40. The author has come to agree that foodshortages could ultimately lead to the collapse of world civilization.41. Increasing water shortages prove to be thebiggest obstacle to boosting the world’s grain production.42. The cost for saving our civilization would beconsiderably less than the world’s current military spending.43. To lower domestic food prices, some countries limited or stopped their grain exports.44. Environmental problems must be solved to ease the current global food shortage.45. A quarter of this year’s American grain harvest will be used to produce bio-fuel for cars.Section CDirections: There are 2 passages in this section. Each passage is followed by some questions or unfinished statements. For each of them there are four choices marked A), B), C) and D). You should decide on the best choice and mark the corresponding letter on Answer Sheet 2 with a single line through the centre.Passage OneQuestions 46 to 50 are based on the following passage.Declining mental function is often seen as a problem of old age, but certain aspects of brain function actually begin their decline in young adulthood, a new study suggests.The study, which followed more than 2,000 healthy adults between the ages of 18 and 60, found that certain mental functions—including measures of abstract reasoning, mental speed and puzzle-solving—started to dull as early as age 27.Dips in memory, meanwhile, generallybecame apparent around age 37.On the other hand, indicators of a person’s accumulated knowledge—like performance on tests of vocabulary and general knowledge—kept improving with age, according to findings published in the journal Neurobiology of Aging.The results do not mean that young adults need to start worrying about their memories. Most people’s minds function at a high level even in their later years, according to researcher Timothy Salthouse.“These patterns suggest that some types of mental flexibility decrease relatively early in adulthood, but that the amount of knowledge one has, and the effectiveness of integrating it with one’s abilities, may increase throughout all of adulthood if there are no dise ases,” Salthouse said in a news release.The study included healthy, educated adultswho took standard tests of memory, reasoning and perception at the outset and at some point over the next seven years.The tests are designed to detect subtle (细微的) changes in mental function, and involve solving puzzles, recalling words and details from stories, and identifying patterns in collections of letters and symbols.In general, Salthouse and his colleagues found, certain aspects of cognition (认知能力) generally started to decline in the late 20s to 30s.The findings shed light on normal age-related changes in mental function, which could aid in understanding the process of dementia (痴呆), according to the researchers.“By following individuals over time,” Salthouse said, “we gain insight in cognition changes, and may possibly discover ways to slow the rate of decline.”The researchers are currently analyzing thestudy participants’ health and lifestyle to see which factors might influence age-related cognitive changes.46. What is the common view of mental function?A)It varies from person to person.B)It weakens in one’s later years.C)It gradually expands with age. D)It indicates one’s health condition.47. What does the new study find about mental functions?A)Some diseases inevitably lead to their decline.B)They reach a peak at the age of 20 for most people.C)They are closely related to physical and mental exercise.D)Some of them begin to decline when people are still young.48. What does Timothy Salthouse say about people’s minds in most cases?A)They tend to decline in people’s later years.B)Their flexibility determines one’s abilities.C)They function quite well even in old age.D)Their functioning is still a puzzle to be solved.49. Although people’s minds may function less flexibly as they age, they _____.A)may be better at solving puzzlesB)can memorize things with more easeC)may have greater facility in abstract reasoningD)can put what they have learnt into more effective use50. According to Salthouse, their study may help us_____.A)find ways to slow down our mental declineB)find ways to boost our memoriesC)understand the complex process of mental functioningD)understand the relation between physical and mental healthPassage TwoQuestions 51 to 55 are based on the following passage.The most important thing in the news last week was the rising discussion in Nashville about the educational needs of children. The shorthand(简写)educators use for this is “pre-K”—meaning instruction before kindergarten—and the big idea is to prepare 4-year-olds and even younger kids to be ready to succeed on their K-12 journey.But it gets complicated. The concept hasmultiple forms, and scholars and policymakers argue about the shape, scope and cost of the ideal program.The federal Head Start program, launched 50 years ago, has served more than 30 million children. It was based on concepts developed at Vanderbilt University’s Peabody College by Susan Gray, the legendary pioneer in early childhood education research.A new Peabody study of the Tennessee Voluntary Pre-K program reports that pre-K works, but the gains are not sustained through the third grade. It seems to me this highlights quality issues in elementary schools more than pre-K, and indicates longer-term success must connect pre-K with all the other issues related to educating a child.Pre-K is controversial. Some critics say it is a luxury and shouldn’t be free to families able to pay. Pre-K advocates insist it is proven and will succeed if integrated with the rest of the child’sschooling. I lean toward the latter view.This is, in any case, the right conversation to be having now as Mayor Megan Barry takes office. She was the first candidate to speak out for strong pre-K programming. The important thing is for all of us to keep in mind the real goal and the longer, bigger picture.The weight of the evidence is on the side of pre-K that early intervention (干预)works. What government has not yet found is the political will to put that understanding into full practice with a sequence of smart schooling that provides the early foundation.For this purpose, our schools need both the talent and the organization to educate each child who arrives at the schoolhouse door. Some show up ready, but many do not at this critical time when young brains are developing rapidly.51.What does the author say about pre-kindergarten education?A)It should cater to the needs of individual children.B)It is e ssential to a person’s future academic success.C)Scholars and policymakers have different opinions about it.D)Parents regard it as the first phase of children’s development.52.What does the new Peabody study find?A)Pre-K achievements usually do not last long.B)The third grade marks a new phase of learning.C)The third grade is critical to children’s development.D)Quality has not been the top concern of pre-K programs.53.When does the author think pre-K works the best?A)When it is accessible to kids of all families.B)When it is made part of kids’ education.C)When it is no longer considered a luxury.D)When it is made fun and enjoyable to kids.54.What do we learn about Mayor Megan Barry?A)She knows the real goal of education.B)She is a mayor of insight and vision.C)She has once run a pre-K program.D)She is a firm supporter of pre-K.55.What does the author think is critical to kids’ education?A)Teaching method. B)Kids’ interest.C)Early intervention.D)Parents’ involvement.Directions:For this part, youare allowed 30 minutes totranslate a passage fromChinese into English. Youshould write your answer on Answer Sheet 2.在山东省潍坊市,风筝不仅仅是玩具,而且还是这座城市文化的标志。
高二英语机器学习单选题50题1.Machine learning is a field of study that focuses on the development of algorithms that can learn from ___.A.dataB.experienceC.intuitionD.opinion答案:A。
本题主要考查机器学习的基本概念。
机器学习是通过数据进行学习的,选项A“data”符合题意。
选项B“experience”通常指人的经验,机器学习主要依据数据而非人的经验。
选项C“intuition”是直觉,机器学习是基于数据和算法的,不是直觉。
选项D“opinion”是观点,机器学习不是基于观点进行学习。
2.The main goal of machine learning is to ___.A.predict future eventsB.create new algorithmsC.solve complex equationsD.store large amounts of data答案:A。
机器学习的主要目标是根据已有数据预测未来事件,选项 A 正确。
选项B“create new algorithms”不是机器学习的主要目标,虽然在研究中可能会产生新算法,但不是主要目的。
选项C“solve complex equations”是数学等领域的任务,不是机器学习的主要目标。
选项D“store large amounts of data”只是存储数据,不是机器学习的目标。
3.Machine learning algorithms can be used in ___.A.image recognitionB.math calculationsC.physical experimentsD.literary analysis答案:A。
机器学习算法可以用于图像识别,选项A 正确。
Filtering Frequent Spatial Patterns withQualitative Spatial ReasoningVania Bogorny a Bart Moelans a Luis Otavio Alvares aba Theoretical Computer Science Group,Hasselt University&Transnational University of Limburg,Belgiumb Instituto de Inform´a tica–Universidade Federal do Rio Grande do Sul,BrazilThe full version of this paper appeared in the2007IEEE Workshop Proceedings of the23rdInternational Conference on Data Engineering(ICDE2007)[6]1IntroductionThe huge amount of patterns generated by frequent pattern mining algorithms has been extensively ad-dressed in the last few years.Different measures have been proposed to evaluate how interesting association rules are.However,according to[3]it is difficult to come up with a single metric that quantifies the“inter-estingness”or“goodness”of an association rule.In most approaches,non-interesting rules are eliminated during the rule generation,i.e.,a posteriori,when frequent itemsets have already been generated.In spatial frequent pattern mining the number of non-interesting association rules can increase even fur-ther than in transactional pattern mining.Geographic data have semantic dependencies and spatial properties which in many cases are well known and non-interesting for data mining.In the geographic domain,not only well known geographic dependencies(e.g.,is gasStation→touches street)generate a large number of patterns without novel and useful knowledge.Different spatial predicates may contain the same geographic object type.In transactional frequent pattern mining items have binary values,and do either participate or not in a transaction.For instance,the item milk is either present or not in a transaction t.In spatial frequent pattern mining the same spatial object(item)may have different qualitative spatial relationships with the target feature(transaction),and by consequence participate more then once in the same transaction t.For instance,a city C may contain an instance i1of river,be crossed by an instance i2of river,or even touch an instance i3of river.Different spatial relationships with the same geographic object type will generate associations such as contains River→touches River when data are considered at general granularity levels[8].It is well known that a city does not touch a river because it also contains a river.Such kind of rule is non-interesting for most applications.An interesting association rule would be the combination of any of these two predicates with a different geographic ob-ject type or some non-spatial attribute.For instance:contains River→W aterP ollution=high or touches River→exportationRate=high.In[5],Apriori-KC is proposed.This method made some changes on Apriori[1]to eliminate well known geographic patterns using background knowledge.In[4]this method is extended with an additional step where not only frequent itemsets are reduced,but input space is reduced as much as possible,since this is still the most efficient way for pruning frequent patterns.In[7]the closed frequent pattern mining approach is applied to the geographic domain,eliminating both well known dependencies and redundant frequent itemsets.In this paper we extended the method presented in[5]to reduce the number of non-interesting spatial patterns using qualitative spatial reasoning.Not only the data is considered,but its semantics is taken into account,and frequent patterns that contain the same feature type are removed a priori.The method proposed in this paper is more effective and efficient then most other methods which perform pruning after the rule generation,since our method explores the anti-monotone constraint of Apriori[1] and prunes non-interesting patterns during the frequent set generation.Other transactional pattern mining(a)Frequent Geographic Pattern with Apriori,Apriori-KC,and Apriori-KC+(b)Computational Time to Generate Frequent Geographic Patterns with Apriori,Apriori-KC,and Apriori-KC+Figure 1:Experiments on real dataapproaches such as [2,9]remove redundant frequent patterns and association rules exploiting support and confidence constraints,while our method is independent of such thresholds.In these approaches “redundant”rules are eliminated by frequent itemset pruning,but “non-interesting”and “meaningless”rules are still generated.In Figure 1(a)it is clear that there is a reduction of the number of frequent patterns,and in Figure 1(b)the efficiency of the method is confirmed.AcknowledgementsThis research has been funded by the Brazilian agency CAPES,the European Union (FP6-IST-FET pro-gramme,Project n.FP6-14915,GeoPKDD:Geographic Privacy-Aware Knowledge Discovery and Deliv-ery),and by the Research Foundation Flanders (FWO-Vlaanderen),Research Project G.0344.05.References[1]Rakesh Agrawal and Ramakrishnan Srikant.Fast algorithms for mining association rules in largedatabases.In VLDB ,pages 487–499,1994.[2]Yves Bastide,Nicolas Pasquier,Rafik Taouil,Gerd Stumme,and LotfiLakhal.Mining minimal non-redundant association rules using frequent closed itemsets.In CL ’00:Proceedings of the First Interna-tional Conference on Computational Logic ,pages 972–986,London,UK,2000.Springer-Verlag.[3]Roberto J.Bayardo and Rakesh Agrawal.Mining the most interesting rules.In KDD ,pages 145–154,1999.[4]Vania Bogorny,Sandro Camargo,Paulo Engel,and Luis Otavio Alvares.Mining frequent geographicpatterns with knowledge constraints.In ACM-GIS ,pages 139–146.ACM,2006.[5]Vania Bogorny,Sandro Camargo,Paulo Engel,and Luis Otavio Alvares.Towards elimination of wellknown patterns in spatial association rule mining.In IS ,pages 532–537.IEEE Computer Society,2006.[6]Vania Bogorny,Bart Moelans,and Luis Otavio Alvares.Filtering frequent spatial patterns with qual-itative spatial reasoning.In Proceedings of the 22nd International Conference on Data Engineering Workshops,ICDE 2007,15–16&20April 2007,Istanbul,Turkey ,pages 527–535,2007.[7]Vania Bogorny,Joao Valiati,Sandro Camargo,Paulo Engel,Bart Kuijpers,and Luis Otavio Alvares.Mining maximal generalized frequent geographic patterns with knowledge constraints.In ICDM ,pages 813–817.IEEE Computer Society,2006.[8]Jiawei Han.Mining knowledge at multiple concept levels.In CIKM ,pages 19–24.ACM,1995.[9]Mohammed J.Zaki.Generating non-redundant association rules.In KDD ’00:Proceedings of the sixthACM SIGKDD international conference on Knowledge discovery and data mining ,pages 34–43,New York,NY ,USA,2000.ACM Press.。
学术英语写作智慧树知到课后章节答案2023年下天津外国语大学天津外国语大学第一章测试1.What steps are involved in writing? ()A:PrewritingB:Revising and editingC:Writing the first draftD:Rewriting答案:Prewriting;Revising and editing;Writing the first draft2.The objectives of the course include ()A:To improve students’ thinking ability via choosing a topic, sourcinginformation, writing different parts of a project paper including introduction, literature review, methods, findings, discussions and conclusionB:To improve students’ writing and reading proficiency in EnglishC:To improve students’ capability of carrying out research by going through the process of writing a project paperD:To improve students’ cultural competence through learning western ways of thinking in English writing答案:To improve students’ thinking ability via choosing a topic,sourcing information, writing different parts of a project paperincluding introduction, literature review, methods, findings,discussions and conclusion;To im prove students’ writing and reading proficiency in English;To improve students’ capability of carrying out research by goingthrough the process of writing a project paper;To improve students’ cultural competence through learning westernways of thinking in English writing3.In freewriting, the writer needs to make everything perfect.()A:错 B:对答案:错4.The relationship between reading and writing can be described as()A:When you start learning writing, you will read differently.B:Reading is the input, while writing is the output of language.C:When you start learning writing, you will improve the awareness ofaccumulating language, information and idea for various topics, which will consequently change the way you read.D:Reading and writing are reciprocal activities; the outcome of a readingactivity can serve as input for writing, and writing can lead a student tofurther reading resources.答案:When you start learning writing, you will read differently.;Reading is the input, while writing is the output of language.;When you start learning writing, you will improve the awareness ofaccumulating language, information and idea for various topics, which will consequently change the way you read.;Reading and writing are reciprocal activities; the outcome of a reading activity can serve as input for writing, and writing can lead a student to further reading resources.5.The functions of writing include ()A:The discipline of writing will strengthen your skills as a reader and listener B:Writing will make you a strong thinker and writing allows you to organize your thoughts in clear and logical waysC:Writing encourages you to seek worthwhile questionsD:Writing helps you refine and enrich your ideas based on feedback fromreaders答案:The discipline of writing will strengthen your skills as a readerand listener;Writing will make you a strong thinker and writing allows you toorganize your thoughts in clear and logical ways;Writing encourages you to seek worthwhile questions;Writing helps you refine and enrich your ideas based on feedback fromreaders第二章测试1. A thesis statement should be one single sentence, including only one idea. ( )A:对 B:错答案:对2.An essay should include at least three specific details to be well-developed. ( )A:对 B:错答案:错3.The topic sentence for each body paragraph should be fairly specific becauseeach body paragraph deals with only one of the many things stated in thethesis statement. ( )A:错 B:对答案:对4.Living with my ex-roommate was unbearable. First, she thought everythingshe owned was the best. Secondly, she possessed numerators filthy habits.Finally, she constantly exhibited immature behavior.Read this paragraph and decide which statement is NOT TRUE? ()。
2018年12⽉⼤学英语六级考试真题答案与详解⼀~2018年12⽉⼤学英语六级考试真题答案与详解⼀梦想不会辜负每⼀个努⼒的⼈(第3套)Part I Writing本篇话题是“如何平衡学业和课外活动”,这与考⽣的学习⽣活密切相关。
在具体⾏⽂⽅⾯,考⽣可以开篇引出话题;然后针对这⼀问题提出⾃⼰的建议;最后总结全⽂,重述论点或升华主题。
旳厝作提纲'—、引出话题:适当地参加课⼣⼘活动不仅能促进学习,⽽且能提⾼综合能⼒(promote academic studyenhance our overallactivities more rationally)如何平衡学业和课外活动许多学⽣和他们的⽗母担⼼花时间参加课外活动会妨碍学习,这是可以理解的。
但在我看来,只要我们能在两者之间取得平衡,适当地参加课外活动不仅能促进学习'⽽且能提⾼我们的综合能⼒。
⾸先,合理安排课业并⾼效地完成是明智之举,因为只有这样,我们才能分配出额外的时间和精⼒参加课外201⾉12/6(第3套)梦想不会辜负每⼀个努⼒的⼈活珈并且不会对我们的学习产⽣负⾯影响。
其次,应该只将时间花在我们想参加的活动上,因为它们会给我们带来乐趣,并在⼀定程度上缓解学习压⼒。
第三,我们也可以参加⼀些俱乐部,在那⾥可以遇到志同遣合的⼈,并会以1种直接对我们的学业有利的⽅式提升我们的能⼒。
总⽽⾔之,只有通过更⾼效地学习和更合理地安排业余活动,我们才能获得学业与课外活动之间的真正平衡。
主题词汇excessive 过度的in hot water 陷⼊困境under the spotlight 备受关注disapprove of 不赞同for the sake of...为了..............not to mention..,更不⽤说...............trigger 引发transform...to...把..............转换成.........pave the way for...为..............铺平道路Part HI Reading ComprehensionSection A在我曾写过的可能最为疯狂的新闻报道中,我曾报道过科学家在⽜屁股上画眼睛的实验,这是牲畜保护⽅⾯所取得的(26)进步。
2018年上半年中小学教师资格考试英语学科知识与教学能力试题(高级中学)(精选)一、单项选择题(本大题共30小题,每小题2分,共60分)1. The sound of "ch" in "teacher" is__________.A. voiceless, post-alveolar, and affricativeB. voiceless, dental, and fricativeC. voiced, dental, and fricativeD. voiced, post-alveolar, and plosive2. The main difference between/m/,/n/, and/η/lies in__________.A. manner of articulationB. sound durationC. place of articulationD. voicing3. She is __________ , from her recording, the diaries of Simon Forman.A. transcribingB. keepingC. paraphrasingD. recollecting4. Neither the unpleasant experiences nor the bad luck __________ him discouraged.A. have causedB. has causedC. has madeD. have made5. Mr. Joe has worked very hard in the past two years and has paid all his debts__________the last penny.A. byB. toC. untilD. with6. The message came to the villagers __________ the enemy had already fled the village.A. whichB. whoC. thatD. where7. We must improve the farming method __________we may get high yields.A. in caseB. in order thatC. now thatD. even if8.--Do you mind if I smoke here?--__________.A. Yes, I don'tB. Yes, you mayC. No, not at allD. Yes,I won’t9. What is the main rhetoric device used in “The plowman homeward plods his weary way.”?A. Metaphor.B. Metonymy.C. Synecdoche.D.Transferred epithet.10.--A:Let's go to the movie tonight.--B:I'd like to. but I have to study for an exam.In the conversation above, B's decline of the proposal is categorized as a kindof__________.A. illocutionary actB. perlocutionary actC. propositional conditionD. sincerity condition11. Which of the following activities is NOT typical of the Task-Based Language Teaching method?A. Problem-solving activities.B. Opinion exchange activities.C. Information-gap activities.D. Pattern practice activities.12. If a teacher shows students how to do an activity before they start doing it, he/sheis using the technique of__________.A. presentationB. demonstrationC. elicitationD. evaluation13. When a teacher asks students to discuss how a text is organized,he/she is most likely to help them__________.A. evaluate the content of the textB. analyze the structure of the passageC. understand the intention of the writerD. distinguish the facts from the opinions14. Which of the following practices can encourage students to read an article critically?A. Evaluating its point of view.B. Finding out the facts.C. Finding detailed information.D. Doing translation exercises.15. Which of the following is a display question used by teachers in class?A. If you were the girl in the story,would you behave like her?B. Do you like this story Girl the Thumb ,why or why not?C. Do you agree that the girl was a kind-hearted person?D. What happened to the girl at the end of the story?16. Which of the following would a teacher encourage students to do in order to develop their cognitive strategies?A. To make a study plan.B. To summarize a story.C. To read a text aloud.D. To do pattern drills.17. Which of the following exercises would a teacher most probably use if he/she wants to help students de-velop discourse competence?A. Paraphrasing sentences.B. Translating sentences.C. Unscrambling sentences.D. Transforming sentences.18. The advantages of pair and group work include all of the following EXCEPT__________.A. interaction with peersB. variety and dynamismC. an increase in language practiceD. opportunities to guarantee accuracy19. Which of the following should a teacher avoid when his/her focus is on developing students' ability to use words appropriately?A. Teaching both the spoken and written form.B. Teaching words in context and giving examples.C. Presenting the form, meaning, and use of a word.D. Asking students to memorize bilingual word lists.20. Which of the following practices is most likely to encourage students' cooperation in learning?A. Doing a project.B. Having a dictationC. Taking a test.D. Copying a text.阅读Passage 1,完成第21~25小题。
USING VIEWPOINTS, FRAMEWORKS, AND DOMAIN-SPECIFIC LANGUAGESTO ENHANCE SOFTWARE REUSESergio Crespo, Marcus Felipe M. C. da Fontoura, andCarlos José P. de LucenaDepartamento de Informática, Pontifícia Universidade Católica do Rio de Janeiro Rua Marquês de São Vicente, 225, 22453-900 Rio de Janeiro, Brazile-mail: [crespo, mafe, lucena]@inf.puc-rio.brSUMMARYCase studies have shown that high levels of software reuse can be achieved through the use of object-oriented frameworks. This paper describes a viewpoint-based design and instantiation method for framework development. This method uses the concept of viewpoints and the hot-spot relation in object-oriented design to guide the designer on the identification of hot-spots in the structure of the framework. Domain-specific languages are used to help the framework instantiation. A method supporting environment is also presented. Software design through the use of this environment appears to be a very successful approach from the point of view of component reuse.KEY WORDS: object-oriented frameworks, viewpoints, domain-specific languages, framework design and instantiation, domain-oriented software development.1. INTRODUCTIONThere are various application areas that are not established yet and for which new ideas and models are presently under development and evaluation. These are domains in which the possibility of rapidly building new applications is essential and strategic from a practical point of view. Examples of application domains that can be classified in this category include Web-based Education, electronic commerce, biology, and financial market. An interesting strategy for developing new applications for these domains is the development of object-oriented application frameworks [10, 13]. Frameworks can reduce the costs of developing applications since it allows designers and implementers to reuse their previous experience with problem solving at the design and code levels. An object-oriented framework captures the common aspects of a family of applications, which can be seen as a domain theory [19].Although object-oriented software development has experienced the benefits of using frameworks, a thorough understanding of how to identify, design, implement, and change them to meet evolving requirement needs is still object of research [24]. Techniques such as design patterns [10, 17], meta-object protocol [15], refactoring [14], and class reorganization [3] have been proposed to support framework development and cope with some of the challenges.Therefore, framework development is very expensive not only because of the intrinsic difficulty related to capturing the domain theory but also because of the lack of appropriate methods and techniques to support framework specification and design.2. FRAMEWORK DESIGN AND INSTANTIATION METHODIn [19] Roberts and Johnson state that “Developing reusable frameworks cannot occur by simply sitting down and thinking about the problem domain. No one has the insight to come up with the proper abstractions.” They propose the development of concrete examples in order to understand the domain. The design strategy presented here is quite similar, analyzing concrete applications as viewpoints [8, 6] of a domain and deriving the final framework from the analysis of these viewpoints.The approach to framework design is based on the idea that any design can be divided into two parts: the kernel sub-system design and the hot-spot sub-system design. The kernel sub-system design is common to all the applications that the framework may generate, and the hot-spot sub-system design describes the different characteristics of each application that can be supported by the framework. The hot-spot sub-system uses the method and the information provided by the kernel sub-system and may extend it.The kernel structure is defined by analyzing the viewpoints design representations to produce a resulting design representation that reflects a structure that is common to all chosen viewpoints. This part of the design approach is based on a domain-dependent semantic analysis of the design diagrams to elicit the common features of the framework design structure.The elements that are not in the kernel are the ones that vary for each application, and depend on the use of the framework. These elements define the framework hot-spots [17, 20] that must be adapted to each related application. A new relationship in object-oriented design, called the hot-spot relationship, has been defined to specify all the hot-spots in the system.The semantics of this new relationship is given by the design patterns essentials [18]. This implies that the hot-spot relationship is in fact a meta-relationship that is implemented through a design pattern that is generated taking into account the hot-spot flexibility requirements. The hot-spot cards guides this generation process, providing a systematic way for generating design patterns based on flexibility properties.The most common way to instantiate a framework is to inherit from some abstract classes defined in the framework hierarchy and write the code that is called by the framework itself. However, it is not always easy to identify which code and where it should be written since frameworks class hierarchies can be very complex, especially for non-expert users.Therefore, the “common” instantiation process is not always the ideal way to describe a particular application. This happens because of several facts:•the language used to build and use the framework is a wide spectrum language, where it is not always easy to express the user intentions;•the complexity of framework class hierarchies and the difficulty of finding the points where the code should be written, that are the framework hot-spots or flexible points.We propose a different instantiation process [2], where a particular application is described through domain-specific languages (DSLs) [12] that are designed precisely for the framework domain. The technique basic idea is to capture domain concepts in DSLs, which will help the framework user to build code in an easier way, without worrying about implementation decisions and remaining focused on the problem domain. The specification written in the DSLs are translated to the framework instantiation code through the use of transformational systems [16, 4].3. SUPPORTING ENVIRONMENTA supporting environment that is used in the design and instantiation phases has been fully implemented (Figure1). The environment helps all the method’ steps:1.Viewpoint definition: the users define the domain relevant viewpoints through the specification of OMTclass diagrams;2.Viewpoint unification: the environment applies the unification rules [9] to generate an abstract frameworkfrom the selected set of viewpoints;3.Flexibility properties definition: flexibility requirements are defined to generate the appropriate designpatterns that will compose the hot-spot sub-system structure;4.DSLs definition: the definition of the syntax (in a BNF like notation) and semantics (through thespecification of transformation rules) of the domain-specific languages used in the framework instantiation process. The Draco-PUC transformational system is used to perform the transformations and generate the framework instantiation code.Through the use of this environment a framework for Web-based education (WBE), called ALADIN, was generated [5]. We used various existing WBE applications as viewpoints and derived the appropriate abstractions through the viewpoint unification process [1]. Two domain-specific languages were defined to instantiate the ALADIN framework: an educational language and a navigational language.So far our studies show that the ALADIN framework can speed-up the development of WBE applications at least by a factor of 3. We are currently working in the electronic commerce domain to improve our investigation on the method (and on its supporting environment) completeness and effectiveness.4. CONCLUSIONSThis paper shows how viewpoints, and particularly, the hot-spot relationship can be used as the main design driving force to reusable framework construction. The method provides mechanisms that help the framework developer in the identification of the kernel structure and the flexible parts.The hot-spot card is a feature that can help us to bridge the gap between frameworks and design patterns, a still challenging issue [18]. The unification step is responsible for harmonically combining the various viewpoints. The use of the hot-spot relationship in this step helps the definition of the framework kernel and its hot-spots, introducing the necessary flexibility requirements.Figure 1. Method supporting environmentAnother important characteristic of the method is that it is a complementary approach that can be combined to other techniques such as refactoring and OOADM. The method support environment presented in Section 3 uses the refactoring algorithms presented in [14] for the viewpoint unification process and the OMT notation for the viewpoint specification.In [9] we present other examples of viewpoint unification and hot-spot instantiation. The method is currently completely formalized [9], based on the Z specification language [21, 22], object calculus [7], and category theory [11]. Through this formalization we could prove some important properties such as loose coupling.In [1, 5] we present a case study for developing a framework for Web-based education. This framework is now fully implemented by the Software Engineering Laboratory, LES, of the Computer Science Department at PUC-Rio, Catholic University. An example of an application generated by this framework is the AulaNet TM environment (http://www.les.inf.puc-rio.br/aulanet)REFERENCES1.P. Alencar, D. Cowan, S. Crespo, M. F. Fontoura, and C. J. Lucena, “Using Viewpoints to Derive aConceptual Model for Web-Based Education Environments”, MCC17/98, Monografias em Ciência da Computação, Departamento de Informática, PUC-Rio, 1998 (also submitted to Journal of Systems and Software).2. C. Braga, M. F. Fontoura, C. J. Lucena, and L. M. Moura, “Using Domain Specific Languages to InstantiateOO Frameworks”, MCC28/98, Monografias em Ciência da Computação, Departamento de Informática, PUC-Rio, 1998 (also submitted to IEE Proceedings – Software Engineering ).3. E. Casais, “An incremental class reorganization approach”, In ECOOP’92 Proceedings, volume 615 ofLecture Notes in Computer Science, pages 114-132, 1992.4.J. Cordy and I. Carmichael, “The TXL Programming Language Syntax and Informal Semantics”, TechnicalReport, Queen’s University at Kinkston, Canada, 1993.5.S. Crespo, M. F. Fontoura, C. J. Lucena, and L. M. Moura, “ALADIN: An Architecture for LearningwareApplications Design and Instantiation”, MCC34/98, Monografias em Ciência da Computação, Departamento de Informática, PUC-Rio, 1998 (also submitted to WWW Journal).6.J. Derrick, H. Bowman, and M. Steen, “Viewpoints and Objects”, Technical Report, Computing Laboratory,University of Kent, Canterbury, UK, 1997.7.J. Fiadeiro and T. Maibaum, “Sometimes ‘Tomorrow’ is ‘Sometime’”, Temporal Logic, volume 827 ofLecture Notes in Artificial Intelligence, pages 48-66, Springer-Verlag, 1994.8. A. Finkelstein and I. Sommerville, “The viwepoints faq” Software Engineering Journal, 11(1):2-4, January1996.9.M. F. Fontoura, E. H. Hermann, and C. J. Lucena, “A Framework Design and Instantiation Method”,MCC35/98, Monografias em Ciência da Computação, Departamento de Informática, PUC-Rio, 1998.10. E. Gamma, R. Helm, R. E. Johnson, and J. Vlissides, “Design Patterns, Elements of Reusable Object-Oriented Software”, Addison-Wesley, 1995.11.R. Goldblatt, “Topoi – The Categorial Analysis of Logic”, North-Holland Publishing Company, 1979.12.P. Hudak, “Building Domain-Specific Embedded Languages”, Computing Surveys, 28A(4), ACM, 1996.13.R. Johnson, “Frameworks = (Components + Patterns)”, Communications of the ACM, Volume 40, Number10, October 1997.14.R. Johnson and W. F. Opdyke, “Refactoring and aggregation”, In Object Technologies for AdvancedSoftware, First JSSST International Symposium, volume 742 of Lecture Notes in Computer Science, pages 264-278, Springer-Verlag, 1993.15.G. Kiczales, J. des Rivieres, and D. G. Bobrow, “The Art of Mataobject Protocol”, MIT Press, 1991.16.J. C. S. P. Leite, M. Sant’anna, and F. G. Freitas, “Draco-PUC: a Technology Assembly for DomainOriented Software Development”; Proceedings of the 3rd IEEE International Conference of Software Reuse, 1994.17.W. Pree, “Design Patterns for Object-Oriented Software Development”, Addison-Wesley, 1995.18.W. Pree, “Framework Patterns”, Sigs Management Briefings, 1996.19. D. Roberts and R. Johnson, “Evolving Frameworks: A Pattern Language for Developing Object-OrientedFrameworks” in “Pattern Languages of Program Design 3”, Addison-Wesley, 1997.20.H. A. Schmid, “Systematic Framework Design by Generalization”, Communications of the ACM, Volume40, Number 10, October 199721.J. M. Spivey, “Understanding Z: A Specification Language and Formal Semantics”, Cambridge UniversityPress, 1988.22.J. M. Spivey, “The Z Notation: A Reference Manual”, Prentice Hall, Hemel Hempstead, 1989.23.W. Turski and T. S. E Maibaum, “The Specification of Computer Programs”, Addison-Wesley PublishingCompany, 1987.24.R. Wirfs-Brock and R. Johnson, “Surveying current research in object-oriented design”, Communications ofthe ACM, 33(9), 1990.。
大学的英语a级考试作文范文篇 1Over the years, the college English Class A examination has presented various types of composition topics. One common type is the chart description, where students are required to carefully observe and analyze the data presented in the chart and then describe the trends and patterns accurately. This demands a clear understanding of the chart and the ability to express the information logically! Another type is the opinion discussion. Here, students need to present their own viewpoints on a given topic, supported by solid reasons and examples. Isn't it challenging to form a coherent and persuasive argument? The problem-solving type is also frequent. It requires students to identify the problem, analyze its causes, and propose feasible solutions. How crucial it is to have a systematic thinking approach! In conclusion, these common types of composition topics in the college English Class A examination all test students' language proficiency, logical thinking, and the ability to express themselves clearly. So, we must be well-prepared and equipped with the necessary skills to handle them effectively!篇 2When it comes to the excellent model essays for College English Test Level A, there are several remarkable writing techniques and strategies that can be discovered. Firstly, a reasonable paragraph layout is of utmost significance! A well-structured essay makes it easy for readers to follow the author's train of thought. For instance, an introduction to catch the readers' attention, followed by detailed body paragraphs presenting arguments, and a conclusion to summarize and emphasize the main points. How crucial it is!Secondly, clear logical expression is indispensable. Every sentence andparagraph should be coherently connected, avoiding confusion and ambiguity. Why is this so important? Because it ensures that the message is conveyed accurately and effectively.Last but not least, the appropriate use of vocabulary plays a vital role. Rich and precise words can enhance the expressiveness and vividness of the essay. Isn't it amazing how a single word can make a big difference?In conclusion, mastering these writing skills and strategies is essential for achieving high scores in the College English Test Level A. Let's keep practicing and improving!篇 3Oh dear students! Let's talk about the scoring standards of the College English Class A Examination composition and how to achieve a high score. Firstly, avoiding grammar mistakes is crucial! How many times have we lost marks just because of a simple tense error or a wrong word form? So, always double-check your grammar. Secondly, the content must be substantial. Don't just write a few simple sentences and call it a composition. Provide detailed examples, explanations, and arguments to make your point clear. For instance, if you're writing about a topic like "The Importance of Reading", don't just say "Reading is important", but elaborate on how it enriches our knowledge and broadens our horizons. Moreover, language coherence is essential. Make sure your paragraphs and sentences flow smoothly. Use proper transitional words and phrases like "however", "therefore", "in addition" to connect your ideas. So, dear friends, if we can pay attention to these aspects, avoid grammar mistakes, enrich the content and ensure language coherence, surely we can get a high score in the exam! How exciting it would be to achieve that!篇 4Oh dear friends, let me share with you my precious experiences in preparing for the composition part of the College English Level A Examination! It's truly a challenging but rewarding journey.First and foremost, regular writing practice is indispensable. Set aside some time every day to write on various topics. This helps enhance your language expression and logical thinking. For instance, you could describe your daily life, express your opinions on current events, or imagine a fictional story.Secondly, the skill of reciting model essays is of great significance. By memorizing excellent examples, you can absorb the essence of good writing and internalize the structure and language usage. But remember, don't just copy blindly. Analyze and understand why these essays are outstanding.Also, the flexible application of templates is a key factor. Templates can provide a framework for your writing, but don't let them restrict your creativity. Adapt and modify them according to the specific topic and your own thoughts.Most importantly, persistence and accumulation are the cornerstones of success. Don't expect immediate improvement. Keep practicing, keep learning, and you will surely see remarkable progress in your writing ability. So, dear friends, let's embark on this journey with determination and enthusiasm!篇 5University English Class A Examination essays play a crucial role in our academic journey. But have you ever wondered how they relate to real-life applications? Let's take a look! In the fierce job market, a well-written essay in the exam showcases our language proficiency and logical thinking skills. It can make our resume stand out among countless others and increase our chances of landing that dream job. Forinstance, when writing a cover letter, the ability to organize ideas clearly and express them precisely, which we acquire through exam practice, is invaluable. Moreover, in academic exchanges, whether it's presenting research findings or participating in international conferences, the writing knowledge gained from the exams enables us to communicate effectively with scholars from all over the world. Don't you think it's amazing how what we learn in exams can have such a significant impact on our practical lives? So, let's attach great importance to the practicality of language and make the most of what we gain from these exams!。
上半年《英语知识与教学能力》(高中)试题(附答案).第 1 题 (单项选择题)(每题 2.00 分) > 单项选择题 >Neither the unpleasant experiences nor the bad luck __________ himdiscouraged.{A}. have caused{B}. has caused{C}. has made{D}. have made正确答案:C,第 2 题 (单项选择题)(每题 2.00 分) > 单项选择题 >Mr. Joe has worked very hard in the past two years and has paid all his debts__________the last penny.{A}. by{B}. to{C}. until{D}. with正确答案:B,第 3 题 (单项选择题)(每题 2.00 分) > 单项选择题 >The message came to the villagers __________ the enemy had already fledthe village.{A}. which{B}. who{C}. that{D}. where正确答案:C,第 4 题 (单项选择题)(每题 2.00 分) > 单项选择题 >We must improve the farming method __________we may get high yields.{A}. in case{B}. in order that{C}. now that{D}. even if正确答案:B,第 5 题 (单项选择题)(每题 2.00 分) > 单项选择题 >--Do you mind if I smoke here?--__________.{A}. Yes, I don't{B}. Yes, you may{C}. No, not at all{D}. Yes,I won’t正确答案:C,第 6 题 (单项选择题)(每题 2.00 分) > 单项选择题 >She is __________ , from her recording, the diaries of Simon Forman.{A}. transcribing{B}. keeping{C}. paraphrasing{D}. recollecting正确答案:A,第 7 题 (单项选择题)(每题 2.00 分) > 单项选择题 >The main difference between/m/,/n/, and/η/lies in__________. {A}. manner of articulation{B}. sound duration{C}. place of articulation{D}. voicing正确答案:C,第 8 题 (单项选择题)(每题 2.00 分) > 单项选择题 >The sound of\"ch\" in \"teacher\" is__________.{A}. voiceless, post-alveolar, and affricative{B}. voiceless, dental, and fricative{C}. voiced, dental, and fricative{D}. voiced, post-alveolar, and plosive正确答案:A,第 9 题 (解答题)(每题 40.00 分) > 未分类 >设计任务:请阅读下面学生信息和语言素材,设计20分钟英语阅读的教学方案。
A Frequent Pattern Based Framework for Event Detectionin Sensor Network Stream Data *Li Wan 1,2 1Beijing University of Posts and Telecommunications, Beijing, China 100876 2EBUPT Information Technology Co., Ltd, Beijing, China 100191 wanli@Jianxin Liao 1,2 1Beijing University of Posts and Telecommunications, Beijing, China 100876 2EBUPT Information Technology Co., Ltd, Beijing, China 100191 liaojx@ Xiaomin Zhu 1,21Beijing University of Posts and Telecommunications, Beijing, China100876 2EBUPT Information Technology Co., Ltd, Beijing, China 100191 zhuxm@ABSTRACTIn this paper, we presented a frequent pattern based framework for event detection in stream data, it consists of frequent pattern discovery, frequent pattern selection and modeling three phases: In the first phase, a M NOE (M ining Non-Overlapping Episode) algorithm is proposed to find the non-overlapping frequent pattern in time series. In the frequent pattern selection phase, we proposed an EGMAMC (Episode Generated Memory Aggregation Markov Chain) model to help us selecting episodes which can describe stream data significantly. Then we defined feature flows to represent the instances of discovered frequent patterns and categorized the distribution of frequent pattern instances into three categories according to the spectrum of their feature flows. At last, we proposed a clustering algorithm EDPA (Event Detection by Pattern Aggregation) to aggregate strongly correlated frequent patterns together. We argue that strongly correlated frequent patterns form events and frequent patterns in different categories can be aggregated to form different kinds of events. Experiments on real-world sensor network datasets demonstrate that the proposed M NOE algorithm is more efficient than the existing non-overlapping episode mining algorithm and EDPA performs better when the input frequent patterns are maximal, significant and non-overlapping.Categories and Subject DescriptorsH.2.8 [Information Sy s tem s]: Database M anagement-Data miningGeneral TermsAlgorithmsKeywordsFrequent pattern, temporal data mining, event detection, sensor network1.INTRODUCTIONStream data refers to object sequences annotated with ordered time stamps. In this paper, “stream data”, “temporal data” and“time series” represent the same concept. Frequent patterns aresubsequences of objects that appear in a time series with frequency no less than a user-specified threshold (i.e. minimum support). Each occurrence of a frequent pattern is an instance of this frequent pattern in the given time series.Abundant literature has been dedicated to find frequent patterns in temporal data [1,2,3,4,5,6,7]. Frequent patterns are usually extracted from time series as the features of the time series for different data mining tasks, such as clustering, classification and mining associate rules [8,9,10,11,12,13]. But the methods developed for mining closed frequent patterns often generate a huge number of patterns. People usually need interesting patterns only [14]. In time series, frequent patterns can be categorized into different categories, patterns in different category can represent different features of stream data which might be interesting patterns in different tasks. As shown in Fig.1, different categories of frequent patterns represent different relationships among frequent pattern instances (y, z axis) and their different distributions in time series (x axis). The concepts mentioned in Fig.1 are defined in section 2.There are different kinds of events, such as burst event, regular event and routine event. For example “burst event” are some events may frequently occur in a special short period of time but not or almost not occur in any other period of time. As events consist of frequent patterns, the features of events can be described by the frequent patterns associated with them. Take the time series in Fig.2 as an example, if we set minimum support as 3 and constraint the length of episode no more than 5, let A, B, C, D denote four episodes consist of 5 objects respectively (see Def. 5). It’s obverse that episode A, B, C, D are all frequent, namely appear more than 3 times respectively. But it’s clear that episode A and D don’t occur frequently in all the scope of the time series. If we split the time series into three pieces, we could say that event A and D burst in the second time frame. Episode A and D are both burst episodes and they form a burst event, namely “AD”. We can consider frequent patterns as elements form events in time series.*This work was jointly supported by: (1) National Science Fund for Distinguished Young Scholars (No. 60525110); (2) National 973 Program (No. 2007CB307100, 2007CB307103); (3) Program for New Century Excellent Talents in University (No. NCET-04-0111); (4) Development Fund Project for Electronic and Information Industry (M obile Service and Application System Based on 3G).Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SensorKDD’09, June 28, 2009, Paris, France.Copyright 2009 ACM 978-1-60558-668-7…$5.00.Different category of frequent patterns form different category of events.Fig.1 Frequent patterns are categorized by the threedimens ions. i.e. the inclus ion relations hip (maximal or non-maximal), the occurrences in time series (episode or sequential pattern) and the pos itions of ins tances in the time s eries (overlapping or non-overlapping).Fig.2 Episode “A” and “D” burst in the second piece of time,and they form a burst event “AD”.In this paper, we focus on discovering non-overlapping episode and selecting frequent pattern in different categories for detecting different kinds of events in temporal data. The contributions of this paper are as follows:z We categorize frequent patterns in time series by the relationship among their instances and the instance distribution. We defined feature flows to represent frequentpatterns and we can get the feature of frequent pattern instance distribution by analyzing the spectral gram of corresponding feature flow.z We presented MNOE algorithm to discover non-overlapping episode and defined significant non-overlapping episode.Based on the presented EGM AM C model, we induced theminimum support for discovering the defined significant non-overlapping episode (see section 3.2.2).z We defined different categories of frequent patterns and events in a time series. We connect them by instance distribution of frequent patterns. We presented EDPA algorithm to aggregate frequent patterns in different category to get different category of event.z Experiments on sensor network datasets demonstrate that M NOE beats the existing algorithm. EDPA can detect events in high accuracy and distinguishing non-overlappingepisode from overlapping episode, significant episode fromnon-significant episode help improving the accuracy of EDPA.The rest of the paper is organized as follows: Section 2 is preliminaries. Section 3 presents the frequent patterns based framework for event detection. The proposed algorithms (i.e.M NOE, EDPA) and EGM AM C model are presented in three subsections respectively. Section 4 gives related works. Section 5 gives the experiment results and we conclude in section 6.2.PRELIMINARIESDefinition 1. (Time Series) A sequence of objects, 1122{,,,,...,,}i iT o t o t o t! ! !, where12{,,...,}iO o o o represents the symbols of different typesof objects andit the time of occurrence of the i th object. Definition 2. (Frequent Pattern) A subsequence of object symbols(i.e.1{,,...,}j iP o o o) that appear in a time series with frequency no less than a user-specified threshold (i.e. minimum support).Definition 3. (Maximal Frequent Pattern) Given a frequent pattern 12{,,...,}i mP o o o, there do not exist another frequentpattern12{,,...,}j nP o o o includesiP. Ifi jP P, then()()i i j jP o P o,i jo o, where ()i iP o represents theindex of symbolio iniP.By defining different measures of the contribution that the occurrences of objects made to supports, we can get different kinds of frequent patterns, namely sequential pattern and episode. Definition 4. (Sequential Pattern) Given a set of time segmentsand a minimal supportSJ. A sequence occurs once or more than once in a segment contributes 1 to its support. If a sequence’ssupport is larger thanSJ, it’s a sequential pattern.Definition 5. (Episode) Given a time series and a minimal support EJ. A sequence occurs once in this time series contributes 1 toits support. If a sequence’s support is larger thanEJ, it’s an episode.Laxman etc. [15] is the first one defines non-overlapping episodeas follows:Definition 6. (Non-overlapping episode) Suppose an episode occurs in the time series twice, if any object associated with either occurrence doesn’t occur between the objects associated with the other occurrence, then these two instances of the episode are called non-overlapping episode. The frequency of an episode is defined as the maximum number of non-overlapping occurrencesof the episode in time series.To describe the distribution of frequent patterns in a time series, we defined feature flows.Definition 7 (Feature Flow) Suppose a time series 1122{,,,,...,,}i iT o t o t o t! ! ! is segmented intoa segmentation12{(),(),...,()}TSeg s tf s tf s tf. A flow offrequent pattern feature can be written as the sequence:12{(),(),...,()}f f f f T y y tf y tf y tf ,()f i y tf is a measureof frequent pattern featurefin time frame i tf , the sequence oftime frame 12{,,...,}T tf tf tf is a mapping of the segmentationSeg .(())()log((())p f pN s tf N y tf N s tf Nu , where (())p N s tfis the number of occurrence of pattern p in segment ()s tf.(())N s tf is the total number of occurrence of all the frequentpatterns in segment ()s tf .N and p Nare the number of occurrence of all the frequent patterns and pattern p in allsegments respectively.Definition 8 (Event) An event in time series is a set of frequent patterns12{,,...,}n P p p p and any two patterns (i p andj p ) must satisfy the following necessary conditions:1.Their feature flows fi y and fjy must be identicallydistributed.2.i p and j p must occur in the same time segments with highprobability.3.FREQUENT PATTERN BASEDFRAMEWORK FOR EVENT DETECTIONThe presented frequent pattern based event detection framework consists of frequent pattern discovery, frequent pattern selection and modeling. In frequent pattern discovery phase, we can use any existing algorithms to discovery various types of frequent patterns (sequential pattern and episode) and section 3.1 presents the proposed MNOE algorithm which can discovery non-overlapping episodes. For different data mining tasks, there are many methods have been proposed to select interesting frequent patterns from the closed set of frequent patterns. In section 3.2, we present a M arkov chain based method to distinguish significant and non-significant episode and it can be used as a method to select frequent pattern. Finally, in modeling phase, different models can be trained from selected frequent patterns to achieve different tasks (e.g. clustering, classification and associate rule). In section 3.3, we present EDPA algorithm which clusters frequent patterns for event detection in stream data.3.1Finding Non-overlapping EpisodesThe concept of non-overlapping episode was firstly proposed by S.Laxman etc. [15] (see Def. 6). But it’s regret that S. Laxman etc. [15,16] only proposed an algorithm to find non-overlapping episodes from the complete set of episodes. It’s obviously not efficient. So based on the same definition of non-overlapping episode and its frequency, we present a iterative algorithm which directly find non-overlapping episodes from the given time series. Algorithm MNOE Input:1.a set S of time annotated time series projection,P head body !2.The length of episode to be found in current iterate: l3.The maximum time interval permitted between every twosuccessive events in an episode: maxGap 4. minimum support: minSpt Output: the set of non-overlapping episodes1.Sortingi P increasingly in S by the time stamp annotedto the last element in .i P head 2.for (every projection i P in S){3. for every element e in .i P body {4.If (substraction of ._e time stamp and the time stamp of the last element in .i P head is largerthan maxGap) then 5. Break; 6.Ifeis the element belonging to |e| withsmallest time stamp in.i P bodyand thetime stamp of the first element in .i P head is larger than the time stamp of last element in HashtableFE(|e|), then 7. e contribute 1 to the support of |e|. 8. Project e on .i P body :i P c =projection(e );9. .()i S add P c c 10. For every key value (i.e.|e|) in HashtableFE 11. if HashtableFE(|e|).size is larger than min Spt 12.Call MNOE(S c ,1l ,m ax G ap ,min Spt )Operator|.|on a sub-sequence x (i.e.||x ) means don’tconsider the annotated time stamps on x . For example, ||e represents an event type, ||,_ee time stamp !represents an instance of ||e appears in time series at_time stamp . So |.|i P head represents the prefix ofepisodes in current iterate, .i P head represents the instances. .i P body represents the set of all events occur after .i P head intime series. The sorting of 12{,,...,}n SP P P in line 1 ofM NOE guarantees us getting the maxmum number of non-overlapping occurrentces of the episode. HashtableFE is a hashtable got |e| as the keys, and HashtableFE(|e|) represent the set of event instances following the current prefix |.|i P head . Finally,projection(e ) follows the rule: if i P c is the new projection, then..i i P head P head e c ,.i P bodyc is a sub-set of eventinstances in .i P body whose time stamps are larger than e .3.2Significant Frequent Episodes3.2.1EGMAMC ModelEGM AM C is a combination of memory M arkov chain andaggregation Markov chain: Suppose12{,,...,,...,}t T X x x x x , where {1,...,}t T ,12{,,...,}t n x E e e e is the state at current time point, Erepresents state space. Then1111(|,...,)()(|)()(|)(|)LLKllt t t L t t l t t l l l k p x x x p l p x x p l p x k p k x ¦¦¦ (1)where L is the maximum time delay permitted (i.e. the maximumtime interval between every successive events permitted in an episode), the state space is categorized into K clusters, ()p l represents the probability that the state at current time point is determined by the state at the l before time points.(|)l t l p k x denotes the probability in which the l before timepoints state determines the current time state belonging to the cluster k .(|)t p x k denotes the probability the state at currenttime point is t x on condition that the current time point state belongs to cluster k .The states in EGMAMC’s state space are categorized into episode event e K and noise n K .e K denotes the events associated withthe episode 121{,,...,,}N N e e e e S S S S D, where ie S represents the event appears in the i th c slot of D and N is the length of D .n K denotes the event not associated with D . Ifset the maximum interval between every two successive events asL , we get the D based EGMAMC model D /:111111(|,...,)(|,...,)(|)(|,...,)(|,)(|)t t t L Llt t L t t l l LKl t t L t t l t l l k P x x x P l x x P x x P l x x P x k x P k x¦¦¦(2)Whent l ex K and,t l nl l x K c c ,1(|,...,)1t t L P l x x c otherwise, 1(1|,...,)1t n t L n Pl x K x K .Set1(,...,)t n t L n P x K x K Owhich indicates theprobability that the time interval of two successive instances ofepisode D is larger than L .O is larger means D distribute more sparsely in the time series, vice versus.Set(|)l n t l P k K x K as the probability that the currentstate belongs to noise (i.e. 1Kis the probability that the current state belongs to episode event). When e kK , if 1n t l e x S , then (|,)1n t e e t l P x S K x ,(|,)0n t e e t l P x S K x z .Whennk K ,1(|,)t n t l P x K x M, where M is thenumber of event type which could appear in the given time series. We suppose the noise obeys an independent identical distribution. Based on the definition above, given an observed time series o ,it’s generated by D /in the probability:()()()()()()()()(1)()(|)(1)()((1))()((1)(1))(1)(1)()()()1(1)(1)()()()1(1)(1)()()n en e n n e e n eq o q o e n n e e T e ee q o q o q o q o q o T q o T q o T P o M MM MM MM MO D O O OKO K O K O K O O K K O K O O K K O K O O KK Kc / (3)Where ||To denotes the length of the observed time series,()n n q o is the times noise states transit to noise states in o .Similarly, ()en q o represents the times noise states transit to the first event in D (i.e.1eS ).()ne q o denotes the times episode states transit to noise state, ()ee q o c represents the times episodestates transit to all the episode states “except” 1e S ,()ee q o represents the times episode states transit to all the episode states.We could approximately consider ()ee q o Nf D, so we getequation (4) from (3) as follows:(1)(1)(1)(|)()()N f T M P o MD O O D O O KK K/ (4)Informally speaking, if K is larger, D appears less frequently ino , if Ois larger, D distributes more sparsely in o . Nextsection introduces the method to test whether D /is proper to be utilized to describe a given time series o .3.2.2Significance Testing of Frequent EpisodesThis study compare EGMAMC model D /with an i.i.d model by Likelihood ratio testing. If the hypothesis of D /is more proper than i.i.d model to be considered as the generative model of theobserved time series o is accepted, D is called significant episode.We test the alternate hypothesis 1H : time series o is generated by an EGM AM C model D /, against the null hypothesis 0H :time series o is generated by an independent identical distribution model...i i d / If inequation (5) holds then 0H is rejected (i.e.D is significant)10(|)()0(|)P o H L o P o H J!! (J is determined by fixing theprobability of Type , error) (5) According to equation (4), when 11M M K or112O , there is too much noise in time series o , so that 1His equal to 0H . So it’s only necessary to test the two hypothesis under condition 1M M Kand 112O .Since 01(|)(T P o H Mis a constant, inequation (5) is equal to 10(|)(|)P o H P o H J J c ! , and according to equation (3), 1(|)P o H J c !is equal to ()ee q o !*, soinequation (5) is equal to()()e e L o q o c !*(Jis determined by fixing theprobability of Type , error) (6) Type,error is the probability of incorrectly accepting 0H under the condition that ()L o c !*.01(();)(()Tf P P L o H Q MDc !* * (7) Where ()|{;()}|e e Q o q o * !* represents the number of T -length time series which let ()L o c !* hold.()Q *has an upper bound as(1)()(1)k TT kT k Q C MM O O !**d¦(8)k T C is the combination number of choosing k position in a T -length time series to place episode states. There are T O positionsare filled by states which are determined by the noise at the time point before, the combination number of these states is TM O .The remaining positions in the T -length time series should be filled by the states are determined by episode states, and each of these position has (1)M choice of the state, because it could not be chosen as the next episode state. So the combination number is (1)(1)T k MO. After all we get111f kP c D d | )(9) where(.))is the cumulative distribution function of a standardnormal random variable and the approximation holds for large Tdue to the central limit theorem. min ()()1T k M c M O d* is a constant, min ()k O d * represents the minimum O appears in the time series in which k d *. So if given the upper bound of Type ,error H (e.g. 0.5H) then11()cH *(10)WhenTis larger, TM*|, and ()ee q o Nf D, thenaccording to inequation (6), when T f M N D !,0H is rejected.Since in a time series o ,()ee T q o T Nf T T D K , inorder to satisfy 1M M K,(1)T f N M D !must hold.Summarizing above, when Tf MN D !and 12O ,Dissignificant episode in time series o .3.3Frequent Pattern Clustering for EventDetection3.3.1Frequent Pattern Spectral AnalysisIntuitively, an event could be represented by a set of correlated frequent patterns. Instances of different types of frequent pattern have different distributions in a time series segmentation. For example, if a subsequence of objects is an episode in a time series segment, its instances must densely distribute in the segment, otherwise the distribution is sparse. If a subsequence of objects is a sequential pattern in a time series segmentation, it must periodically distribute in the original time series. It’s obverse that a subsequence of objects can both be episode and sequential pattern.Feature flows can be used to describe the distribution of frequent patterns (see Def. 7). Given a feature flowf y , we can decomposeit into frequency components by discrete Fourier transform (DFT):2(1)1(),1,2,...,t iTk Tk f t X y t ek TS¦k X denotes the amplitude of the sinusoid with frequency k .The dominant period of the feature flow is represented by the frequency component which has the maximum signal power. We call this maximum frequency component as dominant power spectrum, i.e. 2max ||||fk S X ,1,2,...,2T k ªº «»,2||||k X indicates the signal power at frequency k T . And thecorresponding dominant period is 2arg max ||||f k kP T X .As shown in table 1, according to the values of f S and f P , we categorize frequent patterns into four categories.Table 1 Categories of frequent patterns and correspondingevent types. Pattern CategoriesFeature flow descriptionEvent CategoriesExamples in officesBurstPattern (episode butnot sequential pattern)High f S ,aperiodic or long-term periodic (2f T P ªº!«»)Burst Event: Just happen in a short time frame with large volume.A fire alarm or an annual conference Regular Pattern (episode and sequential pattern)High f S , short-term periodic(2fT P ªºd «»)Regular Event: Daily HappeneventsWork day MeetingsRare Pattern (neither episode nor sequential pattern,namely noise)Low f S ,aperiodic orlong-term periodicRare Event: Just happen in several short time frames, sometimes this kind of event is very important, sometimes it is only noise Somebody fell downRoutine Pattern (not episodebut sequential pattern)Lowf S , short-term periodic (2f T P ªºd «»)Routine Event: Happen periodically but not so oftencomparatively with regular eventsEvery day cleaning work or biweekly meetingsAs defined in Table 1, different kinds of events can be represented by corresponding categories of frequent patterns, each category of frequent pattern can be interpreted as different combinations ofepisode and sequential pattern. We should note that the detection of rare event is a very important and challenging topic [17], but it’s not proper to represent it by frequent patterns. So in this paper, we don’t discriminate rare event and noise and the clustering algorithm for event detection only considered the other three categories of events.According to the definition of sequential pattern and episode (see Def.4 and 5), if a frequent pattern is episode, it can approximately indicate its high f S feature. If a frequent pattern is sequential pattern, it can approximately indicate its short term periodic feature. So we can map categories of frequent patterns to the combinations of episode and sequential pattern. In this way, we needn’t to do DFT on feature flows when we get all the frequent patterns. It promotes the computing efficiency in practice.3.3.2EDPA AlgorithmIf two patterns (i p and j p ) are representative of the same event,they must satisfy the following necessary conditions: 1.Their feature flows fi y and fjy must be identicallydistributed.2.i p and j p must occur in the same time segments with highprobability.To measure the distribution similarity of feature flows, we introduce mutual information (MI). In information Theory, MI is usually used to measure the distance between two distributions (e.g. ()fi D y and ()fj D y ).((),())fi fj MI D y D y can geta value between [0,1], if ((),())1fi fj MI D y D y ,()fi D y and()fj D y are identical. The value of((),())fi fj MI D y D y gets smaller when the distance of()fi D y and ()fj D y becomes larger.Let i TS denotes the set of time segments containing pattern i p ,to measure the probability of pattern i p andj p occur in thesame time segment, we defined time segment occurrence overlap as follows:||(,)min(||,||)i j i j i j TS TS d f f TS TSFor a set of pattern feature i A , we define its mutual informationdistance and time segment occurrence overlap as follows:,()min ((,))ii i j fi fj A MI A MI f f ,()min ((,))ii i j fi fj A d A d f f To detect events occur in time series is to find a set of pattern feature i A that minimizes the following cost function:1()()()ii i i fjfj A C A d A MI A Su ¦ (11)The underlying event e associated with frequent patterns in i A can be represented as:()iifjfj fj A fufu A S y e y S¦¦(12)The proposed EDPA (Event Detection by Pattern Aggregation) is a greedy algorithm. It initializes the pattern feature set{}i i A f , in each iteration it merges two feature sets togetherif the merged result get a smaller value of the cost function, i.e.()()i j i C A A C A * and()()i j j C A A C A *.Intuitively, frequent pattern features in the same category correlated stronger than those in different categories and it could also be verified by the cost function (equation (11)). So each time, we only input the feature flows of the frequent patterns in the same category to find a special kind of event as defined in Table 1, namely we use the feature flows of burst pattern to detect burst event, the regular pattern to detect regular event etc. Algorithm EDPA Input: feature flows 12{,,...,}n Ff f f represent patterns in12{,,...,}n P p p p Output: events 12{,,...,}m E e e e 1.for everyi f in F2. init 1i fiA S3. sort12{,,...,...,}i n A A A A A in ascendant order 4. while A is not empty 5. .i A A next6.//aggregate patterns set together until the aggregation increase ()i C A7. for every j A in A where i j z8. if ()()ij i C A A C A * and ()()i j j C A A C A *9.i i j A A A * and jA A A 10. min ()i C C A 11. else 12. outputi A as k e into E //represent k e as equation (12)13.iA A A 14. break for loop 15. end if 16. end for 17.end while18.outputE4.RELATED WORKS. Laxman etc. [15] defined non-overlapping episode and proposed the first algorithm to discovery non-overlapping episodes. But unfortunately, the algorithm proposed by S. Laxman etc. can only discover non-overlapping episode from the set of all episodes (including overlapping and non-overlapping ones) , which means this algorithm must based on the result output by a episode discovering algorithm such as WinMiner [3]. M NOE proposed in this paper doesn’t traverse all the episode space, we only pick out our interested ones, namely non-overlapping episodes.Meger and C. Rigotti. [3] proposes a complete algorithm (i.e. WinM iner) to find frequent episode pattern in a single long sequence. WinM iner only constrains the maximum time interval between every two successive events in episodes, so the length of sequential pattern is unbound. But WinM iner does not consider the time interval and position of the objects in the pattern instances. J. Pei and J. Han [2] presented the first algorithm discovering sequential patterns.Giannella et al. [5] proposed a frequent itemsets mining algorithm over data stream which utilizes tilted windows to calculate the frequent patterns for the most recent transactions and maintains a tree data structure (i.e. FP-stream) to represent the frequent itemsets.M anku and M otwani [7] proposed an approximate frequency counts in data streams. Cormode and Muthukrishnan [4] proposed an algorithm to find the hottest k items by group testing. The algorithm is used with the turnstile data stream model which allows addition as well as deletion of data items.Gaber et al. [6] proposed an AOG-based algorithm: Lightweight frequency counting LWF. It can find an approximate solution to the most frequent items on-line using adaptation and releasing the least frequent items regularly in order to count the more frequent ones.There is a series of research projects worked on event detection and correlation of events especially in the database community. TelegraphCQ [21] focuses on meeting the challenges that in handling large streams of continuous queries over high-volume, highly-variable data streams. Aurora [22, 23] manages continuous data stream for monitoring applications; it focuses on DBMS-like support for applications. Borealis [24] provide advanced capabilities that are commonly required by newly-emerging stream processing applications. The STREAM project [25] views stream processing as the running of continuous queries expressed in a query language (CQL) that includes sliding windows and sampling over the data stream. Unfortunately, the mentioned projects do not consider the frequent patterns as an important structure in data streams. In this paper, authors argue that frequent patterns form events in data stream and different types of frequent patterns have different features which also represent different features of events. EDPA presented in this paper cluster frequent patterns into events based on the feature follow of frequent patterns.5.EXPERIMENTAL EVALUATIONWe evaluated the presented framework on two public datasets (MERL and IL dataset) and a dataset come from a smart building project (Smart-Building dataset). All the datasets are generated by。