Multivariate Gaussianization for Data Processing
- 格式:pdf
- 大小:35.60 MB
- 文档页数:70
多元高斯分布最大后验多元高斯分布(Multivariate Gaussian Distribution)是统计学中常用的概率分布模型之一,它可以描述多个变量之间的关联关系。
在统计学和机器学习领域中,最大后验估计(Maximum a Posteriori Estimation)是一种常用的参数估计方法,它结合了先验知识和观测数据,寻找最可能解释观测数据的参数值。
在现实生活中,我们经常遇到需要估计某些未知参数的情况。
例如,在医学研究中,我们可能希望估计某种药物对患者疾病的治疗效果;在金融领域,我们可能需要估计某个投资组合的收益率等。
而最大后验估计方法可以帮助我们利用已有的观测数据和先验知识,来得到对未知参数的最优估计。
最大后验估计的核心思想是找到使得后验概率密度函数最大化的参数值。
换句话说,我们通过最大化后验概率来估计参数。
其中,后验概率是指在给定观测数据的情况下,参数的概率分布。
它由贝叶斯定理得到,将观测数据和先验知识结合起来。
多元高斯分布是最大后验估计中常用的概率分布模型之一。
它假设各个变量之间服从高斯分布,并且变量之间存在线性关系。
这使得我们可以通过最大后验估计来估计多元高斯分布的参数,从而得到对观测数据的更准确的描述。
在实际应用中,最大后验估计方法被广泛应用于各个领域。
例如,在图像处理中,我们可以利用最大后验估计来还原受噪声影响的图像;在自然语言处理中,我们可以利用最大后验估计来估计语言模型的参数,从而提高文本生成的准确性。
总的来说,最大后验估计是一种重要的参数估计方法,它结合了先验知识和观测数据,能够帮助我们更准确地估计未知参数。
而多元高斯分布作为一种常用的概率分布模型,可以用于描述多个变量之间的关联关系。
通过最大后验估计,我们可以得到对观测数据更准确的描述,从而提高模型的性能和应用的效果。
A review on time series data miningTak-chung FuDepartment of Computing,Hong Kong Polytechnic University,Hunghom,Kowloon,Hong Konga r t i c l e i n f oArticle history:Received19February2008Received in revised form14March2010Accepted4September2010Keywords:Time series data miningRepresentationSimilarity measureSegmentationVisualizationa b s t r a c tTime series is an important class of temporal data objects and it can be easily obtained from scientificandfinancial applications.A time series is a collection of observations made chronologically.The natureof time series data includes:large in data size,high dimensionality and necessary to updatecontinuously.Moreover time series data,which is characterized by its numerical and continuousnature,is always considered as a whole instead of individual numericalfield.The increasing use of timeseries data has initiated a great deal of research and development attempts in thefield of data mining.The abundant research on time series data mining in the last decade could hamper the entry ofinterested researchers,due to its complexity.In this paper,a comprehensive revision on the existingtime series data mining researchis given.They are generally categorized into representation andindexing,similarity measure,segmentation,visualization and mining.Moreover state-of-the-artresearch issues are also highlighted.The primary objective of this paper is to serve as a glossary forinterested researchers to have an overall picture on the current time series data mining developmentand identify their potential research direction to further investigation.&2010Elsevier Ltd.All rights reserved.1.IntroductionRecently,the increasing use of temporal data,in particulartime series data,has initiated various research and developmentattempts in thefield of data mining.Time series is an importantclass of temporal data objects,and it can be easily obtained fromscientific andfinancial applications(e.g.electrocardiogram(ECG),daily temperature,weekly sales totals,and prices of mutual fundsand stocks).A time series is a collection of observations madechronologically.The nature of time series data includes:large indata size,high dimensionality and update continuously.Moreovertime series data,which is characterized by its numerical andcontinuous nature,is always considered as a whole instead ofindividual numericalfield.Therefore,unlike traditional databaseswhere similarity search is exact match based,similarity search intime series data is typically carried out in an approximatemanner.There are various kinds of time series data related research,forexample,finding similar time series(Agrawal et al.,1993a;Berndtand Clifford,1996;Chan and Fu,1999),subsequence searching intime series(Faloutsos et al.,1994),dimensionality reduction(Keogh,1997b;Keogh et al.,2000)and segmentation(Abonyiet al.,2005).Those researches have been studied in considerabledetail by both database and pattern recognition communities fordifferent domains of time series data(Keogh and Kasetty,2002).In the context of time series data mining,the fundamentalproblem is how to represent the time series data.One of thecommon approaches is transforming the time series to anotherdomain for dimensionality reduction followed by an indexingmechanism.Moreover similarity measure between time series ortime series subsequences and segmentation are two core tasksfor various time series mining tasks.Based on the time seriesrepresentation,different mining tasks can be found in theliterature and they can be roughly classified into fourfields:pattern discovery and clustering,classification,rule discovery andsummarization.Some of the research concentrates on one of thesefields,while the others may focus on more than one of the aboveprocesses.In this paper,a comprehensive review on the existingtime series data mining research is given.Three state-of-the-arttime series data mining issues,streaming,multi-attribute timeseries data and privacy are also briefly introduced.The remaining part of this paper is organized as follows:Section2contains a discussion of time series representation andindexing.The concept of similarity measure,which includes bothwhole time series and subsequence matching,based on the rawtime series data or the transformed domain will be reviewed inSection3.The research work on time series segmentation andvisualization will be discussed in Sections4and5,respectively.InSection6,vary time series data mining tasks and recent timeseries data mining directions will be reviewed,whereas theconclusion will be made in Section7.2.Time series representation and indexingOne of the major reasons for time series representation is toreduce the dimension(i.e.the number of data point)of theContents lists available at ScienceDirectjournal homepage:/locate/engappaiEngineering Applications of Artificial Intelligence0952-1976/$-see front matter&2010Elsevier Ltd.All rights reserved.doi:10.1016/j.engappai.2010.09.007E-mail addresses:cstcfu@.hk,cstcfu@Engineering Applications of Artificial Intelligence24(2011)164–181original data.The simplest method perhaps is sampling(Astrom, 1969).In this method,a rate of m/n is used,where m is the length of a time series P and n is the dimension after dimensionality reduction(Fig.1).However,the sampling method has the drawback of distorting the shape of sampled/compressed time series,if the sampling rate is too low.An enhanced method is to use the average(mean)value of each segment to represent the corresponding set of data points. Again,with time series P¼ðp1,...,p mÞand n is the dimension after dimensionality reduction,the‘‘compressed’’time series ^P¼ð^p1,...,^p nÞcan be obtained by^p k ¼1k kX e ki¼s kp ið1Þwhere s k and e k denote the starting and ending data points of the k th segment in the time series P,respectively(Fig.2).That is, using the segmented means to represent the time series(Yi and Faloutsos,2000).This method is also called piecewise aggregate approximation(PAA)by Keogh et al.(2000).1Keogh et al.(2001a) propose an extended version called an adaptive piecewise constant approximation(APCA),in which the length of each segment is notfixed,but adaptive to the shape of the series.A signature technique is proposed by Faloutsos et al.(1997)with similar ideas.Besides using the mean to represent each segment, other methods are proposed.For example,Lee et al.(2003) propose to use the segmented sum of variation(SSV)to represent each segment of the time series.Furthermore,a bit level approximation is proposed by Ratanamahatana et al.(2005)and Bagnall et al.(2006),which uses a bit to represent each data point.To reduce the dimension of time series data,another approach is to approximate a time series with straight lines.Two major categories are involved.Thefirst one is linear interpolation.A common method is using piecewise linear representation(PLR)2 (Keogh,1997b;Keogh and Smyth,1997;Smyth and Keogh,1997). The approximating line for the subsequence P(p i,y,p j)is simply the line connecting the data points p i and p j.It tends to closely align the endpoint of consecutive segments,giving the piecewise approximation with connected lines.PLR is a bottom-up algo-rithm.It begins with creating afine approximation of the time series,so that m/2segments are used to approximate the m length time series and iteratively merges the lowest cost pair of segments,until it meets the required number of segment.When the pair of adjacent segments S i and S i+1are merged,the cost of merging the new segment with its right neighbor and the cost of merging the S i+1segment with its new larger neighbor is calculated.Ge(1998)extends PLR to hierarchical structure. Furthermore,Keogh and Pazzani enhance PLR by considering weights of the segments(Keogh and Pazzani,1998)and relevance feedback from the user(Keogh and Pazzani,1999).The second approach is linear regression,which represents the subsequences with the bestfitting lines(Shatkay and Zdonik,1996).Furthermore,reducing the dimension by preserving the salient points is a promising method.These points are called as perceptually important points(PIP).The PIP identification process isfirst introduced by Chung et al.(2001)and used for pattern matching of technical(analysis)patterns infinancial applications. With the time series P,there are n data points:P1,P2y,P n.All the data points in P can be reordered by its importance by going through the PIP identification process.Thefirst data point P1and the last data point P n in the time series are thefirst and two PIPs, respectively.The next PIP that is found will be the point in P with maximum distance to thefirst two PIPs.The fourth PIP that is found will be the point in P with maximum vertical distance to the line joining its two adjacent PIPs,either in between thefirst and second PIPs or in between the second and the last PIPs.The PIP location process continues until all the points in P are attached to a reordered list L or the required number of PIPs is reached(i.e. reduced to the required dimension).Seven PIPs are identified in from the sample time series in Fig.3.Detailed treatment can be found in Fu et al.(2008c).The idea is similar to a technique proposed about30years ago for reducing the number of points required to represent a line by Douglas and Peucker(1973)(see also Hershberger and Snoeyink, 1992).Perng et al.(2000)use a landmark model to identify the important points in the time series for similarity measure.Man and Wong(2001)propose a lattice structure to represent the identified peaks and troughs(called control points)in the time series.Pratt and Fink(2002)and Fink et al.(2003)define extrema as minima and maxima in a time series and compress thetime Fig.1.Time series dimensionality reduction by sampling.The time series on the left is sampled regularly(denoted by dotted lines)and displayed on the right with a largedistortion.Fig.2.Time series dimensionality reduction by PAA.The horizontal dotted lines show the mean of each segment.1This method is called piecewise constant approximation originally(Keoghand Pazzani,2000a).2It is also called piecewise linear approximation(PLA).Tak-chung Fu/Engineering Applications of Artificial Intelligence24(2011)164–181165series by selecting only certain important extrema and dropping the other points.The idea is to discard minor fluctuations and keep major minima and maxima.The compression is controlled by the compression ratio with parameter R ,which is always greater than one;an increase of R leads to the selection of fewer points.That is,given indices i and j ,where i r x r j ,a point p x of a series P is an important minimum if p x is the minimum among p i ,y ,p j ,and p i /p x Z R and p j /p x Z R .Similarly,p x is an important maximum if p x is the maximum among p i ,y ,p j and p x /p i Z R and p x /p j Z R .This algorithm takes linear time and constant memory.It outputs the values and indices of all important points,as well as the first and last point of the series.This algorithm can also process new points as they arrive,without storing the original series.It identifies important points based on local information of each segment (subsequence)of time series.Recently,a critical point model (CPM)(Bao,2008)and a high-level representation based on a sequence of critical points (Bao and Yang,2008)are proposed for financial data analysis.On the other hand,special points are introduced to restrict the error on PLR (Jia et al.,2008).Key points are suggested to represent time series in (Leng et al.,2009)for an anomaly detection.Another common family of time series representation approaches converts the numeric time series to symbolic form.That is,first discretizing the time series into segments,then converting each segment into a symbol (Yang and Zhao,1998;Yang et al.,1999;Motoyoshi et al.,2002;Aref et al.,2004).Lin et al.(2003;2007)propose a method called symbolic aggregate approximation (SAX)to convert the result from PAA to symbol string.The distribution space (y -axis)is divided into equiprobable regions.Each region is represented by a symbol and each segment can then be mapped into a symbol corresponding to the region inwhich it resides.The transformed time series ^Pusing PAA is finally converted to a symbol string SS (s 1,y ,s W ).In between,two parameters must be specified for the conversion.They are the length of subsequence w and alphabet size A (number of symbols used).Besides using the means of the segments to build the alphabets,another method uses the volatility change to build the alphabets.Jonsson and Badal (1997)use the ‘‘Shape Description Alphabet (SDA)’’.Example symbols like highly increasing transi-tion,stable transition,and slightly decreasing transition are adopted.Qu et al.(1998)use gradient alphabets like upward,flat and download as symbols.Huang and Yu (1999)suggest transforming the time series to symbol string,using change ratio between contiguous data points.Megalooikonomou et al.(2004)propose to represent each segment by a codeword from a codebook of key-sequences.This work has extended to multi-resolution consideration (Megalooi-konomou et al.,2005).Morchen and Ultsch (2005)propose an unsupervised discretization process based on quality score and persisting states.Instead of ignoring the temporal order of values like many other methods,the Persist algorithm incorporates temporal information.Furthermore,subsequence clustering is a common method to generate the symbols (Das et al.,1998;Li et al.,2000a;Hugueney and Meunier,2001;Hebrail and Hugueney,2001).A multiple abstraction level mining (MALM)approach is proposed by Li et al.(1998),which is based on the symbolic form of the time series.The symbols in this paper are determined by clustering the features of each segment,such as regression coefficients,mean square error and higher order statistics based on the histogram of the regression residuals.Most of the methods described so far are representing time series in time domain directly.Representing time series in the transformation domain is another large family of approaches.One of the popular transformation techniques in time series data mining is the discrete Fourier transforms (DFT),since first being proposed for use in this context by Agrawal et al.(1993a).Rafiei and Mendelzon (2000)develop similarity-based queries,using DFT.Janacek et al.(2005)propose to use likelihood ratio statistics to test the hypothesis of difference between series instead of an Euclidean distance in the transformed domain.Recent research uses wavelet transform to represent time series (Struzik and Siebes,1998).In between,the discrete wavelet transform (DWT)has been found to be effective in replacing DFT (Chan and Fu,1999)and the Haar transform is always selected (Struzik and Siebes,1999;Wang and Wang,2000).The Haar transform is a series of averaging and differencing operations on a time series (Chan and Fu,1999).The average and difference between every two adjacent data points are computed.For example,given a time series P ¼(1,3,7,5),dimension of 4data points is the full resolution (i.e.original time series);in dimension of two coefficients,the averages are (26)with the coefficients (À11)and in dimension of 1coefficient,the average is 4with coefficient (À2).A multi-level representation of the wavelet transform is proposed by Shahabi et al.(2000).Popivanov and Miller (2002)show that a large class of wavelet transformations can be used for time series representation.Dasha et al.(2007)compare different wavelet feature vectors.On the other hand,comparison between DFT and DWT can be found in Wu et al.(2000b)and Morchen (2003)and a combination use of Fourier and wavelet transforms are presented in Kawagoe and Ueda (2002).An ensemble-index,is proposed by Keogh et al.(2001b)and Vlachos et al.(2006),which ensembles two or more representations for indexing.Principal component analysis (PCA)is a popular multivariate technique used for developing multivariate statistical process monitoring methods (Yang and Shahabi,2005b;Yoon et al.,2005)and it is applied to analyze financial time series by Lesch et al.(1999).In most of the related works,PCA is used to eliminate the less significant components or sensors and reduce the data representation only to the most significant ones and to plot the data in two dimensions.The PCA model defines linear hyperplane,it can be considered as the multivariate extension of the PLR.PCA maps the multivariate data into a lower dimensional space,which is useful in the analysis and visualization of correlated high-dimensional data.Singular value decomposition (SVD)(Korn et al.,1997)is another transformation-based approach.Other time series representation methods include modeling time series using hidden markov models (HMMs)(Azzouzi and Nabney,1998)and a compression technique for multiple stream is proposed by Deligiannakis et al.(2004).It is based onbaseFig.3.Time series compression by data point importance.The time series on the left is represented by seven PIPs on the right.Tak-chung Fu /Engineering Applications of Artificial Intelligence 24(2011)164–181166signal,which encodes piecewise linear correlations among the collected data values.In addition,a recent biased dimension reduction technique is proposed by Zhao and Zhang(2006)and Zhao et al.(2006).Moreover many of the representation schemes described above are incorporated with different indexing methods.A common approach is adopted to an existing multidimensional indexing structure(e.g.R-tree proposed by Guttman(1984))for the representation.Agrawal et al.(1993a)propose an F-index, which adopts the R*-tree(Beckmann et al.,1990)to index thefirst few DFT coefficients.An ST-index is further proposed by (Faloutsos et al.(1994),which extends the previous work for subsequence handling.Agrawal et al.(1995a)adopt both the R*-and R+-tree(Sellis et al.,1987)as the indexing structures.A multi-level distance based index structure is proposed(Yang and Shahabi,2005a),which for indexing time series represented by PCA.Vlachos et al.(2005a)propose a Multi-Metric(MM)tree, which is a hybrid indexing structure on Euclidean and periodic spaces.Minimum bounding rectangle(MBR)is also a common technique for time series indexing(Chu and Wong,1999;Vlachos et al.,2003).An MBR is adopted in(Rafiei,1999)which an MT-index is developed based on the Fourier transform and in(Kahveci and Singh,2004)which a multi-resolution index is proposed based on the wavelet transform.Chen et al.(2007a)propose an indexing mechanism for PLR representation.On the other hand, Kim et al.(1996)propose an index structure called TIP-index (TIme series Pattern index)for manipulating time series pattern databases.The TIP-index is developed by improving the extended multidimensional dynamic indexfile(EMDF)(Kim et al.,1994). An iSAX(Shieh and Keogh,2009)is proposed to index massive time series,which is developed based on an SAX.A multi-resolution indexing structure is proposed by Li et al.(2004),which can be adapted to different representations.To sum up,for a given index structure,the efficiency of indexing depends only on the precision of the approximation in the reduced dimensionality space.However in choosing a dimensionality reduction technique,we cannot simply choose an arbitrary compression algorithm.It requires a technique that produces an indexable representation.For example,many time series can be efficiently compressed by delta encoding,but this representation does not lend itself to indexing.In contrast,SVD, DFT,DWT and PAA all lend themselves naturally to indexing,with each eigenwave,Fourier coefficient,wavelet coefficient or aggregate segment map onto one dimension of an index tree. Post-processing is then performed by computing the actual distance between sequences in the time domain and discarding any false matches.3.Similarity measureSimilarity measure is of fundamental importance for a variety of time series analysis and data mining tasks.Most of the representation approaches discussed in Section2also propose the similarity measure method on the transformed representation scheme.In traditional databases,similarity measure is exact match based.However in time series data,which is characterized by its numerical and continuous nature,similarity measure is typically carried out in an approximate manner.Consider the stock time series,one may expect having queries like: Query1:find all stocks which behave‘‘similar’’to stock A.Query2:find all‘‘head and shoulders’’patterns last for a month in the closing prices of all high-tech stocks.The query results are expected to provide useful information for different stock analysis activities.Queries like Query2in fact is tightly coupled with the patterns frequently used in technical analysis, e.g.double top/bottom,ascending triangle,flag and rounded top/bottom.In time series domain,devising an appropriate similarity function is by no means trivial.There are essentially two ways the data that might be organized and processed(Agrawal et al., 1993a).In whole sequence matching,the whole length of all time series is considered during the similarity search.It requires comparing the query sequence to each candidate series by evaluating the distance function and keeping track of the sequence with the smallest distance.In subsequence matching, where a query sequence Q and a longer sequence P are given,the task is tofind the subsequences in P,which matches Q. Subsequence matching requires that the query sequence Q be placed at every possible offset within the longer sequence P.With respect to Query1and Query2above,they can be considered as a whole sequence matching and a subsequence matching,respec-tively.Gavrilov et al.(2000)study the usefulness of different similarity measures for clustering similar stock time series.3.1.Whole sequence matchingTo measure the similarity/dissimilarity between two time series,the most popular approach is to evaluate the Euclidean distance on the transformed representation like the DFT coeffi-cients(Agrawal et al.,1993a)and the DWT coefficients(Chan and Fu,1999).Although most of these approaches guarantee that a lower bound of the Euclidean distance to the original data, Euclidean distance is not always being the suitable distance function in specified domains(Keogh,1997a;Perng et al.,2000; Megalooikonomou et al.,2005).For example,stock time series has its own characteristics over other time series data(e.g.data from scientific areas like ECG),in which the salient points are important.Besides Euclidean-based distance measures,other distance measures can easily be found in the literature.A constraint-based similarity query is proposed by Goldin and Kanellakis(1995), which extended the work of(Agrawal et al.,1993a).Das et al. (1997)apply computational geometry methods for similarity measure.Bozkaya et al.(1997)use a modified edit distance function for time series matching and retrieval.Chu et al.(1998) propose to measure the distance based on the slopes of the segments for handling amplitude and time scaling problems.A projection algorithm is proposed by Lam and Wong(1998).A pattern recognition method is proposed by Morrill(1998),which is based on the building blocks of the primitives of the time series. Ruspini and Zwir(1999)devote an automated identification of significant qualitative features of complex objects.They propose the process of discovery and representation of interesting relations between those features,the generation of structured indexes and textual annotations describing features and their relations.The discovery of knowledge by an analysis of collections of qualitative descriptions is then achieved.They focus on methods for the succinct description of interesting features lying in an effective frontier.Generalized clustering is used for extracting features,which interest domain experts.The general-ized Markov models are adopted for waveform matching in Ge and Smyth(2000).A content-based query-by-example retrieval model called FALCON is proposed by Wu et al.(2000a),which incorporates a feedback mechanism.Indeed,one of the most popular andfield-tested similarity measures is called the‘‘time warping’’distance measure.Based on the dynamic time warping(DTW)technique,the proposed method in(Berndt and Clifford,1994)predefines some patterns to serve as templates for the purpose of pattern detection.To align two time series,P and Q,using DTW,an n-by-m matrix M isfirstTak-chung Fu/Engineering Applications of Artificial Intelligence24(2011)164–181167constructed.The(i th,j th)element of the matrix,m ij,contains the distance d(q i,p j)between the two points q i and p j and an Euclidean distance is typically used,i.e.d(q i,p j)¼(q iÀp j)2.It corresponds to the alignment between the points q i and p j.A warping path,W,is a contiguous set of matrix elements that defines a mapping between Q and P.Its k th element is defined as w k¼(i k,j k)andW¼w1,w2,...,w k,...,w Kð2Þwhere maxðm,nÞr K o mþnÀ1.The warping path is typically subjected to the following constraints.They are boundary conditions,continuity and mono-tonicity.Boundary conditions are w1¼(1,1)and w K¼(m,n).This requires the warping path to start andfinish diagonally.Next constraint is continuity.Given w k¼(a,b),then w kÀ1¼(a0,b0), where aÀa u r1and bÀb u r1.This restricts the allowable steps in the warping path being the adjacent cells,including the diagonally adjacent cell.Also,the constraints aÀa uZ0and bÀb uZ0force the points in W to be monotonically spaced in time.There is an exponential number of warping paths satisfying the above conditions.However,only the path that minimizes the warping cost is of interest.This path can be efficiently found by using dynamic programming(Berndt and Clifford,1996)to evaluate the following recurrence equation that defines the cumulative distance gði,jÞas the distance dði,jÞfound in the current cell and the minimum of the cumulative distances of the adjacent elements,i.e.gði,jÞ¼dðq i,p jÞþmin f gðiÀ1,jÀ1Þ,gðiÀ1,jÞ,gði,jÀ1Þgð3ÞA warping path,W,such that‘‘distance’’between them is minimized,can be calculated by a simple methodDTWðQ,PÞ¼minWX Kk¼1dðw kÞ"#ð4Þwhere dðw kÞcan be defined asdðw kÞ¼dðq ik ,p ikÞ¼ðq ikÀp ikÞ2ð5ÞDetailed treatment can be found in Kruskall and Liberman (1983).As DTW is computationally expensive,different methods are proposed to speedup the DTW matching process.Different constraint(banding)methods,which control the subset of matrix that the warping path is allowed to visit,are reviewed in Ratanamahatana and Keogh(2004).Yi et al.(1998)introduce a technique for an approximate indexing of DTW that utilizes a FastMap technique,whichfilters the non-qualifying series.Kim et al.(2001)propose an indexing approach under DTW similarity measure.Keogh and Pazzani(2000b)introduce a modification of DTW,which integrates with PAA and operates on a higher level abstraction of the time series.An exact indexing approach,which is based on representing the time series by PAA for DTW similarity measure is further proposed by Keogh(2002).An iterative deepening dynamic time warping(IDDTW)is suggested by Chu et al.(2002),which is based on a probabilistic model of the approximate errors for all levels of approximation prior to the query process.Chan et al.(2003)propose afiltering process based on the Haar wavelet transformation from low resolution approx-imation of the real-time warping distance.Shou et al.(2005)use an APCA approximation to compute the lower bounds for DTW distance.They improve the global bound proposed by Kim et al. (2001),which can be used to index the segments and propose a multi-step query processing technique.A FastDTW is proposed by Salvador and Chan(2004).This method uses a multi-level approach that recursively projects a solution from a coarse resolution and refines the projected solution.Similarly,a fast DTW search method,an FTW is proposed by Sakurai et al.(2005) for efficiently pruning a significant number of search candidates. Ratanamahatana and Keogh(2005)clarified some points about DTW where are related to lower bound and speed.Euachongprasit and Ratanamahatana(2008)also focus on this problem.A sequentially indexed structure(SIS)is proposed by Ruengron-ghirunya et al.(2009)to balance the tradeoff between indexing efficiency and I/O cost during DTW similarity measure.A lower bounding function for group of time series,LBG,is adopted.On the other hand,Keogh and Pazzani(2001)point out the potential problems of DTW that it can lead to unintuitive alignments,where a single point on one time series maps onto a large subsection of another time series.Also,DTW may fail to find obvious and natural alignments in two time series,because of a single feature(i.e.peak,valley,inflection point,plateau,etc.). One of the causes is due to the great difference between the lengths of the comparing series.Therefore,besides improving the performance of DTW,methods are also proposed to improve an accuracy of DTW.Keogh and Pazzani(2001)propose a modifica-tion of DTW that considers the higher level feature of shape for better alignment.Ratanamahatana and Keogh(2004)propose to learn arbitrary constraints on the warping path.Regression time warping(RTW)is proposed by Lei and Govindaraju(2004)to address the challenges of shifting,scaling,robustness and tecki et al.(2005)propose a method called the minimal variance matching(MVM)for elastic matching.It determines a subsequence of the time series that best matches a query series byfinding the cheapest path in a directed acyclic graph.A segment-wise time warping distance(STW)is proposed by Zhou and Wong(2005)for time scaling search.Fu et al.(2008a) propose a scaled and warped matching(SWM)approach for handling both DTW and uniform scaling simultaneously.Different customized DTW techniques are applied to thefield of music research for query by humming(Zhu and Shasha,2003;Arentz et al.,2005).Focusing on similar problems as DTW,the Longest Common Subsequence(LCSS)model(Vlachos et al.,2002)is proposed.The LCSS is a variation of the edit distance and the basic idea is to match two sequences by allowing them to stretch,without rearranging the sequence of the elements,but allowing some elements to be unmatched.One of the important advantages of an LCSS over DTW is the consideration on the outliers.Chen et al.(2005a)further introduce a distance function based on an edit distance on real sequence(EDR),which is robust against the data imperfection.Morse and Patel(2007)propose a Fast Time Series Evaluation(FTSE)method which can be used to evaluate the threshold value of these kinds of techniques in a faster way.Threshold-based distance functions are proposed by ABfalg et al. (2006).The proposed function considers intervals,during which the time series exceeds a certain threshold for comparing time series rather than using the exact time series values.A T-Time application is developed(ABfalg et al.,2008)to demonstrate the usage of it.Fu et al.(2007)further suggest to introduce rules to govern the pattern matching process,if a priori knowledge exists in the given domain.A parameter-light distance measure method based on Kolmo-gorov complexity theory is suggested in Keogh et al.(2007b). Compression-based dissimilarity measure(CDM)3is adopted in this paper.Chen et al.(2005b)present a histogram-based representation for similarity measure.Similarly,a histogram-based similarity measure,bag-of-patterns(BOP)is proposed by Lin and Li(2009).The frequency of occurrences of each pattern in 3CDM is proposed by Keogh et al.(2004),which is used to compare the co-compressibility between data sets.Tak-chung Fu/Engineering Applications of Artificial Intelligence24(2011)164–181 168。
多元⾼斯分布(TheMultivariatenormaldistribution)在数据建模时,经常会⽤到多元⾼斯分布模型,下⾯就这个模型的公式并结合它的⼏何意义,来做⼀个直观上的讲解。
1,标准⾼斯函数⾼斯函数标准型:f(x) = \frac{1}{\sqrt{2π}}e^{-\frac{x^2}{2}}这个函数描述了变量 x 的⼀种分布特性,变量x的分布有如下特点:Ⅰ,均值 = 0Ⅱ,⽅差为1Ⅲ,概率密度和为12,⼀元⾼斯函数⼀般形式⼀元⾼斯函数⼀般形式:f(x) = \frac{1}{\sqrt{2π}σ}e^{-\frac{(x-µ)^2}{2σ^{2}}}我们可以令:z = \frac{x - µ}{σ}称这个过程为标准化,不难理解,z ∼ N(0, 1),从z -> x的过程如下:Ⅰ,将 x 向右移动 µ 个单位Ⅱ,将密度函数伸展σ倍⽽标准化(x -> z)所做的事情就是上述步骤的逆向唯⼀不太好理解的是前⾯\frac{1}{\sqrt{2π}σ}中的σ,为什么这⾥多了⼀个σ,不是 2σ或其他?当然,这⾥可以拿着概率密度函数的性质,使⽤微积分进⾏积分,为了保证最终的积分等于1,这⾥必须是σ这⾥我想说⼀下⾃⼰的直观感受:实线代表的函数是标准⾼斯函数:f(x) = \frac{1}{\sqrt{2π}}e^{-\frac{x^2}{2×2^{2}}}虚线代表的是标准⾼斯函数在 x 轴⽅向2倍延展,效果如下:A(x = 1) -> D(x = 2)E(x = 1.5) -> F(x = 3)G(x = 2) -> H(x = 4)横向拓宽了,纵向还是保持不变,可以想象,最后的函数积分肯定不等于1采⽤极限的思想,将 x 轴切分成⽆穷个细⼩的⽚段,每个⽚段可以与函数围城⼀个区域,因为我的切分⾜够⼩,这个区域的⾯积可以近似采⽤公式:⾯积 =底 × ⾼求得:从 AQRS -> DTUV,底乘以2倍,⾼维持不变,所以,要保持变化前后⾯积不变,函数的⾼度应该变为原来的 1/2所以⾼斯函数在 x 轴⽅向做2倍延展的同时,纵向应该压缩为原来的⼀半,才能重新形成新的⾼斯分布函数扩展到⼀般情形,x 轴⽅向做σ倍延拓的同时, y 轴应该压缩σ倍(乘以 1/σ)3, 独⽴多元正态分布先假设n个变量x = \left[ \begin{matrix} x_{1}, x_{2},\cdots,x_{n}\end{matrix}\right]^\mathrm{T}互不相关,且服从正态分布(维度不相关多元正态分布),各个维度的均值E(x) = \left[ \begin{matrix} µ_{1}, µ_{2},\cdots,µ_{n}\end{matrix}\right]^\mathrm{T},⽅差σ(x) = \left[ \begin{matrix} σ_{1},σ_{2},\cdots,σ_{n}\end{matrix}\right]^\mathrm{T}根据联合概率密度公式:f(x) = p(x_{1},x_{2}....x_{n}) = p(x_{1})p(x_{2})....p(x_{n}) = \frac{1}{(\sqrt{2π})^nσ_{1}σ_{2}\cdotsσ_{n}}e^{-\frac{(x_{1}-µ_{1})^2}{2σ_{1}^2}-\frac{(x_{2}-µ_{2})^2}{2σ_{2}^2}\cdots-\frac{(x_{n}-µ_{n})^2}{2σ_{n}^2}}令z^{2} = \frac{(x_{1}-µ_{1})^2}{σ_{1}^2}+\frac{(x_{2}-µ_{2})^2}{σ_{2}^2}\cdots+\frac{(x_{n}-µ_{n})^2}{σ_{n}^2},σ_{z}= σ_{1}σ_{2}\cdotsσ_{n}这样多元正态分布⼜可以写成⼀元那种漂亮的形式了(注意⼀元与多元的差别):f(z) = \frac{1}{(\sqrt{2π})^nσ_{z}}e^{-\frac{z^2}{2}}因为多元正态分布有着很强的⼏何思想,单纯从代数的⾓度看待z很难看出z的概率分布规律,这⾥需要转换成矩阵形式:z^2 = z^\mathrm{T}z = \left[ \begin{matrix} x_{1} - µ_{1}, x_{2} - µ_{2}, \cdots,x_{n} - µ_{n}\end{matrix}\right] \left[ \begin{matrix} \frac{1}{σ_{1}^2}&0&\cdots&0\\ 0&\frac{1}{σ_{2}^2}&\cdots&0\\ \vdots&\cdots&\cdots&\vdots\\ 0&0&\cdots&\frac{1}{σ_{n}^2} \end{matrix}\right]\left[ \begin{matrix} x_{1} - µ_{1}, x_{2} - µ_{2}, \cdots,x_{n} - µ_{n}\end{matrix}\right]^\mathrm{T}等式⽐较长,让我们要做⼀下变量替换:x - µ_{x} = \left[ \begin{matrix} x_{1} - µ_{1}, x_{2} - µ_{2}, \cdots,x_{n} - µ_{n}\end{matrix}\right]^\mathrm{T}定义⼀个符号∑_{}^{} = \left[ \begin{matrix} σ_{1}^2&0&\cdots&0\\ 0&σ_{2}^2&\cdots&0\\ \vdots&\cdots&\cdots&\vdots\\ 0&0&\cdots&σ_{n}^2 \end{matrix}\right]∑_{}^{}代表变量 X 的协⽅差矩阵, i⾏j列的元素值表⽰x_{i}与x_{j}的协⽅差因为现在变量之间是相互独⽴的,所以只有对⾓线上 (i = j)存在元素,其他地⽅都等于0,且x_{i}与它本⾝的协⽅差就等于⽅差∑_{}^{}是⼀个对⾓阵,根据对⾓矩阵的性质,它的逆矩阵:( (∑_{}^{})^{-1} = \left[ \begin{matrix} \frac{1}{σ_{1}^2}&0&\cdots&0\\ 0&\frac{1}{σ_{2}^2}&\cdots&0\\ \vdots&\cdots&\cdots&\vdots\\ 0&0&\cdots&\frac{1}{σ_{n}^2} \end{matrix}\right]对⾓矩阵的⾏列式 = 对⾓元素的乘积σ_{z}= \left|∑_{}^{}\right|^\frac{1}{2} =σ_{1}σ_{2}.....σ_{n}替换变量之后,等式可以简化为:z^\mathrm{T}z = (x - µ_{x})^\mathrm{T} \sum_{}{}^{-1} (x - µ_{x})代⼊以z为⾃变量的标准⾼斯分布函数中:f(z) = \frac{1}{(\sqrt{2π})^nσ_{z}}e^{-\frac{z^2}{2}} = \frac{1}{(\sqrt{2π})^{n}\left|∑_{}^{}\right|^\frac{1}{2}}e^{-\frac{ (x\ -\ µ_{x})^\mathrm{T}\ (\sum_{}{})^{-1}\ (x\ -\ µ_{x})}{2}}注意前⾯的系数变化:从⾮标准正态分布->标准正态分布需要将概率密度函数的⾼度压缩|∑_{}^{}|^\frac{1}{2}倍,从⼀维 -> n维的过程中,每增加⼀维,⾼度将压缩\sqrt{2π}倍维度不相关正太分布函数图像类似这样(以⼆元分布函数为例):4, 相关多元正态分布前⾯也说了,我们讨论多元正态分布的前提是多元变量之间是相互独⽴的,实际上,有很多应⽤场合,变量与变量之间是有关联的。
多元高斯分布python代码多元高斯分布是一种概率分布,可以用于建模多维数据的分布情况。
在机器学习中,多元高斯分布用于建模连续型变量的概率分布。
本文将介绍多元高斯分布的实现方法及Python代码。
多元高斯分布的概率密度函数如下:$$f(x) = \frac{1}{(2\pi)^{d/2}|\Sigma|^{1/2}}exp(-\frac{1}{2}(x-\mu)^T\Sigma^{-1}(x-\mu))$$其中,$x$表示一个$d$维向量,$\mu$表示均值向量,$\Sigma$表示协方差矩阵。
可以看到,多元高斯分布的核心部分是指数函数,通过协方差矩阵的逆矩阵来控制指数函数的形状。
在Python中,可以使用scipy.stats库中的multivariate_normal函数来实现多元高斯分布。
代码如下:```pythonimport numpy as npfrom scipy.stats import multivariate_normal# 定义均值向量和协方差矩阵mu = np.array([0, 0])cov = np.array([[1, 0], [0, 1]])# 创建多元高斯分布对象rv = multivariate_normal(mean=mu, cov=cov)# 计算概率密度函数的值x = np.array([1, 1])print(rv.pdf(x)) # 输出0.05854983152431917```首先,通过numpy库中的array函数定义了均值向量和协方差矩阵。
然后,使用multivariate_normal函数创建了多元高斯分布对象。
最后,使用pdf方法计算了点$(1,1)$的概率密度函数的值。
需要注意的是,当协方差矩阵为奇异矩阵时,multivariate_normal函数会报错。
此时,可以通过增加一个很小的正数来让协方差矩阵变为非奇异矩阵。
另外,在高维情况下,计算协方差矩阵的逆矩阵会非常耗时,可以使用Cholesky分解等技术来加速计算。
多元高斯分布复数-概述说明以及解释1.引言1.1 概述多元高斯分布是一种重要的概率分布模型,它在统计学和机器学习领域得到广泛应用。
它是高斯分布(也称为正态分布)在多维空间中的推广,适用于描述具有多个随机变量的数据集。
多元高斯分布具有许多重要的特性,例如其形状是一个椭球形,在各个方向上的变化与协方差矩阵相关。
这使得多元高斯分布能够有效地捕捉数据的统计特征,并用于数据建模和推断。
与此同时,在本文中我们还将介绍复数的知识,并探讨多元高斯分布与复数的关系。
复数是由实部和虚部构成的数,具有重要的数学性质和广泛的应用。
我们将讨论复数的定义和运算规则,为后续多元高斯分布与复数的关系的探讨打下基础。
本文旨在通过对多元高斯分布和复数的介绍,深入探讨二者的关系及其在实际应用中的重要性。
我们将讨论多元高斯分布与复数的联系,并介绍它们在各种领域中的应用,包括信号处理、图像处理和通信系统等。
通过深入理解多元高斯分布和复数的知识,我们能够更好地应用它们于实际问题中,从而为数据建模和分析提供有力工具。
接下来的章节将逐步展开对多元高斯分布和复数的讨论,希望读者能够从中获得有益的知识和启发。
文章结构部分的内容可以写成以下方式:1.2 文章结构本文将按照以下结构展开讨论多元高斯分布与复数的相关内容:2. 正文2.1 多元高斯分布2.1.1 定义2.1.2 特点2.2 复数2.2.1 复数的定义2.2.2 复数的运算3. 结论3.1 多元高斯分布与复数的关系3.2 应用领域通过以上结构,本文将首先介绍多元高斯分布的定义和特点,然后深入探讨复数的定义和运算规则。
最后,我们将讨论多元高斯分布与复数之间的关系,并探究它们在实际应用领域中的应用。
这个结构将帮助读者逐步理解多元高斯分布与复数的概念,并了解它们之间的联系和应用。
1.3 目的本文旨在探讨多元高斯分布与复数之间的关系,并介绍其应用领域。
目的如下:1. 深入理解多元高斯分布和复数的定义与特性:通过对多元高斯分布和复数的定义和特性进行详细说明,希望读者能够对这两个概念有一个全面的认识。
多元函数微分法的英文英文回答:Multivariate Differential Calculus.Multivariate differential calculus is a branch of calculus that deals with functions of several variables. It has applications in a wide range of fields, including physics, engineering, economics, and finance.The basic concepts of multivariate differential calculus are similar to those of single-variable calculus. However, there are some important differences. For example, in multivariate calculus, the derivative of a function is a vector, rather than a scalar. This is because thederivative of a function of several variables gives the rate of change of the function in each direction.Another important difference between multivariate and single-variable calculus is the concept of the partialderivative. The partial derivative of a function with respect to a particular variable is the derivative of the function with respect to that variable, holding all other variables constant. Partial derivatives are used to findthe slope of a function at a particular point.中文回答:多元函数微分法。
multivariate statistical analysisMultivariate Statistical Analysis: An OverviewMultivariate statistical analysis is a method used to analyze data with more than one variable. It is a statistical technique that enables researchers to explore, understand,and extract meaningful insights from complex data sets. Multivariate statistical analysis can be applied to various domains, such as healthcare, finance, market research, and social sciences. In this article, we will provide an overview of multivariate statistical analysis, its types, and some ofits applications.Types of Multivariate Statistical AnalysisMultivariate analysis can be divided into two broad categories: descriptive and inferential. Descriptive multivariate analysis is used to summarize the data and to identify patterns or relationships among the variables. This analysis is mainly concerned with understanding the structure of the data. Inferential multivariate analysis, on the other hand, is used to make predictions, test hypotheses, and estimate parameters in a population based on a sample.Another way to classify multivariate statisticalanalysis is based on the nature of the variables. There are two types of variables in multivariate analysis: continuous and categorical. Continuous variables are measured on a scale, and their values can be any number within a range. For example, age, weight, and height. Categorical variables, meanwhile, do not have a numeric value, and they are usually nominal or ordinal. Examples of categorical variables includegender, marital status, and education level.Applications of Multivariate Statistical AnalysisMultivariate statistical analysis is a valuable tool in various fields. Here are some examples:1. Healthcare: Multivariate analysis is used to identify risk factors associated with disease and to analyze the effectiveness of treatments. For example, researchers can use multivariate analysis to investigate the relationship between smoking, age, gender, and lung cancer incidence.2. Market research: Multivariate analysis is used to segment customers, to identify their preferences, and to understand their behavior. For example, multivariate analysis is used to cluster customers based on their purchasing habits, demographic characteristics, and lifestyle.3. Finance: Multivariate analysis is used to analyze financial data and to make investment decisions. For example, analysts can use multivariate analysis to identify thefactors that affect stock prices or to evaluate the performance of a portfolio.4. Social sciences: Multivariate analysis is used to investigate the relationships among variables in social sciences. For example, researchers can use multivariate analysis to study the relationship between income, education level, and job satisfaction.In conclusion, multivariate statistical analysis is a powerful method for analyzing complex data sets. It is usedto identify relationships among variables, to understand the structure of the data, and to make predictions. Multivariate analysis is applicable to various domains and is a valuable tool for researchers, professionals, and decision-makers. Ifyou are dealing with complex data sets, multivariate analysis might be the right approach for you.。