Abstract Visualization of Scalar Topology for Structural Enhancement
- 格式:pdf
- 大小:124.95 KB
- 文档页数:8
Semi-supervised and unsupervised extreme learningmachinesGao Huang,Shiji Song,Jatinder N.D.Gupta,and Cheng WuAbstract—Extreme learning machines(ELMs)have proven to be an efficient and effective learning paradigm for pattern classification and regression.However,ELMs are primarily applied to supervised learning problems.Only a few existing research studies have used ELMs to explore unlabeled data. In this paper,we extend ELMs for both semi-supervised and unsupervised tasks based on the manifold regularization,thus greatly expanding the applicability of ELMs.The key advantages of the proposed algorithms are1)both the semi-supervised ELM (SS-ELM)and the unsupervised ELM(US-ELM)exhibit the learning capability and computational efficiency of ELMs;2) both algorithms naturally handle multi-class classification or multi-cluster clustering;and3)both algorithms are inductive and can handle unseen data at test time directly.Moreover,it is shown in this paper that all the supervised,semi-supervised and unsupervised ELMs can actually be put into a unified framework. This provides new perspectives for understanding the mechanism of random feature mapping,which is the key concept in ELM theory.Empirical study on a wide range of data sets demonstrates that the proposed algorithms are competitive with state-of-the-art semi-supervised or unsupervised learning algorithms in terms of accuracy and efficiency.Index Terms—Clustering,embedding,extreme learning ma-chine,manifold regularization,semi-supervised learning,unsu-pervised learning.I.I NTRODUCTIONS INGLE layer feedforward networks(SLFNs)have been intensively studied during the past several decades.Most of the existing learning algorithms for training SLFNs,such as the famous back-propagation algorithm[1]and the Levenberg-Marquardt algorithm[2],adopt gradient methods to optimize the weights in the network.Some existing works also use forward selection or backward elimination approaches to con-struct network dynamically during the training process[3]–[7].However,neither the gradient based methods nor the grow/prune methods guarantee a global optimal solution.Al-though various methods,such as the generic and evolutionary algorithms,have been proposed to handle the local minimum This work was supported by the National Natural Science Foundation of China under Grant61273233,the Research Fund for the Doctoral Program of Higher Education under Grant20120002110035and20130002130010, the National Key Technology R&D Program under Grant2012BAF01B03, the Project of China Ocean Association under Grant DY125-25-02,and Tsinghua University Initiative Scientific Research Program under Grants 2011THZ07132.Gao Huang,Shiji Song,and Cheng Wu are with the Department of Automation,Tsinghua University,Beijing100084,China(e-mail:huang-g09@;shijis@; wuc@).Jatinder N.D.Gupta is with the College of Business Administration,The University of Alabama in Huntsville,Huntsville,AL35899,USA.(e-mail: guptaj@).problem,they basically introduce high computational cost. One of the most successful algorithms for training SLFNs is the support vector machines(SVMs)[8],[9],which is a maximal margin classifier derived under the framework of structural risk minimization(SRM).The dual problem of SVMs is a quadratic programming and can be solved conveniently.Due to its simplicity and stable generalization performance,SVMs have been widely studied and applied to various domains[10]–[14].Recently,Huang et al.[15],[16]proposed the extreme learning machines(ELMs)for training SLFNs.In contrast to most of the existing approaches,ELMs only update the output weights between the hidden layer and the output layer, while the parameters,i.e.,the input weights and biases,of the hidden layer are randomly generated.By adopting squared loss on the prediction error,the training of output weights turns into a regularized least squares(or ridge regression)problem which can be solved efficiently in closed form.It has been shown that even without updating the parameters of the hidden layer,the SLFN with randomly generated hidden neurons and tunable output weights maintains its universal approximation capability[17]–[19].Compared to gradient based algorithms, ELMs are much more efficient and usually lead to better generalization performance[20]–[22].Compared to SVMs, solving the regularized least squares problem in ELMs is also faster than solving the quadratic programming problem in standard SVMs.Moreover,ELMs can be used for multi-class classification problems directly.The predicting accuracy achieved by ELMs is comparable with or even higher than that of SVMs[16],[22]–[24].The differences and similarities between ELMs and SVMs are discussed in[25]and[26], and new algorithms are proposed by combining the advan-tages of both models.In[25],an extreme SVM(ESVM) model is proposed by combining ELMs and the proximal SVM(PSVM).The ESVM algorithm is shown to be more accurate than the basic ELMs model due to the introduced regularization technique,and much more efficient than SVMs since there is no kernel matrix multiplication in ESVM.In [26],the traditional RBF kernel are replaced by ELM kernel, leading to an efficient algorithm with matched accuracy of SVMs.In the past years,researchers from variesfields have made substantial contribution to ELM theories and applications.For example,the universal approximation ability of ELMs has been further studied in a classification context[23].The gen-eralization error bound of ELMs has been investigated from the perspective of the Vapnik-Chervonenkis(VC)dimension theory and the initial localized generalization error model(LGEM)[27],[28].Varies extensions have been made to the basic ELMs to make it more efficient and more suitable for specific problems,such as ELMs for online sequential data [29]–[31],ELMs for noisy/missing data[32]–[34],ELMs for imbalanced data[35],etc.From the implementation aspect, ELMs has recently been implemented using parallel tech-niques[36],[37],and realized on hardware[38],which made ELMs feasible for large data sets and real time reasoning. Though ELMs have become popular in a wide range of domains,they are primarily used for supervised learning tasks such as classification and regression,which greatly limits their applicability.In some cases,such as text classification, information retrieval and fault diagnosis,obtaining labels for fully supervised learning is time consuming and expensive, while a multitude of unlabeled data are easy and cheap to collect.To overcome the disadvantage of supervised learning al-gorithms that they cannot make use of unlabeled data,semi-supervised learning(SSL)has been proposed to leverage both labeled and unlabeled data[39],[40].The SSL algorithms assume that the input patterns from both labeled and unlabeled data are drawn from the same marginal distribution.Therefore, the unlabeled data naturally provide useful information for exploring the data structure in the input space.By assuming that the input data follows some cluster structure or manifold in the input space,SSL algorithms can incorporate both la-beled and unlabeled data into the learning process.Since SSL requires less effort to collect labeled data and can offer higher accuracy,it has been applied to various domains[41]–[43].In some other cases where no labeled data are available,people may be interested in exploring the underlying structure of the data.To this end,unsupervised learning(USL)techniques, such as clustering,dimension reduction or data representation, are widely used to fulfill these tasks.In this paper,we extend ELMs to handle both semi-supervised and unsupervised learning problems by introducing the manifold regularization framework.Both the proposed semi-supervised ELM(SS-ELM)and unsupervised ELM(US-ELM)inherit the computational efficiency and the learn-ing capability of traditional pared with existing algorithms,SS-ELM and US-ELM are not only inductive (straightforward extension for out-of-sample examples at test time),but also can be used for multi-class classification or multi-cluster clustering directly.We test our algorithms on a variety of data sets,and make comparisons with other related algorithms.The results show that the proposed algorithms are competitive with state-of-the-art algorithms in terms of accuracy and efficiency.It is worth to mention that all the supervised,semi-supervised and unsupervised ELMs can actually be put into a unified framework,that is all the algorithms consist of two stages:1)random feature mapping;and2)output weights solving.Thefirst stage is to construct the hidden layer using randomly generated hidden neurons.This is the key concept in the ELM theory,which differs it from many existing feature learning methods.Generating feature mapping randomly en-ables ELMs for fast nonlinear feature learning and alleviates the problem of over-fitting.The second stage is to solve the weights between the hidden layer and the output layer, and this is where the main difference of supervised,semi-supervised and unsupervised ELMs lies.We believe that the unified framework for the three types of ELMs might provide us a new perspective to understand the underlying behavior of the random feature mapping in ELMs.The rest of the paper is organized as follows.In Section II,we give a brief review of related existing literature on semi-supervised and unsupervised learning.Section III and IV introduce the basic formulation of ELMs and the man-ifold regularization framework,respectively.We present the proposed SS-ELM and US-ELM algorithms in Sections V and VI.Experiment results are given in Section VII,and Section VIII concludes the paper.II.R ELATED WORKSOnly a few existing research studies on ELMs have dealt with the problem of semi-supervised learning or unsupervised learning.In[44]and[45],the manifold regularization frame-work was introduce into the ELMs model to leverage both labeled and unlabeled data,thus extended ELMs for semi-supervised learning.However,both of these two works are limited to binary classification problems,thus they haven’t explore the full power of ELMs.Moreover,both algorithms are only effective when the number of training patterns is more than the number of hidden neurons.Unfortunately,this condition is usually violated in semi-supervised learning since the training data is relatively scarce compared to the hidden neurons,whose number is commonly set to several hundreds or several thousands.Recently,a co-training approach have been proposed to train ELMs in a semi-supervised setting [46].In this algorithm,the labeled training sets are augmented gradually by moving a small set of most confidently predicted unlabeled data to the labeled set at each loop,and ELMs are trained repeatedly on the pseudo-labeled set.Since the algo-rithm need to train ELMs repeatedly,it introduces considerable extra computational cost.The proposed SS-ELM is related to a few other mani-fold assumption based semi-supervised learning algorithms, such as the Laplacian support vector machines(LapSVMs) [47],the Laplacian regularized least squares(LapRLS)[47], semi-supervised neural networks(SSNNs)[48],and semi-supervised deep embedding[49].It has been shown in these works that manifold regularization is effective in a wide range of domains and often leads to a state-of-the-art performance in terms of accuracy and efficiency.The US-ELM proposed in this paper are related to the Laplacian Eigenmaps(LE)[50]and spectral clustering(SC) [51]in that they both use spectral techniques for embedding and clustering.In all these algorithms,an affinity matrix is first built from the input patterns.The SC performs eigen-decomposition on the normalized affinity matrix,and then embeds the original data into a d-dimensional space using the first d eigenvectors(each row is normalized to have unit length and represents a point in the embedded space)corresponding to the d largest eigenvalues.The LE algorithm performs generalized eigen-decomposition on the graph Laplacian,anduses the d eigenvectors corresponding to the second through the(d+1)th smallest eigenvalues for embedding.When LE and SC are used for clustering,then k-means is adopted to cluster the data in the embedded space.Similar to LE and SC,the US-ELM are also based on the affinity matrix,and it is converted to solving a generalized eigen-decomposition problem.However,the eigenvectors obtained in US-ELM are not used for data representation directly,but are used as the parameters of the network,i.e.,the output weights.Note that once the US-ELM model is trained,it can be applied to any presented data in the original input space.In this way,US-ELM provide a straightforward way for handling new patterns without recomputing eigenvectors as in LE and SC.III.E XTREME LEARNING MACHINES Consider a supervised learning problem where we have a training set with N samples,{X,Y}={x i,y i}N i=1.Herex i∈R n i,y i is a n o-dimensional binary vector with only one entry(correspond to the class that x i belongs to)equal to one for multi-classification tasks,or y i∈R n o for regression tasks,where n i and n o are the dimensions of input and output respectively.ELMs aim to learn a decision rule or an approximation function based on the training data. Generally,the training of ELMs consists of two stages.The first stage is to construct the hidden layer using afixed number of randomly generated mapping neurons,which can be any nonlinear piecewise continuous functions,such as the Sigmoid function and Gaussian function given below.1)Sigmoid functiong(x;θ)=11+exp(−(a T x+b));(1)2)Gaussian functiong(x;θ)=exp(−b∥x−a∥);(2) whereθ={a,b}are the parameters of the mapping function and∥·∥denotes the Euclidean norm.A notable feature of ELMs is that the parameters of the hidden mapping functions can be randomly generated ac-cording to any continuous probability distribution,e.g.,the uniform distribution on(-1,1).This makes ELMs distinct from the traditional feedforward neural networks and SVMs. The only free parameters that need to be optimized in the training process are the output weights between the hidden neurons and the output nodes.By doing so,training ELMs is equivalent to solving a regularized least squares problem which is considerately more efficient than the training of SVMs or backpropagation algorithms.In thefirst stage,a number of hidden neurons which map the data from the input space into a n h-dimensional feature space (n h is the number of hidden neurons)are randomly generated. We denote by h(x i)∈R1×n h the output vector of the hidden layer with respect to x i,andβ∈R n h×n o the output weights that connect the hidden layer with the output layer.Then,the outputs of the network are given byf(x i)=h(x i)β,i=1,...,N.(3)In the second stage,ELMs aim to solve the output weights by minimizing the sum of the squared losses of the prediction errors,which leads to the following formulationminβ∈R n h×n o12∥β∥2+C2N∑i=1∥e i∥2s.t.h(x i)β=y T i−e T i,i=1,...,N,(4)where thefirst term in the objective function is a regularization term which controls the complexity of the model,e i∈R n o is the error vector with respect to the i th training pattern,and C is a penalty coefficient on the training errors.By substituting the constraints into the objective function, we obtain the following equivalent unconstrained optimization problem:minβ∈R n h×n oL ELM=12∥β∥2+C2∥Y−Hβ∥2(5)where H=[h(x1)T,...,h(x N)T]T∈R N×n h.The above problem is widely known as the ridge regression or regularized least squares.By setting the gradient of L ELM with respect toβto zero,we have∇L ELM=β+CH H T(Y−Hβ)=0(6) If H has more rows than columns and is of full column rank,which is usually the case where the number of training patterns are more than the number of the hidden neurons,the above equation is overdetermined,and we have the following closed form solution for(5):β∗=(H T H+I nhC)−1H T Y,(7)where I nhis an identity matrix of dimension n h.Note that in practice,rather than explicitly inverting the n h×n h matrix in the above expression,we can use Gaussian elimination to directly solve a set of linear equations in a more efficient and numerically stable manner.If the number of training patterns are less than the number of hidden neurons,then H will have more columns than rows, which often leads to an underdetermined least squares prob-lem.In this case,βmay have infinite number of solutions.To handle this problem,we restrictβto be a linear combination of the rows of H:β=H Tα(α∈R N×n o).Notice that when H has more columns than rows and is of full row rank,then H H T is invertible.Multiplying both side of(6) by(H H T)−1H,we getα+C(Y−H H Tα)=0,(8) This yieldsβ∗=H Tα∗=H T(H H T+I NC)−1Y(9)where I N is an identity matrix of dimension N. Therefore,in the case where training patterns are plentiful compared to the hidden neurons,we use(7)to compute the output weights,otherwise we use(9).IV.T HE MANIFOLD REGULARIZATION FRAMEWORK Semi-supervised learning is built on the following two assumptions:(1)both the label data X l and the unlabeled data X u are drawn from the same marginal distribution P X ;and (2)if two points x 1and x 2are close to each other,then the conditional probabilities P (y |x 1)and P (y |x 2)should be similar as well.The latter assumption is widely known as the smoothness assumption in machine learning.To enforce this assumption on the data,the manifold regularization framework proposes to minimize the following cost functionL m=12∑i,jw ij ∥P (y |x i )−P (y |x j )∥2,(10)where w ij is the pair-wise similarity between two patterns x iand x j .Note that the similarity matrix W =[w ij ]is usually sparse,since we only place a nonzero weight between two patterns x i and x j if they are close,e.g.,x i is among the k nearest neighbors of x j or x j is among the k nearest neighbors of x i .The nonzero weights are usually computed using Gaussian function exp (−∥x i −x j ∥2/2σ2),or simply fixed to 1.Intuitively,the formulation (10)penalizes large variation in the conditional probability P (y |x )when x has a small change.This requires that P (y |x )vary smoothly along the geodesics of P (x ).Since it is difficult to compute the conditional probability,we can approximate (10)with the following expression:ˆLm =12∑i,jw ij ∥ˆyi −ˆy j ∥2,(11)where ˆyi and ˆy j are the predictions with respect to pattern x i and x j ,respectively.It is straightforward to simplify the above expression in a matrix form:ˆL m =Tr (ˆY T L ˆY ),(12)where Tr (·)denotes the trace of a matrix,L =D −W isknown as the graph Laplacian ,and D is a diagonal matrixwith its diagonal elements D ii =l +u∑j =1w i,j .As discussed in [52],instead of using L directly,we can normalize it byD −12L D −12or replace it by L p (p is an integer),based on some prior knowledge.V.S EMI -SUPERVISED ELMIn the semi-supervised setting,we have few labeled data and plenty of unlabeled data.We denote the labeled data in the training set as {X l ,Y l }={x i ,y i }l i =1,and unlabeled dataas X u ={x i }ui =1,where l and u are the number of labeled and unlabeled data,respectively.The proposed SS-ELM incorporates the manifold regular-ization to leverage unlabeled data to improve the classification accuracy when labeled data are scarce.By modifying the ordinary ELM formulation (4),we give the formulation ofSS-ELM as:minβ∈R n h ×n o12∥β∥2+12l∑i =1C i ∥e i ∥2+λ2Tr (F T L F )s.t.h (x i )β=y T i −e T i ,i =1,...,l,f i =h (x i )β,i =1,...,l +u(13)where L ∈R (l +u )×(l +u )is the graph Laplacian built fromboth labeled and unlabeled data,and F ∈R (l +u )×n o is the output matrix of the network with its i th row equal to f (x i ),λis a tradeoff parameter.Note that similar to the weighted ELM algorithm (W-ELM)introduced in [35],here we associate different penalty coeffi-cient C i on the prediction errors with respect to patterns from different classes.This is because we found that when the data is skewed,i.e.,some classes have significantly more training patterns than other classes,traditional ELMs tend to fit the classes that having the majority of patterns quite well but fits other classes poorly.This usually leads to poor generalization performance on the testing set (while the prediction accuracy may be high,but the some classes are neglected).Therefore,we propose to alleviate this problem by re-weighting instances from different classes.Suppose that x i belongs to class t i ,which has N t i training patterns,then we associate e i with a penalty ofC i =C 0N t i.(14)where C 0is a user defined parameter as in traditional ELMs.In this way,the patterns from the dominant classes will not be over fitted by the algorithm,and the patterns from a class with less samples will not be neglected.We substitute the constraints into the objective function,and rewrite the above formulation in a matrix form:min β∈R n h×n o 12∥β∥2+12∥C 12( Y −Hβ)∥2+λ2Tr (βT H TL Hβ)(15)where Y∈R (l +u )×n o is the training target with its first l rows equal to Y l and the rest equal to 0,C is a (l +u )×(l +u )diagonal matrix with its first l diagonal elements [C ]ii =C i ,i =1,...,l and the rest equal to 0.Again,we compute the gradient of the objective function with respect to β:∇L SS −ELM =β+H T C ( Y−H β)+λH H T L H β.(16)By setting the gradient to zero,we obtain the solution tothe SS-ELM:β∗=(I n h +H T C H +λH H T L H )−1H TC Y .(17)As in Section III,if the number of labeled data is fewer thanthe number of hidden neurons,which is common in SSL,we have the following alternative solution:β∗=H T (I l +u +C H H T +λL L H H T )−1C Y .(18)where I l +u is an identity matrix of dimension l +u .Note that by settingλto be zero and the diagonal elements of C i(i=1,...,l)to be the same constant,(17)and (18)reduce to the solutions of traditional ELMs(7)and(9), respectively.Based on the above discussion,the SS-ELM algorithm is summarized as Algorithm1.Algorithm1The SS-ELM algorithmInput:The labeled patterns,{X l,Y l}={x i,y i}l i=1;The unlabeled patterns,X u={x i}u i=1;Output:The mapping function of SS-ELM:f:R n i→R n oStep1:Construct the graph Laplacian L from both X l and X u.Step2:Initiate an ELM network of n h hidden neurons with random input weights and biases,and calculate the output matrix of the hidden neurons H∈R(l+u)×n h.Step3:Choose the tradeoff parameter C0andλ.Step4:•If n h≤NCompute the output weightsβusing(17)•ElseCompute the output weightsβusing(18)return The mapping function f(x)=h(x)β.VI.U NSUPERVISED ELMIn this section,we introduce the US-ELM algorithm for unsupervised learning.In an unsupervised setting,the entire training data X={x i}N i=1are unlabeled(N is the number of training patterns)and our target is tofind the underlying structure of the original data.The formulation of US-ELM follows from the formulation of SS-ELM.When there is no labeled data,(15)is reduced tomin β∈R n h×n o ∥β∥2+λTr(βT H T L Hβ)(19)Notice that the above formulation always attains its mini-mum atβ=0.As suggested in[50],we have to introduce addtional constraints to avoid a degenerated solution.Specifi-cally,the formulation of US-ELM is given bymin β∈R n h×n o ∥β∥2+λTr(βT H T L Hβ)s.t.(Hβ)T Hβ=I no(20)Theorem1:An optimal solution to problem(20)is given by choosingβas the matrix whose columns are the eigenvectors (normalized to satisfy the constraint)corresponding to thefirst n o smallest eigenvalues of the generalized eigenvalue problem:(I nh +λH H T L H)v=γH H T H v.(21)Proof:We can rewrite the problem(20)asminβ∈R n h×n o,ββT Bβ=I no Tr(βT Aβ),(22)Algorithm2The US-ELM algorithmInput:The training data:X∈R N×n i;Output:•For embedding task:The embedding in a n o-dimensional space:E∈R N×n o;•For clustering task:The label vector of cluster index:y∈N N×1+.Step1:Construct the graph Laplacian L from X.Step2:Initiate an ELM network of n h hidden neurons withrandom input weights,and calculate the output matrix of thehidden neurons H∈R N×n h.Step3:•If n h≤NFind the generalized eigenvectors v2,v3,...,v no+1of(21)corresponding to the second through the n o+1smallest eigenvalues.Letβ=[ v2, v3,..., v no+1],where v i=v i/∥H v i∥,i=2,...,n o+1.•ElseFind the generalized eigenvectors u2,u3,...,u no+1of(24)corresponding to the second through the n o+1smallest eigenvalues.Letβ=H T[ u2, u3,..., u no+1],where u i=u i/∥H H T u i∥,i=2,...,n o+1.Step4:Calculate the embedding matrix:E=Hβ.Step5(For clustering only):Treat each row of E as a point,and cluster the N points into K clusters using the k-meansalgorithm.Let y be the label vector of cluster index for allthe points.return E(for embedding task)or y(for clustering task);where A=I nh+λH H T L H and B=H T H.It is easy to verify that both A and B are Hermitianmatrices.Thus,according to the Rayleigh-Ritz theorem[53],the above trace minimization problem attains its optimum ifand only if the column span ofβis the minimum span ofthe eigenspace corresponding to the smallest n o eigenvaluesof(21).Therefore,by stacking the normalized eigenvectors of(21)corresponding to the smallest n o generalized eigenvalues,we obtain an optimal solution to(20).In the algorithm of Laplacian eigenmaps,thefirst eigenvec-tor is discarded since it is always a constant vector proportionalto1(corresponding to the smallest eigenvalue0)[50].In theUS-ELM algorithm,thefirst eigenvector of(21)also leadsto small variations in embedding and is not useful for datarepresentation.Therefore,we suggest to discard this trivialsolution as well.Letγ1,γ2,...,γno+1(γ1≤γ2≤...≤γn o+1)be the(n o+1)smallest eigenvalues of(21)and v1,v2,...,v no+1be their corresponding eigenvectors.Then,the solution to theoutput weightsβis given byβ∗=[ v2, v3,..., v no+1],(23)where v i=v i/∥H v i∥,i=2,...,n o+1are the normalizedeigenvectors.If the number of labeled data is fewer than the numberTABLE ID ETAILS OF THE DATA SETS USED FOR SEMI-SUPERVISED LEARNINGData set Class Dimension|L||U||V||T|G50C2505031450136COIL20(B)2102440100040360USPST(B)225650140950498COIL2020102440100040360USPST1025650140950498of hidden neurons,problem(21)is underdetermined.In this case,we have the following alternative formulation by using the same trick as in previous sections:(I u+λL L H H T )u=γH H H T u.(24)Again,let u1,u2,...,u no +1be generalized eigenvectorscorresponding to the(n o+1)smallest eigenvalues of(24), then thefinal solution is given byβ∗=H T[ u2, u3,..., u no +1],(25)where u i=u i/∥H H T u i∥,i=2,...,n o+1are the normal-ized eigenvectors.If our task is clustering,then we can adopt the k-means algorithm to perform clustering in the embedded space.We summarize the proposed US-ELM in Algorithm2. Remark:Comparing the supervised ELM,the semi-supervised ELM and the unsupervised ELM,we can observe that all the algorithms have two similar stages in the training process,that is the random feature learning stage and the out-put weights learning stage.Under this two-stage framework,it is easy tofind the differences and similarities between the three algorithms.Actually,all the algorithms share the same stage of random feature learning,and this is the essence of the ELM theory.This also means that no matter the task is a supervised, semi-supervised or unsupervised learning problem,we can always follow the same step to generate the hidden layer. The differences of the three types of ELMs lie in the second stage on how the output weights are computed.In supervised ELM and SS-ELM,the output weights are trained by solving a regularized least squares problem;while the output weights in the US-ELM are obtained by solving a generalized eigenvalue problem.The unified framework for the three types of ELMs might provide new perspectives to further develop the ELM theory.VII.E XPERIMENTAL RESULTSWe evaluated our algorithms on wide range of semi-supervised and unsupervised parisons were made with related state-of-the-art algorithms, e.g.,Transductive SVM(TSVM)[54],LapSVM[47]and LapRLS[47]for semi-supervised learning;and Laplacian Eigenmap(LE)[50], spectral clustering(SC)[51]and deep autoencoder(DA)[55] for unsupervised learning.All algorithms were implemented using Matlab R2012a on a2.60GHz machine with4GB of memory.TABLE IIIT RAINING TIME(IN SECONDS)COMPARISON OF TSVM,L AP RLS,L AP SVM AND SS-ELMData set TSVM LapRLS LapSVM SS-ELMG50C0.3240.0410.0450.035COIL20(B)16.820.5120.4590.516USPST(B)68.440.9210.947 1.029COIL2018.43 5.841 4.9460.814USPST68.147.1217.259 1.373A.Semi-supervised learning results1)Data sets:We tested the SS-ELM onfive popular semi-supervised learning benchmarks,which have been widely usedfor evaluating semi-supervised algorithms[52],[56],[57].•The G50C is a binary classification data set of which each class is generated by a50-dimensional multivariate Gaus-sian distribution.This classification problem is explicitlydesigned so that the true Bayes error is5%.•The Columbia Object Image Library(COIL20)is a multi-class image classification data set which consists1440 gray-scale images of20objects.Each pattern is a32×32 gray scale image of one object taken from a specific view.The COIL20(B)data set is a binary classification taskobtained from COIL20by grouping thefirst10objectsas Class1,and the last10objects as Class2.•The USPST data set is a subset(the testing set)of the well known handwritten digit recognition data set USPS.The USPST(B)data set is a binary classification task obtained from USPST by grouping thefirst5digits as Class1and the last5digits as Class2.2)Experimental setup:We followed the experimental setup in[57]to evaluate the semi-supervised algorithms.Specifi-cally,each of the data sets is split into4folds,one of which was used for testing(denoted by T)and the rest3folds for training.Each of the folds was used as the testing set once(4-fold cross-validation).As in[57],this random fold generation process were repeated3times,resulted in12different splits in total.Every training set was further partitioned into a labeled set L,a validation set V,and an unlabeled set U.When we train a semi-supervised learning algorithm,the labeled data from L and the unlabeled data from U were used.The validation set which consists of labeled data was only used for model selection,i.e.,finding the optimal hyperparameters C0andλin the SS-ELM algorithm.The characteristics of the data sets used in our experiment are summarized in Table I. The training of SS-ELM consists of two stages:1)generat-ing the random hidden layer;and2)training the output weights using(17)or(18).In thefirst stage,we adopted the Sigmoid function for nonlinear mapping,and the input weights and biases were generated according to the uniform distribution on(-1,1).The number of hidden neurons n h wasfixed to 1000for G50C,and2000for the rest four data sets.In the second stage,wefirst need to build the graph Laplacian L.We followed the methods discussed in[52]and[57]to compute L,and the hyperparameter settings can be found in[47],[52] and[57].The trade off parameters C andλwere selected from。
A Tutorial on Uppaal4.0Updated November28,2006Gerd Behrmann,Alexandre David,and Kim rsenDepartment of Computer Science,Aalborg University,Denmark{behrmann,adavid,kgl}@cs.auc.dk.Abstract.This is a tutorial paper on the tool Uppaal.Its goal is to bea short introduction on theflavour of timed automata implemented inthe tool,to present its interface,and to explain how to use the tool.Thecontribution of the paper is to provide reference examples and modellingpatterns.1IntroductionUppaal is a toolbox for verification of real-time systems jointly developed by Uppsala University and Aalborg University.It has been applied successfully in case studies ranging from communication protocols to multimedia applications [35,55,24,23,34,43,54,44,30].The tool is designed to verify systems that can be modelled as networks of timed automata extended with integer variables,struc-tured data types,user defined functions,and channel synchronisation.Thefirst version of Uppaal was released in1995[52].Since then it has been in constant development[21,5,13,10,26,27].Experiments and improvements in-clude data structures[53],partial order reduction[20],a distributed version of Uppaal[17,9],guided and minimal cost reachability[15,51,16],work on UML Statecharts[29],acceleration techniques[38],and new data structures and memory reductions[18,14].Version4.0[12]brings symmetry reduction[36], the generalised sweep-line method[49],new abstraction techniques[11],priori-ties[28],and user defined functions to the mainstream.Uppaal has also gen-erated related Ph.D.theses[50,57,45,56,19,25,32,8,31].It features a Java user interface and a verification engine written in C++.It is freely available at /.This tutorial covers networks of timed automata and theflavour of timed automata used in Uppaal in section2.The tool itself is described in section3, and three extensive examples are covered in sections4,5,and6.Finally,section7 introduces common modelling patterns often used with Uppaal.2Timed Automata in UppaalThe model-checker Uppaal is based on the theory of timed automata[4](see[42] for automata theory)and its modelling language offers additional features such as bounded integer variables and urgency.The query language of Uppaal,usedto specify properties to be checked,is a subset of TCTL (timed computation tree logic)[39,3].In this section we present the modelling and the query languages of Uppaal and we give an intuitive explanation of time in timed automata.2.1The Modelling LanguageNetworks of Timed Automata A timed automaton is a finite-state machine extended with clock variables.It uses a dense-time model where a clock variable evaluates to a real number.All the clocks progress synchronously.In Uppaal ,a system is modelled as a network of several such timed automata in parallel.The model is further extended with bounded discrete variables that are part of the state.These variables are used as in programming languages:They are read,written,and are subject to common arithmetic operations.A state of the system is defined by the locations of all automata,the clock values,and the values of the discrete variables.Every automaton may fire an edge (sometimes misleadingly called a transition)separately or synchronise with another automaton 1,which leads to a new state.Figure 1(a)shows a timed automaton modelling a simple lamp.The lamp has three locations:off ,low ,and bright .If the user presses a button,i.e.,synchronises with press?,then the lamp is turned on.If the user presses the button again,the lamp is turned off.However,if the user is fast and rapidly presses the button twice,the lamp is turned on and becomes bright.The user model is shown in Fig.1(b).The user can press the button randomly at any time or even not press the button at all.The clock y of the lamp is used to detect if the user was fast (y <5)or slow (y >=5).press?‚‚‚‚‚press!(a)Lamp.(b)User.Fig.1.The simple lamp example.We give the basic definitions of the syntax and semantics for the basic timed automata.In the following we will skip the richer flavour of timed automata supported in Uppaal ,i.e.,with integer variables and the extensions of urgent and committed locations.For additional information,please refer to the helpmenu inside the tool.We use the following notations:C is a set of clocks and B (C )is the set of conjunctions over simple conditions of the form x ⊲⊳c or x −y ⊲⊳c ,where x,y ∈C ,c ∈N and ⊲⊳∈{<,≤,=,≥,>}.A timed automaton is a finite directed graph annotated with conditions over and resets of non-negative real valued clocks.Definition 1(Timed Automaton (TA)).A timed automaton is a tuple (L,l 0,C,A,E,I ),where L is a set of locations,l 0∈L is the initial location,C is the set of clocks,A is a set of actions,co-actions and the internal τ-action,E ⊆L ×A ×B (C )×2C ×L is a set of edges between locations with an action,a guard and a set of clocks to be reset,and I :L →B (C )assigns invariants to locations. In the previous example on Fig.1,y:=0is the reset of the clock y ,and the labels press?and press!denote action–co-action (channel synchronisations here).We now define the semantics of a timed automaton.A clock valuation is a function u :C →R ≥0from the set of clocks to the non-negative reals.Let R C be the set of all clock valuations.Let u 0(x )=0for all x ∈C .We will abuse the notation by considering guards and invariants as sets of clock valuations,writing u ∈I (l )to mean that u satisfies I (l ).0000000001111111110001110000000000000000000000000000000000000000000000001111111111111111111111111111111111111111111111110000000000000000000000000000000000000000000000000000000011111111111111111111111111111111111111111111111111111111000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000011111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111110000111100001111<B,x=1><A,x=2><A,x=3><A,x=3>action transition delay(+1) transition delay(+2) transition state: <A,x=1>actiontransitionOK invalid action transition invalid state: invariant x<3 violatedFig.2.Semantics of TA:different transitions from a given initial state.Definition 2(Semantics of TA).Let (L,l 0,C,A,E,I )be a timed automaton.The semantics is defined as a labelled transition system S,s 0,→ ,where S ⊆L ×R C is the set of states,s 0=(l 0,u 0)is the initial state,and →⊆S ×(R ≥0∪A )×S is the transition relation such that:–(l,u )d−→(l,u +d )if ∀d ′:0≤d ′≤d =⇒u +d ′∈I (l ),and –(l,u )a −→(l ′,u ′)if there exists e =(l,a,g,r,l ′)∈E s.t.u ∈g ,u ′=[r →0]u ,and u ′∈I (l ′),3where for d∈R≥0,u+d maps each clock x in C to the value u(x)+d,and [r→0]u denotes the clock valuation which maps each clock in r to0and agrees with u over C\r. Figure2illustrates the semantics of TA.From a given initial state,we can choose to take an action or a delay transition(different values here).Depending of the chosen delay,further actions may be forbidden.Timed automata are often composed into a network of timed automata over a common set of clocks and actions,consisting of n timed automata A i= (L i,l0i,C,A,E i,I i),1≤i≤n.A location vector is a vector¯l=(l1,...,l n). We compose the invariant functions into a common function over location vec-tors I(¯l)=∧i I i(l i).We write¯l[l′i/l i]to denote the vector where the i th element l i of¯l is replaced by l′i.In the following we define the semantics of a network of timed automata.Definition3(Semantics of a network of Timed Automata).Let A i= (L i,l0i,C,A,E i,I i)be a network of n timed automata.Let¯l0=(l01,...,l0n)be the initial location vector.The semantics is defined as a transition system S,s0,→ , where S=(L1×···×L n)×R C is the set of states,s0=(¯l0,u0)is the initial state,and→⊆S×S is the transition relation defined by:–(¯l,u)d−→(¯l,u+d)if∀d′:0≤d′≤d=⇒u+d′∈I(¯l).−−→l′i s.t.u∈g,–(¯l,u)a−→(¯l[l′i/l i],u′)if there exists l iτgru′=[r→0]u and u′∈I(¯l[l′i/l i]).–(¯l,u)a−→(¯l[l′j/l j,l′i/l i],u′)if there exist l i c?g i r i−−−−→l′i and−−−−→l′j s.t.u∈(g i∧g j),u′=[r i∪r j→0]u and u′∈I(¯l[l′j/l j,l′i/l i]).l j c!g j r jAs an example of the semantics,the lamp in Fig.1may have the follow-ing states(we skip the user):(Lamp.off,y=0)→(Lamp.off,y=3)→(Lamp.low,y=0)→(Lamp.low,y=0.5)→(Lamp.bright,y=0.5)→(Lamp.bright,y=1000)...Timed Automata in Uppaal The Uppaal modelling language extends timed automata with the following additional features(see Fig.3:Templates automata are defined with a set of parameters that can be of any type(e.g.,int,chan).These parameters are substituted for a given argument in the process declaration.Constants are declared as const name value.Constants by definition cannot be modified and must have an integer value.Bounded integer variables are declared as int[min,max]name,where min and max are the lower and upper bound,respectively.Guards,invariants,and assignments may contain expressions ranging over bounded integer variables.The bounds are checked upon verification and violating a bound leads to an invalid state that is discarded(at run-time).If the bounds are omitted,the default range of-32768to32768is used.4Fig.3.Declarations of a constant and a variable,and illustration of some of the channel synchronisations between two templates of the train gate example of Section4,and some committed locations.5Binary synchronisation channels are declared as chan c.An edge labelled with c!synchronises with another labelled c?.A synchronisation pair is chosen non-deterministically if several combinations are enabled. Broadcast channels are declared as broadcast chan c.In a broadcast syn-chronisation one sender c!can synchronise with an arbitrary number of receivers c?.Any receiver than can synchronise in the current state must do so.If there are no receivers,then the sender can still execute the c!action,i.e.broadcast sending is never blocking.Urgent synchronisation channels are declared by prefixing the channel decla-ration with the keyword urgent.Delays must not occur if a synchronisation transition on an urgent channel is enabled.Edges using urgent channels for synchronisation cannot have time constraints,i.e.,no clock guards. Urgent locations are semantically equivalent to adding an extra clock x,that is reset on all incoming edges,and having an invariant x<=0on the location.Hence,time is not allowed to pass when the system is in an urgent location. Committed locations are even more restrictive on the execution than urgent locations.A state is committed if any of the locations in the state is commit-ted.A committed state cannot delay and the next transition must involve an outgoing edge of at least one of the committed locations.Arrays are allowed for clocks,channels,constants and integer variables.They are defined by appending a size to the variable name,e.g.chan c[4];clock a[2];int[3,5]u[7];.Initialisers are used to initialise integer variables and arrays of integer vari-ables.For instance,int i=2;or int i[3]={1,2,3};.Record types are declared with the struct construct like in C.Custom types are defined with the C-like typedef construct.You can define any custom-type from other basic types such as records.User functions are defined either globally or locally to templates.Template parameters are accessible from local functions.The syntax is similar to C except that there is no pointer.C++syntax for references is supported for the arguments only.Expressions in Uppaal Expressions in Uppaal range over clocks and integer variables.The BNF is given in Fig.33in the appendix.Expressions are used with the following labels:Select A select label contains a comma separated list of name:type expressions where name is a variable name and type is a defined type(built-in or custom).These variables are accessible on the associated edge only and they will takea non-deterministic value in the range of their respective types.Guard A guard is a particular expression satisfying the following conditions: it is side-effect free;it evaluates to a boolean;only clocks,integer variables, and constants are referenced(or arrays of these types);clocks and clock differences are only compared to integer expressions;guards over clocks are essentially conjunctions(disjunctions are allowed over integer conditions).A guard may call a side-effect free function that returns a bool,although clock constraints are not supported in such functions.6Synchronisation A synchronisation label is either on the form Expression!or Expression?or is an empty label.The expression must be side-effect free, evaluate to a channel,and only refer to integers,constants and channels. Update An update label is a comma separated list of expressions with a side-effect;expressions must only refer to clocks,integer variables,and constants and only assign integer values to clocks.They may also call functions. Invariant An invariant is an expression that satisfies the following conditions:it is side-effect free;only clock,integer variables,and constants are referenced;it is a conjunction of conditions of the form x<e or x<=e where x is a clock reference and e evaluates to an integer.An invariant may call a side-effect free function that returns a bool,although clock constraints are not supported in such functions.2.2The Query LanguageThe main purpose of a model-checker is verify the model w.r.t.a requirement specification.Like the model,the requirement specification must be expressed in a formally well-defined and machine readable language.Several such logics exist in the scientific literature,and Uppaal uses a simplified version of TCTL. Like in TCTL,the query language consists of path formulae and state formulae.2 State formulae describe individual states,whereas path formulae quantify over paths or traces of the model.Path formulae can be classified into reachability, safety and liveness.Figure4illustrates the different path formulae supported by Uppaal.Each type is described below.State Formulae A state formula is an expression(see Fig.33)that can be evaluated for a state without looking at the behaviour of the model.For instance, this could be a simple expression,like i==7,that is true in a state whenever i equals7.The syntax of state formulae is a superset of that of guards,i.e.,a state formula is a side-effect free expression,but in contrast to guards,the use of disjunctions is not restricted.It is also possible to test whether a particular process is in a given location using an expression on the form P.l,where P is a process and l is a location.In Uppaal,deadlock is expressed using a special state formula(although this is not strictly a state formula).The formula simply consists of the keyword deadlock and is satisfied for all deadlock states.A state is a deadlock state if there are no outgoing action transitions neither from the state itself or any of its delay successors.Due to current limitations in Uppaal,the deadlock state formula can only be used with reachability and invariantly path formulae(see below).Reachability Properties Reachability properties are the simplest form of properties.They ask whether a given state formula,ϕ,possibly can be satisfied3Notice that A ϕ=¬E3¬ϕ8there should exist a maximal path such thatϕis always true.4In Uppaal we write A[]ϕand E[]ϕ,respectively.Liveness Properties Liveness properties are of the form:something will even-tually happen,e.g.when pressing the on button of the remote control of the television,then eventually the television should turn on.Or in a model of a communication protocol,any message that has been sent should eventually be received.In its simple form,liveness is expressed with the path formula A3ϕ,mean-ingϕis eventually satisfied.5The more useful form is the leads to or response property,writtenϕ ψwhich is read as wheneverϕis satisfied,then eventu-allyψwill be satisfied,e.g.whenever a message is sent,then eventually it will be received.6In Uppaal these properties are written as A<>ϕandϕ-->ψ, respectively.2.3Understanding TimeInvariants and Guards Uppaal uses a continuous time model.We illustrate the concept of time with a simple example that makes use of an observer.Nor-mally an observer is an add-on automaton in charge of detecting events without changing the observed system.In our case the clock reset(x:=0)is delegated to the observer for illustration purposes.Figure5shows thefirst model with its observer.We have two automata in parallel.Thefirst automaton has a self-loop guarded by x>=2,x being a clock,that synchronises on the channel reset with the second automaton.The second automaton,the observer,detects when the self loop edge is taken with the location taken and then has an edge going back to idle that resets the clock x.We moved the reset of x from the self loop to the observer only to test what happens on the transition before the reset.Notice that the location taken is committed(marked c)to avoid delay in that location.The following properties can be verified in Uppaal(see section3for an overview of the interface).Assuming we name the observer automaton Obs,we have:–A[]Obs.taken imply x>=2:all resets offx will happen when x is above2.This query means that for all reachable states,being in the locationObs.taken implies that x>=2.–E<>Obs.idle and x>3:this property requires,that it is possible to reach-able state where Obs is in the location idle and x is bigger than3.Essentially we check that we may delay at least3time units between resets.The result would have been the same for larger values like30000,since there are no invariants in this model.x>=2reset!‚‚‚‚‚246824"time"c l o c k x (a)Test.(b)Observer.(c)Behaviour:one possible run.Fig.5.First example with anobserver.x>=2reset!246824"time"c l o c k x(a)Test.(b)Updated behaviour with an invariant.Fig.6.Updated example with an invariant.The observer is the same as in Fig.5and is not shown here.We update the first model and add an invariant to the location loop ,as shown in Fig.6.The invariant is a progress condition:the system is not allowed to stay in the state more than 3time units,so that the transition has to be taken and the clock reset in our example.Now the clock x has 3as an upper bound.The following properties hold:–A[]Obs.taken imply (x>=2and x<=3)shows that the transition is takenwhen x is between 2and 3,i.e.,after a delay between 2and 3.–E<>Obs.idle and x>2:it is possible to take the transition when x is be-tween 2and 3.The upper bound 3is checked with the next property.–A[]Obs.idle imply x<=3:to show that the upper bound is respected.The former property E<>Obs.idle and x>3no longer holds.Now,if we remove the invariant and change the guard to x>=2and x<=3,you may think that it is the same as before,but it is not!The system has no progress condition,just a new condition on the guard.Figure 7shows what happens:the system may take the same transitions as before,but deadlock may also occur.The system may be stuck if it does not take the transition after 3time units.In fact,the system fails the property A[]not deadlock .The property A[]Obs.idle imply x<=3does not hold any longer and the deadlock can also be illustrated by the property A[]x>3imply not Obs.taken ,i.e.,after 3time units,the transition is not taken any more.10x>=2 && x<=3reset!246824"time"c l o c k x(a)Test.(b)Updated behaviour with a guard and no invariant.Fig.7.Updated example with a guard and no invariant.P0P1P2Fig.8.Automata in parallel with normal,urgent and commit states.The clocks are local,i.e.,P0.x and P1.x are two different clocks.Committed and Urgent Locations There are three different types of loca-tions in Uppaal :normal locations with or without invariants (e.g.,x<=3in the previous example),urgent locations,and committed locations.Figure 8shows 3automata to illustrate the difference.The location marked u is urgent and the one marked c is committed.The clocks are local to the automata,i.e.,x in P0is different from x in P1.To understand the difference between normal locations and urgent locations,we can observe that the following properties hold:–E<>P0.S1and P0.x>0:it is possible to wait in S1of P0.–A[]P1.S1imply P1.x==0:it is not possible to wait in S1of P1.An urgent location is equivalent to a location with incoming edges reseting a designated clock y and labelled with the invariant y<=0.Time may not progress in an urgent state,but interleavings with normal states are allowed.A committed location is more restrictive:in all the states where P2.S1is active (in our example),the only possible transition is the one that fires the edge outgoing from P2.S1.A state having a committed location active is said to11be committed:delay is not allowed and the committed location must be left in the successor state(or one of the committed locations if there are several ones). 3Overview of the Uppaal ToolkitUppaal uses a client-server architecture,splitting the tool into a graphical user interface and a model checking engine.The user interface,or client,is imple-mented in Java and the engine,or server,is compiled for different platforms (Linux,Windows,Solaris).7As the names suggest,these two components may be run on different machines as they communicate with each other via TCP/IP. There is also a stand-alone version of the engine that can be used on the com-mand line.3.1The Java ClientThe idea behind the tool is to model a system with timed automata using a graphical editor,simulate it to validate that it behaves as intended,andfinally to verify that it is correct with respect to a set of properties.The graphical interface(GUI)of the Java client reflects this idea and is divided into three main parts:the editor,the simulator,and the verifier,accessible via three“tabs”. The Editor A system is defined as a network of timed automata,called pro-cesses in the tool,put in parallel.A process is instantiated from a parameterised template.The editor is divided into two parts:a tree pane to access the different templates and declarations and a drawing canvas/text editor.Figure9shows the editor with the train gate example of section4.Locations are labelled with names and invariants and edges are labelled with guard conditions(e.g.,e==id), synchronisations(e.g.,go?),and assignments(e.g.,x:=0).The tree on the left hand side gives access to different parts of the system description:Global declaration Contains global integer variables,clocks,synchronisation channels,and constants.Templates Train,Gate,and IntQueue are different parameterised timed au-tomata.A template may have local declarations of variables,channels,and constants.Process assignments Templates are instantiated into processes.The process assignment section contains declarations for these instances.System definition The list of processes in the system.The syntax used in the labels and the declarations is described in the help system of the tool.The local and global declarations are shown in Fig.10.The graphical syntax is directly inspired from the description of timed automata in section2.12Fig.9.The train automaton of the train gate example.The select button is activated in the tool-bar.In this mode the user can move locations and edges or edit labels. The other modes are for adding locations,edges,and vertices on edges(called nails).A new location has no name by default.Two textfields allow the user to define the template name and its eful trick:The middle mouse button is a shortcut for adding new elements,i.e.pressing it on the canvas,a location,or edge adds a new location,edge,or nail,respectively.The Simulator The simulator can be used in three ways:the user can run the system manually and choose which transitions to take,the random mode can be toggled to let the system run on its own,or the user can go through a trace (saved or imported from the verifier)to see how certain states are reachable. Figure11shows the simulator.It is divided into four parts:The control part is used to choose andfire enabled transitions,go through a trace,and toggle the random simulation.The variable view shows the values of the integer variables and the clock con-straints.Uppaal does not show concrete states with actual values for the clocks.Since there are infinitely many of such states,Uppaal instead shows sets of concrete states known as symbolic states.All concrete states in a sym-bolic state share the same location vector and the same values for discretevariables.The possible values of the clocks is described by a set of con-Fig.10.The different local and global declarations of the train gate example.We superpose several screen-shots of the tool to show the declarations in a compact manner.straints.The clock validation in the symbolic state are exactly those that satisfy all constraints.The system view shows all instantiated automata and active locations of the current state.The message sequence chart shows the synchronisations between the differ-ent processes as well as the active locations at every step.The Verifier The verifier“tab”is shown in Fig.12.Properties are selectable in the Overview list.The user may model-check one or several properties,8insert or remove properties,and toggle the view to see the properties or the comments in the list.When a property is selected,it is possible to edit its definition(e.g., E<>Train1.Cross and Train2.Stop...)or comments to document what the property means informally.The Status panel at the bottom shows the commu-nication with the server.When trace generation is enabled and the model-checkerfinds a trace,the user is asked if she wants to import it into the simulator.Satisfied properties are marked green and violated ones red.In case either an over approximation or an under approximation has been selected in the options menu,then it may happen that the verification is inconclusive with the approximation used.In that casethe properties are marked yellow.Fig.11.View of the simulator tab for the train gate example.The interpretation of the constraint system in the variable panel depends on whether a transition in the transition panel is selected or not.If no transition is selected,then the constrain system shows all possible clock valuations that can be reached along the path.If a transition is selected,then only those clock valuations from which the transition can be taken are shown.Keyboard bindings for navigating the simulator without the mouse can be found in the integrated help system.3.2The Stand-alone VerifierWhen running large verification tasks,it is often cumbersome to execute these from inside the GUI.For such situations,the stand-alone command line verifier called verifyta is more appropriate.It also makes it easy to run the verification on a remote UNIX machine with memory to spare.It accepts command line arguments for all options available in the GUI,see Table3in the appendix.4Example1:The Train Gate4.1DescriptionThe train gate example is distributed with Uppaal.It is a railway control system which controls access to a bridge for several trains.The bridge is a critical shared resource that may be accessed only by one train at a time.The system is defined as a number of trains(assume4for this example)and a controller.A train can not be stopped instantly and restarting also takes time.Therefor,there are timing constraints on the trains before entering the bridge.When approaching,15。
A Discriminatively Trained,Multiscale,Deformable Part ModelPedro Felzenszwalb University of Chicago pff@David McAllesterToyota Technological Institute at Chicagomcallester@Deva RamananUC Irvinedramanan@AbstractThis paper describes a discriminatively trained,multi-scale,deformable part model for object detection.Our sys-tem achieves a two-fold improvement in average precision over the best performance in the2006PASCAL person de-tection challenge.It also outperforms the best results in the 2007challenge in ten out of twenty categories.The system relies heavily on deformable parts.While deformable part models have become quite popular,their value had not been demonstrated on difficult benchmarks such as the PASCAL challenge.Our system also relies heavily on new methods for discriminative training.We combine a margin-sensitive approach for data mining hard negative examples with a formalism we call latent SVM.A latent SVM,like a hid-den CRF,leads to a non-convex training problem.How-ever,a latent SVM is semi-convex and the training prob-lem becomes convex once latent information is specified for the positive examples.We believe that our training meth-ods will eventually make possible the effective use of more latent information such as hierarchical(grammar)models and models involving latent three dimensional pose.1.IntroductionWe consider the problem of detecting and localizing ob-jects of a generic category,such as people or cars,in static images.We have developed a new multiscale deformable part model for solving this problem.The models are trained using a discriminative procedure that only requires bound-ing box labels for the positive ing these mod-els we implemented a detection system that is both highly efficient and accurate,processing an image in about2sec-onds and achieving recognition rates that are significantly better than previous systems.Our system achieves a two-fold improvement in average precision over the winning system[5]in the2006PASCAL person detection challenge.The system also outperforms the best results in the2007challenge in ten out of twenty This material is based upon work supported by the National Science Foundation under Grant No.0534820and0535174.Figure1.Example detection obtained with the person model.The model is defined by a coarse template,several higher resolution part templates and a spatial model for the location of each part. object categories.Figure1shows an example detection ob-tained with our person model.The notion that objects can be modeled by parts in a de-formable configuration provides an elegant framework for representing object categories[1–3,6,10,12,13,15,16,22]. While these models are appealing from a conceptual point of view,it has been difficult to establish their value in prac-tice.On difficult datasets,deformable models are often out-performed by“conceptually weaker”models such as rigid templates[5]or bag-of-features[23].One of our main goals is to address this performance gap.Our models include both a coarse global template cov-ering an entire object and higher resolution part templates. The templates represent histogram of gradient features[5]. As in[14,19,21],we train models discriminatively.How-ever,our system is semi-supervised,trained with a max-margin framework,and does not rely on feature detection. We also describe a simple and effective strategy for learn-ing parts from weakly-labeled data.In contrast to computa-tionally demanding approaches such as[4],we can learn a model in3hours on a single CPU.Another contribution of our work is a new methodology for discriminative training.We generalize SVMs for han-dling latent variables such as part positions,and introduce a new method for data mining“hard negative”examples dur-ing training.We believe that handling partially labeled data is a significant issue in machine learning for computer vi-sion.For example,the PASCAL dataset only specifies abounding box for each positive example of an object.We treat the position of each object part as a latent variable.We also treat the exact location of the object as a latent vari-able,requiring only that our classifier select a window that has large overlap with the labeled bounding box.A latent SVM,like a hidden CRF[19],leads to a non-convex training problem.However,unlike a hidden CRF, a latent SVM is semi-convex and the training problem be-comes convex once latent information is specified for thepositive training examples.This leads to a general coordi-nate descent algorithm for latent SVMs.System Overview Our system uses a scanning window approach.A model for an object consists of a global“root”filter and several part models.Each part model specifies a spatial model and a partfilter.The spatial model defines a set of allowed placements for a part relative to a detection window,and a deformation cost for each placement.The score of a detection window is the score of the root filter on the window plus the sum over parts,of the maxi-mum over placements of that part,of the partfilter score on the resulting subwindow minus the deformation cost.This is similar to classical part-based models[10,13].Both root and partfilters are scored by computing the dot product be-tween a set of weights and histogram of gradient(HOG) features within a window.The rootfilter is equivalent to a Dalal-Triggs model[5].The features for the partfilters are computed at twice the spatial resolution of the rootfilter. Our model is defined at afixed scale,and we detect objects by searching over an image pyramid.In training we are given a set of images annotated with bounding boxes around each instance of an object.We re-duce the detection problem to a binary classification prob-lem.Each example x is scored by a function of the form, fβ(x)=max zβ·Φ(x,z).Hereβis a vector of model pa-rameters and z are latent values(e.g.the part placements). To learn a model we define a generalization of SVMs that we call latent variable SVM(LSVM).An important prop-erty of LSVMs is that the training problem becomes convex if wefix the latent values for positive examples.This can be used in a coordinate descent algorithm.In practice we iteratively apply classical SVM training to triples( x1,z1,y1 ,..., x n,z n,y n )where z i is selected to be the best scoring latent label for x i under the model learned in the previous iteration.An initial rootfilter is generated from the bounding boxes in the PASCAL dataset. The parts are initialized from this rootfilter.2.ModelThe underlying building blocks for our models are the Histogram of Oriented Gradient(HOG)features from[5]. We represent HOG features at two different scales.Coarse features are captured by a rigid template covering anentireImage pyramidFigure2.The HOG feature pyramid and an object hypothesis de-fined in terms of a placement of the rootfilter(near the top of the pyramid)and the partfilters(near the bottom of the pyramid). detection window.Finer scale features are captured by part templates that can be moved with respect to the detection window.The spatial model for the part locations is equiv-alent to a star graph or1-fan[3]where the coarse template serves as a reference position.2.1.HOG RepresentationWe follow the construction in[5]to define a dense repre-sentation of an image at a particular resolution.The image isfirst divided into8x8non-overlapping pixel regions,or cells.For each cell we accumulate a1D histogram of gra-dient orientations over pixels in that cell.These histograms capture local shape properties but are also somewhat invari-ant to small deformations.The gradient at each pixel is discretized into one of nine orientation bins,and each pixel“votes”for the orientation of its gradient,with a strength that depends on the gradient magnitude.For color images,we compute the gradient of each color channel and pick the channel with highest gradi-ent magnitude at each pixel.Finally,the histogram of each cell is normalized with respect to the gradient energy in a neighborhood around it.We look at the four2×2blocks of cells that contain a particular cell and normalize the his-togram of the given cell with respect to the total energy in each of these blocks.This leads to a vector of length9×4 representing the local gradient information inside a cell.We define a HOG feature pyramid by computing HOG features of each level of a standard image pyramid(see Fig-ure2).Features at the top of this pyramid capture coarse gradients histogrammed over fairly large areas of the input image while features at the bottom of the pyramid capture finer gradients histogrammed over small areas.2.2.FiltersFilters are rectangular templates specifying weights for subwindows of a HOG pyramid.A w by hfilter F is a vector with w×h×9×4weights.The score of afilter is defined by taking the dot product of the weight vector and the features in a w×h subwindow of a HOG pyramid.The system in[5]uses a singlefilter to define an object model.That system detects objects from a particular class by scoring every w×h subwindow of a HOG pyramid and thresholding the scores.Let H be a HOG pyramid and p=(x,y,l)be a cell in the l-th level of the pyramid.Letφ(H,p,w,h)denote the vector obtained by concatenating the HOG features in the w×h subwindow of H with top-left corner at p.The score of F on this detection window is F·φ(H,p,w,h).Below we useφ(H,p)to denoteφ(H,p,w,h)when the dimensions are clear from context.2.3.Deformable PartsHere we consider models defined by a coarse rootfilter that covers the entire object and higher resolution partfilters covering smaller parts of the object.Figure2illustrates a placement of such a model in a HOG pyramid.The rootfil-ter location defines the detection window(the pixels inside the cells covered by thefilter).The partfilters are placed several levels down in the pyramid,so the HOG cells at that level have half the size of cells in the rootfilter level.We have found that using higher resolution features for defining partfilters is essential for obtaining high recogni-tion performance.With this approach the partfilters repre-sentfiner resolution edges that are localized to greater ac-curacy when compared to the edges represented in the root filter.For example,consider building a model for a face. The rootfilter could capture coarse resolution edges such as the face boundary while the partfilters could capture details such as eyes,nose and mouth.The model for an object with n parts is formally defined by a rootfilter F0and a set of part models(P1,...,P n) where P i=(F i,v i,s i,a i,b i).Here F i is afilter for the i-th part,v i is a two-dimensional vector specifying the center for a box of possible positions for part i relative to the root po-sition,s i gives the size of this box,while a i and b i are two-dimensional vectors specifying coefficients of a quadratic function measuring a score for each possible placement of the i-th part.Figure1illustrates a person model.A placement of a model in a HOG pyramid is given by z=(p0,...,p n),where p i=(x i,y i,l i)is the location of the rootfilter when i=0and the location of the i-th part when i>0.We assume the level of each part is such that a HOG cell at that level has half the size of a HOG cell at the root level.The score of a placement is given by the scores of eachfilter(the data term)plus a score of the placement of each part relative to the root(the spatial term), ni=0F i·φ(H,p i)+ni=1a i·(˜x i,˜y i)+b i·(˜x2i,˜y2i),(1)where(˜x i,˜y i)=((x i,y i)−2(x,y)+v i)/s i gives the lo-cation of the i-th part relative to the root location.Both˜x i and˜y i should be between−1and1.There is a large(exponential)number of placements for a model in a HOG pyramid.We use dynamic programming and distance transforms techniques[9,10]to compute the best location for the parts of a model as a function of the root location.This takes O(nk)time,where n is the number of parts in the model and k is the number of cells in the HOG pyramid.To detect objects in an image we score root locations according to the best possible placement of the parts and threshold this score.The score of a placement z can be expressed in terms of the dot product,β·ψ(H,z),between a vector of model parametersβand a vectorψ(H,z),β=(F0,...,F n,a1,b1...,a n,b n).ψ(H,z)=(φ(H,p0),φ(H,p1),...φ(H,p n),˜x1,˜y1,˜x21,˜y21,...,˜x n,˜y n,˜x2n,˜y2n,). We use this representation for learning the model parame-ters as it makes a connection between our deformable mod-els and linear classifiers.On interesting aspect of the spatial models defined here is that we allow for the coefficients(a i,b i)to be negative. This is more general than the quadratic“spring”cost that has been used in previous work.3.LearningThe PASCAL training data consists of a large set of im-ages with bounding boxes around each instance of an ob-ject.We reduce the problem of learning a deformable part model with this data to a binary classification problem.Let D=( x1,y1 ,..., x n,y n )be a set of labeled exam-ples where y i∈{−1,1}and x i specifies a HOG pyramid, H(x i),together with a range,Z(x i),of valid placements for the root and partfilters.We construct a positive exam-ple from each bounding box in the training set.For these ex-amples we define Z(x i)so the rootfilter must be placed to overlap the bounding box by at least50%.Negative exam-ples come from images that do not contain the target object. Each placement of the rootfilter in such an image yields a negative training example.Note that for the positive examples we treat both the part locations and the exact location of the rootfilter as latent variables.We have found that allowing uncertainty in the root location during training significantly improves the per-formance of the system(see Section4).tent SVMsA latent SVM is defined as follows.We assume that each example x is scored by a function of the form,fβ(x)=maxz∈Z(x)β·Φ(x,z),(2)whereβis a vector of model parameters and z is a set of latent values.For our deformable models we define Φ(x,z)=ψ(H(x),z)so thatβ·Φ(x,z)is the score of placing the model according to z.In analogy to classical SVMs we would like to trainβfrom labeled examples D=( x1,y1 ,..., x n,y n )by optimizing the following objective function,β∗(D)=argminβλ||β||2+ni=1max(0,1−y i fβ(x i)).(3)By restricting the latent domains Z(x i)to a single choice, fβbecomes linear inβ,and we obtain linear SVMs as a special case of latent tent SVMs are instances of the general class of energy-based models[18].3.2.Semi-ConvexityNote that fβ(x)as defined in(2)is a maximum of func-tions each of which is linear inβ.Hence fβ(x)is convex inβ.This implies that the hinge loss max(0,1−y i fβ(x i)) is convex inβwhen y i=−1.That is,the loss function is convex inβfor negative examples.We call this property of the loss function semi-convexity.Consider an LSVM where the latent domains Z(x i)for the positive examples are restricted to a single choice.The loss due to each positive example is now bined with the semi-convexity property,(3)becomes convex inβ.If the labels for the positive examples are notfixed we can compute a local optimum of(3)using a coordinate de-scent algorithm:1.Holdingβfixed,optimize the latent values for the pos-itive examples z i=argmax z∈Z(xi )β·Φ(x,z).2.Holding{z i}fixed for positive examples,optimizeβby solving the convex problem defined above.It can be shown that both steps always improve or maintain the value of the objective function in(3).If both steps main-tain the value we have a strong local optimum of(3),in the sense that Step1searches over an exponentially large space of latent labels for positive examples while Step2simulta-neously searches over weight vectors and an exponentially large space of latent labels for negative examples.3.3.Data Mining Hard NegativesIn object detection the vast majority of training exam-ples are negative.This makes it infeasible to consider all negative examples at a time.Instead,it is common to con-struct training data consisting of the positive instances and “hard negative”instances,where the hard negatives are data mined from the very large set of possible negative examples.Here we describe a general method for data mining ex-amples for SVMs and latent SVMs.The method iteratively solves subproblems using only hard instances.The innova-tion of our approach is a theoretical guarantee that it leads to the exact solution of the training problem defined using the complete training set.Our results require the use of a margin-sensitive definition of hard examples.The results described here apply both to classical SVMs and to the problem defined by Step2of the coordinate de-scent algorithm for latent SVMs.We omit the proofs of the theorems due to lack of space.These results are related to working set methods[17].We define the hard instances of D relative toβas,M(β,D)={ x,y ∈D|yfβ(x)≤1}.(4)That is,M(β,D)are training examples that are incorrectly classified or near the margin of the classifier defined byβ. We can show thatβ∗(D)only depends on hard instances. Theorem1.Let C be a subset of the examples in D.If M(β∗(D),D)⊆C thenβ∗(C)=β∗(D).This implies that in principle we could train a model us-ing a small set of examples.However,this set is defined in terms of the optimal modelβ∗(D).Given afixedβwe can use M(β,D)to approximate M(β∗(D),D).This suggests an iterative algorithm where we repeatedly compute a model from the hard instances de-fined by the model from the last iteration.This is further justified by the followingfixed-point theorem.Theorem2.Ifβ∗(M(β,D))=βthenβ=β∗(D).Let C be an initial“cache”of examples.In practice we can take the positive examples together with random nega-tive examples.Consider the following iterative algorithm: 1.Letβ:=β∗(C).2.Shrink C by letting C:=M(β,C).3.Grow C by adding examples from M(β,D)up to amemory limit L.Theorem3.If|C|<L after each iteration of Step2,the algorithm will converge toβ=β∗(D)infinite time.3.4.Implementation detailsMany of the ideas discussed here are only approximately implemented in our current system.In practice,when train-ing a latent SVM we iteratively apply classical SVM train-ing to triples x1,z1,y1 ,..., x n,z n,y n where z i is se-lected to be the best scoring latent label for x i under themodel trained in the previous iteration.Each of these triples leads to an example Φ(x i,z i),y i for training a linear clas-sifier.This allows us to use a highly optimized SVM pack-age(SVMLight[17]).On a single CPU,the entire training process takes3to4hours per object class in the PASCAL datasets,including initialization of the parts.Root Filter Initialization:For each category,we auto-matically select the dimensions of the rootfilter by looking at statistics of the bounding boxes in the training data.1We train an initial rootfilter F0using an SVM with no latent variables.The positive examples are constructed from the unoccluded training examples(as labeled in the PASCAL data).These examples are anisotropically scaled to the size and aspect ratio of thefilter.We use random subwindows from negative images to generate negative examples.Root Filter Update:Given the initial rootfilter trained as above,for each bounding box in the training set wefind the best-scoring placement for thefilter that significantly overlaps with the bounding box.We do this using the orig-inal,un-scaled images.We retrain F0with the new positive set and the original random negative set,iterating twice.Part Initialization:We employ a simple heuristic to ini-tialize six parts from the rootfilter trained above.First,we select an area a such that6a equals80%of the area of the rootfilter.We greedily select the rectangular region of area a from the rootfilter that has the most positive energy.We zero out the weights in this region and repeat until six parts are selected.The partfilters are initialized from the rootfil-ter values in the subwindow selected for the part,butfilled in to handle the higher spatial resolution of the part.The initial deformation costs measure the squared norm of a dis-placement with a i=(0,0)and b i=−(1,1).Model Update:To update a model we construct new training data triples.For each positive bounding box in the training data,we apply the existing detector at all positions and scales with at least a50%overlap with the given bound-ing box.Among these we select the highest scoring place-ment as the positive example corresponding to this training bounding box(Figure3).Negative examples are selected byfinding high scoring detections in images not containing the target object.We add negative examples to a cache un-til we encounterfile size limits.A new model is trained by running SVMLight on the positive and negative examples, each labeled with part placements.We update the model10 times using the cache scheme described above.In each it-eration we keep the hard instances from the previous cache and add as many new hard instances as possible within the memory limit.Toward thefinal iterations,we are able to include all hard instances,M(β,D),in the cache.1We picked a simple heuristic by cross-validating over5object classes. We set the model aspect to be the most common(mode)aspect in the data. We set the model size to be the largest size not larger than80%of thedata.Figure3.The image on the left shows the optimization of the la-tent variables for a positive example.The dotted box is the bound-ing box label provided in the PASCAL training set.The large solid box shows the placement of the detection window while the smaller solid boxes show the placements of the parts.The image on the right shows a hard-negative example.4.ResultsWe evaluated our system using the PASCAL VOC2006 and2007comp3challenge datasets and protocol.We refer to[7,8]for details,but emphasize that both challenges are widely acknowledged as difficult testbeds for object detec-tion.Each dataset contains several thousand images of real-world scenes.The datasets specify ground-truth bounding boxes for several object classes,and a detection is consid-ered correct when it overlaps more than50%with a ground-truth bounding box.One scores a system by the average precision(AP)of its precision-recall curve across a testset.Recent work in pedestrian detection has tended to report detection rates versus false positives per window,measured with cropped positive examples and negative images with-out objects of interest.These scores are tied to the reso-lution of the scanning window search and ignore effects of non-maximum suppression,making it difficult to compare different systems.We believe the PASCAL scoring method gives a more reliable measure of performance.The2007challenge has20object categories.We entered a preliminary version of our system in the official competi-tion,and obtained the best score in6categories.Our current system obtains the highest score in10categories,and the second highest score in6categories.Table1summarizes the results.Our system performs well on rigid objects such as cars and sofas as well as highly deformable objects such as per-sons and horses.We also note that our system is successful when given a large or small amount of training data.There are roughly4700positive training examples in the person category but only250in the sofa category.Figure4shows some of the models we learned.Figure5shows some ex-ample detections.We evaluated different components of our system on the longer-established2006person dataset.The top AP scoreaero bike bird boat bottle bus car cat chair cow table dog horse mbike person plant sheep sofa train tvOur rank 31211224111422112141Our score .180.411.092.098.249.349.396.110.155.165.110.062.301.337.267.140.141.156.206.336Darmstadt .301INRIA Normal .092.246.012.002.068.197.265.018.097.039.017.016.225.153.121.093.002.102.157.242INRIA Plus.136.287.041.025.077.279.294.132.106.127.067.071.335.249.092.072.011.092.242.275IRISA .281.318.026.097.119.289.227.221.175.253MPI Center .060.110.028.031.000.164.172.208.002.044.049.141.198.170.091.004.091.034.237.051MPI ESSOL.152.157.098.016.001.186.120.240.007.061.098.162.034.208.117.002.046.147.110.054Oxford .262.409.393.432.375.334TKK .186.078.043.072.002.116.184.050.028.100.086.126.186.135.061.019.036.058.067.090Table 1.PASCAL VOC 2007results.Average precision scores of our system and other systems that entered the competition [7].Empty boxes indicate that a method was not tested in the corresponding class.The best score in each class is shown in bold.Our current system ranks first in 10out of 20classes.A preliminary version of our system ranked first in 6classes in the official competition.BottleCarBicycleSofaFigure 4.Some models learned from the PASCAL VOC 2007dataset.We show the total energy in each orientation of the HOG cells in the root and part filters,with the part filters placed at the center of the allowable displacements.We also show the spatial model for each part,where bright values represent “cheap”placements,and dark values represent “expensive”placements.in the PASCAL competition was .16,obtained using a rigid template model of HOG features [5].The best previous re-sult of.19adds a segmentation-based verification step [20].Figure 6summarizes the performance of several models we trained.Our root-only model is equivalent to the model from [5]and it scores slightly higher at .18.Performance jumps to .24when the model is trained with a LSVM that selects a latent position and scale for each positive example.This suggests LSVMs are useful even for rigid templates because they allow for self-adjustment of the detection win-dow in the training examples.Adding deformable parts in-creases performance to .34AP —a factor of two above the best previous score.Finally,we trained a model with partsbut no root filter and obtained .29AP.This illustrates the advantage of using a multiscale representation.We also investigated the effect of the spatial model and allowable deformations on the 2006person dataset.Recall that s i is the allowable displacement of a part,measured in HOG cells.We trained a rigid model with high-resolution parts by setting s i to 0.This model outperforms the root-only system by .27to .24.If we increase the amount of allowable displacements without using a deformation cost,we start to approach a bag-of-features.Performance peaks at s i =1,suggesting it is useful to constrain the part dis-placements.The optimal strategy allows for larger displace-ments while using an explicit deformation cost.The follow-Figure 5.Some results from the PASCAL 2007dataset.Each row shows detections using a model for a specific class (Person,Bottle,Car,Sofa,Bicycle,Horse).The first three columns show correct detections while the last column shows false positives.Our system is able to detect objects over a wide range of scales (such as the cars)and poses (such as the horses).The system can also detect partially occluded objects such as a person behind a bush.Note how the false detections are often quite reasonable,for example detecting a bus with the car model,a bicycle sign with the bicycle model,or a dog with the horse model.In general the part filters represent meaningful object parts that are well localized in each detection such as the head in the person model.Figure6.Evaluation of our system on the PASCAL VOC2006 person dataset.Root uses only a rootfilter and no latent place-ment of the detection windows on positive examples.Root+Latent uses a rootfilter with latent placement of the detection windows. Parts+Latent is a part-based system with latent detection windows but no rootfilter.Root+Parts+Latent includes both root and part filters,and latent placement of the detection windows.ing table shows AP as a function of freely allowable defor-mation in thefirst three columns.The last column gives the performance when using a quadratic deformation cost and an allowable displacement of2HOG cells.s i01232+quadratic costAP.27.33.31.31.345.DiscussionWe introduced a general framework for training SVMs with latent structure.We used it to build a recognition sys-tem based on multiscale,deformable models.Experimental results on difficult benchmark data suggests our system is the current state-of-the-art in object detection.LSVMs allow for exploration of additional latent struc-ture for recognition.One can consider deeper part hierar-chies(parts with parts),mixture models(frontal vs.side cars),and three-dimensional pose.We would like to train and detect multiple classes together using a shared vocab-ulary of parts(perhaps visual words).We also plan to use A*search[11]to efficiently search over latent parameters during detection.References[1]Y.Amit and A.Trouve.POP:Patchwork of parts models forobject recognition.IJCV,75(2):267–282,November2007.[2]M.Burl,M.Weber,and P.Perona.A probabilistic approachto object recognition using local photometry and global ge-ometry.In ECCV,pages II:628–641,1998.[3] D.Crandall,P.Felzenszwalb,and D.Huttenlocher.Spatialpriors for part-based recognition using statistical models.In CVPR,pages10–17,2005.[4] D.Crandall and D.Huttenlocher.Weakly supervised learn-ing of part-based spatial models for visual object recognition.In ECCV,pages I:16–29,2006.[5]N.Dalal and B.Triggs.Histograms of oriented gradients forhuman detection.In CVPR,pages I:886–893,2005.[6] B.Epshtein and S.Ullman.Semantic hierarchies for recog-nizing objects and parts.In CVPR,2007.[7]M.Everingham,L.Van Gool,C.K.I.Williams,J.Winn,and A.Zisserman.The PASCAL Visual Object Classes Challenge2007(VOC2007)Results./challenges/VOC/voc2007/workshop.[8]M.Everingham, A.Zisserman, C.K.I.Williams,andL.Van Gool.The PASCAL Visual Object Classes Challenge2006(VOC2006)Results./challenges/VOC/voc2006/results.pdf.[9]P.Felzenszwalb and D.Huttenlocher.Distance transformsof sampled functions.Cornell Computing and Information Science Technical Report TR2004-1963,September2004.[10]P.Felzenszwalb and D.Huttenlocher.Pictorial structures forobject recognition.IJCV,61(1),2005.[11]P.Felzenszwalb and D.McAllester.The generalized A*ar-chitecture.JAIR,29:153–190,2007.[12]R.Fergus,P.Perona,and A.Zisserman.Object class recog-nition by unsupervised scale-invariant learning.In CVPR, 2003.[13]M.Fischler and R.Elschlager.The representation andmatching of pictorial structures.IEEE Transactions on Com-puter,22(1):67–92,January1973.[14] A.Holub and P.Perona.A discriminative framework formodelling object classes.In CVPR,pages I:664–671,2005.[15]S.Ioffe and D.Forsyth.Probabilistic methods forfindingpeople.IJCV,43(1):45–68,June2001.[16]Y.Jin and S.Geman.Context and hierarchy in a probabilisticimage model.In CVPR,pages II:2145–2152,2006.[17]T.Joachims.Making large-scale svm learning practical.InB.Sch¨o lkopf,C.Burges,and A.Smola,editors,Advances inKernel Methods-Support Vector Learning.MIT Press,1999.[18]Y.LeCun,S.Chopra,R.Hadsell,R.Marc’Aurelio,andF.Huang.A tutorial on energy-based learning.InG.Bakir,T.Hofman,B.Sch¨o lkopf,A.Smola,and B.Taskar,editors, Predicting Structured Data.MIT Press,2006.[19] A.Quattoni,S.Wang,L.Morency,M.Collins,and T.Dar-rell.Hidden conditional randomfields.PAMI,29(10):1848–1852,October2007.[20] ing segmentation to verify object hypothe-ses.In CVPR,pages1–8,2007.[21] D.Ramanan and C.Sminchisescu.Training deformablemodels for localization.In CVPR,pages I:206–213,2006.[22]H.Schneiderman and T.Kanade.Object detection using thestatistics of parts.IJCV,56(3):151–177,February2004. [23]J.Zhang,M.Marszalek,zebnik,and C.Schmid.Localfeatures and kernels for classification of texture and object categories:A comprehensive study.IJCV,73(2):213–238, June2007.。
CCF推荐的国际学术会议和期刊目录修订版发布CCF(China Computer Federation中国计算机学会)于2010年8月发布了第一版推荐的国际学术会议和期刊目录,一年来,经过业内专家的反馈和修订,于日前推出了修订版,现将修订版予以发布。
本次修订对上一版内容进行了充实,一些会议和期刊的分类排行进行了调整,目录包括:计算机科学理论、计算机体系结构与高性能计算、计算机图形学与多媒体、计算机网络、交叉学科、人工智能与模式识别、软件工程/系统软件/程序设计语言、数据库/数据挖掘/内容检索、网络与信息安全、综合刊物等方向的国际学术会议及期刊目录,供国内高校和科研单位作为学术评价的参考依据。
目录中,刊物和会议分为A、B、C三档。
A类表示国际上极少数的顶级刊物和会议,鼓励我国学者去突破;B类是指国际上著名和非常重要的会议、刊物,代表该领域的较高水平,鼓励国内同行投稿;C类指国际上重要、为国际学术界所认可的会议和刊物。
这些分类目录每年将学术界的反馈和意见,进行修订,并逐步增加研究方向。
中国计算机学会推荐国际学术刊物(网络/信息安全)一、 A类序号刊物简称刊物全称出版社网址1. TIFS IEEE Transactions on Information Forensics andSecurity IEEE /organizations/society/sp/tifs.html2. TDSC IEEE Transactions on Dependable and Secure ComputingIEEE /tdsc/3. TISSEC ACM Transactions on Information and SystemSecurity ACM /二、 B类序号刊物简称刊物全称出版社网址1. Journal of Cryptology Springer /jofc/jofc.html2. Journal of Computer SecurityIOS Press /jcs/3. IEEE Security & Privacy IEEE/security/4. Computers &Security Elsevier http://www.elsevier.nl/inca/publications/store/4/0/5/8/7/7/5. JISecJournal of Internet Security NahumGoldmann. /JiSec/index.asp6. Designs, Codes andCryptography Springer /east/home/math/numbers?SGWID=5 -10048-70-35730330-07. IET Information Security IET /IET-IFS8. EURASIP Journal on InformationSecurity Hindawi /journals/is三、C类序号刊物简称刊物全称出版社网址1. CISDA Computational Intelligence for Security and DefenseApplications IEEE /2. CLSR Computer Law and SecurityReports Elsevier /science/journal/026736493. Information Management & Computer Security MCB UniversityPress /info/journals/imcs/imcs.jsp4. Information Security TechnicalReport Elsevier /locate/istr中国计算机学会推荐国际学术会议(网络/信息安全方向)一、A类序号会议简称会议全称出版社网址1. S&PIEEE Symposium on Security and Privacy IEEE /TC/SP-Index.html2. CCSACM Conference on Computer and Communications Security ACM /sigs/sigsac/ccs/3. CRYPTO International Cryptology Conference Springer-Verlag /conferences/二、B类序号会议简称会议全称出版社网址1. SecurityUSENIX Security Symposium USENIX /events/2. NDSSISOC Network and Distributed System Security Symposium Internet Society /isoc/conferences/ndss/3. EurocryptAnnual International Conference on the Theory and Applications of Cryptographic Techniques Springer /conferences/eurocrypt2009/4. IH Workshop on Information Hiding Springer-Verlag /~rja14/ihws.html5. ESORICSEuropean Symposium on Research in Computer Security Springer-Verlag as.fr/%7Eesorics/6. RAIDInternational Symposium on Recent Advances in Intrusion Detection Springer-Verlag /7. ACSACAnnual Computer Security Applications ConferenceIEEE /8. DSNThe International Conference on Dependable Systems and Networks IEEE/IFIP /9. CSFWIEEE Computer Security Foundations Workshop /CSFWweb/10. TCC Theory of Cryptography Conference Springer-Verlag /~tcc08/11. ASIACRYPT Annual International Conference on the Theory and Application of Cryptology and Information Security Springer-Verlag /conferences/ 12. PKC International Workshop on Practice and Theory in Public Key Cryptography Springer-Verlag /workshops/pkc2008/三、 C类序号会议简称会议全称出版社网址1. SecureCommInternational Conference on Security and Privacy in Communication Networks ACM /2. ASIACCSACM Symposium on Information, Computer and Communications Security ACM .tw/asiaccs/3. ACNSApplied Cryptography and Network Security Springer-Verlag /acns_home/4. NSPWNew Security Paradigms Workshop ACM /current/5. FC Financial Cryptography Springer-Verlag http://fc08.ifca.ai/6. SACACM Symposium on Applied Computing ACM /conferences/sac/ 7. ICICS International Conference on Information and Communications Security Springer /ICICS06/8. ISC Information Security Conference Springer /9. ICISCInternational Conference on Information Security and Cryptology Springer /10. FSE Fast Software Encryption Springer http://fse2008.epfl.ch/11. WiSe ACM Workshop on Wireless Security ACM /~adrian/wise2004/12. SASN ACM Workshop on Security of Ad-Hoc and Sensor Networks ACM /~szhu/SASN2006/13. WORM ACM Workshop on Rapid Malcode ACM /~farnam/worm2006.html14. DRM ACM Workshop on Digital Rights Management ACM /~drm2007/15. SEC IFIP International Information Security Conference Springer http://sec2008.dti.unimi.it/16. IWIAIEEE International Information Assurance Workshop IEEE /17. IAWIEEE SMC Information Assurance Workshop IEEE /workshop18. SACMATACM Symposium on Access Control Models and Technologies ACM /19. CHESWorkshop on Cryptographic Hardware and Embedded Systems Springer /20. CT-RSA RSA Conference, Cryptographers' Track Springer /21. DIMVA SIG SIDAR Conference on Detection of Intrusions and Malware and Vulnerability Assessment IEEE /dimva200622. SRUTI Steps to Reducing Unwanted Traffic on the Internet USENIX /events/23. HotSecUSENIX Workshop on Hot Topics in Security USENIX /events/ 24. HotBots USENIX Workshop on Hot Topics in Understanding Botnets USENIX /event/hotbots07/tech/25. ACM MM&SEC ACM Multimedia and Security Workshop ACM。
软件学报ISSN 1000-9825, CODEN RUXUEW E-mail:************.cn Journal of Software,2021,32(3):818−830 [doi: 10.13328/ki.jos.006185] ©中国科学院软件研究所版权所有. Tel: +86-10-62562563 LFKT:学习与遗忘融合的深度知识追踪模型∗李晓光1, 魏思齐1, 张昕1, 杜岳峰1, 于戈21(辽宁大学信息学院,辽宁沈阳 110036)2(东北大学计算机科学与工程学院,辽宁沈阳 110163)通讯作者: 张昕,E-mail:****************.cn摘要: 知识追踪任务旨在根据学生历史学习行为实时追踪学生知识水平变化,并且预测学生在未来学习表现.在学生学习过程中,学习行为与遗忘行为相互交织,学生的遗忘行为对知识追踪影响很大.为了准确建模知识追踪中学习与遗忘行为,提出一种兼顾学习与遗忘行为的深度知识追踪模型LFKT(learning and forgetting behavior modeling for knowledge tracing).LFKT模型综合考虑了4个影响知识遗忘因素,包括学生重复学习知识点的间隔时间、重复学习知识点的次数、顺序学习间隔时间以及学生对于知识点的掌握程度.结合遗忘因素,LFKT采用深度神经网络,利用学生答题结果作为知识追踪过程中知识掌握程度的间接反馈,建模融合学习与遗忘行为的知识追踪模型.通过在真实在线教育数据集上的实验,与当前知识追踪模型相比, LFKT可以更好地追踪学生知识掌握状态,并具有较好的预测性能.关键词: 智慧教育;知识追踪;深度神经网络;学习行为;遗忘行为中图法分类号: TP181中文引用格式: 李晓光,魏思齐,张昕,杜岳峰,于戈.LFKT:学习与遗忘融合的深度知识追踪模型.软件学报,2021,32(3): 818−830./1000-9825/6185.htm英文引用格式: Li XG, Wei SQ, Zhang X, Du YF, Yu G. LFKT: Deep knowledge tracing model with learning and forgetting behavior merging. Ruan Jian Xue Bao/Journal of Software, 2021,32(3):818−830 (in Chinese)./1000-9825/ 6185.htmLFKT: Deep Knowledge Tracing Model with Learning and Forgetting Behavior MergingLI Xiao-Guang1, WEI Si-Qi1, ZHANG Xin1, DU Yue-Feng1, YU Ge21(College of Information, Liaoning University, Shenyang 110036, China)2(School of Computer Science and Engineering, Northeastern University, Shenyang 110163, China)Abstract: The knowledge tracing task is designed to track changes of students’ knowledge in real time based on their historical learning behaviors and to predict their future performance in learning. In the learning process, learning behaviors are intertwined with forgetting behaviors, and students’ forgetting behaviors have a great impact on knowledge tracing. In order to accurately model the learning and forgetting behaviors in knowledge tracing, a deep knowledge tracing model LFKT (learning and forgetting behavior modeling for knowledge tracing) that combines learning and forgetting behaviors is proposed in this study. To model such two behaviors, the LFKT model takes into account four factors that affect knowledge forgetting, including the interval between students’ repeated learning of knowledge points, the number of repeated learning of knowledge points, the interval between sequential learning, and the understanding degree of knowledge points. The model uses a deep neural network to predict knowledge status with indirect feedbacks on students’ understanding of knowledge according to students’ answers. With the experiments on the real datasets of online education, LFKT shows better performance of knowledge tracing and prediction in comparison with the traditional approaches.∗基金项目: 国家自然科学基金(U1811261)Foundation item: National Natural Science Foundation of China (U1811261)本文由“支撑人工智能的数据管理与分析技术”专刊特约编辑陈雷教授、王宏志教授、童咏昕教授、高宏教授推荐.收稿时间: 2020-08-23; 修改时间: 2020-09-03; 采用时间: 2020-11-06; jos在线出版时间: 2021-01-21李晓光等:LFKT:学习与遗忘融合的深度知识追踪模型819Key words: intelligent education; knowledge tracing; deep neural network; learning behavior; forgetting behavior在线教育系统提倡因材施教,即根据学生的知识水平为其推荐合适的学习资源[1].学生的知识水平受其学习阶段和认知能力的影响,在学习过程中不断变化.实时追踪学生知识水平,对于个性化在线教育至关重要[2,3].知识追踪(knowledge tracing,简称KT)任务主要包括:(1) 通过学生的学习历史实时追踪其知识水平变化;(2) 根据学生的知识水平预测其在未来学习中的表现.传统的知识追踪技术主要有基于隐马尔可夫模型(HMM)[4]的贝叶斯知识追踪(BKT)[5].近年来,深度学习在自然语言处理[6]及模式识别[7]等任务上的表现优于传统模型.在知识追踪研究领域,亦提出了大量基于深度学习的知识追踪模型.基于循环神经网络的深度知识追踪(DKT)模型[8]采用循环神经网络的隐藏向量表示学生的知识状态,并且据此预测学生的表现.动态键-值对记忆网络(DKVMN)[9]借鉴了记忆网络(memory network)[10]的思想,用值矩阵表示学生对于各个知识点的掌握程度,并以此预测学生表现,提升预测准确度.但是以上知识追踪方法都忽略了学生在学习过程中的遗忘行为对知识水平的影响.在教育心理学领域已经有很多学者认识到人类的遗忘行为,并且探索影响人类遗忘的因素.教育心理学理论艾宾浩斯遗忘曲线理论[11,12]提出:学生会遗忘所学知识,遗忘带来的影响是知识掌握程度的下降.学生重复学习知识点的次数与距离上次学习知识点的时间间隔会影响学生的遗忘程度.教育心理学理论记忆痕迹衰退说[13]认为:遗忘是记忆痕迹的衰退引起的,消退随时间的推移自动发生,原始的痕迹越深,则遗忘的程度越低.目前,在知识追踪领域,只有少数的研究考虑了学生的遗忘行为.BKT的扩展[14,15]考虑了艾宾浩斯遗忘曲线提到的影响遗忘程度的两个因素:重复学习知识点的次数和间隔时间.DKT的扩展[16]考虑了学生顺序学习的间隔时间,该研究认为:学科中的各个知识点是相通的,持续的学习会降低学生对于知识点的遗忘程度.然而,以上的研究仅仅考虑了部分影响学生遗忘的信息,忽略了学生原本知识水平对于遗忘程度的影响,学生对于其掌握的知识点和没有掌握的知识点的遗忘程度不同.与此同时,由于BKT算法与DKT算法的局限性,以往涉及遗忘的研究或者仅考虑了学生对于单个知识点的遗忘程度,或者仅考虑了学生对于整个知识空间的遗忘程度,没有建模学生对知识空间中的各个知识点的遗忘情况.针对于此,本文提出一种兼顾学习和遗忘的深度知识追踪模型LFKT(learning and forgetting behavior modeling for knowledge tracing).LFKT拟合了学生的学习与遗忘行为,实时更新、输出知识点掌握程度,并以此预测其未来表现.本文主要创新和贡献有:(1)结合教育心理学,知识追踪模型LFKT考虑了4个影响知识遗忘的因素:学生重复学习知识点的间隔时间、重复学习知识点的次数、顺序学习间隔时间以及学生对于知识点的掌握程度.LFKT根据以上信息建模遗忘行为,拟合学生因遗忘行为导致的知识掌握程度变化;(2)基于深度神经网络技术,LFKT设计了一个基于RNN和记忆神经网络的知识追踪神经网络.该网络包括注意力层、遗忘层、学习层、预测层和知识水平输出层.其中:遗忘层采用全连接网络计算记忆擦除向量和记忆更新向量,用以拟合遗忘行为;学习层采用LSTM网络,并利用学生在学习结束时的答题结果作为知识掌握程度的间接反馈.根据记忆擦除向量和记忆更新向量,获得经过遗忘后的知识掌握程度的中间嵌入,并以此作为学习层的输入,进而获得遗忘与学习相融合的知识掌握程度嵌入;(3)通过在两个真实在线教育数据集上的实验验证结果表明:LFKT可以有效地建模学生的学习行为与遗忘行为,实时追踪学生的知识水平,并且LFKT模型的预测性能优于现有模型.本文第1节介绍知识追踪相关工作.第2节给出相关概念和符号定义,并提出问题.第3节详细介绍LFKT 模型结构以及训练方法.第4节给出LFKT对比实验结果和分析.最后总结全文.1 相关工作知识追踪大致可以分为基于隐马尔可夫模型的知识追踪与基于深度学习的知识追踪.贝叶斯知识追踪是典型的基于隐马尔可夫模型实现知识追踪目标的模型之一.BKT将学生对某个知识点的潜在知识状态建模为一组二进制变量,每一个变量代表学生对于特定知识点“掌握”或“没掌握”,根据隐马尔可夫模型更新学生的知识状态,进而更新其在未来学习中的表现.在BKT的基础上,通过考虑其他因素,提出了820 Journal of Software 软件学报 V ol.32, No.3, March 2021 很多扩展方案,例如:Pardos 等人的研究[17]引入了习题难度对预测学生表现的影响;Baker 等人的研究[18]引入了学生猜测和失误对预测学生表现的影响;Khajah 等人的研究[19]将人类的认知因素扩展到BKT 中,从而提高预测精度.基于隐马尔可夫的知识追踪模型将学生对各个知识点的掌握程度分别建模,忽略各个知识点之间的关系.BKT 模型简单地将学生对于各个知识点的掌握程度分为“掌握”与“未掌握”,忽略了中间情况.深度知识追踪是深度学习模型循环神经网络在知识追踪任务的首次尝试,将学生的学习历史作为输入,使用RNN 隐藏状态向量来表示学生的知识水平,并且基于此预测学生未来学习表现.DKT 模型无法建模学生对于各个知识点的掌握程度,仅仅可以建模学生的整个知识水平.Chen 等人的研究[20]考虑了知识点之间的先验关系,提升了DKT 的预测性能.Su 等人的研究[21]将习题文本信息和学生的学习历史作为循环神经网络的输入,利用RNN 的隐向量建模学生的知识水平,取得了较好的预测效果,但是依旧无法建模学生对于各个知识点的掌握程度.DKVMN 模型借鉴了记忆网络的思想,用值矩阵建模学生对于各个知识点的掌握程度,考察习题与各个知识点之间的关系,追踪学生对于各个知识点的掌握程度.Abdelrahman 等人的研究[22]利用注意力机制,着重考察学生作答相似习题时的答题历史,改进了DKVMN.Sun 等人的研究[23]将学生答题时的行为特征扩展到DKVMN,从而取得更好的预测效果.Liu 等人的研究[24]利用习题的文本信息与学生的学习历史进行知识追踪.Nakagawa 等人的研究[25]用知识图建模各个知识点之间的关系,取得了良好的预测效果.Minn 等人的研究[26]根据学生的知识水平对学生进行分类,再对每一类学生进行知识追踪.Cheng 等人的研究[27]通过深度学习扩展了传统方法项目反映理论.Shen 等人的研究[28]考虑了每个学生的个性化差异,取得了很好的效果.Ai 等人的研究[29]考虑了知识点之间的包含关系,重新设计了记忆矩阵,改进了DKVMN,取得了良好的效果.关于学生遗忘行为对知识掌握状态的影响,Qiu 等人[15]考虑了学生距离上一次重复学习知识点的间隔时间,将新一天的标记加入到BKT 中,建模先前学习之后一天的遗忘行为,但是无法对较短时间的遗忘行为进行建模.Khajah 等人的研究[14]应用学生重复学习知识点的次数估计遗忘的概率,改进了BKT,提升了预测准度.Nagatani 等人的研究[16]考虑了重复学习知识点的次数、距离上一次学习知识点的间隔时间和距离上一次学习的间隔时间,改进了DKT 模型,但忽略了学生原本对于知识点掌握程度对学生遗忘程度影响.总的来说,当前的研究可以一定程度上追踪学生对于各个知识点的掌握程度,但或者忽略了学生遗忘行为,默认学生在学习间隔前后知识水平一致;或者对于遗忘行为的建模不够全面.本文所提出的LFKT 模型综合考察了影响学生遗忘程度的因素,将学生的练习结果作为学生知识掌握程度的间接反馈,建模学生的学习与遗忘行为,实时追踪学生对于各个知识点的掌握程度,预测学生未来学习表现.2 问题定义这里,我们定义S 为学生集合,K 为知识点集合,E 为习题集合.每个学生单独进行学习,互不影响.学生s 的答题历史H s =[(e 1,r 1),(e 2,r 2),...,(e t ,r t )],其中:e t 为学生t 时刻所做的习题;r t 为答题结果,r t =1表示答题正确,r t =0表示答题错误.k t ⊆K 为习题e t 所涉及的知识点集合.矩阵M K (d k ×|K |)为整个知识空间中|K |个知识点嵌入表示,其中,d k 维列向量为一个知识点的嵌入表示.矩阵1(||)V t v d K −×M 为学生在t −1时刻的知识空间中知识点嵌入.用|K |维向 量value t −1表示t −1时刻学习结束时学生对于各个知识点的掌握程度,其中,向量每一维的值介于(0,1)之间:值为0,表示学生对于该知识点没有掌握;值为1,表示学生对于该知识点已经完全掌握.学生在学习过程中会遗忘未复习的知识点,同时,学生在两次学习的间隔时间也会遗忘所学知识,因此,t −1时刻学习结束时的知识水平与t 时刻学习开始时的知识水平不尽相同.本文用矩阵(||)FV t v d K ×M 建模学生在t 时刻开始学习时的知识掌握状态.矩阵FV t M 与1V t −M 形状相同,FV t M 是由1V t −M 经过遗忘处理后得到.t 时刻学习结束,系统获得学生答题结果, LFKT 模型根据答题结果将开始学习时知识掌握嵌入矩阵FV t M 更新为学习结束时知识掌握嵌入矩阵V t M ,以 此建模学习行为.预测学生在下一个候选习题e t +1上的表现时,由于t 时刻学习结束到t +1时刻开始答题之间存 在时间间隔,学生在时间间隔内的遗忘行为会影响知识掌握状态,因此需要根据影响知识遗忘的因素将V t M 更新为1FV t +M ,从而预测学生答题表现.李晓光 等:LFKT:学习与遗忘融合的深度知识追踪模型821基于以上描述,本文将问题定义为:给定每个学生的学习历史记录,实现以下两个目标. •跟踪学生的知识状态变化; • 预测学生在下一个候选习题e t +1上的表现.3 方法描述3.1 影响知识遗忘的因素学生如果没有及时复习所学知识,就会产生遗忘,其对知识的掌握程度便会衰减.教育学理论艾宾浩斯曲线理论[11,12]表明,学生对于所学知识点的保留率受以下两方面影响:学生重复学习次数与时间间隔.时间间隔可以分为重复学习知识点的时间间隔和顺序学习的时间间隔.除此之外,教育心理学理论记忆痕迹衰退说[13]认为,学生对于知识点的掌握状态也影响着学生的遗忘程度.因此,本文考虑了以下4个影响知识遗忘的因素.•RT(repeated time interval):距离上次学习相同知识点的时间间隔; •ST(sequence time interval):距离上次学习的时间间隔; •LT(repeated learn times):重复学习知识点的次数; • KL(past knowledge level):学生原本对于该知识点的掌握程度.RT,ST 和LT 这3个标量组合在一起得到向量C t (i )=[RT t (i ),ST t (i ),LT t (i )],表示影响学生对于知识点i 遗忘程 度的前3个因素.每个知识点所对应的向量C t (i )组成矩阵C t (d c ×|K |).本文用向量1()V t i −M 度量学生原本对于知识点i 的掌握程度,并且将其作为影响学生对于知识点遗忘程度的第4个因素(KL).将矩阵C t 与1V t −M 组合在一起,得到矩阵1[,]V t t t −=F M C ,表示4个影响知识遗忘的因素.3.2 LFKT 模型本文提出了一种融合学习与遗忘的深度知识追踪模型LFKT,如图1所示.Fig.1 LFKT model图1 LFKT 模型LFKT 模型分为注意力层(attention layer)、遗忘层(forget layer)、预测层(prediction layer)、学习层(learn layer)822 Journal of Software 软件学报 V ol.32, No.3, March 2021 以及知识水平输出层(knowledge level output layer):注意力层以习题e t 以及习题所涵盖的知识点集合k t 作为输入,计算习题与各个知识点的知识相关权重;遗忘层根据第3.1节中提出的影响知识遗忘的4个因素将学生上一 时刻学习结束时知识掌握嵌入矩阵1V t −M 更新为当前时刻学习开始时知识掌握嵌入矩阵,FV t M 有效地对学生遗忘行为进行建模;预测层根据t +1时刻开始答题时的1FV t +M 预测学生的答题表现;学习层通过LSTM 网络将本次学习开始时的FV t M 更新为本次学习结束时的V t M ,对学生学习行为进行建模;知识水平输出层以学生本次学习结束时的V t M 作为输入,输出学生知识水平向量value t ,实时输出学生对于各个知识点的掌握程度.3.2.1 注意力层注意力层的作用是计算习题与知识点之间的知识相关权重.注意力层的输入是学生当前练习题目e t 以及题目所涉及的知识点集合k t .为了将e t 映射到连续的向量空间,用e t 乘以习题嵌入矩阵A (d k ×|E |),生成一个d k 维度的习题嵌入向量v t .其中,矩阵A 中每一个d k 维列向量为一道习题的嵌入表示.专家标注了每道习题所涵盖的知识点,存放于集合k t 中.本文通过知识点过滤器(K -Filter)滤掉不相关知识点,保留习题涵盖知识点.矩阵R t 存储习题涵盖知识点的嵌入向量,其中,矩阵R t 中每一个d k 维列向量为习题涵盖的一个知识点的嵌入向量.计算习题嵌入向量v t 与涵盖知识点嵌入向量R t (i )之间的内积,再计算内积的Softmax 值并存入向量RS t 中.向量RS t 表示习题v t 与习题涵盖知识点之间的知识相关权重,如公式(1)所示:()(())T t t t i Softmax i =RS v R (1) |K |维向量w t 为习题与全部知识点之间的知识相关权重向量.由于习题涵盖知识点被知识点过滤器滤出,模型需要将习题涵盖知识点的知识相关权重放入到w t 对应的位置上.首先,初始化|K |维全零向量w t ,即w t ←[0,…,0];之后,将RS t 每一维的权重放入到w t 的相应位置上,即w t [k t [i ]]←RS t [i ],得到习题与各个知识点的知识相关权重.3.2.2 遗忘层遗忘层根据第3.1节中介绍的影响知识遗忘的因素F t 对上一次学习结束时学生的知识掌握嵌入矩阵1V t −M 进行遗忘处理,得到学生本次学习开始时的知识掌握嵌入矩阵.FV t M 受LSTM 中遗忘门与输入门的启发,根据影响知识遗忘的因素F t 更新1V t −M 时,首先要擦除1V t −M 中原有的信息,再写入信息.对于遗忘行为建模来说,擦除 过程控制学生知识点掌握程度的衰退,写入过程控制学生知识点掌握程度的更新.遗忘层如图2所示.Fig.2 Forget layer图2 遗忘层利用一个带有Sigmoid 激活函数的全连接层将影响学生对知识点i 遗忘程度的因素F t (i )转换为知识点i 对应的记忆擦除向量fe t (i ):fe t (i )=Sigmoid (FE T F t (i )+b fe ) (2)全连接层的权重矩阵FE 的形状是(d v +d c )×d v ,全连接层的偏置向量b fe 为d v 维.记忆擦除向量fe t (i )是一个维李晓光 等:LFKT:学习与遗忘融合的深度知识追踪模型 823 度为d v 的列向量,向量中的所有值都是在(0,1)范围之内.利用一个带有Tanh 激活函数的全连接层将影响学生对知识点i 遗忘程度的因素F t (i )转换为知识点i 对应的记忆更新向量fa t (i ):fa t (i )=Tanh(FA T F t (i )+b fa ) (3)全连接层的权重矩阵FA 的形状是(d v +d c )×d v ,全连接层的偏置向量b fa 为d v 维,记忆更新向量fa t (i )是一个维 度为d v 的列向量.根据记忆擦除向量与记忆更新向量对1V t −M 更新得到FV t M :1()()(1())(1())FV V t t t t i i i i −=−+M M fe fa (4)遗忘层输出本次学习开始时知识掌握嵌入矩阵FV t M .遗忘层可以为分别建模学生对不同知识点的遗忘过 程,根据学生对于不同知识点学习历史的不同,计算学生对于不同知识点的遗忘程度.3.2.3 学习层学习层根据学生答题结果追踪学习过程中的知识掌握变化,将学生开始学习时知识掌握嵌入矩阵FV t M 更新为学生学习结束时知识掌握嵌入矩阵V t M ,建模学习行为.元组(e t ,r t )表示学生在时间t 的答题结果,为了将元 组(e t ,r t )映射到连续的向量空间,元组(e t ,r t )与大小为d v ×2|E |的答题结果嵌入矩阵B 相乘,得到d v 维答题结果嵌入向量s t .学习层将答题结果嵌入向量s t 和习题对应的知识相关权重向量w t 作为输入,利用学生在在线教育系统中的练习结果作为学生学习效果的间接反馈,通过LSTM 网络更新学生学习过程中的知识掌握状态,建模学习行为:()(,()();)V FV t t t t i LSTM i i =M s w M θ (5)3.2.4 预测层预测层的目的是预测学生在下一个候选习题e t +1上的表现.由于学生在两次答题间隔内的遗忘行为会影响 其知识掌握状态,预测层根据当前时刻开始答题时的知识掌握嵌入矩阵1FV t +M 预测学生正确回答习题e t +1的概率.将知识相关权重w t +1与当前时刻开始答题时学生知识掌握嵌入1FV t +M 加权求和,得到的向量d t +1为学生对于 习题涵盖知识点的加权掌握程度嵌入向量:1111()()K FV t t t i i i +++==∑d w M (6) 学生成功解答习题,不仅与学生对于习题涵盖知识点的综合掌握程度有关,还与习题本身有关,所以将向量d t +1和v t +1连接得到的组合向量[d t +1,v t +1]输入到带有Tanh 激活函数的全连接层,输出向量h t +1包含了学生对习题涵盖知识点的综合掌握程度和习题本身的特点.其中,矩阵W 1和向量b 1分别表示全连接层的权重与偏置: 11111([,])T t t t Tanh +++=+h W d v b (7)最后,将向量h t +1输入到一个带有Sigmoid 激活函数的全连接层中,得到表示学生答对习题的概率p t +1.其中,矩阵W 2和向量b 2分别表示全连接层的权重与偏置:1212()T t t p Sigmoid ++=+W h b (8)3.2.5 知识水平输出层 知识水平输出层以学生学习结束时知识掌握嵌入矩阵V t M 为输入,输出表示学生本次学习结束时知识掌 握程度的|K |维向量value t ,向量的每一维介于(0,1)之间,表示学生对于该知识点的掌握程度.知识水平输出层如图3所示.在预测层中,每一个时间节点t ,公式(7)、公式(8)根据两种输入预测学生在特定习题e t 上的表现:学生对于该习题涵盖知识点的综合掌握嵌入向量d t 和习题嵌入向量v t .因此,如果只是想估计在没有任何习题输入情况 下学生对于第i 个知识点的掌握情况,可以省略习题嵌入v t ,同时,让学生知识掌握嵌入矩阵V t M 的第i 列()V t i M 作为等式的输入.图3展示了知识水平输出层的详细过程.具体来说:学习层输出矩阵V t M 后,为了估计对于第i 个知识点的掌握程度,构造了权重向量βi =(0,…,1,…,0),其中,第i 维的值等于1,并用公式(9)提取第i 个知识点的 知识掌握嵌入向量()V t i M ,之后,使用公式(10)、公式(11)估计知识掌握水平:824Journal of Software 软件学报 V ol.32, No.3, March 2021()V V t i t i =M M β (9)11()([(),])T V t t i Tanh i =+0y W M b (10)22()(())T t t i Sigmoid i =+value W y b (11) 向量0=(0,0,…,0)与习题嵌入v t 维度相同,用于补齐向量维度.参数W 1,W 2,b 1,b 2与公式(7)和公式(8)中的完全相同.依次计算知识空间中每一个知识的掌握程度,得到学生知识掌握程度向量value t.Fig.3 Knowledge level output layer图3 知识水平输出层3.2.6 模型优化目标模型需要训练的参数主要是习题嵌入矩阵A 、答题结果嵌入矩阵B 、神经网络权重与偏置以及知识点矩阵M K .本文通过最小化模型对于学生答题结果的预测值p t 和学生答题的真实结果r t 之间的交叉熵损失函数来优化各个参数.损失函数如公式(12)所示,本文使用Adam 方法优化参数:(log (1)log(1))t t t t t L r p r p =−+−−∑ (12) 4 实 验本文在两个真实在线教育数据集上进行实验,通过对比LFKT 模型与其他知识追踪模型的预测性能以及知识追踪表现,证明LFKT 模型的有效性.4.1 数据集首先介绍实验所使用的两个真实在线教育数据集.• ASSISTments2012:该数据集来自于ASSISTments 在线教育平台.本文删除了学习记录数过少(<3)的学生信息,经过预处理后,数据集包含45 675个学生、266个知识点以及总计5 818 868条学习记录.数据集中,user_id 字段表示学生编号,skill_id 字段表示知识点编号,problem_id 字段表示题目编号, correct 字段表示学生真实答题结果,start_time 字段表示学生本次开始学习的时间,end_time 字段表示学生本次学习结束的时间;• slepemapy.cz:该数据集来自于地理在线教育系统.本文同样删除学习记录数过少(<3)的学生信息,经过预处理后,数据集包含87 952个学生、1 458个知识点以及总计10 087 305条学习记录.数据集中, user 字段表示学生编号,place_answered 字段表示学生真实答题结果,response_time 字段表示学生学习时间,place_asked 字段表示问题编号.其中,由于该数据集中每个问题仅仅考察一个知识点,因此place_asked 也是知识点编号.4.2 评测指标本文使用平均AUC(area under the curve)、平均ACC(accurary)和平均RMSE(root mean squared error)作为评估预测性能的指标.AUC 被定义为ROC 曲线与下坐标轴围成的面积,50%的AUC 值表示随机猜测获得的预。
论文中文摘要毕业设计说明书(论文)外文摘要1 绪论图像的细化是数字图像预处理中的重要的一个核心环节。
图像细化在图像分析和图像识别中如汉子识别、笔迹鉴别。
指纹分析中有着广泛的应用。
图像的细化主要有以下几个基本要求:(1)骨架图像必须保持原图像的曲线连通性。
(2)细化结果尽量是原图像的中心线。
(3)骨架保持原来图形的拓扑结构。
(4)保留曲线的端点。
(5)细化处理速度快。
(6)交叉部分中心线不畸变。
虽然现在人们对细化有着广泛的研究,细化算法的方法有很多。
然而大多数算法如Hilditch算法和Rosenfield算法存在着细化后图形畸变,断点过多,在复杂的情况下关键信息的缺失等问题。
基于上诉考虑,传统的细化算法已经无法满足如今的数字图像处理要求。
因此,需要提出新的一种算法,来解决相关问题。
1.1 相关定义1.1.1串行与并行从处理的过程来看,主要可以分为串行和并行两类,前者对图像中当前象素的处理依据其邻域内象素的即时化结果,即对某一元素进行检测,如果该点是可删除点,则立即删除,且不同的细化阶段采用不同的处理方法;后者对当前的象素处理依据该象素及其邻域内各象素的前一轮迭代处理的结果,至始至终采用相同的细化准则。
即全部检测完汉子图像边缘上的点后再删除可删除点。
1.1.2骨架对图像细化的过程实际上是求一图像骨架的过程。
骨架是二维二值目标的重要拓扑描述,它是指图像中央的骨架部分,是描述图像几何及拓扑性质的重要特征之一。
骨架形状描述的方法是Blum最先提出来的,他使用的是中轴的概念。
如果用一个形象的比喻来说明骨架的含义,那就是设想在t=0的时刻,讲目标的边界各处同时点燃,火焰以匀速向目标内部蔓延,当火焰的前沿相交时火焰熄灭,那么火焰熄灭点的集合就构成了中轴,也就是图像的骨架。
例如一个长方形的骨架是它的长方向上的中轴线,正方形的骨架是它的中心点,圆的骨架是它的圆心,直线的骨架是它自身,孤立点的骨架也是自身。
细化的目的就是在将图像的骨架提取出来的同时保持图像细小部分的连通性,特别是在文字识别,地质识别,工业零件识别或图像理解中,先对被处理的图像进行细化有助于突出形状特点和减少冗余信息量。
降维低维空间映射条件Dimensionality reduction is a crucial technique in data analysis and machine learning, where high-dimensional data is transformed into a lower-dimensional space for easier visualization and analysis. 降维是数据分析和机器学习中的一项重要技术,它将高维数据转换为低维空间,以便更容易进行可视化和分析。
One common method of dimensionality reduction is through mapping conditions, where algorithms map the original high-dimensional data into a lower-dimensional space while preserving as much of the relevant information as possible. 降维的一个常见方法是通过映射条件,即算法将原始高维数据映射到低维空间,同时尽可能保留尽可能多的相关信息。
The purpose of dimensionality reduction is to remove redundant or irrelevant features, reduce noise, and improve computational efficiency. 降维的目的是去掉冗余或无关的特征,减少噪音,并提高计算效率。
One challenge in dimensionality reduction is to find an appropriate mapping that preserves the meaningful structure of the data while reducing its dimensionality. 在降维中的一个挑战是找到一个适当的映射,既能保留数据的有意义结构,又能减少其维数。
收稿日期:2019-11-03 修回日期:2020-03-06基金项目:国家自然科学基金(61671352)作者简介:朱周华(1976-),女,副教授,研究方向为信号处理及应用;高 凡(1994-),女,硕士研究生,研究方向为图像处理㊂基于深度学习的局部实例搜索朱周华,高 凡(西安科技大学通信与信息工程学院,陕西西安710054)摘 要:针对传统实例搜索方法准确率和视觉相似度低下的问题,提出一种利用卷积神经网络提取图像全局特征和区域特征的实例搜索方法㊂该方法经过初步搜索㊁重排和查询扩展三个阶段实现实例搜索任务,通过微调策略和在重排阶段对特征匹配方法的改进进一步提高检索性能,并将其应用到局部实例搜索任务,即利用残缺图像检索得到整幅图像,在此基础之上,加入在线检索功能㊂在Oxford 5k 和Paris 6k 两个公开数据集上进行实验验证,结果表明,整幅图像的检索mAP 值和视觉相似度都得到了很大提升,局部实例检索的mAP 值均高于其他文献中整幅图像的检索,仅比文中整幅图像的检索低0.032㊂因此,提出的实例搜索方法不仅提高了实例搜索的准确率,也增强了目标定位的准确性,同时很好地解决了局部实例搜索问题㊂关键词:深度学习;局部实例搜索;区域特征;微调;特征匹配中图分类号:TP 391 文献标识码:A 文章编号:1673-629X (2020)09-0036-07doi :10.3969/j.issn.1673-629X.2020.09.007Local Instance Search Based on Deep LearningZHU Zhou -hua ,GAO Fan(School of Communication and Information Engineering ,Xi ’an University of Science and Technology ,Xi ’an 710054,China )Abstract :In order to solve the problem of low accuracy and visual similarity of traditional instance search methods ,an instance search method for extracting global and regional features of images using convolutional neural networks is proposed.The method realizes instance search task through three stages :filtering stage ,spatial re -ranking and query expansion.The retrieval performance is further improved by fine -tuning strategy and the improvement of the feature matching method in the re -ranking stage.It is applied to the local instance search task ,that is ,using the incomplete image retrieval to obtain the whole image.On this basis ,the online search function is added.Experiments are carried out on two public datasets of Oxford building 5k and Paris building 6k that the mAP value and visual sim⁃ilarity of the whole image are greatly improved.The mAP value of the local instance retrieval is higher than that of other literatures.The retrieval of images is only 0.032lower than the retrieval of the entire image.Therefore ,the proposed method not only improves the accuracy of instance search ,but also enhances the accuracy of target location ,and solves the problem of local instance search.Key words :deep learning ;local instance search ;regional features ;fine -tuning ;feature matching0 引 言信息时代,数字图像和视频数据日益增多,人们对于图像检索的需求也随之增大㊂传统的基于内容的图像检索(CBIR )都是定义在图像级别的检索,查询图片背景都比较单一,没有干扰信息,因此可以提取整个图片的特征进行检索㊂但是,现实生活中的查询图片都是带有场景的,查询目标仅占了整幅图的一部分,直接将查询图与数据库中的整幅图像进行匹配,准确率必然会很低㊂因此,考虑使用局部特征进行实例搜索㊂实例搜索是指给定一个样例,在视频或图像库中找到包含这个样例的视频片段或者图片,即找到任意场景下的目标对象㊂早期,实例搜索大多采用词袋模型(bag -of -words ,BoW )对图像的特征进行编码,其中大部分都采用尺度不变特征变换(SIFT )[1]来描述局部特征㊂Zhu 等人[2]首先使用SIFT 提取查询图片和视频关键帧的视觉特征,接着采用词袋算法对特征进行编码得到一维向量,最后根据向量之间的相似性,返回一个排好序的视频列表,文中借鉴了传统视觉检索的一些方法,但是没有很好地结合实例搜索的特点㊂2014年有学者[3]提出采用比BoW 性能更好的空间第30卷 第9期2020年9月 计算机技术与发展COMPUTER TECHNOLOGY AND DEVELOPMENT Vol.30 No.9Sep. 2020Fisher向量[4]和局部特征聚合描述符(VLAD)[5]来描述SIFT特征的空间关系,从而进行实例检索㊂随着深度学习的发展,深度卷积神经网络特征被广泛应用于计算机视觉的各个领域,如图像分类[6-7]㊁语音识别[8]等,均取得了不错的效果,因此有学者也将其引入到图像检索领域㊂起初,研究者们利用神经网络的全连接层特征进行图像检索[9],后来很多研究者开始转向卷积层特征的研究[10],并且证明卷积层特征的性能更好㊂Eva等人[11]采用词袋模型对卷积神经网络(CNNs)提取的特征进行编码,然后分别进行初次搜索,局部重排,扩展查询,从而实现实例检索㊂实例检索需采用局部特征实现,因此许多生成区域信息的方法相继出现,最简单的是滑动窗口,之后有学者提出使用Selective Search生成物体候选框[12-13],但是这些方法将生成候选区域和特征提取分开进行㊂Faster R-CNN[14]是一个端到端的网络,它可以同时提取卷积层特征和生成候选区域㊂文献[15]提出将微调之后的目标检测网络Faster R-CNN应用到实例检索中,使用区域提议网络(region proposal network, RPN)生成候选区域,从而得到查询图区域特征与数据库图像区域特征,特征匹配之后排序得到检索结果,在两个建筑物数据集上取得了不错的效果㊂何涛在其论文中[16]针对Faster R-CNN网络效率较低的问题提出了端到端的深度区域哈希网络(DRH),使用VGG16作为特征提取器,滑动窗口和RPN网络得到候选区域,并将两种方法进行对比,整个网络最后阶段对特征进行哈希编码并计算汉明距离进行排序,从而得到检索结果,文中为了排除不同场景㊁不同光照和拍照角度产生的干扰,使用局部信息进行检索㊂以上两篇文献尽管均使用局部信息进行检索,但都是为了排除干扰信息将查询图中的目标标记出来,查询图依然是整幅图像㊂实际搜索图像时某些图片会有残缺,此时就无法通过标记目标进行检索,因此实例检索除了常见的利用局部特征进行整幅图像的检索之外,局部图像的检索亦有着非常重要的现实意义㊂现有的检索方法的检索效果不是很理想,因此文中针对以上两个问题首先改进整幅图像的检索并提高其检索性能,之后利用图像的部分信息(例如建筑物的顶部㊁嫌疑人的部分特征等)检索得到整幅图像,实现局部图像的检索㊂同时,考虑到实际检索时,输入均为一幅图像,输出为一组图像,因此,文中在局部实例检索的基础之上加入在线检索功能,可以实现局部图像的实时搜索㊂因此,主要有以下两大方面的贡献和创新:(1)基于深度学习的实例搜索㊂一方面,通过微调策略提高了实例搜索的精确度;另一方面,针对候选框得分(scores)和余弦距(cosine)两种相似性度量方法存在的不足,提出将两种方法相结合,以获得更好的检索效果㊂(2)基于深度学习的局部实例搜索㊂由于局部图像的检索具有重大的现实意义,将全局实例搜索算法应用在局部实例检索任务中,即利用残缺图片信息搜索得到整幅图像,并加入在线检索功能,输入局部查询图,便可以得到查询结果和所属建筑物的名字㊂1 相关理论1.1 基于Faster R-CNN的区域特征提取如图1所示,Faster R-CNN由卷积层(Conv layers),RPN网络,RoI pooling,分类和回归四部分构成㊂图1 Faster R-CNN结构 卷积层用于提取图像特征,可以选择不同结构的网络,输入为整张图片,输出为提取的特征称为feature maps㊂文中采用VGG16[7]的中间卷积层作为特征提取器㊂RPN网络用于推荐候选区域,这个网络是用来代替之前的selective search[13]的,输入为图片,输出为多个矩形区域以及对应每个矩形区域含有物体的概率㊂首先将一个3*3的滑窗在feature maps上滑动,每个窗口中心点映射到原图并生成9种矩形框(9anchor boxes),之后进入两个同级的1*1卷积,一个分支通过softmax进行二分类,判断anchor boxes属于foreground还是background㊂另一分支计算anchor㊃73㊃ 第9期 朱周华等:基于深度学习的局部实例搜索boxes的bounding box regression的偏移量㊂Proposal 层结合两个分支的输出信息去冗余后保留N个候选框,称为proposals㊂传统的CNN网络,输入图像尺寸必须是固定大小的,但是Faster R-CNN的输入是任意大小的,RoI pooling作用是根据候选区域坐标在特征图上映射得到区域特征并将其pooling成固定大小的输出,即对每个proposal提取固定尺寸的特征图㊂分类和回归模块,一方面通过全连接层和softmax 层确定每个proposal的类别,另一方面回归更加精确的目标检测框,输出候选区域在图像中的精确坐标㊂1.2 微调Faster R-CNN微调,即用预训练模型重新训练自己的数据,现有的VGG16FasterR-CNN预训练模型主要是基于VOC2007数据集中20类常见的物体预训练得到的㊂文中的数据集是建筑物,与VOC2007图片相似度和特征都相差较大,如果依然采用预训练模型,效果必然不好,再加上从头开始训练,需要大量的数据㊁时间和计算资源㊂而文中所选用的数据集较小,因此需要进行微调,这样不仅可以节省大量时间和计算资源,同时还可以得到一个较好的模型㊂微调主要分为以下三个步骤:(1)数据预处理㊂首先需要对数据集进行数据清洗,主要去除无效值和纠正错误数据,提高数据质量㊂其次,由于文中的数据集较小且每类样本分布不均衡,因此选择性地对每类图片作图像增强处理,其中图像增强方法包括水平翻转,增加高斯噪声,模糊处理,改变图像对比度㊂最后将每幅图片中的目标样例标记出来㊂(2)建立训练集和测试集㊂一般是按一定的比例进行分配,但是文中选用的数据集存在样本不均衡的问题,有些类别特别多,有些类别特别少㊂由于是将所有图片全部放入同一个文件夹,然后依次读取样本分配训练集和测试集,如果按比例分配,小类样本参与训练的机会就会比大类少,训练出来的模型将会偏向于大类,使得大类性能好,小类性能差㊂平衡采样策略就是把样本按类别分组,每个类别生成一个样本列表,制作训练集时从各个类别所对应的样本列表中随机选择样本,这样可以保证每个类别参与训练的机会比较均衡㊂(3)修改相关网络参数,进行训练㊂主要修改网络文件中的类别数和类名,然后不断调节超参数,使性能达到最好㊂1.3 实例搜索实例检索一般经过初次搜索,局部重排,扩展查询三部分完成㊂初次搜索首先提取查询图和数据库所有图像的全局特征,然后计算特征之间的相似度,最后经过排序得到初步的检索结果㊂文中提取VGG16网络的最后一个卷积层(Conv5_3)特征㊂局部重排是将初次搜索得到的前K幅图片作为新的数据库进行重排,基本思路是提取查询图和数据库的区域特征并进行匹配,根据匹配结果进行排序从而得到查询结果㊂这里查询图和数据库的区域特征提取方法不同,其中,查询图的区域特征是用groundtruth 给定的边界框对整幅图像特征进行裁剪得到的,而数据库的区域特征是经过RPN网络和RoI pooling池化得到的特征(pool5)㊂扩展查询(query expansion,QE)是取出局部重排返回的前K个结果,对其特征求和取平均作为新的查询图,再做一次检索,属于重排的一种㊂2 实 验2.1 数据集和评价指标(1)实验环境㊂文中所有实验均在NVIDIA GeForce RTX2080上进行,所用系统为Ubuntu18.04,使用的深度学习框架为Caffe,编程语言为Python㊂(2)数据集介绍㊂在两个公开的建筑物数据集Oxford[17]和Paris[18]上进行实验㊂其中Oxford包含5063张图片,Paris包含6412张图片,但是有20张被损坏,因此有6392张图片可用㊂两个数据集都来自Flickr,共有11类建筑物,同一种建筑物有5张查询图,因此每个数据集总共有55张查询图,除此之外,两个数据集相同类别建筑物的场景㊁拍照角度和光照都有所不同,而且有很多本来不是同一种建筑物但是从表面看上去却非常相似的图片㊂(3)评价指标㊂平均精度均值(mean average precision,mAP)是一个反映了图像检索整体性能的指标,如式(1)和(2)所示㊂MAP=∑N k=1AP查询图的个数(1) AP=∑N k=1(P(k)㊃rel(k))相关图片个数(2)其中,N表示返回结果总个数,P(k)表示返回结果中第k个位置的查准率,rel(k)表示返回结果中第k个位置的图片是否与查询图相关,相关为1,不相关为0㊂MAP是多次查询后,对每次检索的平均精度AP值求㊃83㊃ 计算机技术与发展 第30卷和取平均㊂这里对是否相关做进一步解释:两个数据集的groundtruth有三类,分别是good,ok和junk,如果检索结果在good和ok中,则判为与查询图相关,如果在junk中,则判为不相关㊂2.2 基于深度学习的实例检索2.2.1 方 法先尝试使用VGG16Faster R-CNN预训练模型进行检索,在两个数据集上MAP值仅0.5左右,接着文中使用微调策略,只冻结了前两个卷积层,更新了之后所有的网络层权值,通过不断调参,训练一个精度尽可能高的模型㊂其中,微调过程中分别采用数据增强和平衡采样技术对数据进行处理㊂具体地,对于Oxford 数据集,建筑物radcliffe_camera的数量高达221张,而建筑物pitt_rivers仅有6张,其他9类样本数量在7至78之间不等,数量差距相当大,因此,选择性地对数量小的6类样本通过上面提到的方法进行数据增强,使小类样本数量增大㊂对其进行数据增强之后,虽然样本数量差距缩小,但是依然存在不均衡的问题,如果将所有样本放入一个文件夹中,按比例分配训练集和测试集,则依然会导致训练出来的模型小类性能差的问题㊂因此,将每类样本生成一个列表,再从每个列表中随机选取一定数量的样本作为该类的训练样本㊂除此之外,文献[15]在局部重排部分使用了两种特征匹配方法,一种是直接利用候选框对应得分(scores)进行排序㊂数据库中每幅图片经过Proposal layer会得到300个区域提议(proposal)的坐标和对应得分,找到查询图对应类的最高得分作为查询图和数据库每幅图片的相似度,再从高到低进行排序就可以得到检索结果㊂另一种是利用余弦距(cosine)进行排序㊂数据库中的每幅图片经过RoI pooling可以得到300个特征向量,计算查询图与数据库中每幅图片的300个区域特征的余弦距,最小距离对应的候选框即就是和查询图最相似的区域提议,然后把所有的最小距离再从小到大进行排序,就可以得到相似度排序㊂第一种方法虽然得到的边界框(bounding box regression)定位较准,mAP值也很高,但是视觉相似度并不是很高㊂而且根据候选框得分进行排序,每类建筑物的得分和排序都是一定的,因此每次相同类别的不同查询返回结果都是相同的,不会根据查询图片的不同而返回不同的排序㊂第二种方法得到的检索结果,图像相似度很高,但是边界框定位不是很准确㊂文中将两种方法结合(scores+cosine),利用余弦距计算相似度并排序,选择得分最高的候选框进行目标定位,这样既解决了视觉相似度不够的问题,也解决了相同类别的不同查询返回结果相同的问题,同时又解决了目标定位不准的问题㊂2.2.2 参数设置本节主要讨论扩展查询中的参数k对实验结果的影响,并选取一个最优值作为本实验的默认值㊂在特征匹配方法选用余弦距的情况下,在两个数据集上分别测试了k等于3㊁4㊁5㊁6㊁7时的mAP值,实验结果如表1所示㊂表1 不同k值对mAP值的影响k Oxford Paris30.9090.88940.9100.88850.9100.88560.9120.88970.9120.888 从表1综合来看,文中选用6作为k的默认值㊂2.2.3 结果与分析表2是文中与其他文献mAP值的对比,最后两项是文中三种方法的实验结果㊂可以看出,相比于其他文献,文中方法的检索性能得到很大的提升,比目前最好的方法分别高出6.1%和4%,说明文中方法优于其他检索方法㊂文中方法与文献[15]所采用方法类似,但结果却得到了很大的改善,通过分析认为,虽然所用数据集都相同,但是生成的训练集和测试集完全不同,而且文中采用数据增强方法,使得样本的数量增加了3~4倍,使用平衡采样的方法保证小类样本可以得到和大类样本同样的训练机会㊂除此之外,网络超参数对于模型的影响非常大,因此文中对参数进行调优,使得训练出来的模型更好㊂表2 文中方法与其他方法的mAP值对比方法Oxford ParisR.Tao等人[3]0.765-Razavian等人[10]0.5560.697Eva等人[11]0.7390.819CA-SR[15]0.7720.824CS-SR[15]0.7860.842DRH[16]0.8510.849Ours(scores)0.9030.890 Ours(cosine/cosine+scores)0.9120.890 图2是两组对比图,图(a)和图(b)是对建筑物all_ souls和louvre分别用候选框得分和余弦距进行排序的结果,其中,左边1列是查询图,右边5列是查询返回结果,表示与查询相关(下同)㊂图3是all_souls的两个不同查询用得分进行排序得到的检索结果㊂图4是将两种匹配方法结合得到的检索结果㊂从图2可以看出利用候选框得分排序得到的结果目标定位较准确,但是返回结果的背景㊁光照㊁拍照角㊃93㊃ 第9期 朱周华等:基于深度学习的局部实例搜索度㊁颜色㊁对比度和样例大小与查询图相差很大,而利用余弦距排序得到的结果候选框定位不是很准,但从背景㊁样例大小等视觉角度来看相似度较高㊂图3中all _souls 的两个不同查询图返回结果中不仅图片一样,而且顺序也相同㊂因此可以看出两种方法各有缺陷㊂图2 不同建筑物的同一个查询分别利用得分和余弦距得到的排序图3 建筑物all _souls 的两个不同查询根据得分得到的排序图4是将两种方法结合得到的返回结果,与图2相比视觉相似度提高了,目标定位也更准确了,与图3相比,同一个建筑物的两个不同查询返回结果也会根据查询图片的不同而改变,且从表2最后一行可以看出该方法比使用候选框得分在Oxford 上得到的mAP 值高0.009,与使用余弦距得到的mAP 值相同㊂因此认为,以上提出的特征匹配方法得到的返回结果在不降低mAP 值的基础上提高了检索的准确率㊂图4 综合得分和余弦距两种方法得到的检索结果2.3 基于深度学习的局部实例检索2.3.1 方 法本节局部图像的检索是建立在2.2节基础之上的,正是由于整幅图像检索采用候选区域特征实现,局部图像的检索才得以实现㊂较之于整幅图像的检索,局部图像的检索具有同样重大的现实意义㊂生活中常会因为某个原因使图片变得残缺,且难以识别,那么此时就需要使用局部图像检索得到原始图像的完整信息㊂比如,通过某建筑物的顶部搜索得到整幅图像从而识别该建筑物㊂或者可以应用在刑侦工作中,当摄像机捕获到的是某犯罪嫌疑人的部分特征时,可以通过已有的部分特征在图像库或者其他摄像头下搜索得到该嫌疑人的完整信息㊂由于目前没有一个现成的残缺图像库,因此本节利用截图工具对整幅图像作裁剪处理以模拟残缺图像,即从图像库选取不同实例通过裁剪得到不同大小,不同背景,不同角度,不同颜色的局部查询图㊂由于图像库图像都是整幅图像,在尺寸和包含的信息方面与局部查询图相差很大,因此局部检索最大的难点在于如何处理局部图像㊂2.2节会对输入图片统一进行缩放,那么局部查询图片输入后,先进行放大,则会导致原始输入图像失真,提取特征后再对其进行裁剪又会进一步丢失大部分图像信息,因此根本无法得到正确的检索结果㊂文中对其进行以下处理:即输入查询图后,先将局部查询填充至与数据库图像相同大小(图像库的图像基本都是1024*768或者768*1024大小的),这样对图像进行统一缩放,提取特征,按比例㊃04㊃ 计算机技术与发展 第30卷裁剪之后,得到的正是局部图像的特征,再与图像库匹配,则会输出正确的排序结果㊂本节输入为建筑物的局部图像,输出为局部查询所属的建筑物图像,并且会标记出局部查询在所属建筑物中的位置㊂目前,很多文献(如[15])都比较注重算法的研究,基本都采用离线的形式实现图像检索,不仅离线建立特征库,查询图也是成批输入到网络中进行离线检索,得到的结果也是成批保存起来,可是实际应用中,一般都是将查询图逐幅输入进行实时检索,因此文中在前文基础之上,加入了在线检索功能,最后实现在线局部实例检索㊂2.3.2 结果与分析如图5是通过裁剪建筑物radcliffe _camera 和triomphe 的原图,得到的5个不同查询图,分别选取一幅进行检索,得到了完整的建筑物,且标记出了局部查询图像在整个建筑物中的位置,如图6所示㊂最终mAP 值分别为0.880和0.857㊂从检索结果可以看出返回结果的视觉相似度极高,目标定位准确,且mAP 值高于其他文献中整幅图像的检索准确率㊂因此,可以证明文中提出的全局搜索算法在局部图像检索任务中亦能取得很好的效果㊂图5 两种建筑物的五个局部查询图图6 两组局部查询的检索结果在此之前,只有文献[19]为了证明行人重识别系统的普适性,使用CaffeNet 和VGG 16两个网络模型在Oxford 数据集上对局部建筑物图像进行了测试,得到的mAP 值分别为0.662和0.764,远低于文中的准确率㊂因此提出的局部实例搜索的性能良好㊂在线检索功能按照输入图片,可得到查询结果和查询建筑物的名字,且文中在没有使用任何编码算法的情况下,在两个数据集上检索一幅图的平均耗时分别为5.7s 和7s ,经检测,90%的时间都耗费在利用特征向量计算相似度部分㊂如图7所示,分别是bodleian 和eiffel 的检索结果和总耗时㊂图7 在线检索结果3摇结束语为了进一步提高实例检索性能,针对以往的利用候选框得分和余弦距进行特征匹配的不足,提出将两种方法结合,即利用余弦距计算相似度并排序,选择得分最高的候选框进行目标定位㊂并使用微调策略重新训练预训练模型从而使其适用于文中的实例检索㊂相比于其他方法,文中采用的方法在性能方面有明显的提升㊂在此基础之上,利用残缺图像搜索得到整幅图像,性能高于其他文献整幅图像的检索,且仅比文中整幅图像检索低0.032,实验结果证明提出的全局搜索算法同样适用于局部图像检索任务㊂之后加入在线检索功能,在没有任何编码的情况下检索一幅图像平均耗时仅需5.7s ~7s ㊂在未来的工作中,可以进一步加入编码模块,以提高检索速度,并且可以在更大的数据集上进行测试㊂参考文献:[1] LOWE D G.Object recognition from local scale -invariant fea⁃tures [C ]//International conference on computer vision.㊃14㊃ 第9期 朱周华等:基于深度学习的局部实例搜索Kerkyra,Greece:IEEE,1999:1150-1157.[2] ZHU C,SATOH rge vocabulary quantization for searchinginstances from videos[C]//International conference on multime⁃dia retrieval.New York,NY,United States:ACM,2012:1-8.[3] TAO R,GAVVES E,SNOEK C G M,et al.Locality in genericinstance search from one example[C]//Conference on computer vision and pattern recognition.Columbus,OH,USA:IEEE, 2014:2099-2106.[4] PERRONNIN F,LIU Y,SÁNCHEZ J,et rge-scale imageretrieval with compressed Fisher vectors[C]//Conference on computer vision and pattern recognition.San-Francisco,CA, USA:IEEE,2010:3384-3391.[5] GONG Y,WANG L,GUO R,et al.Multi-scale orderless poolingof deep convolutional activation features[C]//European confer⁃ence on computer vision.Zürich,Switzerland:Springer,2014: 392-407.[6] KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenetclassification with deep convolutional neural networks[C]//Neural information processing systems.New York,NY,United States:ACM,2012:1097-1105.[7] SIMONYAN K,ZISSERMAN A.Very deep convolutional net⁃works for large-scale image recognition[C]//Computer vision and pattern recognition.Boston,MA,USA:IEEE,2015:1-14.[8] HINTON G,DENG L,YU D,et al.Deep neural networks for a⁃coustic modeling in speech recognition[J].IEEE Signal Process⁃ing Magazine,2012,29(6):82-97.[9] WAN J,WANG D,HOI S C H,et al.Deep learning for content-based image retrieval:a comprehensive study[C]//International conference on multimedia.New York,NY,United States:ACM, 2014:157-166.[10]RAZAVIAN A S,SULLIVAN J,CARLSSON S,et al.Visual in⁃stance retrieval with deep convolutional networks[J].ITE Trans⁃actions on Media Technology and Applications,2016,4(3):251-258.[11]MOHEDANO E,MCGUINNESS K,O’CONNOR N E,et al.Bags of local convolutional features for scalable instance search[C]//International conference on multimedia retrieval.NewYork,NY,United States:ACM,2016:327-331. [12]UIJLINGS J R R,VAN DE SANDE K E A,GEVERS T,et al.Selective search for object recognition[J].International Journal of Computer Vision,2013,104(2):154-171.[13]GIRSHICK R,DONAHUE J,DARRELL T,et al.Rich feature hi⁃erarchies for accurate object detection and semantic segmentation[C]//Computer vision and pattern recognition.Columbus,OH,USA:IEEE,2014:580-587.[14]REN S,HE K,GIRSHICK R,et al.Faster R-CNN:towards real-time object detection with region proposal networks[C]//Neural information processing systems.New York,NY,United States: ACM,2015:91-99.[15]SALVADOR A,GIRÓ-I-NIETO X,MARQUÉS F,et al.FasterR-CNN features for instance search[C]//Conference on com⁃puter vision and pattern recognition s Vegas,NV, USA:IEEE,2016:9-16.[16]何 涛.基于深度学习和哈希的图像检索的方法研究[D].成都:电子科技大学,2018.[17]PHILBIN J,CHUM O,ISARD M,et al.Object retrieval withlarge vocabularies and fast spatial matching[C]//Conference on computer vision and pattern recognition.Minneapolis,MN,USA: IEEE,2007:1-8.[18]PHILBIN J,CHUM O,ISARD M,et al.Lost in quantization:im⁃proving particular object retrieval in large scale image databases[C]//Conference on computer vision and pattern recognition.Anchorage,AK,USA:IEEE,2008:1-8.[19]ZHENG Z,ZHENG L,YANG Y.A discriminatively learnedCNN embedding for person re-identification[J].Multimedia Computing,Communications,and Applications,2017,14(1):1-20.㊃24㊃ 计算机技术与发展 第30卷。
Discriminatively Trained Sparse Code Gradientsfor Contour DetectionXiaofeng Ren and Liefeng BoIntel Science and Technology Center for Pervasive Computing,Intel LabsSeattle,W A98195,USA{xiaofeng.ren,liefeng.bo}@AbstractFinding contours in natural images is a fundamental problem that serves as thebasis of many tasks such as image segmentation and object recognition.At thecore of contour detection technologies are a set of hand-designed gradient fea-tures,used by most approaches including the state-of-the-art Global Pb(gPb)operator.In this work,we show that contour detection accuracy can be signif-icantly improved by computing Sparse Code Gradients(SCG),which measurecontrast using patch representations automatically learned through sparse coding.We use K-SVD for dictionary learning and Orthogonal Matching Pursuit for com-puting sparse codes on oriented local neighborhoods,and apply multi-scale pool-ing and power transforms before classifying them with linear SVMs.By extract-ing rich representations from pixels and avoiding collapsing them prematurely,Sparse Code Gradients effectively learn how to measure local contrasts andfindcontours.We improve the F-measure metric on the BSDS500benchmark to0.74(up from0.71of gPb contours).Moreover,our learning approach can easily adaptto novel sensor data such as Kinect-style RGB-D cameras:Sparse Code Gradi-ents on depth maps and surface normals lead to promising contour detection usingdepth and depth+color,as verified on the NYU Depth Dataset.1IntroductionContour detection is a fundamental problem in vision.Accuratelyfinding both object boundaries and interior contours has far reaching implications for many vision tasks including segmentation,recog-nition and scene understanding.High-quality image segmentation has increasingly been relying on contour analysis,such as in the widely used system of Global Pb[2].Contours and segmentations have also seen extensive uses in shape matching and object recognition[8,9].Accuratelyfinding contours in natural images is a challenging problem and has been extensively studied.With the availability of datasets with human-marked groundtruth contours,a variety of approaches have been proposed and evaluated(see a summary in[2]),such as learning to clas-sify[17,20,16],contour grouping[23,31,12],multi-scale features[21,2],and hierarchical region analysis[2].Most of these approaches have one thing in common[17,23,31,21,12,2]:they are built on top of a set of gradient features[17]measuring local contrast of oriented discs,using chi-square distances of histograms of color and textons.Despite various efforts to use generic image features[5]or learn them[16],these hand-designed gradients are still widely used after a decade and support top-ranking algorithms on the Berkeley benchmarks[2].In this work,we demonstrate that contour detection can be vastly improved by replacing the hand-designed Pb gradients of[17]with rich representations that are automatically learned from data. We use sparse coding,in particularly Orthogonal Matching Pursuit[18]and K-SVD[1],to learn such representations on patches.Instead of a direct classification of patches[16],the sparse codes on the pixels are pooled over multi-scale half-discs for each orientation,in the spirit of the Pbimage patch: gray, abdepth patch (optional):depth, surface normal…local sparse coding multi-scale pooling oriented gradients power transformslinear SVM+ - …per-pixelsparse codes SVMSVMSVM … SVM RGB-(D) contoursFigure 1:We combine sparse coding and oriented gradients for contour analysis on color as well as depth images.Sparse coding automatically learns a rich representation of patches from data.With multi-scale pooling,oriented gradients efficiently capture local contrast and lead to much more accurate contour detection than those using hand-designed features including Global Pb (gPb)[2].gradients,before being classified with a linear SVM.The SVM outputs are then smoothed and non-max suppressed over orientations,as commonly done,to produce the final contours (see Fig.1).Our sparse code gradients (SCG)are much more effective in capturing local contour contrast than existing features.By only changing local features and keeping the smoothing and globalization parts fixed,we improve the F-measure on the BSDS500benchmark to 0.74(up from 0.71of gPb),a sub-stantial step toward human-level accuracy (see the precision-recall curves in Fig.4).Large improve-ments in accuracy are also observed on other datasets including MSRC2and PASCAL2008.More-over,our approach is built on unsupervised feature learning and can directly apply to novel sensor data such as RGB-D images from Kinect-style depth ing the NYU Depth dataset [27],we verify that our SCG approach combines the strengths of color and depth contour detection and outperforms an adaptation of gPb to RGB-D by a large margin.2Related WorkContour detection has a long history in computer vision as a fundamental building block.Modern approaches to contour detection are evaluated on datasets of natural images against human-marked groundtruth.The Pb work of Martin et.al.[17]combined a set of gradient features,using bright-ness,color and textons,to outperform the Canny edge detector on the Berkeley Benchmark (BSDS).Multi-scale versions of Pb were developed and found beneficial [21,2].Building on top of the Pb gradients,many approaches studied the globalization aspects,i.e.moving beyond local classifica-tion and enforcing consistency and continuity of contours.Ren et.al.developed CRF models on superpixels to learn junction types [23].Zhu ed circular embedding to enforce orderings of edgels [31].The gPb work of Arbelaez puted gradients on eigenvectors of the affinity graph and combined them with local cues [2].In addition to Pb gradients,Dollar et.al.[5]learned boosted trees on generic features such as gradients and Haar wavelets,Kokkinos used SIFT features on edgels [12],and Prasad et.al.[20]used raw pixels in class-specific settings.One closely related work was the discriminative sparse models of Mairal et al [16],which used K-SVD to represent multi-scale patches and had moderate success on the BSDS.A major difference of our work is the use of oriented gradients:comparing to directly classifying a patch,measuring contrast between oriented half-discs is a much easier problem and can be effectively learned.Sparse coding represents a signal by reconstructing it using a small set of basis functions.It has seen wide uses in vision,for example for faces [28]and recognition [29].Similar to deep network approaches [11,14],recent works tried to avoid feature engineering and employed sparse coding of image patches to learn features from “scratch”,for texture analysis [15]and object recognition [30,3].In particular,Orthogonal Matching Pursuit [18]is a greedy algorithm that incrementally finds sparse codes,and K-SVD is also efficient and popular for dictionary learning.Closely related to our work but on the different problem of recognition,Bo ed matching pursuit and K-SVD to learn features in a coding hierarchy [3]and are extending their approach to RGB-D data [4].Thanks to the mass production of Kinect,active RGB-D cameras became affordable and were quickly adopted in vision research and applications.The Kinect pose estimation of Shotton et. ed random forests to learn from a huge amount of data[25].Henry ed RGB-D cam-eras to scan large environments into3D models[10].RGB-D data were also studied in the context of object recognition[13]and scene labeling[27,22].In-depth studies of contour and segmentation problems for depth data are much in need given the fast growing interests in RGB-D perception.3Contour Detection using Sparse Code GradientsWe start by examining the processing pipeline of Global Pb(gPb)[2],a highly influential and widely used system for contour detection.The gPb contour detection has two stages:local contrast estimation at multiple scales,and globalization of the local cues using spectral grouping.The core of the approach lies within its use of local cues in oriented gradients.Originally developed in [17],this set of features use relatively simple pixel representations(histograms of brightness,color and textons)and similarity functions(chi-square distance,manually chosen),comparing to recent advances in using rich representations for high-level recognition(e.g.[11,29,30,3]).We set out to show that both the pixel representation and the aggregation of pixel information in local neighborhoods can be much improved and,to a large extent,learned from and adapted to input data. For pixel representation,in Section3.1we show how to use Orthogonal Matching Pursuit[18]and K-SVD[1],efficient sparse coding and dictionary learning algorithms that readily apply to low-level vision,to extract sparse codes at every pixel.This sparse coding approach can be viewed similar in spirit to the use offilterbanks but avoids manual choices and thus directly applies to the RGB-D data from Kinect.We show learned dictionaries for a number of channels that exhibit different characteristics:grayscale/luminance,chromaticity(ab),depth,and surface normal.In Section3.2we show how the pixel-level sparse codes can be integrated through multi-scale pool-ing into a rich representation of oriented local neighborhoods.By computing oriented gradients on this high dimensional representation and using a double power transform to code the features for linear classification,we show a linear SVM can be efficiently and effectively trained for each orientation to classify contour vs non-contour,yielding local contrast estimates that are much more accurate than the hand-designed features in gPb.3.1Local Sparse Representation of RGB-(D)PatchesK-SVD and Orthogonal Matching Pursuit.K-SVD[1]is a popular dictionary learning algorithm that generalizes K-Means and learns dictionaries of codewords from unsupervised data.Given a set of image patches Y=[y1,···,y n],K-SVD jointlyfinds a dictionary D=[d1,···,d m]and an associated sparse code matrix X=[x1,···,x n]by minimizing the reconstruction errorminY−DX 2F s.t.∀i, x i 0≤K;∀j, d j 2=1(1) D,Xwhere · F denotes the Frobenius norm,x i are the columns of X,the zero-norm · 0counts the non-zero entries in the sparse code x i,and K is a predefined sparsity level(number of non-zero en-tries).This optimization can be solved in an alternating manner.Given the dictionary D,optimizing the sparse code matrix X can be decoupled to sub-problems,each solved with Orthogonal Matching Pursuit(OMP)[18],a greedy algorithm forfinding sparse codes.Given the codes X,the dictionary D and its associated sparse coefficients are updated sequentially by singular value decomposition. For our purpose of representing local patches,the dictionary D has a small size(we use75for5x5 patches)and does not require a lot of sample patches,and it can be learned in a matter of minutes. Once the dictionary D is learned,we again use the Orthogonal Matching Pursuit(OMP)algorithm to compute sparse codes at every pixel.This can be efficiently done with convolution and a batch version of the OMP algorithm[24].For a typical BSDS image of resolution321x481,the sparse code extraction is efficient and takes1∼2seconds.Sparse Representation of RGB-D Data.One advantage of unsupervised dictionary learning is that it readily applies to novel sensor data,such as the color and depth frames from a Kinect-style RGB-D camera.We learn K-SVD dictionaries up to four channels of color and depth:grayscale for luminance,chromaticity ab for color in the Lab space,depth(distance to camera)and surface normal(3-dim).The learned dictionaries are visualized in Fig.2.These dictionaries are interesting(a)Grayscale (b)Chromaticity (ab)(c)Depth (d)Surface normal Figure 2:K-SVD dictionaries learned for four different channels:grayscale and chromaticity (in ab )for an RGB image (a,b),and depth and surface normal for a depth image (c,d).We use a fixed dictionary size of 75on 5x 5patches.The ab channel is visualized using a constant luminance of 50.The 3-dimensional surface normal (xyz)is visualized in RGB (i.e.blue for frontal-parallel surfaces).to look at and qualitatively distinctive:for example,the surface normal codewords tend to be more smooth due to flat surfaces,the depth codewords are also more smooth but with speckles,and the chromaticity codewords respect the opponent color pairs.The channels are coded separately.3.2Coding Multi-Scale Neighborhoods for Measuring ContrastMulti-Scale Pooling over Oriented Half-Discs.Over decades of research on contour detection and related topics,a number of fundamental observations have been made,repeatedly:(1)contrast is the key to differentiate contour vs non-contour;(2)orientation is important for respecting contour continuity;and (3)multi-scale is useful.We do not wish to throw out these principles.Instead,we seek to adopt these principles for our case of high dimensional representations with sparse codes.Each pixel is presented with sparse codes extracted from a small patch (5-by-5)around it.To aggre-gate pixel information,we use oriented half-discs as used in gPb (see an illustration in Fig.1).Each orientation is processed separately.For each orientation,at each pixel p and scale s ,we define two half-discs (rectangles)N a and N b of size s -by-(2s +1),on both sides of p ,rotated to that orienta-tion.For each half-disc N ,we use average pooling on non-zero entries (i.e.a hybrid of average and max pooling)to generate its representationF (N )= i ∈N |x i 1| i ∈N I |x i 1|>0,···, i ∈N |x im | i ∈NI |x im |>0 (2)where x ij is the j -th entry of the sparse code x i ,and I is the indicator function whether x ij is non-zero.We rotate the image (after sparse coding)and use integral images for fast computations (on both |x ij |and |x ij |>0,whose costs are independent of the size of N .For two oriented half-dics N a and N b at a scale s ,we compute a difference (gradient)vector DD (N a s ,N b s )= F (N a s )−F (N b s ) (3)where |·|is an element-wise absolute value operation.We divide D (N a s ,N b s )by their norms F (N a s ) + F (N b s ) + ,where is a positive number.Since the magnitude of sparse codes variesover a wide range due to local variations in illumination as well as occlusion,this step makes the appearance features robust to such variations and increases their discriminative power,as commonly done in both contour detection and object recognition.This value is not hard to set,and we find a value of =0.5is better than,for instance, =0.At this stage,one could train a classifier on D for each scale to convert it to a scalar value of contrast,which would resemble the chi-square distance function in gPb.Instead,we find that it is much better to avoid doing so separately at each scale,but combining multi-scale features in a joint representation,so as to allow interactions both between codewords and between scales.That is,our final representation of the contrast at a pixel p is the concatenation of sparse codes pooled at all thescales s ∈{1,···,S }(we use S =4):D p = D (N a 1,N b 1),···,D (N a S ,N b S );F (N a 1∪N b 1),···,F (N a S ∪N b S ) (4)In addition to difference D ,we also include a union term F (N a s ∪N b s ),which captures the appear-ance of the whole disc (union of the two half discs)and is normalized by F (N a s ) + F (N b s ) + .Double Power Transform and Linear Classifiers.The concatenated feature D p (non-negative)provides multi-scale contrast information for classifying whether p is a contour location for a partic-ular orientation.As D p is high dimensional (1200and above in our experiments)and we need to do it at every pixel and every orientation,we prefer using linear SVMs for both efficient testing as well as training.Directly learning a linear function on D p ,however,does not work very well.Instead,we apply a double power transformation to make the features more suitable for linear SVMs D p = D α1p ,D α2p (5)where 0<α1<α2<1.Empirically,we find that the double power transform works much better than either no transform or a single power transform α,as sometimes done in other classification contexts.Perronnin et.al.[19]provided an intuition why a power transform helps classification,which “re-normalizes”the distribution of the features into a more Gaussian form.One plausible intuition for a double power transform is that the optimal exponent αmay be different across feature dimensions.By putting two power transforms of D p together,we allow the classifier to pick its linear combination,different for each dimension,during the stage of supervised training.From Local Contrast to Global Contours.We intentionally only change the local contrast es-timation in gPb and keep the other steps fixed.These steps include:(1)the Savitzky-Goley filter to smooth responses and find peak locations;(2)non-max suppression over orientations;and (3)optionally,we apply the globalization step in gPb that computes a spectral gradient from the local gradients and then linearly combines the spectral gradient with the local ones.A sigmoid transform step is needed to convert the SVM outputs on D p before computing spectral gradients.4ExperimentsWe use the evaluation framework of,and extensively compare to,the publicly available Global Pb (gPb)system [2],widely used as the state of the art for contour detection 1.All the results reported on gPb are from running the gPb contour detection and evaluation codes (with default parameters),and accuracies are verified against the published results in [2].The gPb evaluation includes a number of criteria,including precision-recall (P/R)curves from contour matching (Fig.4),F-measures computed from P/R (Table 1,2,3)with a fixed contour threshold (ODS)or per-image thresholds (OIS),as well as average precisions (AP)from the P/R curves.Benchmark Datasets.The main dataset we use is the BSDS500benchmark [2],an extension of the original BSDS300benchmark and commonly used for contour evaluation.It includes 500natural images of roughly resolution 321x 481,including 200for training,100for validation,and 200for testing.We conduct both color and grayscale experiments (where we convert the BSDS500images to grayscale and retain the groundtruth).In addition,we also use the MSRC2and PASCAL2008segmentation datasets [26,6],as done in the gPb work [2].The MSRC2dataset has 591images of resolution 200x 300;we randomly choose half for training and half for testing.The PASCAL2008dataset includes 1023images in its training and validation sets,roughly of resolution 350x 500.We randomly choose half for training and half for testing.For RGB-D contour detection,we use the NYU Depth dataset (v2)[27],which includes 1449pairs of color and depth frames of resolution 480x 640,with groundtruth semantic regions.We choose 60%images for training and 40%for testing,as in its scene labeling setup.The Kinect images are of lower quality than BSDS,and we resize the frames to 240x 320in our experiments.Training Sparse Code Gradients.Given sparse codes from K-SVD and Orthogonal Matching Pur-suit,we train the Sparse Code Gradients classifiers,one linear SVM per orientation,from sampled locations.For positive data,we sample groundtruth contour locations and estimate the orientations at these locations using groundtruth.For negative data,locations and orientations are random.We subtract the mean from the patches in each data channel.For BSDS500,we typically have 1.5to 21In this work we focus on contour detection and do not address how to derive segmentations from contours.pooling disc size (pixel)a v e r a g e p r e c i s i o na v e r a g e p r e c i s i o nsparsity level a v e r a g e p r e c i s i o n (a)(b)(c)Figure 3:Analysis of our sparse code gradients,using average precision of classification on sampled boundaries.(a)The effect of single-scale vs multi-scale pooling (accumulated from the smallest).(b)Accuracy increasing with dictionary size,for four orientation channels.(c)The effect of the sparsity level K,which exhibits different behavior for grayscale and chromaticity.BSDS500ODS OIS AP l o c a l gPb (gray).67.69.68SCG (gray).69.71.71gPb (color).70.72.71SCG (color).72.74.75g l o b a l gPb (gray).69.71.67SCG (gray).71.73.74gPb (color).71.74.72SCG (color).74.76.77Table 1:F-measure evaluation on the BSDS500benchmark [2],comparing to gPb on grayscaleand color images,both for local contour detec-tion as well as for global detection (-bined with the spectral gradient analysis in [2]).Recall P r e c i s i o n Figure 4:Precision-recall curves of SCG vs gPb on BSDS500,for grayscale and color images.We make a substantial step beyondthe current state of the art toward reachinghuman-level accuracy (green dot).million data points.We use 4spatial scales,at half-disc sizes 2,4,7,25.For a dictionary size of 75and 4scales,the feature length for one data channel is 1200.For full RGB-D data,the dimension is 4800.For BSDS500,we train only using the 200training images.We modify liblinear [7]to take dense matrices (features are dense after pooling)and single-precision floats.Looking under the Hood.We empirically analyze a number of settings in our Sparse Code Gradi-ents.In particular,we want to understand how the choices in the local sparse coding affect contour classification.Fig.3shows the effects of multi-scale pooling,dictionary size,and sparsity level (K).The numbers reported are intermediate results,namely the mean of average precision of four oriented gradient classifier (0,45,90,135degrees)on sampled locations (grayscale unless otherwise noted,on validation).As a reference,the average precision of gPb on this task is 0.878.For multi-scale pooling,the single best scale for the half-disc filter is about 4x 8,consistent with the settings in gPb.For accumulated scales (using all the scales from the smallest up to the current level),the accuracy continues to increase and does not seem to be saturated,suggesting the use of larger scales.The dictionary size has a minor impact,and there is a small (yet observable)benefit to use dictionaries larger than 75,particularly for diagonal orientations (45-and 135-deg).The sparsity level K is a more intriguing issue.In Fig.3(c),we see that for grayscale only,K =1(normalized nearest neighbor)does quite well;on the other hand,color needs a larger K ,possibly because ab is a nonlinear space.When combining grayscale and color,it seems that we want K to be at least 3.It also varies with orientation:horizontal and vertical edges require a smaller K than diagonal edges.(If using K =1,our final F-measure on BSDS500is 0.730.)We also empirically evaluate the double power transform vs single power transform vs no transform.With no transform,the average precision is 0.865.With a single power transform,the best choice of the exponent is around 0.4,with average precision 0.884.A double power transform (with exponentsMSRC2ODS OIS APgPb.37.39.22SCG.43.43.33PASCAL2008ODS OIS APgPb.34.38.20SCG.37.41.27Table2:F-measure evaluation comparing our SCG approach to gPb on two addi-tional image datasets with contour groundtruth: MSRC2[26]and PASCAL2008[6].RGB-D(NYU v2)ODS OIS AP gPb(color).51.52.37 SCG(color).55.57.46gPb(depth).44.46.28SCG(depth).53.54.45gPb(RGB-D).53.54.40SCG(RGB-D).62.63.54Table3:F-measure evaluation on RGB-D con-tour detection using the NYU dataset(v2)[27].We compare to gPb on using color image only,depth only,as well as color+depth.Figure5:Examples from the BSDS500dataset[2].(Top)Image;(Middle)gPb output;(Bottom) SCG output(this work).Our SCG operator learns to preservefine details(e.g.windmills,faces,fish fins)while at the same time achieving higher precision on large-scale contours(e.g.back of zebras). (Contours are shown in double width for the sake of visualization.)0.25and0.75,which can be computed through sqrt)improves the average precision to0.900,which translates to a large improvement in contour detection accuracy.Image Benchmarking Results.In Table1and Fig.4we show the precision-recall of our Sparse Code Gradients vs gPb on the BSDS500benchmark.We conduct four sets of experiments,using color or grayscale images,with or without the globalization component(for which we use exactly the same setup as in gPb).Using Sparse Code Gradients leads to a significant improvement in accuracy in all four cases.The local version of our SCG operator,i.e.only using local contrast,is already better(F=0.72)than gPb with globalization(F=0.71).The full version,local SCG plus spectral gradient(computed from local SCG),reaches an F-measure of0.739,a large step forward from gPb,as seen in the precision-recall curves in Fig.4.On BSDS300,our F-measure is0.715. We observe that SCG seems to pick upfine-scale details much better than gPb,hence the much higher recall rate,while maintaining higher precision over the entire range.This can be seen in the examples shown in Fig.5.While our scale range is similar to that of gPb,the multi-scale pooling scheme allows theflexibility of learning the balance of scales separately for each code word,which may help detecting the details.The supplemental material contains more comparison examples.In Table2we show the benchmarking results for two additional datasets,MSRC2and PAS-CAL2008.Again we observe large improvements in accuracy,in spite of the somewhat different natures of the scenes in these datasets.The improvement on MSRC2is much larger,partly because the images are smaller,hence the contours are smaller in scale and may be over-smoothed in gPb. As for computational cost,using integral images,local SCG takes∼100seconds to compute on a single-thread Intel Core i5-2500CPU on a BSDS image.It is slower than but comparable to the highly optimized multi-thread C++implementation of gPb(∼60seconds).Figure6:Examples of RGB-D contour detection on the NYU dataset(v2)[27].Thefive panels are:input image,input depth,image-only contours,depth-only contours,and color+depth contours. Color is good picking up details such as photos on the wall,and depth is useful where color is uniform(e.g.corner of a room,row1)or illumination is poor(e.g.chair,row2).RGB-D Contour Detection.We use the second version of the NYU Depth Dataset[27],which has higher quality groundtruth than thefirst version.A medianfiltering is applied to remove double contours(boundaries from two adjacent regions)within3pixels.For RGB-D baseline,we use a simple adaptation of gPb:the depth values are in meters and used directly as a grayscale image in gPb gradient computation.We use a linear combination to put(soft)color and depth gradients together in gPb before non-max suppression,with the weight set from validation.Table3lists the precision-recall evaluations of SCG vs gPb for RGB-D contour detection.All the SCG settings(such as scales and dictionary sizes)are kept the same as for BSDS.SCG again outperforms gPb in all the cases.In particular,we are much better for depth-only contours,for which gPb is not designed.Our approach learns the low-level representations of depth data fully automatically and does not require any manual tweaking.We also achieve a much larger boost by combining color and depth,demonstrating that color and depth channels contain complementary information and are both critical for RGB-D contour detection.Qualitatively,it is easy to see that RGB-D combines the strengths of color and depth and is a promising direction for contour and segmentation tasks and indoor scene analysis in general[22].Fig.6shows a few examples of RGB-D contours from our SCG operator.There are plenty of such cases where color alone or depth alone would fail to extract contours for meaningful parts of the scenes,and color+depth would succeed. 5DiscussionsIn this work we successfully showed how to learn and code local representations to extract contours in natural images.Our approach combined the proven concept of oriented gradients with powerful representations that are automatically learned through sparse coding.Sparse Code Gradients(SCG) performed significantly better than hand-designed features that were in use for a decade,and pushed contour detection much closer to human-level accuracy as illustrated on the BSDS500benchmark. Comparing to hand-designed features(e.g.Global Pb[2]),we maintain the high dimensional rep-resentation from pooling oriented neighborhoods and do not collapse them prematurely(such as computing chi-square distance at each scale).This passes a richer set of information into learn-ing contour classification,where a double power transform effectively codes the features for linear paring to previous learning approaches(e.g.discriminative dictionaries in[16]),our uses of multi-scale pooling and oriented gradients lead to much higher classification accuracies. Our work opens up future possibilities for learning contour detection and segmentation.As we il-lustrated,there is a lot of information locally that is waiting to be extracted,and a learning approach such as sparse coding provides a principled way to do so,where rich representations can be automat-ically constructed and adapted.This is particularly important for novel sensor data such as RGB-D, for which we have less understanding but increasingly more need.。
Visualization of Scalar Topology for Structural EnhancementC.L.Bajaj V.PascucciDepartment of Computer Sciences and TICAMUniversity of Texas,Austin,TX78733D.R.SchikoreCenter for Applied Scientific ComputingLawrence Livermore National Laboratory,Livermore,CA94550AbstractScalarfields arise in every scientific application.Existing scalar visualization techniques require that the user infer the global scalar structure from what is frequently an insufficient display of information.We present a visualization technique which nu-merically detects the structure at all scales,removing from the user the responsibility of extracting information implicit in the data,and presenting the structure explicitly for analysis.We further demonstrate how scalar topology detection proves use-ful for correct visualization and image processing applications such as image co-registration,isocontouring,and mesh com-pression.Keywords:Scientific Visualization,Scalar Fields,Curves and Surfaces,V ector Topology1IntroductionVisualization of scalarfields is common across all scientific dis-ciplines,including geographic data such as altitude and temper-ature,medical applications with CT and MRI values,and pres-sure and vorticity magnitude in computationalfluid dynamics. The purpose of the visualization is to aid the user in understand-ing the structure of the data[29].Common methods for visualizing scalar fields can be grouped into two broad classes.First are methods whose aim is to detect structure and present a display to the user which communicates this structure.Critical to these methods is the definition of structure,and how well the definition matches the visualization users’need.Second are those methods which attempt to display the entire scalarfield simultaneously,leav-ing interpretation of the display to the binations of the two methods serve to reinforce the information provided by each visualization.We will use for comparison one technique from each of these categories,isocontouring and colormapping. Isocontours,or constant valued curves and surfaces from continuous2D and3D scalarfields,are a common visualiza-tion technique for displaying scalarfield structure[21].By their definition,isocontours represent the data only at discrete lev-els,and as such are an effective technique for determining the“shape”of objects in the scalarfield.Shape extraction as de-fined by isocontoursis well understood and appreciated in many applications,such as Medical Imaging,as isocontours in a den-sityfield may result in realistic models of skeletal structure,skin surface,or various organs[22].Also implicit in their definition is the fact that isocontours are an incomplete representation of the scalarfield,as one can only infer from an isocontour that the data to one side is above the isovalue,and the data to the other side is below the isovalue.With multiple isocontours,the scalarfield effectively becomes segmented into afinite number of ranges,within which the structure remains unknown.The same claim of incompleteness can be made of any technique which only displays a portion of thefield.Moreover it is not obvious which isovalues one should select and how namy of them[4].Colormapping of scalar data defines a discrete or continuous range of colors onto which the scalar values are e of color,though proven to be useful in many visualization tech-niques,introduces complications due of perceptual issues,such as colorblindness.Colormaps may also mislead the user,for ex-ample when small-scale structure in the data is washed out due to the large range of values taken on by the variable.Scientific data which is time-varying in nature intensifies the problems with the methods described above.In the typical case,a scalar variable may take on a wide range of values over thecourse of a simulation,however at certain times during the sim-ulation the range may be much smaller.With both isocontours and colormapped display,it is desirable to use the same isoval-ues and colormap for each time-step being displayed in order to reduce the possibility of introducing artifacts which may be misinterpreted as features.This requirement complicates the task of choosing a good colormap or selection of isovalues fora time-varying visualization.In this paper we present a complementary scalar structure visualization technique which does not depend on the user to determine structure from the graphical display,but instead de-fines,computes,and displays the structure of a scalarfield di-rectly.Through detection of all critical points(saddles,max-ima,and minima),we construct an embedded graph by com-puting integral curves in the gradientfield from saddle points to an attached critical point,as illustrated infigure1.Curves in 1Figure1:Isocontours(dotted)of part of a scalarfield along with the critical points and integral curvesthis topological graph are always perpendicular to isocontours of the scalarfield[23],and we will demonstrate that these curves contain complementary information to that provided by display of isocontours or colormapped scalarfields,providing a method which is both useful in its own right and which also enhances the commonly used techniques for visualizing scalarfields.We further indicate that the definition of structure which is provided by the scalar topology proves useful in several additional visu-alization and image processing applications.2Related WorkMuch of the work in enhancing colormapped visualization of scalarfields has dealt with determining“good”colormaps which effectively display the data.Bergman,et.al.,define rules based on perception,user goals,and data characteristics to automatically select a colormap which will meet the user requirements[6].Histogram equalization is a technique which spreads the data evenly over the range of colors,using the avail-able color space to it’s fullest[28].The result is that each color in the colormap is used an equal number of times.Gershon[14] uses“Generalized Animation”to display otherwise static scalar data in a dynamic way,taking advantage of the ability of the visual system to detect dynamic changes.Animation draws at-tention to fuzzy details in the data which may not be detected in the static representation.There has been several papers in detecting isocontours in2d and3d scalar data[21,30].Additional work concentrates on handling problems in regions containing saddle points which cause difficulty in determining the topological structure of the surface contained in the region[25,31,26].The problem of de-tecting ridges and valleys in digital terrain has been treated in several papers[12].McCormack,et.al.consider the problem of detecting drainage patterns in geographic terrain[24].Inter-rante,et.al.have used ridge and valley detection on3d surfaces to enhance the shape of transparently rendered surfaces[19]. Extrema graphs were used by Itoh and Koyamada to speed iso-contour extraction[20].A graph containing extreme points and boundary points of a scalarfield can be guaranteed to intersect every isocontour at least once,allowing seed points to be gener-ated by searching only the cells contained in the extrema graph. Helman and Hesselink detect vectorfield topology by clas-sifying the zeros of a vectorfield and performing particle trac-ing from saddle points[17].The resulting partitioning consists of regions which are topologically equivalent to uniformflow. Globus,et.al.describe a software system for3d vector topol-ogy and briefly note that the technique may also be applied to the gradient of a scalarfield in order to identify maxima and minima[15].Bader et.al.and Collard et.al.examine the gra-dientfield of the charge density in a molecular system[2,1,10]. The topology of this scalarfield represents the bonds linking to-gether the atoms of the molecule.Bader goes on to show how features higher level structures in the topology represent chains, rings,an cages in the molecule.Bader’s example is a defin-ing motivation for developing the automatic extraction and vi-sualization of topology from a scalarfield.In many situations, topology provides a more intuitive and physically meaningful visualization.Grosse[16]also presents methods of approxi-mating the scalar topology of the electron density function of proteins.One of his methods uses tensor product B-splinefits while the other scales Fourier coefficients of the electron den-sity function.3Scalar TopologyPrevious techniques for enhancing scalarfield visualization at-tempt to address the inability of colormapping and isocontour-ing to capture and directly represent features in the data.We ad-dress this problem not through feature enhancement using ex-isting visualization techniques,but through direct feature detec-tion and display.For our purpose of detection and display,we define the topology of a scalarfield defined with domainto consist of the following:1.The local maxima of2.The local minima of3.The saddle points of4.Selected integral curves joining each of the above Integral curves are defined as curves which are everywhere tangent to the gradientfield of.Intuitively,these curves rep-resent the path followed by a heat-seeking particle in a temper-aturefield,or the path followed by a ball rolling down a hill in a field of elevation values.In vectorfield topology,the curves ad-vected in theflowfield segment thefield into regions which are topologically equivalent to uniformflow.In the case of scalar topology,integral curves segment thefield into regions in which the gradientflow is uniform,or in other words,the scalar func-tion is monotonic.Such a segmentation of the scalarfield into regions of simple behavior reveals the structure of the scalar field for the visualization user.We outline the procedure for visualization of scalar topology as follows:1.Detect stationary(critical)points in.2.Classify stationary points.3.Integrate selected integral curves in gradientfield.In the following subsections,we will define our model of a continuous scalarfield and look at each of the steps defined above.3.1Scalar Field ModelIn typical scientific applications,data is represented at the nodes of a mesh of elements and interpolated linearly across the inte-rior of the elements.Such a data model is continuous and has a discontinuous gradientfield,making it unsuitable for our purpose of tracing integral curves in the gradientfield.We seek to construct a data model such that:1.The original nodal data is interpolated.2.The gradient at the boundaries is continuous.3.Critical points in the scalarfield are not removed,and thenumber introduced is keptsmall.Figure2:Artificial extreme points introduced by central differ-encingWe could satisfy thefirst two properties by computing derivatives by a method such as central differencing,which would uniquely define a continuous bi-cubic scalar inter-polant[3].However,such a choice of interpolant is likely to violate our third requirement by introducing critical points,as illustrated for the1-D case infigure2.To address this problem,we use a“damped”central differ-encing scheme as described in the following sections.The re-sulting scalarfield will remain a piecewise continuous bi-cubic function,which we represent in Bernstein-B´e zier form as:whereAs a result,the derivatives of the scalarfield can be repre-sented as:Figure3:Damped central differences maintain critical points Otherwise,the point data at,,and are monotonic,and we dampen the central difference as follows:Original vertex weight Weight determined by fist order partial derivativesWeight determined by second order partial derivatives in two variables Weight determined by third order partial derivative in three variables===Figure 6:Constraints on the mixed partial derivatives for 3Dever,due to the special construction of our interpolant,we have knowledge about where the critical points will occur,and can compute them quite efficiently.Critical points which occur at the vertices of the mesh will be preserved,and can be computed from the bilinear or trilin-ear field respectively,with the guarantee that they exist as well in the higher order shape preserving interpolant.Critical points interior to a cell will occur in locations at which the monotonic-ity constraint could not be met.In smooth parts of the field,there will be no problem computing a monotone field,which will guarantee the absence of critical points.In cells at which constraints were violated,we perform subdivision of the cell in order to locate the critical points,followed by Newton-Rhapson iteration to refine the positions of the zeroes.Saddles from the initial bilinear or trilinear mesh can be approximated by com-puting the position of the bilinear or trilinear saddle analyti-cally,followed by iteration in the bi-cubic or tri-cubic field,re-spectively.3.3Classification of Critical PointsQualitative information about the behavior of the gradient field near a critical point is obtained by analysis of the Hessian of ,given for 2D:The eigenvalues and eigenvectors of the above matrix de-termine the behavior of the gradient field and hence the scalar field near the critical point,much the same as for the behav-ior of a general vector field[7,17].One difference to note is that for a gradient field,the matrix of derivatives is symmetric (),and therefore the eigenvalues will all be real.This is intuitively expected,as imaginary eigenval-ues indicate rotation about the critical point,and a gradient field is an irrotational vector field.This observation allows us to sim-plify the classification of critical points as depicted in figure 7.A positive eigenvalue corresponds to gradient flow away from the critical point,while a negative eigenvalue indicatesMaximaMinimaRegular SaddleConstantDegenerate SaddleFigure 7:Some of the scalar critical pointsgradient flow toward the critical point.In the case of a sad-dle point,there is gradient flow toward and away from the crit-ical point,distinguishing it from the field behavior near other critical points.In this case,the eigenvectors corresponding to the positive and negative eigenvalues define the principal direc-tions of the flow toward and away from the saddle,respectively.It is this property that will be used in the next section to compute critical curves in the gradient field.3.4Tracing Integral CurvesHaving computed and classified the critical points,the final step for computing the scalar topology is the tracing of selected crit-ical curves between the detected points.Even for three and higher dimensional scalar fields we restrict our focus to only computing critical curves,and ignore critical surfaces and hy-persurfaces and other degeneracies in the field,Saddle points have the property that the eigenvectors of the Hessian are the separatrices of the saddle.A particle following the gradient field along the these directions will come to rest at the saddle point,while particles slightlyto either side of the sep-aratrices will diverge rapidly near the point.It is for this reason that saddle points and the critical curves associated with their separatrices are useful in determining the structure of a scalar field.The number of critical curves emanating from saddles along separatrices is twice the field dimension.In 2D,four crit-ical curves are computed for each saddle point,two in the di-rection corresponding to the positive eigenvalue,and two in the direction corresponding to the negative eigenvalue.In 3D,the number is six,and so on.Integral curves are computed using the following 4th order adaptive step Runge Kutta integration in the gradient field[27],where is the time step which adapts per iteration,and is a field point :1.2.4.5.The initial position for the iterative stepping is placed a small distance from the saddle point along the appropriate eigenvec-tor.The steps are bounded such that we take no less than5steps per cell,maintaining a high level of putation of the critical curve ends when we reach the vicinity of another critical point within a certain,in which case the curve termi-nates at that point.Other curves may end at the boundaries of the mesh.4Quality ComparisonHere we compare the qualities of scalar topology visualization with those of isocontours and colormapping.Integral curves are everywhere orthogonal to isocontours. The two techniques arise from an orthogonal definition of “structure”for a scalar variable.Contours are an attempt to compute and display the exact shape of an object in a scalar field,while the topology graph attempts to show the relations among all such objects in thefield,without giving the details of shapes of particular objects.Note that scalarfield topology is invariant under translation and uniform scaling.This quality is very similar to colormapping of scalar variables,in which the entire range of variables is mapped into a color space.Transla-tion and scaling of the scalar variables changes only the map-ping function,not the result.5ExamplesFigures8,9,10and the top two pictures of the Color Plate, demonstrate the use of scalar topology along with both isocon-tours and colormapped visualizations of density in an off-axis pion collision.Figure8uses a simple greyscale colormap,and it is clear that much of the area of interest in the center is washed out.Figure9uses a hue-based colormap and adds isocontours of three isovalues to reveal more of the structure and aid the per-ception.Infigure10,we show the scalar topology of density. This image clearly brings out the detail of the structure of the variable.The topfigures of the Color Plate show a closeup of the interesting topological regions,as well as shows a combi-nation of all three visualization techniques.While small scale structure is important in many scientific applications,in some circumstances the visualization user is in-terested only in large scale structure.For this situation,we ap-ply afilter to smooth the data before applying the topology de-tection algorithm.Figures11,12show two visualizations of topology in a scalarfield representing wind speed.Infigure 11,the unfiltered scalarfield topology reveals some noise in the data.Figure12shows the topology for the same data after a Gaussianfilter has been applied.The middlefigures of the color plate shows an example of scalar topology applied to a mathematically defined surface.In the leftfigure the scalar topology is displayed.In the rightfig-ure both topology and four isocontours are displayed.Notice that even with four isolevels displayed,there are critical points within contour regions which are not revealed like the two max-ima on the bottom left that are not separated by any isocontour. The bottomfigures of the color plate show an example of scalar topology applied to a3D scalarfields(the wave function computed for a high potential iron protein).6Other ApplicationsComputation of scalar topology has the potential to serve many other visualization and image processing applications.We mention only a few here:Data Correlation-Due in part to the invariance under trans-lation and scaling,scalar topology is useful in visually de-termining linear correlation between multiple scalar vari-ables.Image Co-registration-Scalar topology in adjacent planes provides a“1D skeleton”which may be used to align the planes.Warping/Morphing-Editing of the scalar backbone may be used to apply a warping effect to an image,or to warp be-tween the backbones of two similar images.Mesh Reduction-The scalar topology may serve as a guide to aid in computation of reduced resolution meshes. Surface Triangulation-Adaptive triangulation of arbitrary mathematical surfaces by decomposition into monotonic patches which may be subdivided to an arbitrary precision.7ConclusionsExisting scalar visualization techniques lack the ability to ex-plicitly present the structure of a scalarfield to the user.We have presented a definition of scalar structure and a straight-forward algorithm for computing and displaying the structure. For typical scientific data,the scalar data model remains true to the original linear data,minimizing introduction of false critical points,and also simplifying the detection of critical points. The resulting topology visualization serves to both provide information which is not available in commonly used scalar visualization techniques,as well as reinforcing or enhancing the information provided by common visualization techniques. Furthermore,computation of scalar topology offers promise to-ward improving several visualization and image processing ap-plications.AcknowledgementsWe are grateful to Lawrence Livermore National Lab for ac-cess to the pion collision data set.The Earth Science dataset iscourtesy the Space Science and Engineering Center at the Uni-versity of Wisconsin.The Wave function data set is courtesy the Visualization lab,SUNY-Stony Brook.This research was supported in part by AFOSR grant F49620-97-1-0278and ONR grant N00014-97-1-0398.References[1]R.Bader.Atoms in Molecules.Clarendon Press,Oxford,1990.[2]R.Bader,T.Tung Nguyen-Dang,and Y.Tal.Quantumtopology of molecular charge distributions ii.molecular structure and its charge.Journal of Chemical Physics, 70:4316–4329,1979.[3]C.Bajaj.Modelling physicalfields for interrogative visu-alization.In Tim Goodman and Ralph Martin,editor,The Mathematics of Surfaces VII,pages241–rmation Geometers Ltd.,1997.[4]C.L.Bajaj,V.Pascucci,and D.R.Schikore.The contourspectrum.In R.Yagel and H.Hagen,editors,Proceeding Visualization’97,pages167–173,Phoenix,AZ,October 1997.IEEE Computer Society&ACM SIGGRAPH. [5]R.Beatson and Z.Ziegler.Monotonicity preserving sur-face interpolation.In SIAM J.Numer.Anal.,volume22, pages401–411,April1985.[6]L.Bergman,B.Rogowitz,and L.Treinish.A rule-basedtool for assisting colormap selection.In G.M.Nielson andD.Silver,editors,Visualization’95Proceedings,pages118–125,October1995.[7]W.Boyce and R.DiPrima.Elementary Differential Equa-tions and Boundary V alue Problems.John Wiley and Sons,Inc.,New York,fifth edition,1992.[8]R.Carlson and F.Fritsch.Monotone piecewise bicubicinterpolation.In SIAM J.Numer.Anal.,volume22,pages 386–400,April1985.[9]R.Carlson and F.Fritsch.An algorithm for monotonepiecewise bicubic interpolation.In SIAM J.Numer.Anal., volume26,pages230–238,February1989.[10]K.Collard and G.Hall.Orthogonal trajectories ofthe electron density.International Journal of Quantum Chemistry XII,0:623–637,1977.[11]S.Dodd,D.McAllister,and J.Roulier.Shape-preservingspline interpolation for specifying bivariate functions on grids.In IEEE Computer Graphics and Applications, pages70–79,September1983.[12]R.J.Fowler and J.J.Little.Automatic extraction of irreg-ular network digital terrain models.In Computer Graph-ics(SIGGRAPH’79Proceedings),volume13(3),pages 199–207,August1979.[13]F.Fritsch and R.Carlson.Monotone piecewise cubic in-terpolation.In SIAM J.Numer.Anal.,volume17,pages 238–246,April1980.[14]N.D.Gershon.Visualization of fuzzy data using general-ized animation.In A.E.Kaufman and G.M.Nielson,ed-itors,Visualization’92Proceedings,pages268–273,Oc-tober1992.[15]A.Globus,C.Levit,and sinski.A tool for visual-izing the topology of three-dimensional vectorfields.InG.M.Nielson and L.Rosenblum,editors,Visualization’91Proceedings,pages33–40,October1991.[16]E.Grosse.Approximation and Optimization of ElectronDensity Maps.PhD thesis,Stanford University,1980. [17]J.Helman and L.Hesselink.Visualizing vectorfieldtopology influidflows.IEEE Computer Graphics and Ap-plications,11(3),1991.[18]J.Hoscheck and sser.Fundamentals of ComputerAided Geometric Design.A K Peters,Wellesley,Mas-sachusetts,1993.[19]V.Interrante,H.Fuchs,and S.Pizer.Enhancing trans-parent skin surfaces with ridge and valley lines.In G.M.Nielson and D.Silver,editors,Visualization’95Proceed-ings,pages52–59,October1995.[20]T.Itoh and K.Koyamada.Isosurface extraction by usingextrema graphs.In R.D.Bergeron and A.E.Kaufman, editors,Visualization’94Proceedings,pages77–83,Oc-tober1994.[21]William E.Lorensen and Harvey E.Cline.Marchingcubes:A high resolution3D surface construction algo-rithm.In Maureen C.Stone,editor,Computer Graphics (SIGGRAPH’87Proceedings),volume21,pages163–169,July1987.[22]W.Lorenson.Marching through the visible man.In G.M.Nielson and D.Silver,editors,Visualization’95Proceed-ings,pages368–373,October1995.[23]J.Marsden and A.Tromba.V ector Calculus.W.H.Free-man and Company,New York,third edition,1988. [24]J.E.McCormack,M.N.Gahegan,S.A.Roberts,J.Hogg,and B.S.Hoyle.Feature-based derivation of drainage net-works.Int.Journal of Geographical Information Systems, 7(3):263–279,1993.[25]B.K.Natarajan.On generating topologically consis-tent isosurfaces from uniform samples.Technical Report HPL-91-76,Hewlett-Packard,June1991.[26]G.M.Nielson and B.Hamaan.The asymptotic decider:Resolving the ambiguity of marching cubes.In G.M.Nielson and L.Rosenblum,editors,Visualization’91Pro-ceedings,pages83–91,October1991.Figure8:Visualization of density in a pion collision simulation: GreyscalecolormappingFigure9:Visualization of density in a pion collision simulation: Isocontours overlayed on hue based colormapping[27]W.Press,S.Teukolsky,W.V etterling,and B.Flannery.Numerical Recipes in C,Second Edition.Cambridge Uni-versity Press,1992.[28]A.Rosenfeld and A.Kak.Digital Picture Processing.Academic Press,San Diego,1982.[29]E.Tufte.The Visual Display of Quantitative Information.Graphics Press,1983.[30]Jane Wilhelms and Allen V an Gelder.Octrees for fasterisosurface generation extended abstract.In Computer Graphics(San Diego Workshop on V olume Visualization), volume24,pages57–62,November1990.[31]Jane Wilhelms and Allen V an Gelder.Topological con-siderations in isosurface generation extended abstract.In Computer Graphics(San Diego Workshop on V olume Vi-sualization),volume24,pages79–86,November1990.Figure10:Visualization of density in a pion collision simula-tion:Topology overlayed on hue basedcolormappingFigure11:Visualization of wind speed from a climate model: Scalar topology for noisydataFigure12:Visualization of wind speed from a climate model: After applying a Gaussianfilter。