Abstract Visualization of Scalar Topology for Structural Enhancement
- 格式:pdf
- 大小:124.95 KB
- 文档页数:8
Semi-supervised and unsupervised extreme learningmachinesGao Huang,Shiji Song,Jatinder N.D.Gupta,and Cheng WuAbstract—Extreme learning machines(ELMs)have proven to be an efficient and effective learning paradigm for pattern classification and regression.However,ELMs are primarily applied to supervised learning problems.Only a few existing research studies have used ELMs to explore unlabeled data. In this paper,we extend ELMs for both semi-supervised and unsupervised tasks based on the manifold regularization,thus greatly expanding the applicability of ELMs.The key advantages of the proposed algorithms are1)both the semi-supervised ELM (SS-ELM)and the unsupervised ELM(US-ELM)exhibit the learning capability and computational efficiency of ELMs;2) both algorithms naturally handle multi-class classification or multi-cluster clustering;and3)both algorithms are inductive and can handle unseen data at test time directly.Moreover,it is shown in this paper that all the supervised,semi-supervised and unsupervised ELMs can actually be put into a unified framework. This provides new perspectives for understanding the mechanism of random feature mapping,which is the key concept in ELM theory.Empirical study on a wide range of data sets demonstrates that the proposed algorithms are competitive with state-of-the-art semi-supervised or unsupervised learning algorithms in terms of accuracy and efficiency.Index Terms—Clustering,embedding,extreme learning ma-chine,manifold regularization,semi-supervised learning,unsu-pervised learning.I.I NTRODUCTIONS INGLE layer feedforward networks(SLFNs)have been intensively studied during the past several decades.Most of the existing learning algorithms for training SLFNs,such as the famous back-propagation algorithm[1]and the Levenberg-Marquardt algorithm[2],adopt gradient methods to optimize the weights in the network.Some existing works also use forward selection or backward elimination approaches to con-struct network dynamically during the training process[3]–[7].However,neither the gradient based methods nor the grow/prune methods guarantee a global optimal solution.Al-though various methods,such as the generic and evolutionary algorithms,have been proposed to handle the local minimum This work was supported by the National Natural Science Foundation of China under Grant61273233,the Research Fund for the Doctoral Program of Higher Education under Grant20120002110035and20130002130010, the National Key Technology R&D Program under Grant2012BAF01B03, the Project of China Ocean Association under Grant DY125-25-02,and Tsinghua University Initiative Scientific Research Program under Grants 2011THZ07132.Gao Huang,Shiji Song,and Cheng Wu are with the Department of Automation,Tsinghua University,Beijing100084,China(e-mail:huang-g09@;shijis@; wuc@).Jatinder N.D.Gupta is with the College of Business Administration,The University of Alabama in Huntsville,Huntsville,AL35899,USA.(e-mail: guptaj@).problem,they basically introduce high computational cost. One of the most successful algorithms for training SLFNs is the support vector machines(SVMs)[8],[9],which is a maximal margin classifier derived under the framework of structural risk minimization(SRM).The dual problem of SVMs is a quadratic programming and can be solved conveniently.Due to its simplicity and stable generalization performance,SVMs have been widely studied and applied to various domains[10]–[14].Recently,Huang et al.[15],[16]proposed the extreme learning machines(ELMs)for training SLFNs.In contrast to most of the existing approaches,ELMs only update the output weights between the hidden layer and the output layer, while the parameters,i.e.,the input weights and biases,of the hidden layer are randomly generated.By adopting squared loss on the prediction error,the training of output weights turns into a regularized least squares(or ridge regression)problem which can be solved efficiently in closed form.It has been shown that even without updating the parameters of the hidden layer,the SLFN with randomly generated hidden neurons and tunable output weights maintains its universal approximation capability[17]–[19].Compared to gradient based algorithms, ELMs are much more efficient and usually lead to better generalization performance[20]–[22].Compared to SVMs, solving the regularized least squares problem in ELMs is also faster than solving the quadratic programming problem in standard SVMs.Moreover,ELMs can be used for multi-class classification problems directly.The predicting accuracy achieved by ELMs is comparable with or even higher than that of SVMs[16],[22]–[24].The differences and similarities between ELMs and SVMs are discussed in[25]and[26], and new algorithms are proposed by combining the advan-tages of both models.In[25],an extreme SVM(ESVM) model is proposed by combining ELMs and the proximal SVM(PSVM).The ESVM algorithm is shown to be more accurate than the basic ELMs model due to the introduced regularization technique,and much more efficient than SVMs since there is no kernel matrix multiplication in ESVM.In [26],the traditional RBF kernel are replaced by ELM kernel, leading to an efficient algorithm with matched accuracy of SVMs.In the past years,researchers from variesfields have made substantial contribution to ELM theories and applications.For example,the universal approximation ability of ELMs has been further studied in a classification context[23].The gen-eralization error bound of ELMs has been investigated from the perspective of the Vapnik-Chervonenkis(VC)dimension theory and the initial localized generalization error model(LGEM)[27],[28].Varies extensions have been made to the basic ELMs to make it more efficient and more suitable for specific problems,such as ELMs for online sequential data [29]–[31],ELMs for noisy/missing data[32]–[34],ELMs for imbalanced data[35],etc.From the implementation aspect, ELMs has recently been implemented using parallel tech-niques[36],[37],and realized on hardware[38],which made ELMs feasible for large data sets and real time reasoning. Though ELMs have become popular in a wide range of domains,they are primarily used for supervised learning tasks such as classification and regression,which greatly limits their applicability.In some cases,such as text classification, information retrieval and fault diagnosis,obtaining labels for fully supervised learning is time consuming and expensive, while a multitude of unlabeled data are easy and cheap to collect.To overcome the disadvantage of supervised learning al-gorithms that they cannot make use of unlabeled data,semi-supervised learning(SSL)has been proposed to leverage both labeled and unlabeled data[39],[40].The SSL algorithms assume that the input patterns from both labeled and unlabeled data are drawn from the same marginal distribution.Therefore, the unlabeled data naturally provide useful information for exploring the data structure in the input space.By assuming that the input data follows some cluster structure or manifold in the input space,SSL algorithms can incorporate both la-beled and unlabeled data into the learning process.Since SSL requires less effort to collect labeled data and can offer higher accuracy,it has been applied to various domains[41]–[43].In some other cases where no labeled data are available,people may be interested in exploring the underlying structure of the data.To this end,unsupervised learning(USL)techniques, such as clustering,dimension reduction or data representation, are widely used to fulfill these tasks.In this paper,we extend ELMs to handle both semi-supervised and unsupervised learning problems by introducing the manifold regularization framework.Both the proposed semi-supervised ELM(SS-ELM)and unsupervised ELM(US-ELM)inherit the computational efficiency and the learn-ing capability of traditional pared with existing algorithms,SS-ELM and US-ELM are not only inductive (straightforward extension for out-of-sample examples at test time),but also can be used for multi-class classification or multi-cluster clustering directly.We test our algorithms on a variety of data sets,and make comparisons with other related algorithms.The results show that the proposed algorithms are competitive with state-of-the-art algorithms in terms of accuracy and efficiency.It is worth to mention that all the supervised,semi-supervised and unsupervised ELMs can actually be put into a unified framework,that is all the algorithms consist of two stages:1)random feature mapping;and2)output weights solving.Thefirst stage is to construct the hidden layer using randomly generated hidden neurons.This is the key concept in the ELM theory,which differs it from many existing feature learning methods.Generating feature mapping randomly en-ables ELMs for fast nonlinear feature learning and alleviates the problem of over-fitting.The second stage is to solve the weights between the hidden layer and the output layer, and this is where the main difference of supervised,semi-supervised and unsupervised ELMs lies.We believe that the unified framework for the three types of ELMs might provide us a new perspective to understand the underlying behavior of the random feature mapping in ELMs.The rest of the paper is organized as follows.In Section II,we give a brief review of related existing literature on semi-supervised and unsupervised learning.Section III and IV introduce the basic formulation of ELMs and the man-ifold regularization framework,respectively.We present the proposed SS-ELM and US-ELM algorithms in Sections V and VI.Experiment results are given in Section VII,and Section VIII concludes the paper.II.R ELATED WORKSOnly a few existing research studies on ELMs have dealt with the problem of semi-supervised learning or unsupervised learning.In[44]and[45],the manifold regularization frame-work was introduce into the ELMs model to leverage both labeled and unlabeled data,thus extended ELMs for semi-supervised learning.However,both of these two works are limited to binary classification problems,thus they haven’t explore the full power of ELMs.Moreover,both algorithms are only effective when the number of training patterns is more than the number of hidden neurons.Unfortunately,this condition is usually violated in semi-supervised learning since the training data is relatively scarce compared to the hidden neurons,whose number is commonly set to several hundreds or several thousands.Recently,a co-training approach have been proposed to train ELMs in a semi-supervised setting [46].In this algorithm,the labeled training sets are augmented gradually by moving a small set of most confidently predicted unlabeled data to the labeled set at each loop,and ELMs are trained repeatedly on the pseudo-labeled set.Since the algo-rithm need to train ELMs repeatedly,it introduces considerable extra computational cost.The proposed SS-ELM is related to a few other mani-fold assumption based semi-supervised learning algorithms, such as the Laplacian support vector machines(LapSVMs) [47],the Laplacian regularized least squares(LapRLS)[47], semi-supervised neural networks(SSNNs)[48],and semi-supervised deep embedding[49].It has been shown in these works that manifold regularization is effective in a wide range of domains and often leads to a state-of-the-art performance in terms of accuracy and efficiency.The US-ELM proposed in this paper are related to the Laplacian Eigenmaps(LE)[50]and spectral clustering(SC) [51]in that they both use spectral techniques for embedding and clustering.In all these algorithms,an affinity matrix is first built from the input patterns.The SC performs eigen-decomposition on the normalized affinity matrix,and then embeds the original data into a d-dimensional space using the first d eigenvectors(each row is normalized to have unit length and represents a point in the embedded space)corresponding to the d largest eigenvalues.The LE algorithm performs generalized eigen-decomposition on the graph Laplacian,anduses the d eigenvectors corresponding to the second through the(d+1)th smallest eigenvalues for embedding.When LE and SC are used for clustering,then k-means is adopted to cluster the data in the embedded space.Similar to LE and SC,the US-ELM are also based on the affinity matrix,and it is converted to solving a generalized eigen-decomposition problem.However,the eigenvectors obtained in US-ELM are not used for data representation directly,but are used as the parameters of the network,i.e.,the output weights.Note that once the US-ELM model is trained,it can be applied to any presented data in the original input space.In this way,US-ELM provide a straightforward way for handling new patterns without recomputing eigenvectors as in LE and SC.III.E XTREME LEARNING MACHINES Consider a supervised learning problem where we have a training set with N samples,{X,Y}={x i,y i}N i=1.Herex i∈R n i,y i is a n o-dimensional binary vector with only one entry(correspond to the class that x i belongs to)equal to one for multi-classification tasks,or y i∈R n o for regression tasks,where n i and n o are the dimensions of input and output respectively.ELMs aim to learn a decision rule or an approximation function based on the training data. Generally,the training of ELMs consists of two stages.The first stage is to construct the hidden layer using afixed number of randomly generated mapping neurons,which can be any nonlinear piecewise continuous functions,such as the Sigmoid function and Gaussian function given below.1)Sigmoid functiong(x;θ)=11+exp(−(a T x+b));(1)2)Gaussian functiong(x;θ)=exp(−b∥x−a∥);(2) whereθ={a,b}are the parameters of the mapping function and∥·∥denotes the Euclidean norm.A notable feature of ELMs is that the parameters of the hidden mapping functions can be randomly generated ac-cording to any continuous probability distribution,e.g.,the uniform distribution on(-1,1).This makes ELMs distinct from the traditional feedforward neural networks and SVMs. The only free parameters that need to be optimized in the training process are the output weights between the hidden neurons and the output nodes.By doing so,training ELMs is equivalent to solving a regularized least squares problem which is considerately more efficient than the training of SVMs or backpropagation algorithms.In thefirst stage,a number of hidden neurons which map the data from the input space into a n h-dimensional feature space (n h is the number of hidden neurons)are randomly generated. We denote by h(x i)∈R1×n h the output vector of the hidden layer with respect to x i,andβ∈R n h×n o the output weights that connect the hidden layer with the output layer.Then,the outputs of the network are given byf(x i)=h(x i)β,i=1,...,N.(3)In the second stage,ELMs aim to solve the output weights by minimizing the sum of the squared losses of the prediction errors,which leads to the following formulationminβ∈R n h×n o12∥β∥2+C2N∑i=1∥e i∥2s.t.h(x i)β=y T i−e T i,i=1,...,N,(4)where thefirst term in the objective function is a regularization term which controls the complexity of the model,e i∈R n o is the error vector with respect to the i th training pattern,and C is a penalty coefficient on the training errors.By substituting the constraints into the objective function, we obtain the following equivalent unconstrained optimization problem:minβ∈R n h×n oL ELM=12∥β∥2+C2∥Y−Hβ∥2(5)where H=[h(x1)T,...,h(x N)T]T∈R N×n h.The above problem is widely known as the ridge regression or regularized least squares.By setting the gradient of L ELM with respect toβto zero,we have∇L ELM=β+CH H T(Y−Hβ)=0(6) If H has more rows than columns and is of full column rank,which is usually the case where the number of training patterns are more than the number of the hidden neurons,the above equation is overdetermined,and we have the following closed form solution for(5):β∗=(H T H+I nhC)−1H T Y,(7)where I nhis an identity matrix of dimension n h.Note that in practice,rather than explicitly inverting the n h×n h matrix in the above expression,we can use Gaussian elimination to directly solve a set of linear equations in a more efficient and numerically stable manner.If the number of training patterns are less than the number of hidden neurons,then H will have more columns than rows, which often leads to an underdetermined least squares prob-lem.In this case,βmay have infinite number of solutions.To handle this problem,we restrictβto be a linear combination of the rows of H:β=H Tα(α∈R N×n o).Notice that when H has more columns than rows and is of full row rank,then H H T is invertible.Multiplying both side of(6) by(H H T)−1H,we getα+C(Y−H H Tα)=0,(8) This yieldsβ∗=H Tα∗=H T(H H T+I NC)−1Y(9)where I N is an identity matrix of dimension N. Therefore,in the case where training patterns are plentiful compared to the hidden neurons,we use(7)to compute the output weights,otherwise we use(9).IV.T HE MANIFOLD REGULARIZATION FRAMEWORK Semi-supervised learning is built on the following two assumptions:(1)both the label data X l and the unlabeled data X u are drawn from the same marginal distribution P X ;and (2)if two points x 1and x 2are close to each other,then the conditional probabilities P (y |x 1)and P (y |x 2)should be similar as well.The latter assumption is widely known as the smoothness assumption in machine learning.To enforce this assumption on the data,the manifold regularization framework proposes to minimize the following cost functionL m=12∑i,jw ij ∥P (y |x i )−P (y |x j )∥2,(10)where w ij is the pair-wise similarity between two patterns x iand x j .Note that the similarity matrix W =[w ij ]is usually sparse,since we only place a nonzero weight between two patterns x i and x j if they are close,e.g.,x i is among the k nearest neighbors of x j or x j is among the k nearest neighbors of x i .The nonzero weights are usually computed using Gaussian function exp (−∥x i −x j ∥2/2σ2),or simply fixed to 1.Intuitively,the formulation (10)penalizes large variation in the conditional probability P (y |x )when x has a small change.This requires that P (y |x )vary smoothly along the geodesics of P (x ).Since it is difficult to compute the conditional probability,we can approximate (10)with the following expression:ˆLm =12∑i,jw ij ∥ˆyi −ˆy j ∥2,(11)where ˆyi and ˆy j are the predictions with respect to pattern x i and x j ,respectively.It is straightforward to simplify the above expression in a matrix form:ˆL m =Tr (ˆY T L ˆY ),(12)where Tr (·)denotes the trace of a matrix,L =D −W isknown as the graph Laplacian ,and D is a diagonal matrixwith its diagonal elements D ii =l +u∑j =1w i,j .As discussed in [52],instead of using L directly,we can normalize it byD −12L D −12or replace it by L p (p is an integer),based on some prior knowledge.V.S EMI -SUPERVISED ELMIn the semi-supervised setting,we have few labeled data and plenty of unlabeled data.We denote the labeled data in the training set as {X l ,Y l }={x i ,y i }l i =1,and unlabeled dataas X u ={x i }ui =1,where l and u are the number of labeled and unlabeled data,respectively.The proposed SS-ELM incorporates the manifold regular-ization to leverage unlabeled data to improve the classification accuracy when labeled data are scarce.By modifying the ordinary ELM formulation (4),we give the formulation ofSS-ELM as:minβ∈R n h ×n o12∥β∥2+12l∑i =1C i ∥e i ∥2+λ2Tr (F T L F )s.t.h (x i )β=y T i −e T i ,i =1,...,l,f i =h (x i )β,i =1,...,l +u(13)where L ∈R (l +u )×(l +u )is the graph Laplacian built fromboth labeled and unlabeled data,and F ∈R (l +u )×n o is the output matrix of the network with its i th row equal to f (x i ),λis a tradeoff parameter.Note that similar to the weighted ELM algorithm (W-ELM)introduced in [35],here we associate different penalty coeffi-cient C i on the prediction errors with respect to patterns from different classes.This is because we found that when the data is skewed,i.e.,some classes have significantly more training patterns than other classes,traditional ELMs tend to fit the classes that having the majority of patterns quite well but fits other classes poorly.This usually leads to poor generalization performance on the testing set (while the prediction accuracy may be high,but the some classes are neglected).Therefore,we propose to alleviate this problem by re-weighting instances from different classes.Suppose that x i belongs to class t i ,which has N t i training patterns,then we associate e i with a penalty ofC i =C 0N t i.(14)where C 0is a user defined parameter as in traditional ELMs.In this way,the patterns from the dominant classes will not be over fitted by the algorithm,and the patterns from a class with less samples will not be neglected.We substitute the constraints into the objective function,and rewrite the above formulation in a matrix form:min β∈R n h×n o 12∥β∥2+12∥C 12( Y −Hβ)∥2+λ2Tr (βT H TL Hβ)(15)where Y∈R (l +u )×n o is the training target with its first l rows equal to Y l and the rest equal to 0,C is a (l +u )×(l +u )diagonal matrix with its first l diagonal elements [C ]ii =C i ,i =1,...,l and the rest equal to 0.Again,we compute the gradient of the objective function with respect to β:∇L SS −ELM =β+H T C ( Y−H β)+λH H T L H β.(16)By setting the gradient to zero,we obtain the solution tothe SS-ELM:β∗=(I n h +H T C H +λH H T L H )−1H TC Y .(17)As in Section III,if the number of labeled data is fewer thanthe number of hidden neurons,which is common in SSL,we have the following alternative solution:β∗=H T (I l +u +C H H T +λL L H H T )−1C Y .(18)where I l +u is an identity matrix of dimension l +u .Note that by settingλto be zero and the diagonal elements of C i(i=1,...,l)to be the same constant,(17)and (18)reduce to the solutions of traditional ELMs(7)and(9), respectively.Based on the above discussion,the SS-ELM algorithm is summarized as Algorithm1.Algorithm1The SS-ELM algorithmInput:The labeled patterns,{X l,Y l}={x i,y i}l i=1;The unlabeled patterns,X u={x i}u i=1;Output:The mapping function of SS-ELM:f:R n i→R n oStep1:Construct the graph Laplacian L from both X l and X u.Step2:Initiate an ELM network of n h hidden neurons with random input weights and biases,and calculate the output matrix of the hidden neurons H∈R(l+u)×n h.Step3:Choose the tradeoff parameter C0andλ.Step4:•If n h≤NCompute the output weightsβusing(17)•ElseCompute the output weightsβusing(18)return The mapping function f(x)=h(x)β.VI.U NSUPERVISED ELMIn this section,we introduce the US-ELM algorithm for unsupervised learning.In an unsupervised setting,the entire training data X={x i}N i=1are unlabeled(N is the number of training patterns)and our target is tofind the underlying structure of the original data.The formulation of US-ELM follows from the formulation of SS-ELM.When there is no labeled data,(15)is reduced tomin β∈R n h×n o ∥β∥2+λTr(βT H T L Hβ)(19)Notice that the above formulation always attains its mini-mum atβ=0.As suggested in[50],we have to introduce addtional constraints to avoid a degenerated solution.Specifi-cally,the formulation of US-ELM is given bymin β∈R n h×n o ∥β∥2+λTr(βT H T L Hβ)s.t.(Hβ)T Hβ=I no(20)Theorem1:An optimal solution to problem(20)is given by choosingβas the matrix whose columns are the eigenvectors (normalized to satisfy the constraint)corresponding to thefirst n o smallest eigenvalues of the generalized eigenvalue problem:(I nh +λH H T L H)v=γH H T H v.(21)Proof:We can rewrite the problem(20)asminβ∈R n h×n o,ββT Bβ=I no Tr(βT Aβ),(22)Algorithm2The US-ELM algorithmInput:The training data:X∈R N×n i;Output:•For embedding task:The embedding in a n o-dimensional space:E∈R N×n o;•For clustering task:The label vector of cluster index:y∈N N×1+.Step1:Construct the graph Laplacian L from X.Step2:Initiate an ELM network of n h hidden neurons withrandom input weights,and calculate the output matrix of thehidden neurons H∈R N×n h.Step3:•If n h≤NFind the generalized eigenvectors v2,v3,...,v no+1of(21)corresponding to the second through the n o+1smallest eigenvalues.Letβ=[ v2, v3,..., v no+1],where v i=v i/∥H v i∥,i=2,...,n o+1.•ElseFind the generalized eigenvectors u2,u3,...,u no+1of(24)corresponding to the second through the n o+1smallest eigenvalues.Letβ=H T[ u2, u3,..., u no+1],where u i=u i/∥H H T u i∥,i=2,...,n o+1.Step4:Calculate the embedding matrix:E=Hβ.Step5(For clustering only):Treat each row of E as a point,and cluster the N points into K clusters using the k-meansalgorithm.Let y be the label vector of cluster index for allthe points.return E(for embedding task)or y(for clustering task);where A=I nh+λH H T L H and B=H T H.It is easy to verify that both A and B are Hermitianmatrices.Thus,according to the Rayleigh-Ritz theorem[53],the above trace minimization problem attains its optimum ifand only if the column span ofβis the minimum span ofthe eigenspace corresponding to the smallest n o eigenvaluesof(21).Therefore,by stacking the normalized eigenvectors of(21)corresponding to the smallest n o generalized eigenvalues,we obtain an optimal solution to(20).In the algorithm of Laplacian eigenmaps,thefirst eigenvec-tor is discarded since it is always a constant vector proportionalto1(corresponding to the smallest eigenvalue0)[50].In theUS-ELM algorithm,thefirst eigenvector of(21)also leadsto small variations in embedding and is not useful for datarepresentation.Therefore,we suggest to discard this trivialsolution as well.Letγ1,γ2,...,γno+1(γ1≤γ2≤...≤γn o+1)be the(n o+1)smallest eigenvalues of(21)and v1,v2,...,v no+1be their corresponding eigenvectors.Then,the solution to theoutput weightsβis given byβ∗=[ v2, v3,..., v no+1],(23)where v i=v i/∥H v i∥,i=2,...,n o+1are the normalizedeigenvectors.If the number of labeled data is fewer than the numberTABLE ID ETAILS OF THE DATA SETS USED FOR SEMI-SUPERVISED LEARNINGData set Class Dimension|L||U||V||T|G50C2505031450136COIL20(B)2102440100040360USPST(B)225650140950498COIL2020102440100040360USPST1025650140950498of hidden neurons,problem(21)is underdetermined.In this case,we have the following alternative formulation by using the same trick as in previous sections:(I u+λL L H H T )u=γH H H T u.(24)Again,let u1,u2,...,u no +1be generalized eigenvectorscorresponding to the(n o+1)smallest eigenvalues of(24), then thefinal solution is given byβ∗=H T[ u2, u3,..., u no +1],(25)where u i=u i/∥H H T u i∥,i=2,...,n o+1are the normal-ized eigenvectors.If our task is clustering,then we can adopt the k-means algorithm to perform clustering in the embedded space.We summarize the proposed US-ELM in Algorithm2. Remark:Comparing the supervised ELM,the semi-supervised ELM and the unsupervised ELM,we can observe that all the algorithms have two similar stages in the training process,that is the random feature learning stage and the out-put weights learning stage.Under this two-stage framework,it is easy tofind the differences and similarities between the three algorithms.Actually,all the algorithms share the same stage of random feature learning,and this is the essence of the ELM theory.This also means that no matter the task is a supervised, semi-supervised or unsupervised learning problem,we can always follow the same step to generate the hidden layer. The differences of the three types of ELMs lie in the second stage on how the output weights are computed.In supervised ELM and SS-ELM,the output weights are trained by solving a regularized least squares problem;while the output weights in the US-ELM are obtained by solving a generalized eigenvalue problem.The unified framework for the three types of ELMs might provide new perspectives to further develop the ELM theory.VII.E XPERIMENTAL RESULTSWe evaluated our algorithms on wide range of semi-supervised and unsupervised parisons were made with related state-of-the-art algorithms, e.g.,Transductive SVM(TSVM)[54],LapSVM[47]and LapRLS[47]for semi-supervised learning;and Laplacian Eigenmap(LE)[50], spectral clustering(SC)[51]and deep autoencoder(DA)[55] for unsupervised learning.All algorithms were implemented using Matlab R2012a on a2.60GHz machine with4GB of memory.TABLE IIIT RAINING TIME(IN SECONDS)COMPARISON OF TSVM,L AP RLS,L AP SVM AND SS-ELMData set TSVM LapRLS LapSVM SS-ELMG50C0.3240.0410.0450.035COIL20(B)16.820.5120.4590.516USPST(B)68.440.9210.947 1.029COIL2018.43 5.841 4.9460.814USPST68.147.1217.259 1.373A.Semi-supervised learning results1)Data sets:We tested the SS-ELM onfive popular semi-supervised learning benchmarks,which have been widely usedfor evaluating semi-supervised algorithms[52],[56],[57].•The G50C is a binary classification data set of which each class is generated by a50-dimensional multivariate Gaus-sian distribution.This classification problem is explicitlydesigned so that the true Bayes error is5%.•The Columbia Object Image Library(COIL20)is a multi-class image classification data set which consists1440 gray-scale images of20objects.Each pattern is a32×32 gray scale image of one object taken from a specific view.The COIL20(B)data set is a binary classification taskobtained from COIL20by grouping thefirst10objectsas Class1,and the last10objects as Class2.•The USPST data set is a subset(the testing set)of the well known handwritten digit recognition data set USPS.The USPST(B)data set is a binary classification task obtained from USPST by grouping thefirst5digits as Class1and the last5digits as Class2.2)Experimental setup:We followed the experimental setup in[57]to evaluate the semi-supervised algorithms.Specifi-cally,each of the data sets is split into4folds,one of which was used for testing(denoted by T)and the rest3folds for training.Each of the folds was used as the testing set once(4-fold cross-validation).As in[57],this random fold generation process were repeated3times,resulted in12different splits in total.Every training set was further partitioned into a labeled set L,a validation set V,and an unlabeled set U.When we train a semi-supervised learning algorithm,the labeled data from L and the unlabeled data from U were used.The validation set which consists of labeled data was only used for model selection,i.e.,finding the optimal hyperparameters C0andλin the SS-ELM algorithm.The characteristics of the data sets used in our experiment are summarized in Table I. The training of SS-ELM consists of two stages:1)generat-ing the random hidden layer;and2)training the output weights using(17)or(18).In thefirst stage,we adopted the Sigmoid function for nonlinear mapping,and the input weights and biases were generated according to the uniform distribution on(-1,1).The number of hidden neurons n h wasfixed to 1000for G50C,and2000for the rest four data sets.In the second stage,wefirst need to build the graph Laplacian L.We followed the methods discussed in[52]and[57]to compute L,and the hyperparameter settings can be found in[47],[52] and[57].The trade off parameters C andλwere selected from。
A Tutorial on Uppaal4.0Updated November28,2006Gerd Behrmann,Alexandre David,and Kim rsenDepartment of Computer Science,Aalborg University,Denmark{behrmann,adavid,kgl}@cs.auc.dk.Abstract.This is a tutorial paper on the tool Uppaal.Its goal is to bea short introduction on theflavour of timed automata implemented inthe tool,to present its interface,and to explain how to use the tool.Thecontribution of the paper is to provide reference examples and modellingpatterns.1IntroductionUppaal is a toolbox for verification of real-time systems jointly developed by Uppsala University and Aalborg University.It has been applied successfully in case studies ranging from communication protocols to multimedia applications [35,55,24,23,34,43,54,44,30].The tool is designed to verify systems that can be modelled as networks of timed automata extended with integer variables,struc-tured data types,user defined functions,and channel synchronisation.Thefirst version of Uppaal was released in1995[52].Since then it has been in constant development[21,5,13,10,26,27].Experiments and improvements in-clude data structures[53],partial order reduction[20],a distributed version of Uppaal[17,9],guided and minimal cost reachability[15,51,16],work on UML Statecharts[29],acceleration techniques[38],and new data structures and memory reductions[18,14].Version4.0[12]brings symmetry reduction[36], the generalised sweep-line method[49],new abstraction techniques[11],priori-ties[28],and user defined functions to the mainstream.Uppaal has also gen-erated related Ph.D.theses[50,57,45,56,19,25,32,8,31].It features a Java user interface and a verification engine written in C++.It is freely available at /.This tutorial covers networks of timed automata and theflavour of timed automata used in Uppaal in section2.The tool itself is described in section3, and three extensive examples are covered in sections4,5,and6.Finally,section7 introduces common modelling patterns often used with Uppaal.2Timed Automata in UppaalThe model-checker Uppaal is based on the theory of timed automata[4](see[42] for automata theory)and its modelling language offers additional features such as bounded integer variables and urgency.The query language of Uppaal,usedto specify properties to be checked,is a subset of TCTL (timed computation tree logic)[39,3].In this section we present the modelling and the query languages of Uppaal and we give an intuitive explanation of time in timed automata.2.1The Modelling LanguageNetworks of Timed Automata A timed automaton is a finite-state machine extended with clock variables.It uses a dense-time model where a clock variable evaluates to a real number.All the clocks progress synchronously.In Uppaal ,a system is modelled as a network of several such timed automata in parallel.The model is further extended with bounded discrete variables that are part of the state.These variables are used as in programming languages:They are read,written,and are subject to common arithmetic operations.A state of the system is defined by the locations of all automata,the clock values,and the values of the discrete variables.Every automaton may fire an edge (sometimes misleadingly called a transition)separately or synchronise with another automaton 1,which leads to a new state.Figure 1(a)shows a timed automaton modelling a simple lamp.The lamp has three locations:off ,low ,and bright .If the user presses a button,i.e.,synchronises with press?,then the lamp is turned on.If the user presses the button again,the lamp is turned off.However,if the user is fast and rapidly presses the button twice,the lamp is turned on and becomes bright.The user model is shown in Fig.1(b).The user can press the button randomly at any time or even not press the button at all.The clock y of the lamp is used to detect if the user was fast (y <5)or slow (y >=5).press?‚‚‚‚‚press!(a)Lamp.(b)User.Fig.1.The simple lamp example.We give the basic definitions of the syntax and semantics for the basic timed automata.In the following we will skip the richer flavour of timed automata supported in Uppaal ,i.e.,with integer variables and the extensions of urgent and committed locations.For additional information,please refer to the helpmenu inside the tool.We use the following notations:C is a set of clocks and B (C )is the set of conjunctions over simple conditions of the form x ⊲⊳c or x −y ⊲⊳c ,where x,y ∈C ,c ∈N and ⊲⊳∈{<,≤,=,≥,>}.A timed automaton is a finite directed graph annotated with conditions over and resets of non-negative real valued clocks.Definition 1(Timed Automaton (TA)).A timed automaton is a tuple (L,l 0,C,A,E,I ),where L is a set of locations,l 0∈L is the initial location,C is the set of clocks,A is a set of actions,co-actions and the internal τ-action,E ⊆L ×A ×B (C )×2C ×L is a set of edges between locations with an action,a guard and a set of clocks to be reset,and I :L →B (C )assigns invariants to locations. In the previous example on Fig.1,y:=0is the reset of the clock y ,and the labels press?and press!denote action–co-action (channel synchronisations here).We now define the semantics of a timed automaton.A clock valuation is a function u :C →R ≥0from the set of clocks to the non-negative reals.Let R C be the set of all clock valuations.Let u 0(x )=0for all x ∈C .We will abuse the notation by considering guards and invariants as sets of clock valuations,writing u ∈I (l )to mean that u satisfies I (l ).0000000001111111110001110000000000000000000000000000000000000000000000001111111111111111111111111111111111111111111111110000000000000000000000000000000000000000000000000000000011111111111111111111111111111111111111111111111111111111000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000011111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111110000111100001111<B,x=1><A,x=2><A,x=3><A,x=3>action transition delay(+1) transition delay(+2) transition state: <A,x=1>actiontransitionOK invalid action transition invalid state: invariant x<3 violatedFig.2.Semantics of TA:different transitions from a given initial state.Definition 2(Semantics of TA).Let (L,l 0,C,A,E,I )be a timed automaton.The semantics is defined as a labelled transition system S,s 0,→ ,where S ⊆L ×R C is the set of states,s 0=(l 0,u 0)is the initial state,and →⊆S ×(R ≥0∪A )×S is the transition relation such that:–(l,u )d−→(l,u +d )if ∀d ′:0≤d ′≤d =⇒u +d ′∈I (l ),and –(l,u )a −→(l ′,u ′)if there exists e =(l,a,g,r,l ′)∈E s.t.u ∈g ,u ′=[r →0]u ,and u ′∈I (l ′),3where for d∈R≥0,u+d maps each clock x in C to the value u(x)+d,and [r→0]u denotes the clock valuation which maps each clock in r to0and agrees with u over C\r. Figure2illustrates the semantics of TA.From a given initial state,we can choose to take an action or a delay transition(different values here).Depending of the chosen delay,further actions may be forbidden.Timed automata are often composed into a network of timed automata over a common set of clocks and actions,consisting of n timed automata A i= (L i,l0i,C,A,E i,I i),1≤i≤n.A location vector is a vector¯l=(l1,...,l n). We compose the invariant functions into a common function over location vec-tors I(¯l)=∧i I i(l i).We write¯l[l′i/l i]to denote the vector where the i th element l i of¯l is replaced by l′i.In the following we define the semantics of a network of timed automata.Definition3(Semantics of a network of Timed Automata).Let A i= (L i,l0i,C,A,E i,I i)be a network of n timed automata.Let¯l0=(l01,...,l0n)be the initial location vector.The semantics is defined as a transition system S,s0,→ , where S=(L1×···×L n)×R C is the set of states,s0=(¯l0,u0)is the initial state,and→⊆S×S is the transition relation defined by:–(¯l,u)d−→(¯l,u+d)if∀d′:0≤d′≤d=⇒u+d′∈I(¯l).−−→l′i s.t.u∈g,–(¯l,u)a−→(¯l[l′i/l i],u′)if there exists l iτgru′=[r→0]u and u′∈I(¯l[l′i/l i]).–(¯l,u)a−→(¯l[l′j/l j,l′i/l i],u′)if there exist l i c?g i r i−−−−→l′i and−−−−→l′j s.t.u∈(g i∧g j),u′=[r i∪r j→0]u and u′∈I(¯l[l′j/l j,l′i/l i]).l j c!g j r jAs an example of the semantics,the lamp in Fig.1may have the follow-ing states(we skip the user):(Lamp.off,y=0)→(Lamp.off,y=3)→(Lamp.low,y=0)→(Lamp.low,y=0.5)→(Lamp.bright,y=0.5)→(Lamp.bright,y=1000)...Timed Automata in Uppaal The Uppaal modelling language extends timed automata with the following additional features(see Fig.3:Templates automata are defined with a set of parameters that can be of any type(e.g.,int,chan).These parameters are substituted for a given argument in the process declaration.Constants are declared as const name value.Constants by definition cannot be modified and must have an integer value.Bounded integer variables are declared as int[min,max]name,where min and max are the lower and upper bound,respectively.Guards,invariants,and assignments may contain expressions ranging over bounded integer variables.The bounds are checked upon verification and violating a bound leads to an invalid state that is discarded(at run-time).If the bounds are omitted,the default range of-32768to32768is used.4Fig.3.Declarations of a constant and a variable,and illustration of some of the channel synchronisations between two templates of the train gate example of Section4,and some committed locations.5Binary synchronisation channels are declared as chan c.An edge labelled with c!synchronises with another labelled c?.A synchronisation pair is chosen non-deterministically if several combinations are enabled. Broadcast channels are declared as broadcast chan c.In a broadcast syn-chronisation one sender c!can synchronise with an arbitrary number of receivers c?.Any receiver than can synchronise in the current state must do so.If there are no receivers,then the sender can still execute the c!action,i.e.broadcast sending is never blocking.Urgent synchronisation channels are declared by prefixing the channel decla-ration with the keyword urgent.Delays must not occur if a synchronisation transition on an urgent channel is enabled.Edges using urgent channels for synchronisation cannot have time constraints,i.e.,no clock guards. Urgent locations are semantically equivalent to adding an extra clock x,that is reset on all incoming edges,and having an invariant x<=0on the location.Hence,time is not allowed to pass when the system is in an urgent location. Committed locations are even more restrictive on the execution than urgent locations.A state is committed if any of the locations in the state is commit-ted.A committed state cannot delay and the next transition must involve an outgoing edge of at least one of the committed locations.Arrays are allowed for clocks,channels,constants and integer variables.They are defined by appending a size to the variable name,e.g.chan c[4];clock a[2];int[3,5]u[7];.Initialisers are used to initialise integer variables and arrays of integer vari-ables.For instance,int i=2;or int i[3]={1,2,3};.Record types are declared with the struct construct like in C.Custom types are defined with the C-like typedef construct.You can define any custom-type from other basic types such as records.User functions are defined either globally or locally to templates.Template parameters are accessible from local functions.The syntax is similar to C except that there is no pointer.C++syntax for references is supported for the arguments only.Expressions in Uppaal Expressions in Uppaal range over clocks and integer variables.The BNF is given in Fig.33in the appendix.Expressions are used with the following labels:Select A select label contains a comma separated list of name:type expressions where name is a variable name and type is a defined type(built-in or custom).These variables are accessible on the associated edge only and they will takea non-deterministic value in the range of their respective types.Guard A guard is a particular expression satisfying the following conditions: it is side-effect free;it evaluates to a boolean;only clocks,integer variables, and constants are referenced(or arrays of these types);clocks and clock differences are only compared to integer expressions;guards over clocks are essentially conjunctions(disjunctions are allowed over integer conditions).A guard may call a side-effect free function that returns a bool,although clock constraints are not supported in such functions.6Synchronisation A synchronisation label is either on the form Expression!or Expression?or is an empty label.The expression must be side-effect free, evaluate to a channel,and only refer to integers,constants and channels. Update An update label is a comma separated list of expressions with a side-effect;expressions must only refer to clocks,integer variables,and constants and only assign integer values to clocks.They may also call functions. Invariant An invariant is an expression that satisfies the following conditions:it is side-effect free;only clock,integer variables,and constants are referenced;it is a conjunction of conditions of the form x<e or x<=e where x is a clock reference and e evaluates to an integer.An invariant may call a side-effect free function that returns a bool,although clock constraints are not supported in such functions.2.2The Query LanguageThe main purpose of a model-checker is verify the model w.r.t.a requirement specification.Like the model,the requirement specification must be expressed in a formally well-defined and machine readable language.Several such logics exist in the scientific literature,and Uppaal uses a simplified version of TCTL. Like in TCTL,the query language consists of path formulae and state formulae.2 State formulae describe individual states,whereas path formulae quantify over paths or traces of the model.Path formulae can be classified into reachability, safety and liveness.Figure4illustrates the different path formulae supported by Uppaal.Each type is described below.State Formulae A state formula is an expression(see Fig.33)that can be evaluated for a state without looking at the behaviour of the model.For instance, this could be a simple expression,like i==7,that is true in a state whenever i equals7.The syntax of state formulae is a superset of that of guards,i.e.,a state formula is a side-effect free expression,but in contrast to guards,the use of disjunctions is not restricted.It is also possible to test whether a particular process is in a given location using an expression on the form P.l,where P is a process and l is a location.In Uppaal,deadlock is expressed using a special state formula(although this is not strictly a state formula).The formula simply consists of the keyword deadlock and is satisfied for all deadlock states.A state is a deadlock state if there are no outgoing action transitions neither from the state itself or any of its delay successors.Due to current limitations in Uppaal,the deadlock state formula can only be used with reachability and invariantly path formulae(see below).Reachability Properties Reachability properties are the simplest form of properties.They ask whether a given state formula,ϕ,possibly can be satisfied3Notice that A ϕ=¬E3¬ϕ8there should exist a maximal path such thatϕis always true.4In Uppaal we write A[]ϕand E[]ϕ,respectively.Liveness Properties Liveness properties are of the form:something will even-tually happen,e.g.when pressing the on button of the remote control of the television,then eventually the television should turn on.Or in a model of a communication protocol,any message that has been sent should eventually be received.In its simple form,liveness is expressed with the path formula A3ϕ,mean-ingϕis eventually satisfied.5The more useful form is the leads to or response property,writtenϕ ψwhich is read as wheneverϕis satisfied,then eventu-allyψwill be satisfied,e.g.whenever a message is sent,then eventually it will be received.6In Uppaal these properties are written as A<>ϕandϕ-->ψ, respectively.2.3Understanding TimeInvariants and Guards Uppaal uses a continuous time model.We illustrate the concept of time with a simple example that makes use of an observer.Nor-mally an observer is an add-on automaton in charge of detecting events without changing the observed system.In our case the clock reset(x:=0)is delegated to the observer for illustration purposes.Figure5shows thefirst model with its observer.We have two automata in parallel.Thefirst automaton has a self-loop guarded by x>=2,x being a clock,that synchronises on the channel reset with the second automaton.The second automaton,the observer,detects when the self loop edge is taken with the location taken and then has an edge going back to idle that resets the clock x.We moved the reset of x from the self loop to the observer only to test what happens on the transition before the reset.Notice that the location taken is committed(marked c)to avoid delay in that location.The following properties can be verified in Uppaal(see section3for an overview of the interface).Assuming we name the observer automaton Obs,we have:–A[]Obs.taken imply x>=2:all resets offx will happen when x is above2.This query means that for all reachable states,being in the locationObs.taken implies that x>=2.–E<>Obs.idle and x>3:this property requires,that it is possible to reach-able state where Obs is in the location idle and x is bigger than3.Essentially we check that we may delay at least3time units between resets.The result would have been the same for larger values like30000,since there are no invariants in this model.x>=2reset!‚‚‚‚‚246824"time"c l o c k x (a)Test.(b)Observer.(c)Behaviour:one possible run.Fig.5.First example with anobserver.x>=2reset!246824"time"c l o c k x(a)Test.(b)Updated behaviour with an invariant.Fig.6.Updated example with an invariant.The observer is the same as in Fig.5and is not shown here.We update the first model and add an invariant to the location loop ,as shown in Fig.6.The invariant is a progress condition:the system is not allowed to stay in the state more than 3time units,so that the transition has to be taken and the clock reset in our example.Now the clock x has 3as an upper bound.The following properties hold:–A[]Obs.taken imply (x>=2and x<=3)shows that the transition is takenwhen x is between 2and 3,i.e.,after a delay between 2and 3.–E<>Obs.idle and x>2:it is possible to take the transition when x is be-tween 2and 3.The upper bound 3is checked with the next property.–A[]Obs.idle imply x<=3:to show that the upper bound is respected.The former property E<>Obs.idle and x>3no longer holds.Now,if we remove the invariant and change the guard to x>=2and x<=3,you may think that it is the same as before,but it is not!The system has no progress condition,just a new condition on the guard.Figure 7shows what happens:the system may take the same transitions as before,but deadlock may also occur.The system may be stuck if it does not take the transition after 3time units.In fact,the system fails the property A[]not deadlock .The property A[]Obs.idle imply x<=3does not hold any longer and the deadlock can also be illustrated by the property A[]x>3imply not Obs.taken ,i.e.,after 3time units,the transition is not taken any more.10x>=2 && x<=3reset!246824"time"c l o c k x(a)Test.(b)Updated behaviour with a guard and no invariant.Fig.7.Updated example with a guard and no invariant.P0P1P2Fig.8.Automata in parallel with normal,urgent and commit states.The clocks are local,i.e.,P0.x and P1.x are two different clocks.Committed and Urgent Locations There are three different types of loca-tions in Uppaal :normal locations with or without invariants (e.g.,x<=3in the previous example),urgent locations,and committed locations.Figure 8shows 3automata to illustrate the difference.The location marked u is urgent and the one marked c is committed.The clocks are local to the automata,i.e.,x in P0is different from x in P1.To understand the difference between normal locations and urgent locations,we can observe that the following properties hold:–E<>P0.S1and P0.x>0:it is possible to wait in S1of P0.–A[]P1.S1imply P1.x==0:it is not possible to wait in S1of P1.An urgent location is equivalent to a location with incoming edges reseting a designated clock y and labelled with the invariant y<=0.Time may not progress in an urgent state,but interleavings with normal states are allowed.A committed location is more restrictive:in all the states where P2.S1is active (in our example),the only possible transition is the one that fires the edge outgoing from P2.S1.A state having a committed location active is said to11be committed:delay is not allowed and the committed location must be left in the successor state(or one of the committed locations if there are several ones). 3Overview of the Uppaal ToolkitUppaal uses a client-server architecture,splitting the tool into a graphical user interface and a model checking engine.The user interface,or client,is imple-mented in Java and the engine,or server,is compiled for different platforms (Linux,Windows,Solaris).7As the names suggest,these two components may be run on different machines as they communicate with each other via TCP/IP. There is also a stand-alone version of the engine that can be used on the com-mand line.3.1The Java ClientThe idea behind the tool is to model a system with timed automata using a graphical editor,simulate it to validate that it behaves as intended,andfinally to verify that it is correct with respect to a set of properties.The graphical interface(GUI)of the Java client reflects this idea and is divided into three main parts:the editor,the simulator,and the verifier,accessible via three“tabs”. The Editor A system is defined as a network of timed automata,called pro-cesses in the tool,put in parallel.A process is instantiated from a parameterised template.The editor is divided into two parts:a tree pane to access the different templates and declarations and a drawing canvas/text editor.Figure9shows the editor with the train gate example of section4.Locations are labelled with names and invariants and edges are labelled with guard conditions(e.g.,e==id), synchronisations(e.g.,go?),and assignments(e.g.,x:=0).The tree on the left hand side gives access to different parts of the system description:Global declaration Contains global integer variables,clocks,synchronisation channels,and constants.Templates Train,Gate,and IntQueue are different parameterised timed au-tomata.A template may have local declarations of variables,channels,and constants.Process assignments Templates are instantiated into processes.The process assignment section contains declarations for these instances.System definition The list of processes in the system.The syntax used in the labels and the declarations is described in the help system of the tool.The local and global declarations are shown in Fig.10.The graphical syntax is directly inspired from the description of timed automata in section2.12Fig.9.The train automaton of the train gate example.The select button is activated in the tool-bar.In this mode the user can move locations and edges or edit labels. The other modes are for adding locations,edges,and vertices on edges(called nails).A new location has no name by default.Two textfields allow the user to define the template name and its eful trick:The middle mouse button is a shortcut for adding new elements,i.e.pressing it on the canvas,a location,or edge adds a new location,edge,or nail,respectively.The Simulator The simulator can be used in three ways:the user can run the system manually and choose which transitions to take,the random mode can be toggled to let the system run on its own,or the user can go through a trace (saved or imported from the verifier)to see how certain states are reachable. Figure11shows the simulator.It is divided into four parts:The control part is used to choose andfire enabled transitions,go through a trace,and toggle the random simulation.The variable view shows the values of the integer variables and the clock con-straints.Uppaal does not show concrete states with actual values for the clocks.Since there are infinitely many of such states,Uppaal instead shows sets of concrete states known as symbolic states.All concrete states in a sym-bolic state share the same location vector and the same values for discretevariables.The possible values of the clocks is described by a set of con-Fig.10.The different local and global declarations of the train gate example.We superpose several screen-shots of the tool to show the declarations in a compact manner.straints.The clock validation in the symbolic state are exactly those that satisfy all constraints.The system view shows all instantiated automata and active locations of the current state.The message sequence chart shows the synchronisations between the differ-ent processes as well as the active locations at every step.The Verifier The verifier“tab”is shown in Fig.12.Properties are selectable in the Overview list.The user may model-check one or several properties,8insert or remove properties,and toggle the view to see the properties or the comments in the list.When a property is selected,it is possible to edit its definition(e.g., E<>Train1.Cross and Train2.Stop...)or comments to document what the property means informally.The Status panel at the bottom shows the commu-nication with the server.When trace generation is enabled and the model-checkerfinds a trace,the user is asked if she wants to import it into the simulator.Satisfied properties are marked green and violated ones red.In case either an over approximation or an under approximation has been selected in the options menu,then it may happen that the verification is inconclusive with the approximation used.In that casethe properties are marked yellow.Fig.11.View of the simulator tab for the train gate example.The interpretation of the constraint system in the variable panel depends on whether a transition in the transition panel is selected or not.If no transition is selected,then the constrain system shows all possible clock valuations that can be reached along the path.If a transition is selected,then only those clock valuations from which the transition can be taken are shown.Keyboard bindings for navigating the simulator without the mouse can be found in the integrated help system.3.2The Stand-alone VerifierWhen running large verification tasks,it is often cumbersome to execute these from inside the GUI.For such situations,the stand-alone command line verifier called verifyta is more appropriate.It also makes it easy to run the verification on a remote UNIX machine with memory to spare.It accepts command line arguments for all options available in the GUI,see Table3in the appendix.4Example1:The Train Gate4.1DescriptionThe train gate example is distributed with Uppaal.It is a railway control system which controls access to a bridge for several trains.The bridge is a critical shared resource that may be accessed only by one train at a time.The system is defined as a number of trains(assume4for this example)and a controller.A train can not be stopped instantly and restarting also takes time.Therefor,there are timing constraints on the trains before entering the bridge.When approaching,15。
A Discriminatively Trained,Multiscale,Deformable Part ModelPedro Felzenszwalb University of Chicago pff@David McAllesterToyota Technological Institute at Chicagomcallester@Deva RamananUC Irvinedramanan@AbstractThis paper describes a discriminatively trained,multi-scale,deformable part model for object detection.Our sys-tem achieves a two-fold improvement in average precision over the best performance in the2006PASCAL person de-tection challenge.It also outperforms the best results in the 2007challenge in ten out of twenty categories.The system relies heavily on deformable parts.While deformable part models have become quite popular,their value had not been demonstrated on difficult benchmarks such as the PASCAL challenge.Our system also relies heavily on new methods for discriminative training.We combine a margin-sensitive approach for data mining hard negative examples with a formalism we call latent SVM.A latent SVM,like a hid-den CRF,leads to a non-convex training problem.How-ever,a latent SVM is semi-convex and the training prob-lem becomes convex once latent information is specified for the positive examples.We believe that our training meth-ods will eventually make possible the effective use of more latent information such as hierarchical(grammar)models and models involving latent three dimensional pose.1.IntroductionWe consider the problem of detecting and localizing ob-jects of a generic category,such as people or cars,in static images.We have developed a new multiscale deformable part model for solving this problem.The models are trained using a discriminative procedure that only requires bound-ing box labels for the positive ing these mod-els we implemented a detection system that is both highly efficient and accurate,processing an image in about2sec-onds and achieving recognition rates that are significantly better than previous systems.Our system achieves a two-fold improvement in average precision over the winning system[5]in the2006PASCAL person detection challenge.The system also outperforms the best results in the2007challenge in ten out of twenty This material is based upon work supported by the National Science Foundation under Grant No.0534820and0535174.Figure1.Example detection obtained with the person model.The model is defined by a coarse template,several higher resolution part templates and a spatial model for the location of each part. object categories.Figure1shows an example detection ob-tained with our person model.The notion that objects can be modeled by parts in a de-formable configuration provides an elegant framework for representing object categories[1–3,6,10,12,13,15,16,22]. While these models are appealing from a conceptual point of view,it has been difficult to establish their value in prac-tice.On difficult datasets,deformable models are often out-performed by“conceptually weaker”models such as rigid templates[5]or bag-of-features[23].One of our main goals is to address this performance gap.Our models include both a coarse global template cov-ering an entire object and higher resolution part templates. The templates represent histogram of gradient features[5]. As in[14,19,21],we train models discriminatively.How-ever,our system is semi-supervised,trained with a max-margin framework,and does not rely on feature detection. We also describe a simple and effective strategy for learn-ing parts from weakly-labeled data.In contrast to computa-tionally demanding approaches such as[4],we can learn a model in3hours on a single CPU.Another contribution of our work is a new methodology for discriminative training.We generalize SVMs for han-dling latent variables such as part positions,and introduce a new method for data mining“hard negative”examples dur-ing training.We believe that handling partially labeled data is a significant issue in machine learning for computer vi-sion.For example,the PASCAL dataset only specifies abounding box for each positive example of an object.We treat the position of each object part as a latent variable.We also treat the exact location of the object as a latent vari-able,requiring only that our classifier select a window that has large overlap with the labeled bounding box.A latent SVM,like a hidden CRF[19],leads to a non-convex training problem.However,unlike a hidden CRF, a latent SVM is semi-convex and the training problem be-comes convex once latent information is specified for thepositive training examples.This leads to a general coordi-nate descent algorithm for latent SVMs.System Overview Our system uses a scanning window approach.A model for an object consists of a global“root”filter and several part models.Each part model specifies a spatial model and a partfilter.The spatial model defines a set of allowed placements for a part relative to a detection window,and a deformation cost for each placement.The score of a detection window is the score of the root filter on the window plus the sum over parts,of the maxi-mum over placements of that part,of the partfilter score on the resulting subwindow minus the deformation cost.This is similar to classical part-based models[10,13].Both root and partfilters are scored by computing the dot product be-tween a set of weights and histogram of gradient(HOG) features within a window.The rootfilter is equivalent to a Dalal-Triggs model[5].The features for the partfilters are computed at twice the spatial resolution of the rootfilter. Our model is defined at afixed scale,and we detect objects by searching over an image pyramid.In training we are given a set of images annotated with bounding boxes around each instance of an object.We re-duce the detection problem to a binary classification prob-lem.Each example x is scored by a function of the form, fβ(x)=max zβ·Φ(x,z).Hereβis a vector of model pa-rameters and z are latent values(e.g.the part placements). To learn a model we define a generalization of SVMs that we call latent variable SVM(LSVM).An important prop-erty of LSVMs is that the training problem becomes convex if wefix the latent values for positive examples.This can be used in a coordinate descent algorithm.In practice we iteratively apply classical SVM training to triples( x1,z1,y1 ,..., x n,z n,y n )where z i is selected to be the best scoring latent label for x i under the model learned in the previous iteration.An initial rootfilter is generated from the bounding boxes in the PASCAL dataset. The parts are initialized from this rootfilter.2.ModelThe underlying building blocks for our models are the Histogram of Oriented Gradient(HOG)features from[5]. We represent HOG features at two different scales.Coarse features are captured by a rigid template covering anentireImage pyramidFigure2.The HOG feature pyramid and an object hypothesis de-fined in terms of a placement of the rootfilter(near the top of the pyramid)and the partfilters(near the bottom of the pyramid). detection window.Finer scale features are captured by part templates that can be moved with respect to the detection window.The spatial model for the part locations is equiv-alent to a star graph or1-fan[3]where the coarse template serves as a reference position.2.1.HOG RepresentationWe follow the construction in[5]to define a dense repre-sentation of an image at a particular resolution.The image isfirst divided into8x8non-overlapping pixel regions,or cells.For each cell we accumulate a1D histogram of gra-dient orientations over pixels in that cell.These histograms capture local shape properties but are also somewhat invari-ant to small deformations.The gradient at each pixel is discretized into one of nine orientation bins,and each pixel“votes”for the orientation of its gradient,with a strength that depends on the gradient magnitude.For color images,we compute the gradient of each color channel and pick the channel with highest gradi-ent magnitude at each pixel.Finally,the histogram of each cell is normalized with respect to the gradient energy in a neighborhood around it.We look at the four2×2blocks of cells that contain a particular cell and normalize the his-togram of the given cell with respect to the total energy in each of these blocks.This leads to a vector of length9×4 representing the local gradient information inside a cell.We define a HOG feature pyramid by computing HOG features of each level of a standard image pyramid(see Fig-ure2).Features at the top of this pyramid capture coarse gradients histogrammed over fairly large areas of the input image while features at the bottom of the pyramid capture finer gradients histogrammed over small areas.2.2.FiltersFilters are rectangular templates specifying weights for subwindows of a HOG pyramid.A w by hfilter F is a vector with w×h×9×4weights.The score of afilter is defined by taking the dot product of the weight vector and the features in a w×h subwindow of a HOG pyramid.The system in[5]uses a singlefilter to define an object model.That system detects objects from a particular class by scoring every w×h subwindow of a HOG pyramid and thresholding the scores.Let H be a HOG pyramid and p=(x,y,l)be a cell in the l-th level of the pyramid.Letφ(H,p,w,h)denote the vector obtained by concatenating the HOG features in the w×h subwindow of H with top-left corner at p.The score of F on this detection window is F·φ(H,p,w,h).Below we useφ(H,p)to denoteφ(H,p,w,h)when the dimensions are clear from context.2.3.Deformable PartsHere we consider models defined by a coarse rootfilter that covers the entire object and higher resolution partfilters covering smaller parts of the object.Figure2illustrates a placement of such a model in a HOG pyramid.The rootfil-ter location defines the detection window(the pixels inside the cells covered by thefilter).The partfilters are placed several levels down in the pyramid,so the HOG cells at that level have half the size of cells in the rootfilter level.We have found that using higher resolution features for defining partfilters is essential for obtaining high recogni-tion performance.With this approach the partfilters repre-sentfiner resolution edges that are localized to greater ac-curacy when compared to the edges represented in the root filter.For example,consider building a model for a face. The rootfilter could capture coarse resolution edges such as the face boundary while the partfilters could capture details such as eyes,nose and mouth.The model for an object with n parts is formally defined by a rootfilter F0and a set of part models(P1,...,P n) where P i=(F i,v i,s i,a i,b i).Here F i is afilter for the i-th part,v i is a two-dimensional vector specifying the center for a box of possible positions for part i relative to the root po-sition,s i gives the size of this box,while a i and b i are two-dimensional vectors specifying coefficients of a quadratic function measuring a score for each possible placement of the i-th part.Figure1illustrates a person model.A placement of a model in a HOG pyramid is given by z=(p0,...,p n),where p i=(x i,y i,l i)is the location of the rootfilter when i=0and the location of the i-th part when i>0.We assume the level of each part is such that a HOG cell at that level has half the size of a HOG cell at the root level.The score of a placement is given by the scores of eachfilter(the data term)plus a score of the placement of each part relative to the root(the spatial term), ni=0F i·φ(H,p i)+ni=1a i·(˜x i,˜y i)+b i·(˜x2i,˜y2i),(1)where(˜x i,˜y i)=((x i,y i)−2(x,y)+v i)/s i gives the lo-cation of the i-th part relative to the root location.Both˜x i and˜y i should be between−1and1.There is a large(exponential)number of placements for a model in a HOG pyramid.We use dynamic programming and distance transforms techniques[9,10]to compute the best location for the parts of a model as a function of the root location.This takes O(nk)time,where n is the number of parts in the model and k is the number of cells in the HOG pyramid.To detect objects in an image we score root locations according to the best possible placement of the parts and threshold this score.The score of a placement z can be expressed in terms of the dot product,β·ψ(H,z),between a vector of model parametersβand a vectorψ(H,z),β=(F0,...,F n,a1,b1...,a n,b n).ψ(H,z)=(φ(H,p0),φ(H,p1),...φ(H,p n),˜x1,˜y1,˜x21,˜y21,...,˜x n,˜y n,˜x2n,˜y2n,). We use this representation for learning the model parame-ters as it makes a connection between our deformable mod-els and linear classifiers.On interesting aspect of the spatial models defined here is that we allow for the coefficients(a i,b i)to be negative. This is more general than the quadratic“spring”cost that has been used in previous work.3.LearningThe PASCAL training data consists of a large set of im-ages with bounding boxes around each instance of an ob-ject.We reduce the problem of learning a deformable part model with this data to a binary classification problem.Let D=( x1,y1 ,..., x n,y n )be a set of labeled exam-ples where y i∈{−1,1}and x i specifies a HOG pyramid, H(x i),together with a range,Z(x i),of valid placements for the root and partfilters.We construct a positive exam-ple from each bounding box in the training set.For these ex-amples we define Z(x i)so the rootfilter must be placed to overlap the bounding box by at least50%.Negative exam-ples come from images that do not contain the target object. Each placement of the rootfilter in such an image yields a negative training example.Note that for the positive examples we treat both the part locations and the exact location of the rootfilter as latent variables.We have found that allowing uncertainty in the root location during training significantly improves the per-formance of the system(see Section4).tent SVMsA latent SVM is defined as follows.We assume that each example x is scored by a function of the form,fβ(x)=maxz∈Z(x)β·Φ(x,z),(2)whereβis a vector of model parameters and z is a set of latent values.For our deformable models we define Φ(x,z)=ψ(H(x),z)so thatβ·Φ(x,z)is the score of placing the model according to z.In analogy to classical SVMs we would like to trainβfrom labeled examples D=( x1,y1 ,..., x n,y n )by optimizing the following objective function,β∗(D)=argminβλ||β||2+ni=1max(0,1−y i fβ(x i)).(3)By restricting the latent domains Z(x i)to a single choice, fβbecomes linear inβ,and we obtain linear SVMs as a special case of latent tent SVMs are instances of the general class of energy-based models[18].3.2.Semi-ConvexityNote that fβ(x)as defined in(2)is a maximum of func-tions each of which is linear inβ.Hence fβ(x)is convex inβ.This implies that the hinge loss max(0,1−y i fβ(x i)) is convex inβwhen y i=−1.That is,the loss function is convex inβfor negative examples.We call this property of the loss function semi-convexity.Consider an LSVM where the latent domains Z(x i)for the positive examples are restricted to a single choice.The loss due to each positive example is now bined with the semi-convexity property,(3)becomes convex inβ.If the labels for the positive examples are notfixed we can compute a local optimum of(3)using a coordinate de-scent algorithm:1.Holdingβfixed,optimize the latent values for the pos-itive examples z i=argmax z∈Z(xi )β·Φ(x,z).2.Holding{z i}fixed for positive examples,optimizeβby solving the convex problem defined above.It can be shown that both steps always improve or maintain the value of the objective function in(3).If both steps main-tain the value we have a strong local optimum of(3),in the sense that Step1searches over an exponentially large space of latent labels for positive examples while Step2simulta-neously searches over weight vectors and an exponentially large space of latent labels for negative examples.3.3.Data Mining Hard NegativesIn object detection the vast majority of training exam-ples are negative.This makes it infeasible to consider all negative examples at a time.Instead,it is common to con-struct training data consisting of the positive instances and “hard negative”instances,where the hard negatives are data mined from the very large set of possible negative examples.Here we describe a general method for data mining ex-amples for SVMs and latent SVMs.The method iteratively solves subproblems using only hard instances.The innova-tion of our approach is a theoretical guarantee that it leads to the exact solution of the training problem defined using the complete training set.Our results require the use of a margin-sensitive definition of hard examples.The results described here apply both to classical SVMs and to the problem defined by Step2of the coordinate de-scent algorithm for latent SVMs.We omit the proofs of the theorems due to lack of space.These results are related to working set methods[17].We define the hard instances of D relative toβas,M(β,D)={ x,y ∈D|yfβ(x)≤1}.(4)That is,M(β,D)are training examples that are incorrectly classified or near the margin of the classifier defined byβ. We can show thatβ∗(D)only depends on hard instances. Theorem1.Let C be a subset of the examples in D.If M(β∗(D),D)⊆C thenβ∗(C)=β∗(D).This implies that in principle we could train a model us-ing a small set of examples.However,this set is defined in terms of the optimal modelβ∗(D).Given afixedβwe can use M(β,D)to approximate M(β∗(D),D).This suggests an iterative algorithm where we repeatedly compute a model from the hard instances de-fined by the model from the last iteration.This is further justified by the followingfixed-point theorem.Theorem2.Ifβ∗(M(β,D))=βthenβ=β∗(D).Let C be an initial“cache”of examples.In practice we can take the positive examples together with random nega-tive examples.Consider the following iterative algorithm: 1.Letβ:=β∗(C).2.Shrink C by letting C:=M(β,C).3.Grow C by adding examples from M(β,D)up to amemory limit L.Theorem3.If|C|<L after each iteration of Step2,the algorithm will converge toβ=β∗(D)infinite time.3.4.Implementation detailsMany of the ideas discussed here are only approximately implemented in our current system.In practice,when train-ing a latent SVM we iteratively apply classical SVM train-ing to triples x1,z1,y1 ,..., x n,z n,y n where z i is se-lected to be the best scoring latent label for x i under themodel trained in the previous iteration.Each of these triples leads to an example Φ(x i,z i),y i for training a linear clas-sifier.This allows us to use a highly optimized SVM pack-age(SVMLight[17]).On a single CPU,the entire training process takes3to4hours per object class in the PASCAL datasets,including initialization of the parts.Root Filter Initialization:For each category,we auto-matically select the dimensions of the rootfilter by looking at statistics of the bounding boxes in the training data.1We train an initial rootfilter F0using an SVM with no latent variables.The positive examples are constructed from the unoccluded training examples(as labeled in the PASCAL data).These examples are anisotropically scaled to the size and aspect ratio of thefilter.We use random subwindows from negative images to generate negative examples.Root Filter Update:Given the initial rootfilter trained as above,for each bounding box in the training set wefind the best-scoring placement for thefilter that significantly overlaps with the bounding box.We do this using the orig-inal,un-scaled images.We retrain F0with the new positive set and the original random negative set,iterating twice.Part Initialization:We employ a simple heuristic to ini-tialize six parts from the rootfilter trained above.First,we select an area a such that6a equals80%of the area of the rootfilter.We greedily select the rectangular region of area a from the rootfilter that has the most positive energy.We zero out the weights in this region and repeat until six parts are selected.The partfilters are initialized from the rootfil-ter values in the subwindow selected for the part,butfilled in to handle the higher spatial resolution of the part.The initial deformation costs measure the squared norm of a dis-placement with a i=(0,0)and b i=−(1,1).Model Update:To update a model we construct new training data triples.For each positive bounding box in the training data,we apply the existing detector at all positions and scales with at least a50%overlap with the given bound-ing box.Among these we select the highest scoring place-ment as the positive example corresponding to this training bounding box(Figure3).Negative examples are selected byfinding high scoring detections in images not containing the target object.We add negative examples to a cache un-til we encounterfile size limits.A new model is trained by running SVMLight on the positive and negative examples, each labeled with part placements.We update the model10 times using the cache scheme described above.In each it-eration we keep the hard instances from the previous cache and add as many new hard instances as possible within the memory limit.Toward thefinal iterations,we are able to include all hard instances,M(β,D),in the cache.1We picked a simple heuristic by cross-validating over5object classes. We set the model aspect to be the most common(mode)aspect in the data. We set the model size to be the largest size not larger than80%of thedata.Figure3.The image on the left shows the optimization of the la-tent variables for a positive example.The dotted box is the bound-ing box label provided in the PASCAL training set.The large solid box shows the placement of the detection window while the smaller solid boxes show the placements of the parts.The image on the right shows a hard-negative example.4.ResultsWe evaluated our system using the PASCAL VOC2006 and2007comp3challenge datasets and protocol.We refer to[7,8]for details,but emphasize that both challenges are widely acknowledged as difficult testbeds for object detec-tion.Each dataset contains several thousand images of real-world scenes.The datasets specify ground-truth bounding boxes for several object classes,and a detection is consid-ered correct when it overlaps more than50%with a ground-truth bounding box.One scores a system by the average precision(AP)of its precision-recall curve across a testset.Recent work in pedestrian detection has tended to report detection rates versus false positives per window,measured with cropped positive examples and negative images with-out objects of interest.These scores are tied to the reso-lution of the scanning window search and ignore effects of non-maximum suppression,making it difficult to compare different systems.We believe the PASCAL scoring method gives a more reliable measure of performance.The2007challenge has20object categories.We entered a preliminary version of our system in the official competi-tion,and obtained the best score in6categories.Our current system obtains the highest score in10categories,and the second highest score in6categories.Table1summarizes the results.Our system performs well on rigid objects such as cars and sofas as well as highly deformable objects such as per-sons and horses.We also note that our system is successful when given a large or small amount of training data.There are roughly4700positive training examples in the person category but only250in the sofa category.Figure4shows some of the models we learned.Figure5shows some ex-ample detections.We evaluated different components of our system on the longer-established2006person dataset.The top AP scoreaero bike bird boat bottle bus car cat chair cow table dog horse mbike person plant sheep sofa train tvOur rank 31211224111422112141Our score .180.411.092.098.249.349.396.110.155.165.110.062.301.337.267.140.141.156.206.336Darmstadt .301INRIA Normal .092.246.012.002.068.197.265.018.097.039.017.016.225.153.121.093.002.102.157.242INRIA Plus.136.287.041.025.077.279.294.132.106.127.067.071.335.249.092.072.011.092.242.275IRISA .281.318.026.097.119.289.227.221.175.253MPI Center .060.110.028.031.000.164.172.208.002.044.049.141.198.170.091.004.091.034.237.051MPI ESSOL.152.157.098.016.001.186.120.240.007.061.098.162.034.208.117.002.046.147.110.054Oxford .262.409.393.432.375.334TKK .186.078.043.072.002.116.184.050.028.100.086.126.186.135.061.019.036.058.067.090Table 1.PASCAL VOC 2007results.Average precision scores of our system and other systems that entered the competition [7].Empty boxes indicate that a method was not tested in the corresponding class.The best score in each class is shown in bold.Our current system ranks first in 10out of 20classes.A preliminary version of our system ranked first in 6classes in the official competition.BottleCarBicycleSofaFigure 4.Some models learned from the PASCAL VOC 2007dataset.We show the total energy in each orientation of the HOG cells in the root and part filters,with the part filters placed at the center of the allowable displacements.We also show the spatial model for each part,where bright values represent “cheap”placements,and dark values represent “expensive”placements.in the PASCAL competition was .16,obtained using a rigid template model of HOG features [5].The best previous re-sult of.19adds a segmentation-based verification step [20].Figure 6summarizes the performance of several models we trained.Our root-only model is equivalent to the model from [5]and it scores slightly higher at .18.Performance jumps to .24when the model is trained with a LSVM that selects a latent position and scale for each positive example.This suggests LSVMs are useful even for rigid templates because they allow for self-adjustment of the detection win-dow in the training examples.Adding deformable parts in-creases performance to .34AP —a factor of two above the best previous score.Finally,we trained a model with partsbut no root filter and obtained .29AP.This illustrates the advantage of using a multiscale representation.We also investigated the effect of the spatial model and allowable deformations on the 2006person dataset.Recall that s i is the allowable displacement of a part,measured in HOG cells.We trained a rigid model with high-resolution parts by setting s i to 0.This model outperforms the root-only system by .27to .24.If we increase the amount of allowable displacements without using a deformation cost,we start to approach a bag-of-features.Performance peaks at s i =1,suggesting it is useful to constrain the part dis-placements.The optimal strategy allows for larger displace-ments while using an explicit deformation cost.The follow-Figure 5.Some results from the PASCAL 2007dataset.Each row shows detections using a model for a specific class (Person,Bottle,Car,Sofa,Bicycle,Horse).The first three columns show correct detections while the last column shows false positives.Our system is able to detect objects over a wide range of scales (such as the cars)and poses (such as the horses).The system can also detect partially occluded objects such as a person behind a bush.Note how the false detections are often quite reasonable,for example detecting a bus with the car model,a bicycle sign with the bicycle model,or a dog with the horse model.In general the part filters represent meaningful object parts that are well localized in each detection such as the head in the person model.Figure6.Evaluation of our system on the PASCAL VOC2006 person dataset.Root uses only a rootfilter and no latent place-ment of the detection windows on positive examples.Root+Latent uses a rootfilter with latent placement of the detection windows. Parts+Latent is a part-based system with latent detection windows but no rootfilter.Root+Parts+Latent includes both root and part filters,and latent placement of the detection windows.ing table shows AP as a function of freely allowable defor-mation in thefirst three columns.The last column gives the performance when using a quadratic deformation cost and an allowable displacement of2HOG cells.s i01232+quadratic costAP.27.33.31.31.345.DiscussionWe introduced a general framework for training SVMs with latent structure.We used it to build a recognition sys-tem based on multiscale,deformable models.Experimental results on difficult benchmark data suggests our system is the current state-of-the-art in object detection.LSVMs allow for exploration of additional latent struc-ture for recognition.One can consider deeper part hierar-chies(parts with parts),mixture models(frontal vs.side cars),and three-dimensional pose.We would like to train and detect multiple classes together using a shared vocab-ulary of parts(perhaps visual words).We also plan to use A*search[11]to efficiently search over latent parameters during detection.References[1]Y.Amit and A.Trouve.POP:Patchwork of parts models forobject recognition.IJCV,75(2):267–282,November2007.[2]M.Burl,M.Weber,and P.Perona.A probabilistic approachto object recognition using local photometry and global ge-ometry.In ECCV,pages II:628–641,1998.[3] D.Crandall,P.Felzenszwalb,and D.Huttenlocher.Spatialpriors for part-based recognition using statistical models.In CVPR,pages10–17,2005.[4] D.Crandall and D.Huttenlocher.Weakly supervised learn-ing of part-based spatial models for visual object recognition.In ECCV,pages I:16–29,2006.[5]N.Dalal and B.Triggs.Histograms of oriented gradients forhuman detection.In CVPR,pages I:886–893,2005.[6] B.Epshtein and S.Ullman.Semantic hierarchies for recog-nizing objects and parts.In CVPR,2007.[7]M.Everingham,L.Van Gool,C.K.I.Williams,J.Winn,and A.Zisserman.The PASCAL Visual Object Classes Challenge2007(VOC2007)Results./challenges/VOC/voc2007/workshop.[8]M.Everingham, A.Zisserman, C.K.I.Williams,andL.Van Gool.The PASCAL Visual Object Classes Challenge2006(VOC2006)Results./challenges/VOC/voc2006/results.pdf.[9]P.Felzenszwalb and D.Huttenlocher.Distance transformsof sampled functions.Cornell Computing and Information Science Technical Report TR2004-1963,September2004.[10]P.Felzenszwalb and D.Huttenlocher.Pictorial structures forobject recognition.IJCV,61(1),2005.[11]P.Felzenszwalb and D.McAllester.The generalized A*ar-chitecture.JAIR,29:153–190,2007.[12]R.Fergus,P.Perona,and A.Zisserman.Object class recog-nition by unsupervised scale-invariant learning.In CVPR, 2003.[13]M.Fischler and R.Elschlager.The representation andmatching of pictorial structures.IEEE Transactions on Com-puter,22(1):67–92,January1973.[14] A.Holub and P.Perona.A discriminative framework formodelling object classes.In CVPR,pages I:664–671,2005.[15]S.Ioffe and D.Forsyth.Probabilistic methods forfindingpeople.IJCV,43(1):45–68,June2001.[16]Y.Jin and S.Geman.Context and hierarchy in a probabilisticimage model.In CVPR,pages II:2145–2152,2006.[17]T.Joachims.Making large-scale svm learning practical.InB.Sch¨o lkopf,C.Burges,and A.Smola,editors,Advances inKernel Methods-Support Vector Learning.MIT Press,1999.[18]Y.LeCun,S.Chopra,R.Hadsell,R.Marc’Aurelio,andF.Huang.A tutorial on energy-based learning.InG.Bakir,T.Hofman,B.Sch¨o lkopf,A.Smola,and B.Taskar,editors, Predicting Structured Data.MIT Press,2006.[19] A.Quattoni,S.Wang,L.Morency,M.Collins,and T.Dar-rell.Hidden conditional randomfields.PAMI,29(10):1848–1852,October2007.[20] ing segmentation to verify object hypothe-ses.In CVPR,pages1–8,2007.[21] D.Ramanan and C.Sminchisescu.Training deformablemodels for localization.In CVPR,pages I:206–213,2006.[22]H.Schneiderman and T.Kanade.Object detection using thestatistics of parts.IJCV,56(3):151–177,February2004. [23]J.Zhang,M.Marszalek,zebnik,and C.Schmid.Localfeatures and kernels for classification of texture and object categories:A comprehensive study.IJCV,73(2):213–238, June2007.。
CCF推荐的国际学术会议和期刊目录修订版发布CCF(China Computer Federation中国计算机学会)于2010年8月发布了第一版推荐的国际学术会议和期刊目录,一年来,经过业内专家的反馈和修订,于日前推出了修订版,现将修订版予以发布。
本次修订对上一版内容进行了充实,一些会议和期刊的分类排行进行了调整,目录包括:计算机科学理论、计算机体系结构与高性能计算、计算机图形学与多媒体、计算机网络、交叉学科、人工智能与模式识别、软件工程/系统软件/程序设计语言、数据库/数据挖掘/内容检索、网络与信息安全、综合刊物等方向的国际学术会议及期刊目录,供国内高校和科研单位作为学术评价的参考依据。
目录中,刊物和会议分为A、B、C三档。
A类表示国际上极少数的顶级刊物和会议,鼓励我国学者去突破;B类是指国际上著名和非常重要的会议、刊物,代表该领域的较高水平,鼓励国内同行投稿;C类指国际上重要、为国际学术界所认可的会议和刊物。
这些分类目录每年将学术界的反馈和意见,进行修订,并逐步增加研究方向。
中国计算机学会推荐国际学术刊物(网络/信息安全)一、 A类序号刊物简称刊物全称出版社网址1. TIFS IEEE Transactions on Information Forensics andSecurity IEEE /organizations/society/sp/tifs.html2. TDSC IEEE Transactions on Dependable and Secure ComputingIEEE /tdsc/3. TISSEC ACM Transactions on Information and SystemSecurity ACM /二、 B类序号刊物简称刊物全称出版社网址1. Journal of Cryptology Springer /jofc/jofc.html2. Journal of Computer SecurityIOS Press /jcs/3. IEEE Security & Privacy IEEE/security/4. Computers &Security Elsevier http://www.elsevier.nl/inca/publications/store/4/0/5/8/7/7/5. JISecJournal of Internet Security NahumGoldmann. /JiSec/index.asp6. Designs, Codes andCryptography Springer /east/home/math/numbers?SGWID=5 -10048-70-35730330-07. IET Information Security IET /IET-IFS8. EURASIP Journal on InformationSecurity Hindawi /journals/is三、C类序号刊物简称刊物全称出版社网址1. CISDA Computational Intelligence for Security and DefenseApplications IEEE /2. CLSR Computer Law and SecurityReports Elsevier /science/journal/026736493. Information Management & Computer Security MCB UniversityPress /info/journals/imcs/imcs.jsp4. Information Security TechnicalReport Elsevier /locate/istr中国计算机学会推荐国际学术会议(网络/信息安全方向)一、A类序号会议简称会议全称出版社网址1. S&PIEEE Symposium on Security and Privacy IEEE /TC/SP-Index.html2. CCSACM Conference on Computer and Communications Security ACM /sigs/sigsac/ccs/3. CRYPTO International Cryptology Conference Springer-Verlag /conferences/二、B类序号会议简称会议全称出版社网址1. SecurityUSENIX Security Symposium USENIX /events/2. NDSSISOC Network and Distributed System Security Symposium Internet Society /isoc/conferences/ndss/3. EurocryptAnnual International Conference on the Theory and Applications of Cryptographic Techniques Springer /conferences/eurocrypt2009/4. IH Workshop on Information Hiding Springer-Verlag /~rja14/ihws.html5. ESORICSEuropean Symposium on Research in Computer Security Springer-Verlag as.fr/%7Eesorics/6. RAIDInternational Symposium on Recent Advances in Intrusion Detection Springer-Verlag /7. ACSACAnnual Computer Security Applications ConferenceIEEE /8. DSNThe International Conference on Dependable Systems and Networks IEEE/IFIP /9. CSFWIEEE Computer Security Foundations Workshop /CSFWweb/10. TCC Theory of Cryptography Conference Springer-Verlag /~tcc08/11. ASIACRYPT Annual International Conference on the Theory and Application of Cryptology and Information Security Springer-Verlag /conferences/ 12. PKC International Workshop on Practice and Theory in Public Key Cryptography Springer-Verlag /workshops/pkc2008/三、 C类序号会议简称会议全称出版社网址1. SecureCommInternational Conference on Security and Privacy in Communication Networks ACM /2. ASIACCSACM Symposium on Information, Computer and Communications Security ACM .tw/asiaccs/3. ACNSApplied Cryptography and Network Security Springer-Verlag /acns_home/4. NSPWNew Security Paradigms Workshop ACM /current/5. FC Financial Cryptography Springer-Verlag http://fc08.ifca.ai/6. SACACM Symposium on Applied Computing ACM /conferences/sac/ 7. ICICS International Conference on Information and Communications Security Springer /ICICS06/8. ISC Information Security Conference Springer /9. ICISCInternational Conference on Information Security and Cryptology Springer /10. FSE Fast Software Encryption Springer http://fse2008.epfl.ch/11. WiSe ACM Workshop on Wireless Security ACM /~adrian/wise2004/12. SASN ACM Workshop on Security of Ad-Hoc and Sensor Networks ACM /~szhu/SASN2006/13. WORM ACM Workshop on Rapid Malcode ACM /~farnam/worm2006.html14. DRM ACM Workshop on Digital Rights Management ACM /~drm2007/15. SEC IFIP International Information Security Conference Springer http://sec2008.dti.unimi.it/16. IWIAIEEE International Information Assurance Workshop IEEE /17. IAWIEEE SMC Information Assurance Workshop IEEE /workshop18. SACMATACM Symposium on Access Control Models and Technologies ACM /19. CHESWorkshop on Cryptographic Hardware and Embedded Systems Springer /20. CT-RSA RSA Conference, Cryptographers' Track Springer /21. DIMVA SIG SIDAR Conference on Detection of Intrusions and Malware and Vulnerability Assessment IEEE /dimva200622. SRUTI Steps to Reducing Unwanted Traffic on the Internet USENIX /events/23. HotSecUSENIX Workshop on Hot Topics in Security USENIX /events/ 24. HotBots USENIX Workshop on Hot Topics in Understanding Botnets USENIX /event/hotbots07/tech/25. ACM MM&SEC ACM Multimedia and Security Workshop ACM。
软件学报ISSN 1000-9825, CODEN RUXUEW E-mail:************.cn Journal of Software,2021,32(3):818−830 [doi: 10.13328/ki.jos.006185] ©中国科学院软件研究所版权所有. Tel: +86-10-62562563 LFKT:学习与遗忘融合的深度知识追踪模型∗李晓光1, 魏思齐1, 张昕1, 杜岳峰1, 于戈21(辽宁大学信息学院,辽宁沈阳 110036)2(东北大学计算机科学与工程学院,辽宁沈阳 110163)通讯作者: 张昕,E-mail:****************.cn摘要: 知识追踪任务旨在根据学生历史学习行为实时追踪学生知识水平变化,并且预测学生在未来学习表现.在学生学习过程中,学习行为与遗忘行为相互交织,学生的遗忘行为对知识追踪影响很大.为了准确建模知识追踪中学习与遗忘行为,提出一种兼顾学习与遗忘行为的深度知识追踪模型LFKT(learning and forgetting behavior modeling for knowledge tracing).LFKT模型综合考虑了4个影响知识遗忘因素,包括学生重复学习知识点的间隔时间、重复学习知识点的次数、顺序学习间隔时间以及学生对于知识点的掌握程度.结合遗忘因素,LFKT采用深度神经网络,利用学生答题结果作为知识追踪过程中知识掌握程度的间接反馈,建模融合学习与遗忘行为的知识追踪模型.通过在真实在线教育数据集上的实验,与当前知识追踪模型相比, LFKT可以更好地追踪学生知识掌握状态,并具有较好的预测性能.关键词: 智慧教育;知识追踪;深度神经网络;学习行为;遗忘行为中图法分类号: TP181中文引用格式: 李晓光,魏思齐,张昕,杜岳峰,于戈.LFKT:学习与遗忘融合的深度知识追踪模型.软件学报,2021,32(3): 818−830./1000-9825/6185.htm英文引用格式: Li XG, Wei SQ, Zhang X, Du YF, Yu G. LFKT: Deep knowledge tracing model with learning and forgetting behavior merging. Ruan Jian Xue Bao/Journal of Software, 2021,32(3):818−830 (in Chinese)./1000-9825/ 6185.htmLFKT: Deep Knowledge Tracing Model with Learning and Forgetting Behavior MergingLI Xiao-Guang1, WEI Si-Qi1, ZHANG Xin1, DU Yue-Feng1, YU Ge21(College of Information, Liaoning University, Shenyang 110036, China)2(School of Computer Science and Engineering, Northeastern University, Shenyang 110163, China)Abstract: The knowledge tracing task is designed to track changes of students’ knowledge in real time based on their historical learning behaviors and to predict their future performance in learning. In the learning process, learning behaviors are intertwined with forgetting behaviors, and students’ forgetting behaviors have a great impact on knowledge tracing. In order to accurately model the learning and forgetting behaviors in knowledge tracing, a deep knowledge tracing model LFKT (learning and forgetting behavior modeling for knowledge tracing) that combines learning and forgetting behaviors is proposed in this study. To model such two behaviors, the LFKT model takes into account four factors that affect knowledge forgetting, including the interval between students’ repeated learning of knowledge points, the number of repeated learning of knowledge points, the interval between sequential learning, and the understanding degree of knowledge points. The model uses a deep neural network to predict knowledge status with indirect feedbacks on students’ understanding of knowledge according to students’ answers. With the experiments on the real datasets of online education, LFKT shows better performance of knowledge tracing and prediction in comparison with the traditional approaches.∗基金项目: 国家自然科学基金(U1811261)Foundation item: National Natural Science Foundation of China (U1811261)本文由“支撑人工智能的数据管理与分析技术”专刊特约编辑陈雷教授、王宏志教授、童咏昕教授、高宏教授推荐.收稿时间: 2020-08-23; 修改时间: 2020-09-03; 采用时间: 2020-11-06; jos在线出版时间: 2021-01-21李晓光等:LFKT:学习与遗忘融合的深度知识追踪模型819Key words: intelligent education; knowledge tracing; deep neural network; learning behavior; forgetting behavior在线教育系统提倡因材施教,即根据学生的知识水平为其推荐合适的学习资源[1].学生的知识水平受其学习阶段和认知能力的影响,在学习过程中不断变化.实时追踪学生知识水平,对于个性化在线教育至关重要[2,3].知识追踪(knowledge tracing,简称KT)任务主要包括:(1) 通过学生的学习历史实时追踪其知识水平变化;(2) 根据学生的知识水平预测其在未来学习中的表现.传统的知识追踪技术主要有基于隐马尔可夫模型(HMM)[4]的贝叶斯知识追踪(BKT)[5].近年来,深度学习在自然语言处理[6]及模式识别[7]等任务上的表现优于传统模型.在知识追踪研究领域,亦提出了大量基于深度学习的知识追踪模型.基于循环神经网络的深度知识追踪(DKT)模型[8]采用循环神经网络的隐藏向量表示学生的知识状态,并且据此预测学生的表现.动态键-值对记忆网络(DKVMN)[9]借鉴了记忆网络(memory network)[10]的思想,用值矩阵表示学生对于各个知识点的掌握程度,并以此预测学生表现,提升预测准确度.但是以上知识追踪方法都忽略了学生在学习过程中的遗忘行为对知识水平的影响.在教育心理学领域已经有很多学者认识到人类的遗忘行为,并且探索影响人类遗忘的因素.教育心理学理论艾宾浩斯遗忘曲线理论[11,12]提出:学生会遗忘所学知识,遗忘带来的影响是知识掌握程度的下降.学生重复学习知识点的次数与距离上次学习知识点的时间间隔会影响学生的遗忘程度.教育心理学理论记忆痕迹衰退说[13]认为:遗忘是记忆痕迹的衰退引起的,消退随时间的推移自动发生,原始的痕迹越深,则遗忘的程度越低.目前,在知识追踪领域,只有少数的研究考虑了学生的遗忘行为.BKT的扩展[14,15]考虑了艾宾浩斯遗忘曲线提到的影响遗忘程度的两个因素:重复学习知识点的次数和间隔时间.DKT的扩展[16]考虑了学生顺序学习的间隔时间,该研究认为:学科中的各个知识点是相通的,持续的学习会降低学生对于知识点的遗忘程度.然而,以上的研究仅仅考虑了部分影响学生遗忘的信息,忽略了学生原本知识水平对于遗忘程度的影响,学生对于其掌握的知识点和没有掌握的知识点的遗忘程度不同.与此同时,由于BKT算法与DKT算法的局限性,以往涉及遗忘的研究或者仅考虑了学生对于单个知识点的遗忘程度,或者仅考虑了学生对于整个知识空间的遗忘程度,没有建模学生对知识空间中的各个知识点的遗忘情况.针对于此,本文提出一种兼顾学习和遗忘的深度知识追踪模型LFKT(learning and forgetting behavior modeling for knowledge tracing).LFKT拟合了学生的学习与遗忘行为,实时更新、输出知识点掌握程度,并以此预测其未来表现.本文主要创新和贡献有:(1)结合教育心理学,知识追踪模型LFKT考虑了4个影响知识遗忘的因素:学生重复学习知识点的间隔时间、重复学习知识点的次数、顺序学习间隔时间以及学生对于知识点的掌握程度.LFKT根据以上信息建模遗忘行为,拟合学生因遗忘行为导致的知识掌握程度变化;(2)基于深度神经网络技术,LFKT设计了一个基于RNN和记忆神经网络的知识追踪神经网络.该网络包括注意力层、遗忘层、学习层、预测层和知识水平输出层.其中:遗忘层采用全连接网络计算记忆擦除向量和记忆更新向量,用以拟合遗忘行为;学习层采用LSTM网络,并利用学生在学习结束时的答题结果作为知识掌握程度的间接反馈.根据记忆擦除向量和记忆更新向量,获得经过遗忘后的知识掌握程度的中间嵌入,并以此作为学习层的输入,进而获得遗忘与学习相融合的知识掌握程度嵌入;(3)通过在两个真实在线教育数据集上的实验验证结果表明:LFKT可以有效地建模学生的学习行为与遗忘行为,实时追踪学生的知识水平,并且LFKT模型的预测性能优于现有模型.本文第1节介绍知识追踪相关工作.第2节给出相关概念和符号定义,并提出问题.第3节详细介绍LFKT 模型结构以及训练方法.第4节给出LFKT对比实验结果和分析.最后总结全文.1 相关工作知识追踪大致可以分为基于隐马尔可夫模型的知识追踪与基于深度学习的知识追踪.贝叶斯知识追踪是典型的基于隐马尔可夫模型实现知识追踪目标的模型之一.BKT将学生对某个知识点的潜在知识状态建模为一组二进制变量,每一个变量代表学生对于特定知识点“掌握”或“没掌握”,根据隐马尔可夫模型更新学生的知识状态,进而更新其在未来学习中的表现.在BKT的基础上,通过考虑其他因素,提出了820 Journal of Software 软件学报 V ol.32, No.3, March 2021 很多扩展方案,例如:Pardos 等人的研究[17]引入了习题难度对预测学生表现的影响;Baker 等人的研究[18]引入了学生猜测和失误对预测学生表现的影响;Khajah 等人的研究[19]将人类的认知因素扩展到BKT 中,从而提高预测精度.基于隐马尔可夫的知识追踪模型将学生对各个知识点的掌握程度分别建模,忽略各个知识点之间的关系.BKT 模型简单地将学生对于各个知识点的掌握程度分为“掌握”与“未掌握”,忽略了中间情况.深度知识追踪是深度学习模型循环神经网络在知识追踪任务的首次尝试,将学生的学习历史作为输入,使用RNN 隐藏状态向量来表示学生的知识水平,并且基于此预测学生未来学习表现.DKT 模型无法建模学生对于各个知识点的掌握程度,仅仅可以建模学生的整个知识水平.Chen 等人的研究[20]考虑了知识点之间的先验关系,提升了DKT 的预测性能.Su 等人的研究[21]将习题文本信息和学生的学习历史作为循环神经网络的输入,利用RNN 的隐向量建模学生的知识水平,取得了较好的预测效果,但是依旧无法建模学生对于各个知识点的掌握程度.DKVMN 模型借鉴了记忆网络的思想,用值矩阵建模学生对于各个知识点的掌握程度,考察习题与各个知识点之间的关系,追踪学生对于各个知识点的掌握程度.Abdelrahman 等人的研究[22]利用注意力机制,着重考察学生作答相似习题时的答题历史,改进了DKVMN.Sun 等人的研究[23]将学生答题时的行为特征扩展到DKVMN,从而取得更好的预测效果.Liu 等人的研究[24]利用习题的文本信息与学生的学习历史进行知识追踪.Nakagawa 等人的研究[25]用知识图建模各个知识点之间的关系,取得了良好的预测效果.Minn 等人的研究[26]根据学生的知识水平对学生进行分类,再对每一类学生进行知识追踪.Cheng 等人的研究[27]通过深度学习扩展了传统方法项目反映理论.Shen 等人的研究[28]考虑了每个学生的个性化差异,取得了很好的效果.Ai 等人的研究[29]考虑了知识点之间的包含关系,重新设计了记忆矩阵,改进了DKVMN,取得了良好的效果.关于学生遗忘行为对知识掌握状态的影响,Qiu 等人[15]考虑了学生距离上一次重复学习知识点的间隔时间,将新一天的标记加入到BKT 中,建模先前学习之后一天的遗忘行为,但是无法对较短时间的遗忘行为进行建模.Khajah 等人的研究[14]应用学生重复学习知识点的次数估计遗忘的概率,改进了BKT,提升了预测准度.Nagatani 等人的研究[16]考虑了重复学习知识点的次数、距离上一次学习知识点的间隔时间和距离上一次学习的间隔时间,改进了DKT 模型,但忽略了学生原本对于知识点掌握程度对学生遗忘程度影响.总的来说,当前的研究可以一定程度上追踪学生对于各个知识点的掌握程度,但或者忽略了学生遗忘行为,默认学生在学习间隔前后知识水平一致;或者对于遗忘行为的建模不够全面.本文所提出的LFKT 模型综合考察了影响学生遗忘程度的因素,将学生的练习结果作为学生知识掌握程度的间接反馈,建模学生的学习与遗忘行为,实时追踪学生对于各个知识点的掌握程度,预测学生未来学习表现.2 问题定义这里,我们定义S 为学生集合,K 为知识点集合,E 为习题集合.每个学生单独进行学习,互不影响.学生s 的答题历史H s =[(e 1,r 1),(e 2,r 2),...,(e t ,r t )],其中:e t 为学生t 时刻所做的习题;r t 为答题结果,r t =1表示答题正确,r t =0表示答题错误.k t ⊆K 为习题e t 所涉及的知识点集合.矩阵M K (d k ×|K |)为整个知识空间中|K |个知识点嵌入表示,其中,d k 维列向量为一个知识点的嵌入表示.矩阵1(||)V t v d K −×M 为学生在t −1时刻的知识空间中知识点嵌入.用|K |维向 量value t −1表示t −1时刻学习结束时学生对于各个知识点的掌握程度,其中,向量每一维的值介于(0,1)之间:值为0,表示学生对于该知识点没有掌握;值为1,表示学生对于该知识点已经完全掌握.学生在学习过程中会遗忘未复习的知识点,同时,学生在两次学习的间隔时间也会遗忘所学知识,因此,t −1时刻学习结束时的知识水平与t 时刻学习开始时的知识水平不尽相同.本文用矩阵(||)FV t v d K ×M 建模学生在t 时刻开始学习时的知识掌握状态.矩阵FV t M 与1V t −M 形状相同,FV t M 是由1V t −M 经过遗忘处理后得到.t 时刻学习结束,系统获得学生答题结果, LFKT 模型根据答题结果将开始学习时知识掌握嵌入矩阵FV t M 更新为学习结束时知识掌握嵌入矩阵V t M ,以 此建模学习行为.预测学生在下一个候选习题e t +1上的表现时,由于t 时刻学习结束到t +1时刻开始答题之间存 在时间间隔,学生在时间间隔内的遗忘行为会影响知识掌握状态,因此需要根据影响知识遗忘的因素将V t M 更新为1FV t +M ,从而预测学生答题表现.李晓光 等:LFKT:学习与遗忘融合的深度知识追踪模型821基于以上描述,本文将问题定义为:给定每个学生的学习历史记录,实现以下两个目标. •跟踪学生的知识状态变化; • 预测学生在下一个候选习题e t +1上的表现.3 方法描述3.1 影响知识遗忘的因素学生如果没有及时复习所学知识,就会产生遗忘,其对知识的掌握程度便会衰减.教育学理论艾宾浩斯曲线理论[11,12]表明,学生对于所学知识点的保留率受以下两方面影响:学生重复学习次数与时间间隔.时间间隔可以分为重复学习知识点的时间间隔和顺序学习的时间间隔.除此之外,教育心理学理论记忆痕迹衰退说[13]认为,学生对于知识点的掌握状态也影响着学生的遗忘程度.因此,本文考虑了以下4个影响知识遗忘的因素.•RT(repeated time interval):距离上次学习相同知识点的时间间隔; •ST(sequence time interval):距离上次学习的时间间隔; •LT(repeated learn times):重复学习知识点的次数; • KL(past knowledge level):学生原本对于该知识点的掌握程度.RT,ST 和LT 这3个标量组合在一起得到向量C t (i )=[RT t (i ),ST t (i ),LT t (i )],表示影响学生对于知识点i 遗忘程 度的前3个因素.每个知识点所对应的向量C t (i )组成矩阵C t (d c ×|K |).本文用向量1()V t i −M 度量学生原本对于知识点i 的掌握程度,并且将其作为影响学生对于知识点遗忘程度的第4个因素(KL).将矩阵C t 与1V t −M 组合在一起,得到矩阵1[,]V t t t −=F M C ,表示4个影响知识遗忘的因素.3.2 LFKT 模型本文提出了一种融合学习与遗忘的深度知识追踪模型LFKT,如图1所示.Fig.1 LFKT model图1 LFKT 模型LFKT 模型分为注意力层(attention layer)、遗忘层(forget layer)、预测层(prediction layer)、学习层(learn layer)822 Journal of Software 软件学报 V ol.32, No.3, March 2021 以及知识水平输出层(knowledge level output layer):注意力层以习题e t 以及习题所涵盖的知识点集合k t 作为输入,计算习题与各个知识点的知识相关权重;遗忘层根据第3.1节中提出的影响知识遗忘的4个因素将学生上一 时刻学习结束时知识掌握嵌入矩阵1V t −M 更新为当前时刻学习开始时知识掌握嵌入矩阵,FV t M 有效地对学生遗忘行为进行建模;预测层根据t +1时刻开始答题时的1FV t +M 预测学生的答题表现;学习层通过LSTM 网络将本次学习开始时的FV t M 更新为本次学习结束时的V t M ,对学生学习行为进行建模;知识水平输出层以学生本次学习结束时的V t M 作为输入,输出学生知识水平向量value t ,实时输出学生对于各个知识点的掌握程度.3.2.1 注意力层注意力层的作用是计算习题与知识点之间的知识相关权重.注意力层的输入是学生当前练习题目e t 以及题目所涉及的知识点集合k t .为了将e t 映射到连续的向量空间,用e t 乘以习题嵌入矩阵A (d k ×|E |),生成一个d k 维度的习题嵌入向量v t .其中,矩阵A 中每一个d k 维列向量为一道习题的嵌入表示.专家标注了每道习题所涵盖的知识点,存放于集合k t 中.本文通过知识点过滤器(K -Filter)滤掉不相关知识点,保留习题涵盖知识点.矩阵R t 存储习题涵盖知识点的嵌入向量,其中,矩阵R t 中每一个d k 维列向量为习题涵盖的一个知识点的嵌入向量.计算习题嵌入向量v t 与涵盖知识点嵌入向量R t (i )之间的内积,再计算内积的Softmax 值并存入向量RS t 中.向量RS t 表示习题v t 与习题涵盖知识点之间的知识相关权重,如公式(1)所示:()(())T t t t i Softmax i =RS v R (1) |K |维向量w t 为习题与全部知识点之间的知识相关权重向量.由于习题涵盖知识点被知识点过滤器滤出,模型需要将习题涵盖知识点的知识相关权重放入到w t 对应的位置上.首先,初始化|K |维全零向量w t ,即w t ←[0,…,0];之后,将RS t 每一维的权重放入到w t 的相应位置上,即w t [k t [i ]]←RS t [i ],得到习题与各个知识点的知识相关权重.3.2.2 遗忘层遗忘层根据第3.1节中介绍的影响知识遗忘的因素F t 对上一次学习结束时学生的知识掌握嵌入矩阵1V t −M 进行遗忘处理,得到学生本次学习开始时的知识掌握嵌入矩阵.FV t M 受LSTM 中遗忘门与输入门的启发,根据影响知识遗忘的因素F t 更新1V t −M 时,首先要擦除1V t −M 中原有的信息,再写入信息.对于遗忘行为建模来说,擦除 过程控制学生知识点掌握程度的衰退,写入过程控制学生知识点掌握程度的更新.遗忘层如图2所示.Fig.2 Forget layer图2 遗忘层利用一个带有Sigmoid 激活函数的全连接层将影响学生对知识点i 遗忘程度的因素F t (i )转换为知识点i 对应的记忆擦除向量fe t (i ):fe t (i )=Sigmoid (FE T F t (i )+b fe ) (2)全连接层的权重矩阵FE 的形状是(d v +d c )×d v ,全连接层的偏置向量b fe 为d v 维.记忆擦除向量fe t (i )是一个维李晓光 等:LFKT:学习与遗忘融合的深度知识追踪模型 823 度为d v 的列向量,向量中的所有值都是在(0,1)范围之内.利用一个带有Tanh 激活函数的全连接层将影响学生对知识点i 遗忘程度的因素F t (i )转换为知识点i 对应的记忆更新向量fa t (i ):fa t (i )=Tanh(FA T F t (i )+b fa ) (3)全连接层的权重矩阵FA 的形状是(d v +d c )×d v ,全连接层的偏置向量b fa 为d v 维,记忆更新向量fa t (i )是一个维 度为d v 的列向量.根据记忆擦除向量与记忆更新向量对1V t −M 更新得到FV t M :1()()(1())(1())FV V t t t t i i i i −=−+M M fe fa (4)遗忘层输出本次学习开始时知识掌握嵌入矩阵FV t M .遗忘层可以为分别建模学生对不同知识点的遗忘过 程,根据学生对于不同知识点学习历史的不同,计算学生对于不同知识点的遗忘程度.3.2.3 学习层学习层根据学生答题结果追踪学习过程中的知识掌握变化,将学生开始学习时知识掌握嵌入矩阵FV t M 更新为学生学习结束时知识掌握嵌入矩阵V t M ,建模学习行为.元组(e t ,r t )表示学生在时间t 的答题结果,为了将元 组(e t ,r t )映射到连续的向量空间,元组(e t ,r t )与大小为d v ×2|E |的答题结果嵌入矩阵B 相乘,得到d v 维答题结果嵌入向量s t .学习层将答题结果嵌入向量s t 和习题对应的知识相关权重向量w t 作为输入,利用学生在在线教育系统中的练习结果作为学生学习效果的间接反馈,通过LSTM 网络更新学生学习过程中的知识掌握状态,建模学习行为:()(,()();)V FV t t t t i LSTM i i =M s w M θ (5)3.2.4 预测层预测层的目的是预测学生在下一个候选习题e t +1上的表现.由于学生在两次答题间隔内的遗忘行为会影响 其知识掌握状态,预测层根据当前时刻开始答题时的知识掌握嵌入矩阵1FV t +M 预测学生正确回答习题e t +1的概率.将知识相关权重w t +1与当前时刻开始答题时学生知识掌握嵌入1FV t +M 加权求和,得到的向量d t +1为学生对于 习题涵盖知识点的加权掌握程度嵌入向量:1111()()K FV t t t i i i +++==∑d w M (6) 学生成功解答习题,不仅与学生对于习题涵盖知识点的综合掌握程度有关,还与习题本身有关,所以将向量d t +1和v t +1连接得到的组合向量[d t +1,v t +1]输入到带有Tanh 激活函数的全连接层,输出向量h t +1包含了学生对习题涵盖知识点的综合掌握程度和习题本身的特点.其中,矩阵W 1和向量b 1分别表示全连接层的权重与偏置: 11111([,])T t t t Tanh +++=+h W d v b (7)最后,将向量h t +1输入到一个带有Sigmoid 激活函数的全连接层中,得到表示学生答对习题的概率p t +1.其中,矩阵W 2和向量b 2分别表示全连接层的权重与偏置:1212()T t t p Sigmoid ++=+W h b (8)3.2.5 知识水平输出层 知识水平输出层以学生学习结束时知识掌握嵌入矩阵V t M 为输入,输出表示学生本次学习结束时知识掌 握程度的|K |维向量value t ,向量的每一维介于(0,1)之间,表示学生对于该知识点的掌握程度.知识水平输出层如图3所示.在预测层中,每一个时间节点t ,公式(7)、公式(8)根据两种输入预测学生在特定习题e t 上的表现:学生对于该习题涵盖知识点的综合掌握嵌入向量d t 和习题嵌入向量v t .因此,如果只是想估计在没有任何习题输入情况 下学生对于第i 个知识点的掌握情况,可以省略习题嵌入v t ,同时,让学生知识掌握嵌入矩阵V t M 的第i 列()V t i M 作为等式的输入.图3展示了知识水平输出层的详细过程.具体来说:学习层输出矩阵V t M 后,为了估计对于第i 个知识点的掌握程度,构造了权重向量βi =(0,…,1,…,0),其中,第i 维的值等于1,并用公式(9)提取第i 个知识点的 知识掌握嵌入向量()V t i M ,之后,使用公式(10)、公式(11)估计知识掌握水平:824Journal of Software 软件学报 V ol.32, No.3, March 2021()V V t i t i =M M β (9)11()([(),])T V t t i Tanh i =+0y W M b (10)22()(())T t t i Sigmoid i =+value W y b (11) 向量0=(0,0,…,0)与习题嵌入v t 维度相同,用于补齐向量维度.参数W 1,W 2,b 1,b 2与公式(7)和公式(8)中的完全相同.依次计算知识空间中每一个知识的掌握程度,得到学生知识掌握程度向量value t.Fig.3 Knowledge level output layer图3 知识水平输出层3.2.6 模型优化目标模型需要训练的参数主要是习题嵌入矩阵A 、答题结果嵌入矩阵B 、神经网络权重与偏置以及知识点矩阵M K .本文通过最小化模型对于学生答题结果的预测值p t 和学生答题的真实结果r t 之间的交叉熵损失函数来优化各个参数.损失函数如公式(12)所示,本文使用Adam 方法优化参数:(log (1)log(1))t t t t t L r p r p =−+−−∑ (12) 4 实 验本文在两个真实在线教育数据集上进行实验,通过对比LFKT 模型与其他知识追踪模型的预测性能以及知识追踪表现,证明LFKT 模型的有效性.4.1 数据集首先介绍实验所使用的两个真实在线教育数据集.• ASSISTments2012:该数据集来自于ASSISTments 在线教育平台.本文删除了学习记录数过少(<3)的学生信息,经过预处理后,数据集包含45 675个学生、266个知识点以及总计5 818 868条学习记录.数据集中,user_id 字段表示学生编号,skill_id 字段表示知识点编号,problem_id 字段表示题目编号, correct 字段表示学生真实答题结果,start_time 字段表示学生本次开始学习的时间,end_time 字段表示学生本次学习结束的时间;• slepemapy.cz:该数据集来自于地理在线教育系统.本文同样删除学习记录数过少(<3)的学生信息,经过预处理后,数据集包含87 952个学生、1 458个知识点以及总计10 087 305条学习记录.数据集中, user 字段表示学生编号,place_answered 字段表示学生真实答题结果,response_time 字段表示学生学习时间,place_asked 字段表示问题编号.其中,由于该数据集中每个问题仅仅考察一个知识点,因此place_asked 也是知识点编号.4.2 评测指标本文使用平均AUC(area under the curve)、平均ACC(accurary)和平均RMSE(root mean squared error)作为评估预测性能的指标.AUC 被定义为ROC 曲线与下坐标轴围成的面积,50%的AUC 值表示随机猜测获得的预。
论文中文摘要毕业设计说明书(论文)外文摘要1 绪论图像的细化是数字图像预处理中的重要的一个核心环节。
图像细化在图像分析和图像识别中如汉子识别、笔迹鉴别。
指纹分析中有着广泛的应用。
图像的细化主要有以下几个基本要求:(1)骨架图像必须保持原图像的曲线连通性。
(2)细化结果尽量是原图像的中心线。
(3)骨架保持原来图形的拓扑结构。
(4)保留曲线的端点。
(5)细化处理速度快。
(6)交叉部分中心线不畸变。
虽然现在人们对细化有着广泛的研究,细化算法的方法有很多。
然而大多数算法如Hilditch算法和Rosenfield算法存在着细化后图形畸变,断点过多,在复杂的情况下关键信息的缺失等问题。
基于上诉考虑,传统的细化算法已经无法满足如今的数字图像处理要求。
因此,需要提出新的一种算法,来解决相关问题。
1.1 相关定义1.1.1串行与并行从处理的过程来看,主要可以分为串行和并行两类,前者对图像中当前象素的处理依据其邻域内象素的即时化结果,即对某一元素进行检测,如果该点是可删除点,则立即删除,且不同的细化阶段采用不同的处理方法;后者对当前的象素处理依据该象素及其邻域内各象素的前一轮迭代处理的结果,至始至终采用相同的细化准则。
即全部检测完汉子图像边缘上的点后再删除可删除点。
1.1.2骨架对图像细化的过程实际上是求一图像骨架的过程。
骨架是二维二值目标的重要拓扑描述,它是指图像中央的骨架部分,是描述图像几何及拓扑性质的重要特征之一。
骨架形状描述的方法是Blum最先提出来的,他使用的是中轴的概念。
如果用一个形象的比喻来说明骨架的含义,那就是设想在t=0的时刻,讲目标的边界各处同时点燃,火焰以匀速向目标内部蔓延,当火焰的前沿相交时火焰熄灭,那么火焰熄灭点的集合就构成了中轴,也就是图像的骨架。
例如一个长方形的骨架是它的长方向上的中轴线,正方形的骨架是它的中心点,圆的骨架是它的圆心,直线的骨架是它自身,孤立点的骨架也是自身。
细化的目的就是在将图像的骨架提取出来的同时保持图像细小部分的连通性,特别是在文字识别,地质识别,工业零件识别或图像理解中,先对被处理的图像进行细化有助于突出形状特点和减少冗余信息量。
降维低维空间映射条件Dimensionality reduction is a crucial technique in data analysis and machine learning, where high-dimensional data is transformed into a lower-dimensional space for easier visualization and analysis. 降维是数据分析和机器学习中的一项重要技术,它将高维数据转换为低维空间,以便更容易进行可视化和分析。
One common method of dimensionality reduction is through mapping conditions, where algorithms map the original high-dimensional data into a lower-dimensional space while preserving as much of the relevant information as possible. 降维的一个常见方法是通过映射条件,即算法将原始高维数据映射到低维空间,同时尽可能保留尽可能多的相关信息。
The purpose of dimensionality reduction is to remove redundant or irrelevant features, reduce noise, and improve computational efficiency. 降维的目的是去掉冗余或无关的特征,减少噪音,并提高计算效率。
One challenge in dimensionality reduction is to find an appropriate mapping that preserves the meaningful structure of the data while reducing its dimensionality. 在降维中的一个挑战是找到一个适当的映射,既能保留数据的有意义结构,又能减少其维数。
收稿日期:2019-11-03 修回日期:2020-03-06基金项目:国家自然科学基金(61671352)作者简介:朱周华(1976-),女,副教授,研究方向为信号处理及应用;高 凡(1994-),女,硕士研究生,研究方向为图像处理㊂基于深度学习的局部实例搜索朱周华,高 凡(西安科技大学通信与信息工程学院,陕西西安710054)摘 要:针对传统实例搜索方法准确率和视觉相似度低下的问题,提出一种利用卷积神经网络提取图像全局特征和区域特征的实例搜索方法㊂该方法经过初步搜索㊁重排和查询扩展三个阶段实现实例搜索任务,通过微调策略和在重排阶段对特征匹配方法的改进进一步提高检索性能,并将其应用到局部实例搜索任务,即利用残缺图像检索得到整幅图像,在此基础之上,加入在线检索功能㊂在Oxford 5k 和Paris 6k 两个公开数据集上进行实验验证,结果表明,整幅图像的检索mAP 值和视觉相似度都得到了很大提升,局部实例检索的mAP 值均高于其他文献中整幅图像的检索,仅比文中整幅图像的检索低0.032㊂因此,提出的实例搜索方法不仅提高了实例搜索的准确率,也增强了目标定位的准确性,同时很好地解决了局部实例搜索问题㊂关键词:深度学习;局部实例搜索;区域特征;微调;特征匹配中图分类号:TP 391 文献标识码:A 文章编号:1673-629X (2020)09-0036-07doi :10.3969/j.issn.1673-629X.2020.09.007Local Instance Search Based on Deep LearningZHU Zhou -hua ,GAO Fan(School of Communication and Information Engineering ,Xi ’an University of Science and Technology ,Xi ’an 710054,China )Abstract :In order to solve the problem of low accuracy and visual similarity of traditional instance search methods ,an instance search method for extracting global and regional features of images using convolutional neural networks is proposed.The method realizes instance search task through three stages :filtering stage ,spatial re -ranking and query expansion.The retrieval performance is further improved by fine -tuning strategy and the improvement of the feature matching method in the re -ranking stage.It is applied to the local instance search task ,that is ,using the incomplete image retrieval to obtain the whole image.On this basis ,the online search function is added.Experiments are carried out on two public datasets of Oxford building 5k and Paris building 6k that the mAP value and visual sim⁃ilarity of the whole image are greatly improved.The mAP value of the local instance retrieval is higher than that of other literatures.The retrieval of images is only 0.032lower than the retrieval of the entire image.Therefore ,the proposed method not only improves the accuracy of instance search ,but also enhances the accuracy of target location ,and solves the problem of local instance search.Key words :deep learning ;local instance search ;regional features ;fine -tuning ;feature matching0 引 言信息时代,数字图像和视频数据日益增多,人们对于图像检索的需求也随之增大㊂传统的基于内容的图像检索(CBIR )都是定义在图像级别的检索,查询图片背景都比较单一,没有干扰信息,因此可以提取整个图片的特征进行检索㊂但是,现实生活中的查询图片都是带有场景的,查询目标仅占了整幅图的一部分,直接将查询图与数据库中的整幅图像进行匹配,准确率必然会很低㊂因此,考虑使用局部特征进行实例搜索㊂实例搜索是指给定一个样例,在视频或图像库中找到包含这个样例的视频片段或者图片,即找到任意场景下的目标对象㊂早期,实例搜索大多采用词袋模型(bag -of -words ,BoW )对图像的特征进行编码,其中大部分都采用尺度不变特征变换(SIFT )[1]来描述局部特征㊂Zhu 等人[2]首先使用SIFT 提取查询图片和视频关键帧的视觉特征,接着采用词袋算法对特征进行编码得到一维向量,最后根据向量之间的相似性,返回一个排好序的视频列表,文中借鉴了传统视觉检索的一些方法,但是没有很好地结合实例搜索的特点㊂2014年有学者[3]提出采用比BoW 性能更好的空间第30卷 第9期2020年9月 计算机技术与发展COMPUTER TECHNOLOGY AND DEVELOPMENT Vol.30 No.9Sep. 2020Fisher向量[4]和局部特征聚合描述符(VLAD)[5]来描述SIFT特征的空间关系,从而进行实例检索㊂随着深度学习的发展,深度卷积神经网络特征被广泛应用于计算机视觉的各个领域,如图像分类[6-7]㊁语音识别[8]等,均取得了不错的效果,因此有学者也将其引入到图像检索领域㊂起初,研究者们利用神经网络的全连接层特征进行图像检索[9],后来很多研究者开始转向卷积层特征的研究[10],并且证明卷积层特征的性能更好㊂Eva等人[11]采用词袋模型对卷积神经网络(CNNs)提取的特征进行编码,然后分别进行初次搜索,局部重排,扩展查询,从而实现实例检索㊂实例检索需采用局部特征实现,因此许多生成区域信息的方法相继出现,最简单的是滑动窗口,之后有学者提出使用Selective Search生成物体候选框[12-13],但是这些方法将生成候选区域和特征提取分开进行㊂Faster R-CNN[14]是一个端到端的网络,它可以同时提取卷积层特征和生成候选区域㊂文献[15]提出将微调之后的目标检测网络Faster R-CNN应用到实例检索中,使用区域提议网络(region proposal network, RPN)生成候选区域,从而得到查询图区域特征与数据库图像区域特征,特征匹配之后排序得到检索结果,在两个建筑物数据集上取得了不错的效果㊂何涛在其论文中[16]针对Faster R-CNN网络效率较低的问题提出了端到端的深度区域哈希网络(DRH),使用VGG16作为特征提取器,滑动窗口和RPN网络得到候选区域,并将两种方法进行对比,整个网络最后阶段对特征进行哈希编码并计算汉明距离进行排序,从而得到检索结果,文中为了排除不同场景㊁不同光照和拍照角度产生的干扰,使用局部信息进行检索㊂以上两篇文献尽管均使用局部信息进行检索,但都是为了排除干扰信息将查询图中的目标标记出来,查询图依然是整幅图像㊂实际搜索图像时某些图片会有残缺,此时就无法通过标记目标进行检索,因此实例检索除了常见的利用局部特征进行整幅图像的检索之外,局部图像的检索亦有着非常重要的现实意义㊂现有的检索方法的检索效果不是很理想,因此文中针对以上两个问题首先改进整幅图像的检索并提高其检索性能,之后利用图像的部分信息(例如建筑物的顶部㊁嫌疑人的部分特征等)检索得到整幅图像,实现局部图像的检索㊂同时,考虑到实际检索时,输入均为一幅图像,输出为一组图像,因此,文中在局部实例检索的基础之上加入在线检索功能,可以实现局部图像的实时搜索㊂因此,主要有以下两大方面的贡献和创新:(1)基于深度学习的实例搜索㊂一方面,通过微调策略提高了实例搜索的精确度;另一方面,针对候选框得分(scores)和余弦距(cosine)两种相似性度量方法存在的不足,提出将两种方法相结合,以获得更好的检索效果㊂(2)基于深度学习的局部实例搜索㊂由于局部图像的检索具有重大的现实意义,将全局实例搜索算法应用在局部实例检索任务中,即利用残缺图片信息搜索得到整幅图像,并加入在线检索功能,输入局部查询图,便可以得到查询结果和所属建筑物的名字㊂1 相关理论1.1 基于Faster R-CNN的区域特征提取如图1所示,Faster R-CNN由卷积层(Conv layers),RPN网络,RoI pooling,分类和回归四部分构成㊂图1 Faster R-CNN结构 卷积层用于提取图像特征,可以选择不同结构的网络,输入为整张图片,输出为提取的特征称为feature maps㊂文中采用VGG16[7]的中间卷积层作为特征提取器㊂RPN网络用于推荐候选区域,这个网络是用来代替之前的selective search[13]的,输入为图片,输出为多个矩形区域以及对应每个矩形区域含有物体的概率㊂首先将一个3*3的滑窗在feature maps上滑动,每个窗口中心点映射到原图并生成9种矩形框(9anchor boxes),之后进入两个同级的1*1卷积,一个分支通过softmax进行二分类,判断anchor boxes属于foreground还是background㊂另一分支计算anchor㊃73㊃ 第9期 朱周华等:基于深度学习的局部实例搜索boxes的bounding box regression的偏移量㊂Proposal 层结合两个分支的输出信息去冗余后保留N个候选框,称为proposals㊂传统的CNN网络,输入图像尺寸必须是固定大小的,但是Faster R-CNN的输入是任意大小的,RoI pooling作用是根据候选区域坐标在特征图上映射得到区域特征并将其pooling成固定大小的输出,即对每个proposal提取固定尺寸的特征图㊂分类和回归模块,一方面通过全连接层和softmax 层确定每个proposal的类别,另一方面回归更加精确的目标检测框,输出候选区域在图像中的精确坐标㊂1.2 微调Faster R-CNN微调,即用预训练模型重新训练自己的数据,现有的VGG16FasterR-CNN预训练模型主要是基于VOC2007数据集中20类常见的物体预训练得到的㊂文中的数据集是建筑物,与VOC2007图片相似度和特征都相差较大,如果依然采用预训练模型,效果必然不好,再加上从头开始训练,需要大量的数据㊁时间和计算资源㊂而文中所选用的数据集较小,因此需要进行微调,这样不仅可以节省大量时间和计算资源,同时还可以得到一个较好的模型㊂微调主要分为以下三个步骤:(1)数据预处理㊂首先需要对数据集进行数据清洗,主要去除无效值和纠正错误数据,提高数据质量㊂其次,由于文中的数据集较小且每类样本分布不均衡,因此选择性地对每类图片作图像增强处理,其中图像增强方法包括水平翻转,增加高斯噪声,模糊处理,改变图像对比度㊂最后将每幅图片中的目标样例标记出来㊂(2)建立训练集和测试集㊂一般是按一定的比例进行分配,但是文中选用的数据集存在样本不均衡的问题,有些类别特别多,有些类别特别少㊂由于是将所有图片全部放入同一个文件夹,然后依次读取样本分配训练集和测试集,如果按比例分配,小类样本参与训练的机会就会比大类少,训练出来的模型将会偏向于大类,使得大类性能好,小类性能差㊂平衡采样策略就是把样本按类别分组,每个类别生成一个样本列表,制作训练集时从各个类别所对应的样本列表中随机选择样本,这样可以保证每个类别参与训练的机会比较均衡㊂(3)修改相关网络参数,进行训练㊂主要修改网络文件中的类别数和类名,然后不断调节超参数,使性能达到最好㊂1.3 实例搜索实例检索一般经过初次搜索,局部重排,扩展查询三部分完成㊂初次搜索首先提取查询图和数据库所有图像的全局特征,然后计算特征之间的相似度,最后经过排序得到初步的检索结果㊂文中提取VGG16网络的最后一个卷积层(Conv5_3)特征㊂局部重排是将初次搜索得到的前K幅图片作为新的数据库进行重排,基本思路是提取查询图和数据库的区域特征并进行匹配,根据匹配结果进行排序从而得到查询结果㊂这里查询图和数据库的区域特征提取方法不同,其中,查询图的区域特征是用groundtruth 给定的边界框对整幅图像特征进行裁剪得到的,而数据库的区域特征是经过RPN网络和RoI pooling池化得到的特征(pool5)㊂扩展查询(query expansion,QE)是取出局部重排返回的前K个结果,对其特征求和取平均作为新的查询图,再做一次检索,属于重排的一种㊂2 实 验2.1 数据集和评价指标(1)实验环境㊂文中所有实验均在NVIDIA GeForce RTX2080上进行,所用系统为Ubuntu18.04,使用的深度学习框架为Caffe,编程语言为Python㊂(2)数据集介绍㊂在两个公开的建筑物数据集Oxford[17]和Paris[18]上进行实验㊂其中Oxford包含5063张图片,Paris包含6412张图片,但是有20张被损坏,因此有6392张图片可用㊂两个数据集都来自Flickr,共有11类建筑物,同一种建筑物有5张查询图,因此每个数据集总共有55张查询图,除此之外,两个数据集相同类别建筑物的场景㊁拍照角度和光照都有所不同,而且有很多本来不是同一种建筑物但是从表面看上去却非常相似的图片㊂(3)评价指标㊂平均精度均值(mean average precision,mAP)是一个反映了图像检索整体性能的指标,如式(1)和(2)所示㊂MAP=∑N k=1AP查询图的个数(1) AP=∑N k=1(P(k)㊃rel(k))相关图片个数(2)其中,N表示返回结果总个数,P(k)表示返回结果中第k个位置的查准率,rel(k)表示返回结果中第k个位置的图片是否与查询图相关,相关为1,不相关为0㊂MAP是多次查询后,对每次检索的平均精度AP值求㊃83㊃ 计算机技术与发展 第30卷和取平均㊂这里对是否相关做进一步解释:两个数据集的groundtruth有三类,分别是good,ok和junk,如果检索结果在good和ok中,则判为与查询图相关,如果在junk中,则判为不相关㊂2.2 基于深度学习的实例检索2.2.1 方 法先尝试使用VGG16Faster R-CNN预训练模型进行检索,在两个数据集上MAP值仅0.5左右,接着文中使用微调策略,只冻结了前两个卷积层,更新了之后所有的网络层权值,通过不断调参,训练一个精度尽可能高的模型㊂其中,微调过程中分别采用数据增强和平衡采样技术对数据进行处理㊂具体地,对于Oxford 数据集,建筑物radcliffe_camera的数量高达221张,而建筑物pitt_rivers仅有6张,其他9类样本数量在7至78之间不等,数量差距相当大,因此,选择性地对数量小的6类样本通过上面提到的方法进行数据增强,使小类样本数量增大㊂对其进行数据增强之后,虽然样本数量差距缩小,但是依然存在不均衡的问题,如果将所有样本放入一个文件夹中,按比例分配训练集和测试集,则依然会导致训练出来的模型小类性能差的问题㊂因此,将每类样本生成一个列表,再从每个列表中随机选取一定数量的样本作为该类的训练样本㊂除此之外,文献[15]在局部重排部分使用了两种特征匹配方法,一种是直接利用候选框对应得分(scores)进行排序㊂数据库中每幅图片经过Proposal layer会得到300个区域提议(proposal)的坐标和对应得分,找到查询图对应类的最高得分作为查询图和数据库每幅图片的相似度,再从高到低进行排序就可以得到检索结果㊂另一种是利用余弦距(cosine)进行排序㊂数据库中的每幅图片经过RoI pooling可以得到300个特征向量,计算查询图与数据库中每幅图片的300个区域特征的余弦距,最小距离对应的候选框即就是和查询图最相似的区域提议,然后把所有的最小距离再从小到大进行排序,就可以得到相似度排序㊂第一种方法虽然得到的边界框(bounding box regression)定位较准,mAP值也很高,但是视觉相似度并不是很高㊂而且根据候选框得分进行排序,每类建筑物的得分和排序都是一定的,因此每次相同类别的不同查询返回结果都是相同的,不会根据查询图片的不同而返回不同的排序㊂第二种方法得到的检索结果,图像相似度很高,但是边界框定位不是很准确㊂文中将两种方法结合(scores+cosine),利用余弦距计算相似度并排序,选择得分最高的候选框进行目标定位,这样既解决了视觉相似度不够的问题,也解决了相同类别的不同查询返回结果相同的问题,同时又解决了目标定位不准的问题㊂2.2.2 参数设置本节主要讨论扩展查询中的参数k对实验结果的影响,并选取一个最优值作为本实验的默认值㊂在特征匹配方法选用余弦距的情况下,在两个数据集上分别测试了k等于3㊁4㊁5㊁6㊁7时的mAP值,实验结果如表1所示㊂表1 不同k值对mAP值的影响k Oxford Paris30.9090.88940.9100.88850.9100.88560.9120.88970.9120.888 从表1综合来看,文中选用6作为k的默认值㊂2.2.3 结果与分析表2是文中与其他文献mAP值的对比,最后两项是文中三种方法的实验结果㊂可以看出,相比于其他文献,文中方法的检索性能得到很大的提升,比目前最好的方法分别高出6.1%和4%,说明文中方法优于其他检索方法㊂文中方法与文献[15]所采用方法类似,但结果却得到了很大的改善,通过分析认为,虽然所用数据集都相同,但是生成的训练集和测试集完全不同,而且文中采用数据增强方法,使得样本的数量增加了3~4倍,使用平衡采样的方法保证小类样本可以得到和大类样本同样的训练机会㊂除此之外,网络超参数对于模型的影响非常大,因此文中对参数进行调优,使得训练出来的模型更好㊂表2 文中方法与其他方法的mAP值对比方法Oxford ParisR.Tao等人[3]0.765-Razavian等人[10]0.5560.697Eva等人[11]0.7390.819CA-SR[15]0.7720.824CS-SR[15]0.7860.842DRH[16]0.8510.849Ours(scores)0.9030.890 Ours(cosine/cosine+scores)0.9120.890 图2是两组对比图,图(a)和图(b)是对建筑物all_ souls和louvre分别用候选框得分和余弦距进行排序的结果,其中,左边1列是查询图,右边5列是查询返回结果,表示与查询相关(下同)㊂图3是all_souls的两个不同查询用得分进行排序得到的检索结果㊂图4是将两种匹配方法结合得到的检索结果㊂从图2可以看出利用候选框得分排序得到的结果目标定位较准确,但是返回结果的背景㊁光照㊁拍照角㊃93㊃ 第9期 朱周华等:基于深度学习的局部实例搜索度㊁颜色㊁对比度和样例大小与查询图相差很大,而利用余弦距排序得到的结果候选框定位不是很准,但从背景㊁样例大小等视觉角度来看相似度较高㊂图3中all _souls 的两个不同查询图返回结果中不仅图片一样,而且顺序也相同㊂因此可以看出两种方法各有缺陷㊂图2 不同建筑物的同一个查询分别利用得分和余弦距得到的排序图3 建筑物all _souls 的两个不同查询根据得分得到的排序图4是将两种方法结合得到的返回结果,与图2相比视觉相似度提高了,目标定位也更准确了,与图3相比,同一个建筑物的两个不同查询返回结果也会根据查询图片的不同而改变,且从表2最后一行可以看出该方法比使用候选框得分在Oxford 上得到的mAP 值高0.009,与使用余弦距得到的mAP 值相同㊂因此认为,以上提出的特征匹配方法得到的返回结果在不降低mAP 值的基础上提高了检索的准确率㊂图4 综合得分和余弦距两种方法得到的检索结果2.3 基于深度学习的局部实例检索2.3.1 方 法本节局部图像的检索是建立在2.2节基础之上的,正是由于整幅图像检索采用候选区域特征实现,局部图像的检索才得以实现㊂较之于整幅图像的检索,局部图像的检索具有同样重大的现实意义㊂生活中常会因为某个原因使图片变得残缺,且难以识别,那么此时就需要使用局部图像检索得到原始图像的完整信息㊂比如,通过某建筑物的顶部搜索得到整幅图像从而识别该建筑物㊂或者可以应用在刑侦工作中,当摄像机捕获到的是某犯罪嫌疑人的部分特征时,可以通过已有的部分特征在图像库或者其他摄像头下搜索得到该嫌疑人的完整信息㊂由于目前没有一个现成的残缺图像库,因此本节利用截图工具对整幅图像作裁剪处理以模拟残缺图像,即从图像库选取不同实例通过裁剪得到不同大小,不同背景,不同角度,不同颜色的局部查询图㊂由于图像库图像都是整幅图像,在尺寸和包含的信息方面与局部查询图相差很大,因此局部检索最大的难点在于如何处理局部图像㊂2.2节会对输入图片统一进行缩放,那么局部查询图片输入后,先进行放大,则会导致原始输入图像失真,提取特征后再对其进行裁剪又会进一步丢失大部分图像信息,因此根本无法得到正确的检索结果㊂文中对其进行以下处理:即输入查询图后,先将局部查询填充至与数据库图像相同大小(图像库的图像基本都是1024*768或者768*1024大小的),这样对图像进行统一缩放,提取特征,按比例㊃04㊃ 计算机技术与发展 第30卷裁剪之后,得到的正是局部图像的特征,再与图像库匹配,则会输出正确的排序结果㊂本节输入为建筑物的局部图像,输出为局部查询所属的建筑物图像,并且会标记出局部查询在所属建筑物中的位置㊂目前,很多文献(如[15])都比较注重算法的研究,基本都采用离线的形式实现图像检索,不仅离线建立特征库,查询图也是成批输入到网络中进行离线检索,得到的结果也是成批保存起来,可是实际应用中,一般都是将查询图逐幅输入进行实时检索,因此文中在前文基础之上,加入了在线检索功能,最后实现在线局部实例检索㊂2.3.2 结果与分析如图5是通过裁剪建筑物radcliffe _camera 和triomphe 的原图,得到的5个不同查询图,分别选取一幅进行检索,得到了完整的建筑物,且标记出了局部查询图像在整个建筑物中的位置,如图6所示㊂最终mAP 值分别为0.880和0.857㊂从检索结果可以看出返回结果的视觉相似度极高,目标定位准确,且mAP 值高于其他文献中整幅图像的检索准确率㊂因此,可以证明文中提出的全局搜索算法在局部图像检索任务中亦能取得很好的效果㊂图5 两种建筑物的五个局部查询图图6 两组局部查询的检索结果在此之前,只有文献[19]为了证明行人重识别系统的普适性,使用CaffeNet 和VGG 16两个网络模型在Oxford 数据集上对局部建筑物图像进行了测试,得到的mAP 值分别为0.662和0.764,远低于文中的准确率㊂因此提出的局部实例搜索的性能良好㊂在线检索功能按照输入图片,可得到查询结果和查询建筑物的名字,且文中在没有使用任何编码算法的情况下,在两个数据集上检索一幅图的平均耗时分别为5.7s 和7s ,经检测,90%的时间都耗费在利用特征向量计算相似度部分㊂如图7所示,分别是bodleian 和eiffel 的检索结果和总耗时㊂图7 在线检索结果3摇结束语为了进一步提高实例检索性能,针对以往的利用候选框得分和余弦距进行特征匹配的不足,提出将两种方法结合,即利用余弦距计算相似度并排序,选择得分最高的候选框进行目标定位㊂并使用微调策略重新训练预训练模型从而使其适用于文中的实例检索㊂相比于其他方法,文中采用的方法在性能方面有明显的提升㊂在此基础之上,利用残缺图像搜索得到整幅图像,性能高于其他文献整幅图像的检索,且仅比文中整幅图像检索低0.032,实验结果证明提出的全局搜索算法同样适用于局部图像检索任务㊂之后加入在线检索功能,在没有任何编码的情况下检索一幅图像平均耗时仅需5.7s ~7s ㊂在未来的工作中,可以进一步加入编码模块,以提高检索速度,并且可以在更大的数据集上进行测试㊂参考文献:[1] LOWE D G.Object recognition from local scale -invariant fea⁃tures [C ]//International conference on computer vision.㊃14㊃ 第9期 朱周华等:基于深度学习的局部实例搜索Kerkyra,Greece:IEEE,1999:1150-1157.[2] ZHU C,SATOH rge vocabulary quantization for searchinginstances from videos[C]//International conference on multime⁃dia retrieval.New York,NY,United States:ACM,2012:1-8.[3] TAO R,GAVVES E,SNOEK C G M,et al.Locality in genericinstance search from one example[C]//Conference on computer vision and pattern recognition.Columbus,OH,USA:IEEE, 2014:2099-2106.[4] PERRONNIN F,LIU Y,SÁNCHEZ J,et rge-scale imageretrieval with compressed Fisher vectors[C]//Conference on computer vision and pattern recognition.San-Francisco,CA, USA:IEEE,2010:3384-3391.[5] GONG Y,WANG L,GUO R,et al.Multi-scale orderless poolingof deep convolutional activation features[C]//European confer⁃ence on computer vision.Zürich,Switzerland:Springer,2014: 392-407.[6] KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenetclassification with deep convolutional neural networks[C]//Neural information processing systems.New York,NY,United States:ACM,2012:1097-1105.[7] SIMONYAN K,ZISSERMAN A.Very deep convolutional net⁃works for large-scale image recognition[C]//Computer vision and pattern recognition.Boston,MA,USA:IEEE,2015:1-14.[8] HINTON G,DENG L,YU D,et al.Deep neural networks for a⁃coustic modeling in speech recognition[J].IEEE Signal Process⁃ing Magazine,2012,29(6):82-97.[9] WAN J,WANG D,HOI S C H,et al.Deep learning for content-based image retrieval:a comprehensive study[C]//International conference on multimedia.New York,NY,United States:ACM, 2014:157-166.[10]RAZAVIAN A S,SULLIVAN J,CARLSSON S,et al.Visual in⁃stance retrieval with deep convolutional networks[J].ITE Trans⁃actions on Media Technology and Applications,2016,4(3):251-258.[11]MOHEDANO E,MCGUINNESS K,O’CONNOR N E,et al.Bags of local convolutional features for scalable instance search[C]//International conference on multimedia retrieval.NewYork,NY,United States:ACM,2016:327-331. [12]UIJLINGS J R R,VAN DE SANDE K E A,GEVERS T,et al.Selective search for object recognition[J].International Journal of Computer Vision,2013,104(2):154-171.[13]GIRSHICK R,DONAHUE J,DARRELL T,et al.Rich feature hi⁃erarchies for accurate object detection and semantic segmentation[C]//Computer vision and pattern recognition.Columbus,OH,USA:IEEE,2014:580-587.[14]REN S,HE K,GIRSHICK R,et al.Faster R-CNN:towards real-time object detection with region proposal networks[C]//Neural information processing systems.New York,NY,United States: ACM,2015:91-99.[15]SALVADOR A,GIRÓ-I-NIETO X,MARQUÉS F,et al.FasterR-CNN features for instance search[C]//Conference on com⁃puter vision and pattern recognition s Vegas,NV, USA:IEEE,2016:9-16.[16]何 涛.基于深度学习和哈希的图像检索的方法研究[D].成都:电子科技大学,2018.[17]PHILBIN J,CHUM O,ISARD M,et al.Object retrieval withlarge vocabularies and fast spatial matching[C]//Conference on computer vision and pattern recognition.Minneapolis,MN,USA: IEEE,2007:1-8.[18]PHILBIN J,CHUM O,ISARD M,et al.Lost in quantization:im⁃proving particular object retrieval in large scale image databases[C]//Conference on computer vision and pattern recognition.Anchorage,AK,USA:IEEE,2008:1-8.[19]ZHENG Z,ZHENG L,YANG Y.A discriminatively learnedCNN embedding for person re-identification[J].Multimedia Computing,Communications,and Applications,2017,14(1):1-20.㊃24㊃ 计算机技术与发展 第30卷。
Discriminatively Trained Sparse Code Gradientsfor Contour DetectionXiaofeng Ren and Liefeng BoIntel Science and Technology Center for Pervasive Computing,Intel LabsSeattle,W A98195,USA{xiaofeng.ren,liefeng.bo}@AbstractFinding contours in natural images is a fundamental problem that serves as thebasis of many tasks such as image segmentation and object recognition.At thecore of contour detection technologies are a set of hand-designed gradient fea-tures,used by most approaches including the state-of-the-art Global Pb(gPb)operator.In this work,we show that contour detection accuracy can be signif-icantly improved by computing Sparse Code Gradients(SCG),which measurecontrast using patch representations automatically learned through sparse coding.We use K-SVD for dictionary learning and Orthogonal Matching Pursuit for com-puting sparse codes on oriented local neighborhoods,and apply multi-scale pool-ing and power transforms before classifying them with linear SVMs.By extract-ing rich representations from pixels and avoiding collapsing them prematurely,Sparse Code Gradients effectively learn how to measure local contrasts andfindcontours.We improve the F-measure metric on the BSDS500benchmark to0.74(up from0.71of gPb contours).Moreover,our learning approach can easily adaptto novel sensor data such as Kinect-style RGB-D cameras:Sparse Code Gradi-ents on depth maps and surface normals lead to promising contour detection usingdepth and depth+color,as verified on the NYU Depth Dataset.1IntroductionContour detection is a fundamental problem in vision.Accuratelyfinding both object boundaries and interior contours has far reaching implications for many vision tasks including segmentation,recog-nition and scene understanding.High-quality image segmentation has increasingly been relying on contour analysis,such as in the widely used system of Global Pb[2].Contours and segmentations have also seen extensive uses in shape matching and object recognition[8,9].Accuratelyfinding contours in natural images is a challenging problem and has been extensively studied.With the availability of datasets with human-marked groundtruth contours,a variety of approaches have been proposed and evaluated(see a summary in[2]),such as learning to clas-sify[17,20,16],contour grouping[23,31,12],multi-scale features[21,2],and hierarchical region analysis[2].Most of these approaches have one thing in common[17,23,31,21,12,2]:they are built on top of a set of gradient features[17]measuring local contrast of oriented discs,using chi-square distances of histograms of color and textons.Despite various efforts to use generic image features[5]or learn them[16],these hand-designed gradients are still widely used after a decade and support top-ranking algorithms on the Berkeley benchmarks[2].In this work,we demonstrate that contour detection can be vastly improved by replacing the hand-designed Pb gradients of[17]with rich representations that are automatically learned from data. We use sparse coding,in particularly Orthogonal Matching Pursuit[18]and K-SVD[1],to learn such representations on patches.Instead of a direct classification of patches[16],the sparse codes on the pixels are pooled over multi-scale half-discs for each orientation,in the spirit of the Pbimage patch: gray, abdepth patch (optional):depth, surface normal…local sparse coding multi-scale pooling oriented gradients power transformslinear SVM+ - …per-pixelsparse codes SVMSVMSVM … SVM RGB-(D) contoursFigure 1:We combine sparse coding and oriented gradients for contour analysis on color as well as depth images.Sparse coding automatically learns a rich representation of patches from data.With multi-scale pooling,oriented gradients efficiently capture local contrast and lead to much more accurate contour detection than those using hand-designed features including Global Pb (gPb)[2].gradients,before being classified with a linear SVM.The SVM outputs are then smoothed and non-max suppressed over orientations,as commonly done,to produce the final contours (see Fig.1).Our sparse code gradients (SCG)are much more effective in capturing local contour contrast than existing features.By only changing local features and keeping the smoothing and globalization parts fixed,we improve the F-measure on the BSDS500benchmark to 0.74(up from 0.71of gPb),a sub-stantial step toward human-level accuracy (see the precision-recall curves in Fig.4).Large improve-ments in accuracy are also observed on other datasets including MSRC2and PASCAL2008.More-over,our approach is built on unsupervised feature learning and can directly apply to novel sensor data such as RGB-D images from Kinect-style depth ing the NYU Depth dataset [27],we verify that our SCG approach combines the strengths of color and depth contour detection and outperforms an adaptation of gPb to RGB-D by a large margin.2Related WorkContour detection has a long history in computer vision as a fundamental building block.Modern approaches to contour detection are evaluated on datasets of natural images against human-marked groundtruth.The Pb work of Martin et.al.[17]combined a set of gradient features,using bright-ness,color and textons,to outperform the Canny edge detector on the Berkeley Benchmark (BSDS).Multi-scale versions of Pb were developed and found beneficial [21,2].Building on top of the Pb gradients,many approaches studied the globalization aspects,i.e.moving beyond local classifica-tion and enforcing consistency and continuity of contours.Ren et.al.developed CRF models on superpixels to learn junction types [23].Zhu ed circular embedding to enforce orderings of edgels [31].The gPb work of Arbelaez puted gradients on eigenvectors of the affinity graph and combined them with local cues [2].In addition to Pb gradients,Dollar et.al.[5]learned boosted trees on generic features such as gradients and Haar wavelets,Kokkinos used SIFT features on edgels [12],and Prasad et.al.[20]used raw pixels in class-specific settings.One closely related work was the discriminative sparse models of Mairal et al [16],which used K-SVD to represent multi-scale patches and had moderate success on the BSDS.A major difference of our work is the use of oriented gradients:comparing to directly classifying a patch,measuring contrast between oriented half-discs is a much easier problem and can be effectively learned.Sparse coding represents a signal by reconstructing it using a small set of basis functions.It has seen wide uses in vision,for example for faces [28]and recognition [29].Similar to deep network approaches [11,14],recent works tried to avoid feature engineering and employed sparse coding of image patches to learn features from “scratch”,for texture analysis [15]and object recognition [30,3].In particular,Orthogonal Matching Pursuit [18]is a greedy algorithm that incrementally finds sparse codes,and K-SVD is also efficient and popular for dictionary learning.Closely related to our work but on the different problem of recognition,Bo ed matching pursuit and K-SVD to learn features in a coding hierarchy [3]and are extending their approach to RGB-D data [4].Thanks to the mass production of Kinect,active RGB-D cameras became affordable and were quickly adopted in vision research and applications.The Kinect pose estimation of Shotton et. ed random forests to learn from a huge amount of data[25].Henry ed RGB-D cam-eras to scan large environments into3D models[10].RGB-D data were also studied in the context of object recognition[13]and scene labeling[27,22].In-depth studies of contour and segmentation problems for depth data are much in need given the fast growing interests in RGB-D perception.3Contour Detection using Sparse Code GradientsWe start by examining the processing pipeline of Global Pb(gPb)[2],a highly influential and widely used system for contour detection.The gPb contour detection has two stages:local contrast estimation at multiple scales,and globalization of the local cues using spectral grouping.The core of the approach lies within its use of local cues in oriented gradients.Originally developed in [17],this set of features use relatively simple pixel representations(histograms of brightness,color and textons)and similarity functions(chi-square distance,manually chosen),comparing to recent advances in using rich representations for high-level recognition(e.g.[11,29,30,3]).We set out to show that both the pixel representation and the aggregation of pixel information in local neighborhoods can be much improved and,to a large extent,learned from and adapted to input data. For pixel representation,in Section3.1we show how to use Orthogonal Matching Pursuit[18]and K-SVD[1],efficient sparse coding and dictionary learning algorithms that readily apply to low-level vision,to extract sparse codes at every pixel.This sparse coding approach can be viewed similar in spirit to the use offilterbanks but avoids manual choices and thus directly applies to the RGB-D data from Kinect.We show learned dictionaries for a number of channels that exhibit different characteristics:grayscale/luminance,chromaticity(ab),depth,and surface normal.In Section3.2we show how the pixel-level sparse codes can be integrated through multi-scale pool-ing into a rich representation of oriented local neighborhoods.By computing oriented gradients on this high dimensional representation and using a double power transform to code the features for linear classification,we show a linear SVM can be efficiently and effectively trained for each orientation to classify contour vs non-contour,yielding local contrast estimates that are much more accurate than the hand-designed features in gPb.3.1Local Sparse Representation of RGB-(D)PatchesK-SVD and Orthogonal Matching Pursuit.K-SVD[1]is a popular dictionary learning algorithm that generalizes K-Means and learns dictionaries of codewords from unsupervised data.Given a set of image patches Y=[y1,···,y n],K-SVD jointlyfinds a dictionary D=[d1,···,d m]and an associated sparse code matrix X=[x1,···,x n]by minimizing the reconstruction errorminY−DX 2F s.t.∀i, x i 0≤K;∀j, d j 2=1(1) D,Xwhere · F denotes the Frobenius norm,x i are the columns of X,the zero-norm · 0counts the non-zero entries in the sparse code x i,and K is a predefined sparsity level(number of non-zero en-tries).This optimization can be solved in an alternating manner.Given the dictionary D,optimizing the sparse code matrix X can be decoupled to sub-problems,each solved with Orthogonal Matching Pursuit(OMP)[18],a greedy algorithm forfinding sparse codes.Given the codes X,the dictionary D and its associated sparse coefficients are updated sequentially by singular value decomposition. For our purpose of representing local patches,the dictionary D has a small size(we use75for5x5 patches)and does not require a lot of sample patches,and it can be learned in a matter of minutes. Once the dictionary D is learned,we again use the Orthogonal Matching Pursuit(OMP)algorithm to compute sparse codes at every pixel.This can be efficiently done with convolution and a batch version of the OMP algorithm[24].For a typical BSDS image of resolution321x481,the sparse code extraction is efficient and takes1∼2seconds.Sparse Representation of RGB-D Data.One advantage of unsupervised dictionary learning is that it readily applies to novel sensor data,such as the color and depth frames from a Kinect-style RGB-D camera.We learn K-SVD dictionaries up to four channels of color and depth:grayscale for luminance,chromaticity ab for color in the Lab space,depth(distance to camera)and surface normal(3-dim).The learned dictionaries are visualized in Fig.2.These dictionaries are interesting(a)Grayscale (b)Chromaticity (ab)(c)Depth (d)Surface normal Figure 2:K-SVD dictionaries learned for four different channels:grayscale and chromaticity (in ab )for an RGB image (a,b),and depth and surface normal for a depth image (c,d).We use a fixed dictionary size of 75on 5x 5patches.The ab channel is visualized using a constant luminance of 50.The 3-dimensional surface normal (xyz)is visualized in RGB (i.e.blue for frontal-parallel surfaces).to look at and qualitatively distinctive:for example,the surface normal codewords tend to be more smooth due to flat surfaces,the depth codewords are also more smooth but with speckles,and the chromaticity codewords respect the opponent color pairs.The channels are coded separately.3.2Coding Multi-Scale Neighborhoods for Measuring ContrastMulti-Scale Pooling over Oriented Half-Discs.Over decades of research on contour detection and related topics,a number of fundamental observations have been made,repeatedly:(1)contrast is the key to differentiate contour vs non-contour;(2)orientation is important for respecting contour continuity;and (3)multi-scale is useful.We do not wish to throw out these principles.Instead,we seek to adopt these principles for our case of high dimensional representations with sparse codes.Each pixel is presented with sparse codes extracted from a small patch (5-by-5)around it.To aggre-gate pixel information,we use oriented half-discs as used in gPb (see an illustration in Fig.1).Each orientation is processed separately.For each orientation,at each pixel p and scale s ,we define two half-discs (rectangles)N a and N b of size s -by-(2s +1),on both sides of p ,rotated to that orienta-tion.For each half-disc N ,we use average pooling on non-zero entries (i.e.a hybrid of average and max pooling)to generate its representationF (N )= i ∈N |x i 1| i ∈N I |x i 1|>0,···, i ∈N |x im | i ∈NI |x im |>0 (2)where x ij is the j -th entry of the sparse code x i ,and I is the indicator function whether x ij is non-zero.We rotate the image (after sparse coding)and use integral images for fast computations (on both |x ij |and |x ij |>0,whose costs are independent of the size of N .For two oriented half-dics N a and N b at a scale s ,we compute a difference (gradient)vector DD (N a s ,N b s )= F (N a s )−F (N b s ) (3)where |·|is an element-wise absolute value operation.We divide D (N a s ,N b s )by their norms F (N a s ) + F (N b s ) + ,where is a positive number.Since the magnitude of sparse codes variesover a wide range due to local variations in illumination as well as occlusion,this step makes the appearance features robust to such variations and increases their discriminative power,as commonly done in both contour detection and object recognition.This value is not hard to set,and we find a value of =0.5is better than,for instance, =0.At this stage,one could train a classifier on D for each scale to convert it to a scalar value of contrast,which would resemble the chi-square distance function in gPb.Instead,we find that it is much better to avoid doing so separately at each scale,but combining multi-scale features in a joint representation,so as to allow interactions both between codewords and between scales.That is,our final representation of the contrast at a pixel p is the concatenation of sparse codes pooled at all thescales s ∈{1,···,S }(we use S =4):D p = D (N a 1,N b 1),···,D (N a S ,N b S );F (N a 1∪N b 1),···,F (N a S ∪N b S ) (4)In addition to difference D ,we also include a union term F (N a s ∪N b s ),which captures the appear-ance of the whole disc (union of the two half discs)and is normalized by F (N a s ) + F (N b s ) + .Double Power Transform and Linear Classifiers.The concatenated feature D p (non-negative)provides multi-scale contrast information for classifying whether p is a contour location for a partic-ular orientation.As D p is high dimensional (1200and above in our experiments)and we need to do it at every pixel and every orientation,we prefer using linear SVMs for both efficient testing as well as training.Directly learning a linear function on D p ,however,does not work very well.Instead,we apply a double power transformation to make the features more suitable for linear SVMs D p = D α1p ,D α2p (5)where 0<α1<α2<1.Empirically,we find that the double power transform works much better than either no transform or a single power transform α,as sometimes done in other classification contexts.Perronnin et.al.[19]provided an intuition why a power transform helps classification,which “re-normalizes”the distribution of the features into a more Gaussian form.One plausible intuition for a double power transform is that the optimal exponent αmay be different across feature dimensions.By putting two power transforms of D p together,we allow the classifier to pick its linear combination,different for each dimension,during the stage of supervised training.From Local Contrast to Global Contours.We intentionally only change the local contrast es-timation in gPb and keep the other steps fixed.These steps include:(1)the Savitzky-Goley filter to smooth responses and find peak locations;(2)non-max suppression over orientations;and (3)optionally,we apply the globalization step in gPb that computes a spectral gradient from the local gradients and then linearly combines the spectral gradient with the local ones.A sigmoid transform step is needed to convert the SVM outputs on D p before computing spectral gradients.4ExperimentsWe use the evaluation framework of,and extensively compare to,the publicly available Global Pb (gPb)system [2],widely used as the state of the art for contour detection 1.All the results reported on gPb are from running the gPb contour detection and evaluation codes (with default parameters),and accuracies are verified against the published results in [2].The gPb evaluation includes a number of criteria,including precision-recall (P/R)curves from contour matching (Fig.4),F-measures computed from P/R (Table 1,2,3)with a fixed contour threshold (ODS)or per-image thresholds (OIS),as well as average precisions (AP)from the P/R curves.Benchmark Datasets.The main dataset we use is the BSDS500benchmark [2],an extension of the original BSDS300benchmark and commonly used for contour evaluation.It includes 500natural images of roughly resolution 321x 481,including 200for training,100for validation,and 200for testing.We conduct both color and grayscale experiments (where we convert the BSDS500images to grayscale and retain the groundtruth).In addition,we also use the MSRC2and PASCAL2008segmentation datasets [26,6],as done in the gPb work [2].The MSRC2dataset has 591images of resolution 200x 300;we randomly choose half for training and half for testing.The PASCAL2008dataset includes 1023images in its training and validation sets,roughly of resolution 350x 500.We randomly choose half for training and half for testing.For RGB-D contour detection,we use the NYU Depth dataset (v2)[27],which includes 1449pairs of color and depth frames of resolution 480x 640,with groundtruth semantic regions.We choose 60%images for training and 40%for testing,as in its scene labeling setup.The Kinect images are of lower quality than BSDS,and we resize the frames to 240x 320in our experiments.Training Sparse Code Gradients.Given sparse codes from K-SVD and Orthogonal Matching Pur-suit,we train the Sparse Code Gradients classifiers,one linear SVM per orientation,from sampled locations.For positive data,we sample groundtruth contour locations and estimate the orientations at these locations using groundtruth.For negative data,locations and orientations are random.We subtract the mean from the patches in each data channel.For BSDS500,we typically have 1.5to 21In this work we focus on contour detection and do not address how to derive segmentations from contours.pooling disc size (pixel)a v e r a g e p r e c i s i o na v e r a g e p r e c i s i o nsparsity level a v e r a g e p r e c i s i o n (a)(b)(c)Figure 3:Analysis of our sparse code gradients,using average precision of classification on sampled boundaries.(a)The effect of single-scale vs multi-scale pooling (accumulated from the smallest).(b)Accuracy increasing with dictionary size,for four orientation channels.(c)The effect of the sparsity level K,which exhibits different behavior for grayscale and chromaticity.BSDS500ODS OIS AP l o c a l gPb (gray).67.69.68SCG (gray).69.71.71gPb (color).70.72.71SCG (color).72.74.75g l o b a l gPb (gray).69.71.67SCG (gray).71.73.74gPb (color).71.74.72SCG (color).74.76.77Table 1:F-measure evaluation on the BSDS500benchmark [2],comparing to gPb on grayscaleand color images,both for local contour detec-tion as well as for global detection (-bined with the spectral gradient analysis in [2]).Recall P r e c i s i o n Figure 4:Precision-recall curves of SCG vs gPb on BSDS500,for grayscale and color images.We make a substantial step beyondthe current state of the art toward reachinghuman-level accuracy (green dot).million data points.We use 4spatial scales,at half-disc sizes 2,4,7,25.For a dictionary size of 75and 4scales,the feature length for one data channel is 1200.For full RGB-D data,the dimension is 4800.For BSDS500,we train only using the 200training images.We modify liblinear [7]to take dense matrices (features are dense after pooling)and single-precision floats.Looking under the Hood.We empirically analyze a number of settings in our Sparse Code Gradi-ents.In particular,we want to understand how the choices in the local sparse coding affect contour classification.Fig.3shows the effects of multi-scale pooling,dictionary size,and sparsity level (K).The numbers reported are intermediate results,namely the mean of average precision of four oriented gradient classifier (0,45,90,135degrees)on sampled locations (grayscale unless otherwise noted,on validation).As a reference,the average precision of gPb on this task is 0.878.For multi-scale pooling,the single best scale for the half-disc filter is about 4x 8,consistent with the settings in gPb.For accumulated scales (using all the scales from the smallest up to the current level),the accuracy continues to increase and does not seem to be saturated,suggesting the use of larger scales.The dictionary size has a minor impact,and there is a small (yet observable)benefit to use dictionaries larger than 75,particularly for diagonal orientations (45-and 135-deg).The sparsity level K is a more intriguing issue.In Fig.3(c),we see that for grayscale only,K =1(normalized nearest neighbor)does quite well;on the other hand,color needs a larger K ,possibly because ab is a nonlinear space.When combining grayscale and color,it seems that we want K to be at least 3.It also varies with orientation:horizontal and vertical edges require a smaller K than diagonal edges.(If using K =1,our final F-measure on BSDS500is 0.730.)We also empirically evaluate the double power transform vs single power transform vs no transform.With no transform,the average precision is 0.865.With a single power transform,the best choice of the exponent is around 0.4,with average precision 0.884.A double power transform (with exponentsMSRC2ODS OIS APgPb.37.39.22SCG.43.43.33PASCAL2008ODS OIS APgPb.34.38.20SCG.37.41.27Table2:F-measure evaluation comparing our SCG approach to gPb on two addi-tional image datasets with contour groundtruth: MSRC2[26]and PASCAL2008[6].RGB-D(NYU v2)ODS OIS AP gPb(color).51.52.37 SCG(color).55.57.46gPb(depth).44.46.28SCG(depth).53.54.45gPb(RGB-D).53.54.40SCG(RGB-D).62.63.54Table3:F-measure evaluation on RGB-D con-tour detection using the NYU dataset(v2)[27].We compare to gPb on using color image only,depth only,as well as color+depth.Figure5:Examples from the BSDS500dataset[2].(Top)Image;(Middle)gPb output;(Bottom) SCG output(this work).Our SCG operator learns to preservefine details(e.g.windmills,faces,fish fins)while at the same time achieving higher precision on large-scale contours(e.g.back of zebras). (Contours are shown in double width for the sake of visualization.)0.25and0.75,which can be computed through sqrt)improves the average precision to0.900,which translates to a large improvement in contour detection accuracy.Image Benchmarking Results.In Table1and Fig.4we show the precision-recall of our Sparse Code Gradients vs gPb on the BSDS500benchmark.We conduct four sets of experiments,using color or grayscale images,with or without the globalization component(for which we use exactly the same setup as in gPb).Using Sparse Code Gradients leads to a significant improvement in accuracy in all four cases.The local version of our SCG operator,i.e.only using local contrast,is already better(F=0.72)than gPb with globalization(F=0.71).The full version,local SCG plus spectral gradient(computed from local SCG),reaches an F-measure of0.739,a large step forward from gPb,as seen in the precision-recall curves in Fig.4.On BSDS300,our F-measure is0.715. We observe that SCG seems to pick upfine-scale details much better than gPb,hence the much higher recall rate,while maintaining higher precision over the entire range.This can be seen in the examples shown in Fig.5.While our scale range is similar to that of gPb,the multi-scale pooling scheme allows theflexibility of learning the balance of scales separately for each code word,which may help detecting the details.The supplemental material contains more comparison examples.In Table2we show the benchmarking results for two additional datasets,MSRC2and PAS-CAL2008.Again we observe large improvements in accuracy,in spite of the somewhat different natures of the scenes in these datasets.The improvement on MSRC2is much larger,partly because the images are smaller,hence the contours are smaller in scale and may be over-smoothed in gPb. As for computational cost,using integral images,local SCG takes∼100seconds to compute on a single-thread Intel Core i5-2500CPU on a BSDS image.It is slower than but comparable to the highly optimized multi-thread C++implementation of gPb(∼60seconds).Figure6:Examples of RGB-D contour detection on the NYU dataset(v2)[27].Thefive panels are:input image,input depth,image-only contours,depth-only contours,and color+depth contours. Color is good picking up details such as photos on the wall,and depth is useful where color is uniform(e.g.corner of a room,row1)or illumination is poor(e.g.chair,row2).RGB-D Contour Detection.We use the second version of the NYU Depth Dataset[27],which has higher quality groundtruth than thefirst version.A medianfiltering is applied to remove double contours(boundaries from two adjacent regions)within3pixels.For RGB-D baseline,we use a simple adaptation of gPb:the depth values are in meters and used directly as a grayscale image in gPb gradient computation.We use a linear combination to put(soft)color and depth gradients together in gPb before non-max suppression,with the weight set from validation.Table3lists the precision-recall evaluations of SCG vs gPb for RGB-D contour detection.All the SCG settings(such as scales and dictionary sizes)are kept the same as for BSDS.SCG again outperforms gPb in all the cases.In particular,we are much better for depth-only contours,for which gPb is not designed.Our approach learns the low-level representations of depth data fully automatically and does not require any manual tweaking.We also achieve a much larger boost by combining color and depth,demonstrating that color and depth channels contain complementary information and are both critical for RGB-D contour detection.Qualitatively,it is easy to see that RGB-D combines the strengths of color and depth and is a promising direction for contour and segmentation tasks and indoor scene analysis in general[22].Fig.6shows a few examples of RGB-D contours from our SCG operator.There are plenty of such cases where color alone or depth alone would fail to extract contours for meaningful parts of the scenes,and color+depth would succeed. 5DiscussionsIn this work we successfully showed how to learn and code local representations to extract contours in natural images.Our approach combined the proven concept of oriented gradients with powerful representations that are automatically learned through sparse coding.Sparse Code Gradients(SCG) performed significantly better than hand-designed features that were in use for a decade,and pushed contour detection much closer to human-level accuracy as illustrated on the BSDS500benchmark. Comparing to hand-designed features(e.g.Global Pb[2]),we maintain the high dimensional rep-resentation from pooling oriented neighborhoods and do not collapse them prematurely(such as computing chi-square distance at each scale).This passes a richer set of information into learn-ing contour classification,where a double power transform effectively codes the features for linear paring to previous learning approaches(e.g.discriminative dictionaries in[16]),our uses of multi-scale pooling and oriented gradients lead to much higher classification accuracies. Our work opens up future possibilities for learning contour detection and segmentation.As we il-lustrated,there is a lot of information locally that is waiting to be extracted,and a learning approach such as sparse coding provides a principled way to do so,where rich representations can be automat-ically constructed and adapted.This is particularly important for novel sensor data such as RGB-D, for which we have less understanding but increasingly more need.。
S ENTI W ORD N ET:A Publicly Available Lexical Resourcefor Opinion MiningAndrea Esuli∗and Fabrizio Sebastiani†∗Istituto di Scienza e Tecnologie dell’Informazione,Consiglio Nazionale delle RicercheVia Giuseppe Moruzzi1,56124Pisa,ItalyE-mail:andrea.esuli@r.it†Dipartimento di Matematica Pura e Applicata,Universit`a di PadovaVia Giovan Battista Belzoni7,35131Padova,ItalyE-mail:fabrizio.sebastiani@unipd.itAbstractOpinion mining(OM)is a recent subdiscipline at the crossroads of information retrieval and computational linguistics which is concerned not with the topic a document is about,but with the opinion it expresses.OM has a rich set of applications,ranging from tracking users’opinions about products or about political candidates as expressed in online forums,to customer relationship management.In order to aid the extraction of opinions from text,recent research has tried to automatically determine the“PN-polarity”of subjective terms,i.e. identify whether a term that is a marker of opinionated content has a positive or a negative connotation.Research on determining whether a term is indeed a marker of opinionated content(a subjective term)or not(an objective term)has been,instead,much more scarce.In this work we describe S ENTI W ORD N ET,a lexical resource in which each W ORD N ET synset s is associated to three numerical scores Obj(s), P os(s)and Neg(s),describing how objective,positive,and negative the terms contained in the synset are.The method used to develop S ENTI W ORD N ET is based on the quantitative analysis of the glosses associated to synsets,and on the use of the resulting vectorial term representations for semi-supervised synset classification.The three scores are derived by combining the results produced by a committee of eight ternary classifiers,all characterized by similar accuracy levels but different classification behaviour.S ENTI W ORD N ET is freely available for research purposes,and is endowed with a Web-based graphical user interface.1.IntroductionOpinion mining(OM–also known as“sentiment classifi-cation”)is a recent subdiscipline at the crossroads of infor-mation retrieval and computational linguistics which is con-cerned not with the topic a text is about,but with the opin-ion it expresses.Opinion-driven content management has several important applications,such as determining critics’opinions about a given product by classifying online prod-uct reviews,or tracking the shifting attitudes of the general public towards a political candidate by mining online fo-rums or blogs.Within OM,several subtasks can be iden-tified,all of them having to do with tagging a given text according to expressed opinion:1.determining text SO-polarity,as in deciding whether agiven text has a factual nature(i.e.describes a given situation or event,without expressing a positive or a negative opinion on it)or expresses an opinion on its subject matter.This amounts to performing binary text categorization under categories Subjective and Ob-jective(Pang and Lee,2004;Yu and Hatzivassiloglou, 2003);2.determining text PN-polarity,as in deciding if a givenSubjective text expresses a Positive or a Negative opinion on its subject matter(Pang and Lee,2004;Turney,2002);3.determining the strength of text PN-polarity,as in de-ciding e.g.whether the Positive opinion expressed bya text on its subject matter is Weakly Positive,MildlyPositive,or Strongly Positive(Pang and Lee,2005;Wilson et al.,2004).To aid these tasks,several researchers have attempted to automatically determine whether a term that is a marker of opinionated content has a Positive or a Negative con-notation(Esuli and Sebastiani,2005;Hatzivassiloglou and McKeown,1997;Kamps et al.,2004;Kim and Hovy,2004; Takamura et al.,2005;Turney and Littman,2003),since it is by considering the combined contribution of these terms that one may hope to solve Tasks1,2and3.The con-ceptually simplest approach to this latter problem is prob-ably Turney’s(Turney,2002),who has obtained interest-ing results on Task2by considering the algebraic sum of the orientations of terms as representative of the orienta-tion of the document they belong to;but more sophisticated approaches are also possible(Hatzivassiloglou and Wiebe, 2000;Riloff et al.,2003;Whitelaw et al.,2005;Wilson et al.,2004).The task of determining whether a term is indeed a marker of opinionated content(i.e.is Subjective or Ob-jective)has instead received much less attention(Esuli and Sebastiani,2006;Riloff et al.,2003;Vegnaduzzo,2004). Note that in these works no distinction between different senses of a word is attempted,so that the term,and not its senses,are classified(although some such works(Hatzivas-siloglou and McKeown,1997;Kamps et al.,2004)distin-guish between different POSs of a word).In this paper we describe S ENTI W ORD N ET(version 1.0),a lexical resource in which each synset of W ORD-N ET(version2.0)is associated to three numerical scores Obj(s),P os(s)and Neg(s),describing how Objective, Positive,and Negative the terms contained in the synset are.The assumption that underlies our switch from terms to synsets is that different senses of the same term mayhave different opinion-related properties.Each of the three scores ranges from0.0to1.0,and their sum is1.0for each synset.This means that a synset may have nonzero scores for all the three categories,which would indicate that the corresponding terms have,in the sense indicated by the synset,each of the three opinion-related proper-ties only to a certain degree1.For example,the synset [estimable(3)]2,corresponding to the sense“may be computed or estimated”of the adjective estimable,has an Obj score of1.0(and P os and Neg scores of0.0),while the synset[estimable(1)]corresponding to the sense “deserving of respect or high regard”has a P os score of0.75,a Neg score of0.0,and an Obj score of0.25.A similar intuition had previously been presented in(Kim and Hovy,2004),whereby a term could have both a Positive and a Negative PN-polarity,each to a cer-tain degree.A similar point has also recently been made in(Andreevskaia and Bergler,2006),in which terms that possess a given opinion-related property to a higher de-gree are claimed to be also the ones on which human an-notators asked to assign this property agree more.Non-binary scores are attached to opinion-related properties also in(Turney and Littman,2003),but the interpretation here is related to the confidence in the correctness of the labelling, rather than in how strong the term is deemed to possess the property.We believe that a graded(as opposed to“hard”)eval-uation of opinion-related properties of terms can be help-ful in the development of opinion mining applications.A hard classification method will probably label as Objective any term that has no strong SO-polarity,e.g.terms such as short or alone.If a sentence contains many such terms, a resource based on a hard classification will probably miss its subtly subjective character,while a graded lexical re-source like S ENTI W ORD N ET may provide enough infor-mation to capture such nuances.The method we have used to develop S ENTI W ORD N ET is based on our previous work on determining the opinion-related properties of terms(Esuli and Sebastiani,2005; Esuli and Sebastiani,2006).The method relies on the quan-titative analysis of the glosses associated to synsets,and on the use of the resulting vectorial term representations for 1Note that associating a graded score to a synset for a certain property(e.g.Positive)may have(at least)three different inter-pretations:(i)the terms in the synset are Positive only to a cer-tain degree;(ii)the terms in the synset are sometimes used in a Positive sense and sometimes not,e.g.depending on the context of use;(iii)a combination of(i)and(ii)is the case.Interpreta-tion(i)has a fuzzy character,implying that each instance of these terms,in each context of use,have the property to a certain de-gree,while interpretation(ii)has a probabilistic interpretation(of a frequentistic type),implying that membership of a synset in the set denoted by the property must be computed by counting the number of contexts of use in which the terms have the property. We do not attempt to take a stand on this distinction,which(to our knowledge)had never been raised in sentiment analysis and that requires an in-depth linguistic study,although we tend to believe that(iii)is the case.2We here adopt the standard convention according to which a term enclosed in square bracket denotes a synset;thus [poor(7)]refers not just to the term poor but to the synset consisting of{inadequate(2),poor(7),short(4)}.semi-supervised synset classification.The three scores are derived by combining the results produced by a committee of eight ternary classifiers,each of which has demonstrated, in our previous tests,similar accuracy but different charac-teristics in terms of classification behaviour.S ENTI W ORD N ET is freely available for research pur-poses,and is endowed with a Web-based graphical user in-terface.2.Building S ENTI W ORD N ETThe method we have used to develop S ENTI W ORD N ET is an adaptation to synset classification of our method for de-ciding the PN-polarity(Esuli and Sebastiani,2005)and SO-polarity(Esuli and Sebastiani,2006)of terms.The method relies on training a set of ternary classifiers3,each of them capable of deciding whether a synset is Positive,or Nega-tive,or Objective.Each ternary classifier differs from the other in the training set used to train it and in the learn-ing device used to train it,thus producing different classi-fication results of the W ORD N ET synsets.Opinion-related scores for a synset are determined by the(normalized)pro-portion of ternary classifiers that have assigned the corre-sponding label to it.If all the ternary classifiers agree in assigning the same label to a synset,that label will have the maximum score for that synset,otherwise each label will have a score proportional to the number of classifiers that have assigned it.2.1.Training a classifierEach ternary classifier is generated using the semi-supervised method described in(Esuli and Sebastiani, 2006).A semi-supervised method is a learning process whereby only a small subset L⊂T r of the training data T r have been manually labelled.In origin the training data in U=T r−L were instead unlabelled;it is the process itself that has labelled them,automatically,by using L(with the possible addition of other publicly available resources)as input.Our method defines L as the union of three seed(i.e. training)sets L p,L n and L o of known Positive,Negative and Objective synsets,respectively4.L p and L n are two small sets,which we have defined by manually selecting the intended synsets5for14“paradig-matic”Positive and Negative terms(e.g.the Positive term nice,the Negative term nasty)which were used as seed terms in(Turney and Littman,2003).L p and L n are then iteratively expanded,in K itera-tions,into thefinal training sets T r K p and T r K n.At each iteration step k two sets T r k p and T r k n are generated,where T r k p⊃T r k−1p⊃...⊃T r1p=L p and T r k n⊃T r k−1n⊃...⊃T r1n=L n.The expansion at iteration step k consists 3An n-ary classifier is a device that attaches to each object exactly one from a predefined set of n labels.4A W ORD N ET synset represent a unique sense,which is de-fined by a unique gloss and is associated to a set of terms all with the same POS,each one associated to a sense number,(e.g.the ad-jectives blasphemous(2),blue(4),profane(1)are all contained in the same synset,whose sense is defined by the gloss “characterized by profanity or cursing”).5For example,for the term nice we have removed the synset relative to the French city of Nice.The process has resulted in47 Positive and58Negative synsets.•in adding to T r k p(resp.T r k n)all the synsets thatare connected to synsets in T r k−1p (resp.T r k−1n)byW ORD N ET lexical relations(e.g.also-see)such that the two related synsets can be taken to have the same PN-polarity;•in adding to T r k p(resp.T r k n)all the synsets thatare connected to synsets in T r k−1n (resp.T r k−1p)byW ORD N ET lexical relations(e.g.direct antonymy) such that the two related synsets can be taken to have opposite PN-polarity.The relations we have used in(Esuli and Sebastiani, 2005;Esuli and Sebastiani,2006)are synonymy and direct antonymy between terms,as is common in related litera-ture(Kamps et al.,2004;Kim and Hovy,2004;Takamura et al.,2005).In the case of synsets,synonymy cannot be used because it is the relation that defines synsets,thus it does connect different synsets.We have then followed the method used in(Valitutti et al.,2004)for the development of W ORD N ET-A FFECT,a lexical resource that tags W ORD-N ET synsets by means of a taxonomy of affective cate-gories(e.g.Behaviour,Personality,Cognitive state):af-ter hand-collecting a number of labelled terms from other resources,Valitutti and colleagues generate W ORD N ET-A FFECT by adding to them the synsets reachable by navi-gating the relations of direct antonymy,similarity,derived-from,pertains-to,attribute,and also-see,which they con-sider to reliably preserve/invert the involved labels.Given the similarity with our task,we have used exactly these relations in our expansion.Thefinal sets T r K p and T r K n, along with the set T r K o described below,are used to train the ternary classifiers.The L o set is treated differently from L p and L n,be-cause of the inherently“complementary”nature of the Ob-jective category(an Objective term can be defined as a term that does not have either Positive or Negative char-acteristics).We have heuristically defined L o as the set of synsets that(a)do not belong to either T r K p or T r K n,and (b)contain terms not marked as either Positive or Neg-ative in the General Inquirer lexicon(Stone et al.,1966). The resulting L o set consists of17,530synsets;for any K, we define T r K o to coincide with L o.We give each synset a vectorial representation,obtained by applying a standard text indexing technique(cosine-normalized tf∗id f preceded by stop word removal)to its gloss,which we thus take to be a textual representa-tion of its semantics.Our basic assumption is that terms with similar polarity tend to have“similar”glosses:for in-stance,that the glosses of honest and intrepid will both contain appreciative expressions,while the glosses of disturbing and superfluous will both contain derogative expressions.The vectorial representations of the training synsets for a given label c i are then input to a standard supervised learner,which generates two binary classifiers.One of them must discriminate between terms that belong to the Positive category and ones that belong to its complement (not Positive),while the other must discriminate between terms that belong to the Negative category and ones that belong to its complement(not Negative).Terms that have been classified both into Positive by the former classifier and into(not Negative)by the latter are deemed to be posi-tive,and terms that have been classified both into(not Posi-tive)by the former classifier and into Negative by the latter are deemed to be negative.The terms that have been clas-sified(i)into both(not Positive)and(not Negative),or (ii)into both Positive and Negative,are taken to be Ob-jective.In the training phase,the terms in T r K n∪T r K o are used as training examples of category(not Positive), and the terms in T r K p∪T r K o are used as training examples of category(not Negative).The resulting ternary classi-fierˆΦis then applied to the vectorial representations of all W ORD N ET synsets(including those in T r K−L),to pro-duce the sentiment classification of the entire W ORD N ET.2.2.Defining the committee of classifiersIn(Esuli and Sebastiani,2006)we point out how differ-ent combinations of training set and learner perform differ-ently,even though with similar accuracy.The main three observations we recall here are the following:•Low values of K produce small training sets for Pos-itive and Negative,which produces binary classifiers with low recall and high precision for these categories.By increasing K these sets get larger,and the effect of these larger numbers is to increase recall and but to also add“noise”to the training set,which decreases precision.•Learners that use information about the prior proba-bilities of categories,e.g.naive Bayesian learners and SVMs,which estimate these probabilities from the training sets,are sensitive to the relative cardinalities of the training sets,and tend to classify more items into the categories that have more positive training items.Learners that do not use this kind of infor-mation,like Rocchio,do not exhibit this kind of be-haviour.•The variability described in the previous points does not affect the overall accuracy of the method,but only the balance in classification between Subjective and Objective items,while the accuracy in discriminating between Positive and Negative items tends to be con-stant.Following these considerations,we have decided to com-bine different configurations of training set and learner into a committee,to produce thefinal S ENTI W ORD N ET scores. Specifically,we have defined four different training sets,by choosing four different values of K(0,2,4,6),and we have alternatively used two learners(Rocchio and SVMs)6;this yields a total of eight ternary classifiers.With K=0and the SVM learner we have obtained very“conservative”bi-nary classifier for Positive and Negative,with very low re-call and high precision.For K=6SVMs produced instead 6The Rocchio learner we have used is from Andrew McCallum’s Bow package (/˜mccallum/bow/), while the SVMs learner we have used is version 6.01of Thorsten Joachims’SV M light(/).“liberal”binary classifiers for these two labels,that classifymany synsets as Positive or Negative even in the presence of very little evidence of subjectivity.The Rocchio learner has a similar behaviour,although not dependent on the prior probabilities of categories.S ENTI W ORD N ET is thus ob-tained by combining,for each synset,the scores produced by the eight ternary classifiers and normalizing them to1.0.2.3.Some statisticsTable1shows some statistics about the distribution of scores in S ENTI W ORD N ET.Thefirst remarkable fact is that the synsets judged to have some degree of opinion-related properties(i.e.not fully Objective)are a consid-erable part of the whole W ORD N ET,i.e.24.63%of it. However,as the objectivity score decreases,indicating a stronger subjectivity score(either as Positive,or as Neg-ative,or as a combination of them),the number of the synsets involved decreases rapidly,from10.45%for Ob-jective<=0.5,to0.56%for Objective<=0.125.This seems to indicate that there are only few terms that are un-questionably Positive(or Negative),where“unquestion-ably”here indicates widespread agreement among different automated classifiers;in essence,this is the same observa-tion which has independently been made in(Andreevskaia and Bergler,2006),where agreement among human classi-fiers is shown to correlate strongly with agreement among automated classifiers,and where such agreement is strong only for a small subset of“core”,strongly-marked terms.Table1reports a breakdown by POS of the scores ob-tained by synsets.It is quite evident that“adverb”and“ad-jective”synsets are evaluated as(at least partially)Sub-jective(i.e.Obj(s)<1)much more frequently(39.66% and35.7%of the cases,respectively)than“verb”(11.04%) or“name”synsets(9.98%).This fact seems to indicate that,in natural language,opinionated content is most of-ten carried by parts of speech used as modifiers(i.e.ad-verbs,adjectives)rather than parts of speech used as heads (i.e.verbs,nouns),as exemplified by expressions such as a disastrous appearance or a fabulous game. This intuition might be rephrased by saying that the most frequent role of heads is to denote entities or events,while that of modifiers is(among other things)to express a judg-ment of merit on them.3.Visualizing S ENTI W ORD N ETGiven that the sum of the opinion-related scores as-signed to a synset is always1.0,it is possible to display these values in a triangle whose vertices are the maximum possible values for the three dimensions observed.Figure1shows the graphical model we have designed to display the scores of a synset.This model is used in the Web-based graphical user interface through which S ENTI W ORD N ET can be freely accessed at r.it/˜esuli/software/ SentiWordNet.Figures2and3show two screenshots of the output for the terms estimable and short.4.Evaluating S ENTI W ORD N ETHow reliable are the opinion-related scores attached to synsets in SentiWordNet?Testing the accuracy of ourScore Positive Negative ObjectiveAdjectives0.065.77%62.81%0.08%0.12512.12%7.32% 2.14%0.258.81%8.68%7.42%0.375 4.85% 5.19%11.73%0.5 3.74% 5.63%9.50%0.625 2.94% 5.53%7.65%0.75 1.28% 3.72%9.21%0.8750.47% 1.07%7.57%1.00.03%0.04%44.71%Avg0.1060.1510.743Names0.090.80%89.25%0.00%0.125 4.53% 3.93%0.23%0.25 2.37% 2.42%0.87%0.375 1.25% 1.54% 1.84%0.50.62% 1.35% 2.32%0.6250.24%0.91% 2.57%0.750.14%0.48% 3.27%0.8750.05%0.12% 5.40%1.00.00%0.00%83.50%Avg0.0220.0340.944Verbs0.089.98%87.93%0.00%0.125 4.43% 4.94%0.21%0.25 2.66% 2.95%0.64%0.375 1.55% 1.81% 1.35%0.50.84% 1.24% 2.67%0.6250.36%0.63% 3.40%0.750.10%0.42% 4.57%0.8750.07%0.08% 6.11%1.00.00%0.00%81.05%Avg0.0260.0340.940Adverbs0.043.70%76.99%0.00%0.125 6.25%9.66%0.57%0.25 6.17% 5.32% 3.00%0.37514.44% 2.51%12.83%0.522.63% 2.70%23.91%0.625 5.70% 1.72%13.56%0.75 1.06%0.82% 6.11%0.8750.05%0.27%7.04%1.00.00%0.00%32.97%Avg0.2350.0670.698All parts of speech0.085.18%84.45%0.02%0.125 5.79% 4.77%0.54%0.25 3.56% 3.58% 1.97%0.375 2.28% 2.19% 3.72%0.5 1.85% 2.07% 4.20%0.6250.87% 1.64% 3.83%0.750.35% 1.00% 4.47%0.8750.12%0.27% 5.88%1.00.01%0.01%75.37%Avg0.0430.0540.903Table1:Breakdown by part of speech of the scores ob-tained by W ORD N ET synsets,and average scores obtained for each part of speech.tagging method experimentally is impossible,since for this we would need a full manual tagging of W ORD N ET ac-cording to our three labels of interest,and the lack of such a manually tagged resource is exactly the reason why we are interested in generating it automatically.A first,approximate indication of the quality of S EN -TI W ORD N ET can be gleaned by looking at the accuracy obtained by our method in classifying the General In-quirer (Stone et al.,1966),a lexicon which is instead fully tagged according to three opinion-related labels we have been discussing;the results of this classification exercise are reported in (Esuli and Sebastiani,2006).The reader should however bear in mind a few differences between the method used in (Esuli and Sebastiani,2006)and the one used here:(i)we here classify entire synsets,while in (Esuli and Sebastiani,2006)we classified terms,which can some-times be ambiguous and thus more difficult to classify cor-rectly;(ii)as discussed in Section 2.1.,the W ORD N ET lex-ical relations used for the expansion of the training set are different.The effectiveness results reported in (Esuli and Sebastiani,2006)may thus be considered only approxi-mately indicative of the accuracy of the S ENTI W ORD N ET labelling.A second,more direct route to evaluating S ENTI -W ORD N ET is to produce a human labelling of a subset of W ORD N ET ,and to use this subset as a “gold standard”against which to evaluate the scores attached to the same synsets in S ENTI W ORD N ET .We are currently producing this labelled corpus 7,which will consist of 1000W ORD -N ET synsets tagged by five different evaluators;for each synset each evaluator will attribute,through a graphical in-terface we have designed,a score for each of the three la-bels of interest such that the three scores sum up to parisons among the scores assigned by different eval-uators to the same synsets will also allow us to obtain inter-indexer inconsistency results for this task;the five evalua-tors have initially gone through a training session in which the meaning of the labels has been clarified,which should keep inter-indexer inconsistency within reasonable bounds.Note that 1000synsets correspond to less than 1%of the total 115,000W ORD N ET synsets;this points at the fact that,again,the accuracy obtained on this benchmark may be considered only as indicative of the (unknown)level of accuracy with which SentiWordNet has been produced.Notwithstanding this fact this benchmark will prove a use-ful tool in the comparative evaluation of future systems that,like ours,tag WordNet synsets by opinion,including pos-sible future releases of SentiWordNet .5.Conclusion and future researchWe believe that SentiWordNet can prove a useful tool for opinion mining applications,because of its wide cov-erage (all WordNet synsets are tagged according to each of the three labels Objective ,Positive ,Negative )and be-cause of its fine grain,obtained by qualifying the labels by means of numerical scores.7This work is being carried out in collaboration with Andrea Sans`o from the University of Pavia,whose help we gratefully ac-knowledge.Figure 1:The graphical representation adopted by S ENTI -W ORD N ET for representing the opinion-related properties of a termsense.Figure 2:S ENTI W ORD N ET visualization of the opinion-related properties of the term estimable .We are currently testing new algorithms for tagging WordNet synsets by sentiment,and thus plan to continue the development of SentiWordNet beyond the currently released “Version 1.0”;once developed,the gold standard discussed in Section 4.will contribute to guiding this de-velopment,hopefully allowing us to make available to the scientific community more and more refined releases of SentiWordNet .6.AcknowledgmentsThis work was partially supported by Project ONTOTEXT“From Text to Knowledge for the Semantic Web”,funded by the Provincia Autonoma di Trento under the 2004–2006“Fondo Unico per la Ricerca”funding scheme.Figure3:S ENTI W ORD N ET visualization of the opinion-related properties of the term short.7.ReferencesAlina Andreevskaia and Sabine Bergler.2006.Mining WordNet for fuzzy sentiment:Sentiment tag extraction from WordNet glosses.In Proceedings of EACL-06,11th Conference of the European Chapter of the Association for Computational Lin-guistics,Trento,IT.Forthcoming.Andrea Esuli and Fabrizio Sebastiani.2005.Determining the se-mantic orientation of terms through gloss analysis.In Proceed-ings of CIKM-05,14th ACM International Conference on Infor-mation and Knowledge Management,pages617–624,Bremen, DE.Andrea Esuli and Fabrizio Sebastiani.2006.Determining term subjectivity and term orientation for opinion mining.In Pro-ceedings of EACL-06,11th Conference of the European Chap-ter of the Association for Computational Linguistics,Trento, IT.Forthcoming.Vasileios Hatzivassiloglou and Kathleen R.McKeown.1997.Predicting the semantic orientation of adjectives.In Proceed-ings of ACL-97,35th Annual Meeting of the Association for Computational Linguistics,pages174–181,Madrid,ES. Vasileios Hatzivassiloglou and Janyce M.Wiebe.2000.Effects of adjective orientation and gradability on sentence subjectivity.In Proceedings of COLING-00,18th International Conference on Computational Linguistics,pages174–181.Jaap Kamps,Maarten Marx,Robert J.Mokken,and Maarten De ing WordNet to measure semantic orienta-tion of adjectives.In Proceedings of LREC-04,4th Interna-tional Conference on Language Resources and Evaluation,vol-ume IV,pages1115–1118,Lisbon,PT.Soo-Min Kim and Eduard Hovy.2004.Determining the senti-ment of opinions.In Proceedings of COLING-04,20th In-ternational Conference on Computational Linguistics,pages 1367–1373,Geneva,CH.Bo Pang and Lillian Lee.2004.A sentimental education:Senti-ment analysis using subjectivity summarization based on min-imum cuts.In Proceedings of ACL-04,42nd Meeting of the Association for Computational Linguistics,pages271–278, Barcelona,ES.Bo Pang and Lillian Lee.2005.Seeing stars:Exploiting class relationships for sentiment categorization with respect to rat-ing scales.In Proceedings of ACL-05,43rd Meeting of the As-sociation for Computational Linguistics,pages115–124,Ann Arbor,US.Ellen Riloff,Janyce Wiebe,and Theresa Wilson.2003.Learn-ing subjective nouns using extraction pattern bootstrapping.In Proceedings of CONLL-03,7th Conference on Natural Lan-guage Learning,pages25–32,Edmonton,CA.P.J.Stone,D.C.Dunphy,M.S.Smith,and D.M.Ogilvie.1966.The General Inquirer:A Computer Approach to Content Anal-ysis.MIT Press,Cambridge,US.Hiroya Takamura,Takashi Inui,and Manabu Okumura.2005.Extracting emotional polarity of words using spin model.In Proceedings of ACL-05,43rd Annual Meeting of the Associa-tion for Computational Linguistics,pages133–140,Ann Arbor, US.Peter D.Turney and Michael L.Littman.2003.Measuring praise and criticism:Inference of semantic orientation from associ-ation.ACM Transactions on Information Systems,21(4):315–346.Peter Turney.2002.Thumbs up or thumbs down?Semantic ori-entation applied to unsupervised classification of reviews.In Proceedings of ACL-02,40th Annual Meeting of the Associa-tion for Computational Linguistics,pages417–424,Philadel-phia,US.Alessandro Valitutti,Carlo Strapparava,and Oliviero Stock.2004.Developing affective lexical resources.PsychNology Journal,2(1):61–83.Stefano Vegnaduzzo.2004.Acquisition of subjective adjectives with limited resources.In Proceedings of the AAAI Spring Symposium on Exploring Attitude and Affect in Text:Theories and Applications,Stanford,US.Casey Whitelaw,Navendu Garg,and Shlomo Argamon.2005.Using appraisal groups for sentiment analysis.In Proceedings of CIKM-05,14th ACM International Conference on Informa-tion and Knowledge Management,pages625–631,Bremen, DE.Theresa Wilson,Janyce Wiebe,and Rebecca Hwa.2004.Just how mad are you?Finding strong and weak opinion clauses.In Proceedings of AAAI-04,21st Conference of the American As-sociation for Artificial Intelligence,pages761–769,San Jose, US.Hong Yu and Vasileios Hatzivassiloglou.2003.Towards answer-ing opinion questions:Separating facts from opinions and iden-tifying the polarity of opinion sentences.In Proceedings of EMNLP-03,8th Conference on Empirical Methods in Natural Language Processing,pages129–136,Sapporo,JP.。
第40卷第12期2023年12月控制理论与应用Control Theory&ApplicationsV ol.40No.12Dec.2023物体级语义视觉SLAM研究综述田瑞,张云洲†,杨凌昊,曹振中(东北大学信息科学与工程学院,辽宁沈阳110819)摘要:视觉同时定位与地图构建(Visual simultaneous localization and mapping,VSLAM)是自主移动机器人、自动驾驶、增强现实(AR)等领域的关键技术.随着深度学习的发展,准确高效的图像语义信息在VSLAM领域得到了广泛的应用.与传统SLAM相比,语义VSLAM利用语义信息提升了定位精度和鲁棒性,并通过物体级重建提高了环境感知能力,成为当前VSLAM领域的研究热点.本文对近年来优秀的物体级语义SLAM工作进行了阐述归纳和对比梳理,总结了该领域的4个关键问题,包括物体表达形式、物体初始化方法、融合语义信息的数据关联算法和融合物体级语义信息的后端优化方法.同时,对代表性方法进行了优缺点分析.最后,在现有技术成果和研究基础上,对物体级语义VSLAM面临的挑战和未来研究方向进行了展望和分析.当前物体级语义SLAM仍面临着物体关联不准确、物体优化框架不完善等问题.如何有效使用和维护语义地图以应用于决策规划等任务,以及融合多源信息以丰富视觉感知是未来的研究热点.关键词:视觉SLAM;数据关联;语义分割;物体级地图引用格式:田瑞,张云洲,杨凌昊,等.物体级语义视觉SLAM研究综述.控制理论与应用,2023,40(12):2160–2171DOI:10.7641/CTA.2023.30338Survey of object-oriented semantic visual SLAMTIAN Rui,ZHANG Yun-zhou†,YANG Ling-hao,CAO Zhen-zhong(College of Information Science and Technology,Shenyang Liaoning110819,China) Abstract:Visual simultaneous localization and mapping(VSLAM)is a key technology for autonomous robots,au-tonomous navigation,and AR applications.With the development of deep learning,accurate and efficient semantic infor-mation has been widely used in pared with traditional SLAM,semantic SLAM leverages semantic informa-tion to improve the accuracy and robustness of localization,and enhances environmental perception ability by object-level reconstruction,which has became the trend in VSLAM research.In this survey,we provide an overview of semantic SLAM techniques with state-of-the-art object SLAM systems.Four key issues of semantic SLAM are summarized,including ob-ject representation,object initialization methods,data association methods,and back-end optimization methods integrating semantic objects.The advantages and disadvantages of the comparison methods are provided.Finally,we propose the future work and challenges of object-level SLAM technology.Currently,semantic SLAM still faces problems such as inaccurate object association and an unified optimization framework has not yet been proposed.How to effectively use and maintain semantic maps for the application of decision and planning tasks,as well as integrate multi-source information to enrich visual perception,will be future research hotspots.Key words:visual SLAM;data association;semantic information;Semantic mappingCitation:TIAN Rui,ZHANG Yunzhou,YANG Linghao,et.al.Survey of Object-oriented Semantic visual SLAM. Control Theory&Applications,2023,40(12):2160–21711引言视觉同时定位与建图(visual simultaneous locali-zation and mapping,VSLAM)技术通过相机实现自主定位与地图构建,相较于激光雷达,相机具有低成本、低功耗、强感知等特点,且二维图像的语义信息更容易通过深度学习技术获取.结合语义信息对环境中的物体进行建模,并利用物体的语义不变性约束提升VSLAM的定位精度和鲁棒性成为当前研究的热点.本文着重对物体级语义VSLAM的发展和关键技术进行讨论:首先,阐述了物体级语义信息在SLAM中的收稿日期:2023−05−19;录用日期:2023−11−21.†通信作者.E-mail:*********************;Tel.:+86139****1976.本文责任编委:胡德文.国家自然科学基金项目(61973066,61471110)资助.Supported by the National Natural Science Foundation of China(61973066,61471110).第12期田瑞等:物体级语义视觉SLAM 研究综述2161重要作用;其次,归纳了物体级语义SLAM 技术的4个关键的问题(模型表达、物体初始化、数据关联、后端优化);最后,对语义SLAM 面临的挑战和未来发展方向进行了展望.本文结构框图如图1所示.图1本文结构框图Fig.1Structure of the survey2物体级语义VSLAM 系统架构物体级语义VSLAM 一般采用多线程的算法架构,分为前端和后端.前端主要由跟踪线程和检测线程构成:跟踪线程负责图像特征提取,并通过帧间特征匹配和局部BA(bundle adjustment)优化求解相机位姿;检测线程使用深度网络对输入图像进行语义信息提取,并将其送入到跟踪线程中.图像语义信息是基于当前帧的检测结果,因此,使用物体数据关联对不同帧的检测信息进行处理,并进行物体初始化.后端优化线程负责相机和物体位姿优化,以及对物体建模的参数进行调整.最终,系统构建了物体级的语义地图,实现环境的语义感知.语义信息的获取形式可以分为:目标检测[1–3]、语义分割[4–8]、实例分割[9–10].不同的语义信息获取方式会影响算法的实时性,通常,语义分割网络耗时更长,且语义分割得到的像素级分割结果存在信息冗余和误检,目标检测网络效率更高,但在复杂场景下容易出现漏检和误检的现象.后端优化方式可以分为独立优化和联合优化策略,例如,OA-SLAM(object assisted SLAM)[11]使用独立的线程来优化二次曲面参数,QuadricSLAM [12]则将物体和相机放在局部BA 的统一框架下优化.近年来,融合目标检测和实例分割的物体级SL-AM 成为研究的热点,该类方法通过多视图几何约束,利用物体检测框重建物体模型.重建模型可以分为二次曲面[13–20]、立方框[21]等.实例分割可以获得更准确的物体实例掩码,通常用于辅助物体特征提取和数据关联,实现更准确的目标跟踪[22].常见的物体级VSLAM 结构如图2所示.3物体级语义VSLAM 优势和应用传统的VSLAM 一般通过点、线、面等几何元素构建地图,例如,稀疏点云地图[23]、稠密点云地图[24]、网格地图[25–26]、TSDF(truncated signed distance field)地图[27]等.这些地图为自身定位和环境感知提供基础,使得VSLAM 技术得以广泛应用.随着应用场景的增加,人们发现传统VSLAM 方法在定位精度和算法鲁棒性上具有局限性,主要有如下原因:1)动态干扰,当前VSLAM 算法大多基于环境静态假设,特征匹配和优化容易受到外点干扰,导致跟踪精度变差或者丢失.2)光照变换,传统的视觉特征在光照变化或者暗光条件下,特征匹配和图像光度误差匹配失败,导致无法实现位姿估计,算法鲁棒性降低.3)高层次的语义感知需求,传统的VSLAM 在表征物体上具有局限性,不具有语义信息,无法满足人机交互等复杂任务的需求.深度学习技术的引入为VSLAM 定位和环境感知带了新的解决方法.基于深度学习的特征提取技术为VSLAM 在复杂光照条件下提供更稳定的匹配效2162控制理论与应用第40卷果[28–30],实例分割或目标检测为物体的运动属性判断提供可能,减少了VSLAM 在复杂环境中受动态干扰的影响[31–32].通过构建的物体级地图和模型表达,丰富了系统的环境感知能力[11,14–19].图2物体级语义VSLAM 结构图Fig.2Architecture of object VSLAM3.1利用物体信息提升定位精度当前,室内外场景下的VSLAM 算法已经得到了长足的发展,一些SLAM 算法能够准确地构建环境地图,并在一定程度上克服噪声、动态干扰和光照变化的影响.例如,ORB-SLAM [33]、RGBD-SLAM [34]、LS-D-SLAM [35]等.然而,在实际应用部署中,算法仍面临着场景动态干扰的影响.早期的解决方案中[23],使用运动一致性和基于外点剔除的RANSAC(random sample consensus)策略对由噪声干扰导致的错误特征匹配进行筛选,或者在优化中引入鲁棒核函数来降低动态特征的优化权重,例如,ORB-SLAM2[23]使用特征均匀提取和鲁棒核函数来降低错误匹配干扰.近年来,一些工作将物体检测结果的语义属性引入VSLAM 中,对场景中物体的动静态进行判断,并剔除动态物体的干扰[36–40].Detect-SLAM [41]通过目标检测剔除动态点,并通过特征匹配和扩展区域进行运动概率传播,在提升定位精度的同时提升了目标检测的稳定性.DS-SLAM [39]使用实例分割结果和运动一致性判断物体的运动属性,并将动态特征进行剔除以提升定位精度.Dyna-SLAM [40]将落在运动物体掩码内的特征作为外点剔除,从而提升其在动态场景下的定位鲁棒性.类似的,Kaveti 和Singh [42]提出了Light Field SLAM,通过合成孔径成像技术重建被遮挡的静态场景,不同于Bescos 等人[43]的算法,其进一步利用了重建背景的特征进行位姿跟踪以实现更好的定位性能.针对基于深度学习的动态物体检测通常存在漏检和错检问题,Ballester 等人[44]提出了DOT-SLAM,结合实例分割和多视图几何来生成动态物体掩码,并通过最小化光度误差进行跟踪.这种方法不仅提高了定位精度,还提高了语义分割的精度.上述工作的重点是通过剔除动态信息来提升自身定位的鲁棒和准确性,但忽略了对场景中移动物体状态的感知.作为VSLAM 对动态场景理解的扩展,结合运动跟踪的VSLAM 成为当前研究的热点.Wang 等人[45]首先提出了带有运动物体跟踪的SLAM,将自身位姿估计和动态物体位姿估计分解为两个独立的状态估计问题.Kundu 等人[46]结合SfM(structure from motion)和运动物体跟踪来解决运动场景下的SLAM 问题,该方法将系统输出统一到包含静态结构和运动物体轨迹的三维动态地图中.Huang 等人[47]提出了Cluster-VO,能够进行多个物体的运动估计.该方法提出了一种多层概率关联机制来高效地跟踪物体特征,利用异构条件随机场(conditional random filed,CRF)聚类方法进行物体关联,最后在滑动窗口内优化物体的运动轨迹.Bescos 等人[43]将运动物体与自身状态估计问题紧耦合到统一框架中,对跟踪点集使用主成分分析(principal component analysis,PCA)聚类和立方框建模,并使用动态路标点对自身位姿进行约束.第12期田瑞等:物体级语义视觉SLAM研究综述2163考虑到场景的先验约束,Twist SLAM[48]使用机械关节约束来限制物体在特定场景位姿估计的自由度,结合3D目标检测获得先验物体估计,使用语义信息来构建物体点簇地图,并利用静态簇(道路和房屋)来估计相机位姿.动态簇则通过速度的变化进行跟踪和约束.VDO-SLAM[49]使用聚类点的形式对物体进行状态估计,使用实例分割和稠密场景流,提高了动态物体观测的数量和关联质量,该方法将动态和静态结构集成到统一的估计框架中,实现了对相机位姿和物体位姿的联合估计.3.2利用物体信息提升定位鲁棒性传统的视觉定位大多采用手工描述了,如OR-B[50],SIFT[51]等特征,并使用基于视觉词袋(bag of words,BOW)进行定位,当图像视角变化或者光照发生明显改变时,该方案的视觉定位会失效.物体语义信息能有效克服大视角变换以及光照变换等情况,为VSLAM提供更鲁棒的定位.实时的物体级单目SLAM算法SLAM++[52]利用了一个大型物体数据库,使用单词袋来识别对象,实现鲁棒定位.Zins等[11]提出的OA-SLAM利用重建的物体级语义地图进行相机重定位.该方案结合了特征描述子和场景物体的重投影观测,利用物体的相对位置关系约束,在视角变化剧烈的场景下实现定位,提升了视觉定位的鲁棒性.Liu等[53]提出基于物体级描述符的定位方法.文献[54]提出基于深度网络的物体描述符定位方法.CubeSLAM[55]利用物体立方框和当前帧的目标检测约束,提升系统在无纹理场景下的定位鲁棒性. QuadricSLAM[12]提出基于二次曲面的物体观测约束,首次使用3D椭球作为路标,同时使用一个联合优化框架,将相机位姿和二次曲面联合优化.文献[56]利用单目视觉构建的物体级路标和物体先验大小约束,减少了单目定位的尺度漂移,提升了单目视觉的定位精度和鲁棒性.类似的方案如文献[57–58],采用物体先验尺度约束单目定位漂移.EAO-SLAM[21]则使用物体立方框约束构建观测误差,减少了定位漂移.可以看出,融合物体语义信息已经成为了提高视觉定位精度和鲁棒性的有效途径之一.语义信息已经广泛应用于SLAM系统的初始化、后端优化、重定位和闭环检测等阶段.因此,有效地处理和利用语义信息是提高定位精度的关键.3.3利用物体信息提升系统环境感知能力VSLAM构建的地图可以分为:稀疏点云地图[23]、稠密地图[27]、半稠密地图[24]、结构地图[59–60]、平面地图[61–65]、物体级地图[13–19,52]等.点云地图中仅具有点云结构信息,通常用于为SLAM提供定位约束.半稠密和稠密地图可以更精细地表达环境.结构地图和平面地图通过抽象的场景点线面的结构,为场景提供轻量级的地图表达.然而,上述的地图表达形式缺少对环境的高层次语义感知能力.近年来,随着自动驾驶、人机交互等领域的兴起,环境的语义感知越来越受到研究者的重视.语义信息的融入为SLAM的地图提供更为丰富的感知信息.早期的物体SLAM,例如,SLAM++[52]利用物体CAD模型构建语义地图,通过目标检测和识别,将先验物体数据库的物体加载在地图中.文献[37]将语义标签信息融合到稠密点云地图中,构建了稠密语义地图.CubeSLAM[55]和EAO-SLAM[21]通过立方框构建物体级地图.文献[13–19]构建了物体的二次曲面地图,同时估计了物体的大小、旋转和位置.相比于二次曲面和立方体的包络,超二次曲面可以通过调节二次模型参数适应不同形状的物体,丰富环境物体的表达.文献[66]使用超二次曲面构建室内场景的物体级地图.一些工作将抽象的语义标识加入到地图表达中,A VP-SL-AM[67]通过检测道路的车道线,交通标识等信息构建了轻量级的语义地图,用于实现准确的室外场景定位.另外,一些研究者将运动物体的感知信息加入到SLAM中,提出了SLAM-MOT[22,47,68],在构建场景稀疏点云地图的同时,表达物体的运动轨迹,构建包含运动信息的物体地图.例如,VDO-SLAM[49]提出利用语义信息构建环境结构,跟踪刚性物体的运动并估计其三维运动轨迹,其地图表示如图3所示.图3VDO-SLAM系统可视化地图[49],包含运动物体跟踪和三维轨迹Fig.3Visualization of Object tracking and trajectory estima tion of VDO-SLAM[49]可以看出,融合语义信息后,VSLAM的地图表达形式更加丰富.构建的物体级地图包含场景的高层次2164控制理论与应用第40卷语义信息,而且通过动态跟踪和联合位姿估计,可以获得动态物体的速度和运动轨迹估计,使得VSLAM 可以实时估计环境物体的运动轨迹,具有更丰富的环境感知能力.4物体语义的表达形式和初始化方法物体表达形式是物体级语义SLAM 进行环境感知的重要环节,传统的SLAM 算法使用几何特征,例如点、线、面等元素构建环境地图.这些几何特征能为SLAM 提供定位约束,并在一定程度上表征场景的感知信息,但缺少语义信息.SIFT [51],SURF [50]和ORB [50]是最常用的特征.利用稀疏点表达环境的视觉SLAM 方法[23,33]已经在三维场景重建领域取得了巨大的成功.然而,这类地图由三维空间中稀疏分布的点集构成,缺乏对物体位姿和边界的准确描述.因此,稀疏点云地图不能应用于复杂的任务,如路径规划、避障等.近年来,得益于深度学习检测技术的发展,SLAM 的地图构建已经由传统的几何表征转为语义描述,特别是物体级的描述.在物体表达上,可以分为:先验模型、几何模型、深度学习表征等.这些物体表达提升了SLAM 的语义感知能力,不同物体的表达如图4所示[12,52,55,69–71].图4物体语义的表达形式Fig.4Object representation method of object VSLAM4.1先验模型表达先验模型表达使用预先建立的先验数据库,通过检测–匹配的方式加载物体.如图4(a)所示,先验模型表达的代表为SLAM++[52].文献[72]提出使用检测立方框与先验CAD 模型进行ICP 匹配,通过物体路标约束,实现缺乏纹理的地下停车场定位.文献[73]使用预先集成或预定义的模型来进行对象跟踪,该工作的目标是建立一个具有物体标识的环境地图,并使用预集成的对象模型辅助定位,其结合了两种不同的深度网络输出结果来联合物体检测和对象的姿态估计.4.2几何模型表达几何模型通过参数化的二次曲面或者立方框实现,如图4(b)–4(c)所示.Nicholson 等人[12]提出了Quadric-SLAM,首次将二次曲面作为路标引入到SLAM 中,详细推导了如何利用多帧不同视角的目标检测观测数据构建约束,求解物体的二次曲面参数.并提出二次曲面投影观测模型,使得二次曲面参与位姿优化成为可能.后续的大多数基于二次曲面的SLAM 方案都是基于这个思路的延续[74].Hosseinzadeh 等人[75]提出了Structure Aware SLAM,在二次曲面路标的基础上加入了平面约束,使得二次曲面的建模精度进一步提高.Ok 等人[14]使用室外物体前向运动假设,提出了一种利用目标检测框、图像纹理以及语义尺度先验估计二次曲面参数的方法,降低了二次曲面初始化的难度,然而,该方法只能对车辆进行建模.Liao 等人[76]引入对称性假设,提出了物体感知S-LAM,利用物体对称性补全物体点云,进而根据物体点云拟合二次曲面.Chen 等人[77]针对物体前向平移运动假设,提出了一种基于物体凸包和目标检测的二次曲面初始化方法,为二次曲面初始化提供了新的思路.为了解决二次曲面初始化对噪声敏感的问题,Ti-第12期田瑞等:物体级语义视觉SLAM研究综述2165an等人[19]提出了一种参数分离的二次曲面初始化方法,将旋转和平移估计解耦估计,提升了初始化对检测框噪声的鲁棒性.利用物体对称性可以实现快速二次曲面初始化,Liao等人[78]提出的SO-SLAM是一种新颖的单目物体级语义SLAM,该方法使用三种具有代表性的空间约束,包括比例比例约束、对称纹理约束和平面支撑约束实现单帧视角下的二次曲面初始化.立方框表达的代表作是CubeSLAM[55],将物体模型参数化为三维立方框.EAO-SLAM[21]使用立方框和椭球对室内物体进行空间描述.然而,相比于立方框,二次曲面具有完备的数学模型表达和射影几何描述,更易于通过二次曲面重投影约束融合到SLAM的后端优化框架中,因此受到研究者的青睐.另外,一些物体模型表达方案采用物体聚类点描述,Cluster-SLAM[69]及后续的ClusterVO[47]均使用物体聚类点簇进行物体位姿估计和表达,如图4(d)所示.4.3深度学习表征粗略的几何模型往往不能表示物体的精确体积,而稠密点云需要大量的内存占用来存储地图.最近一些工作使用基于深度学习的特征进行模型表达,结合学习表征的物体级路标实现室外定位[79].DSP-SLAM[70]使用DeepSDF(signed distance fun-ction)网络[80]提取物体特征,并通过网络参数和表面重建损失函数进行物体表面恢复,构建场景的物体地图,如图4(e)所示.SceneCode[81]和Node-SLAM[71]则使用了深度网络中间层特征来表征物体.利用这些深度提取的特征和表面渲染误差函数,可以恢复物体的几何形状,如图4(f)所示.以上可知,物体的初始化表征方法决定了物体SLAM的地图表达形式,深度学习需要高算力的计算设备,且系统的实时性无法保证,几何模型可以准确描述物体的大小、旋转和位置,能完整表达物体的占据信息,且地图占用小,已经成为当前研究的热点.5物体级语义信息的数据关联方法基于深度学习的语义提取方法大多关注于单帧检测,而VSLAM在定位和建图环节均需要考虑时间和空间上的数据关联.针对物体级语义SLAM,解决不同帧之间的语义观测关联问题,确定同一语义对象在连续帧的关联性,是后续实现多帧优化的前提条件.当前数据关联方法可以分为两类:基于概率关联的方法和基于分配算法的关联方法.5.1基于概率关联方法该方法将属于物体的观测约束建模为概率分布模型,根据模型分布关系来确定帧间物体关联.Beipeng Mu等人[82]使用实例分割掩码的中心深度表征物体观测,并利用Dirichlet分布对观测进行建模,通过DP m-eans算法和最大似然估计(maximum likelihood estim-ation,MLE)迭代结果确定物体的数据关联.Bowm-an等人[83]使用期望最大化(expectation-maximization, EM)算法对物体路标进行软关联,并将物体路标作为约束因子与几何观测进行融合.文献[84]使用概率数据关联的方式解决动态环境下的物体关联.Iqbal和Gans等人[85]分析了不同物体点云深度分布之间的区别,使用层次密度聚类算法和非参数检验方法对物体进行关联.5.2基于分配算法的关联方法基于分配算法的关联方法能利用多帧观测解决帧间漏检等问题,为系统提供稳定的物体关联结果.文献[86]使用物体词袋方法构建成本矩阵,通过分配算法实现关联.OA-SLAM[11]使用目标检测结果和物体路标重投影的交并比(intersection over union,IoU)构建成本矩阵,并使用KM(kuhn-munkres,KM)算法进行分配.然而,由于有限的观测视角以及观测帧数,上述方法对于动态场景下的物体数据关联表现并不理想.为了解决上述问题,一些工作采用检测跟踪算法(track-by-detection)实现物体数据关联.Bewley等人[87]使用卡尔曼滤波器对检测框进行状态预测和更新,通过计算预测和检测结果的2D IoU 来度量匹配相似度,并使用匈牙利算法求解指派问题.针对单源相似度的局限性,Deep SORT(deep simple online and realtime tracking)[88]融入了外观信息,使用重识别网络提取的特征,增强了匹配性能,同时,其在匹配策略上增加了级联匹配模块,根据轨迹相似性进行关联,降低了遮挡目标ID切换的频率.Hosseinzadeh 等人[89]采用检测框内特征点投影匹配数量作为度量,该方法能够在一定程度上克服跟踪时的遮挡问题.可以看到,当前数据关联方法主要通过融合多源特征构建成本矩阵,并通过分配算法求解实现.然而,数据关联结果依赖于语义提取模块精度,当检测精度降低时会对关联结果产生影响,进而影系统的定位精度和鲁棒性.稳定可靠的数据关联方法是提升系统表现的有效途径之一.6融合物体级语义信息的后端优化方法在物体完成初始化后,需要利用后续观测信息对地图中的重建物体进行优化,根据物体是否参与相机位姿优化,后端优化策略可以分为独立优化策略和联合优化策略.根据是否需要跟踪场景中的动态物体,联合优化策略的因子图也有不同的形式.后端优化策略示意图如图5所示.2166控制理论与应用第40卷图5融合物体信息的语义VSLAM后端优化方法Fig.5Back-end of object VSLAM with object observations6.1独立优化策略如图5(a)所示,独立优化策略下,物体的位姿和模型参数单独进行优化,物体模型利用跟踪线程中提供的相机初始位姿进行优化.OA-SLAM[11]使用连续帧的目标检测结果对椭球参数单独优化,并在后端优化中使用优化后的物体路标对相机位姿进行优化.CubeSLAM[55]使用采样得分初始化立方框,并独立估计相机位姿和立方框参数,从而确保相机位姿估计的准确性.独立优化关注于物体重建,在进行物体位姿优化调整时无法对相机定位结果进行修正,当相机定位失败时,系统无法实现准确的自身定位和语义地图构建,没有充分利用语义信息辅助定位.6.2联合优化策略1)联合因子图,该方案将物体参数和位姿估计放在统一因子图中进行优化,并根据是否需要对动态物体进行位姿估计分别采用不同的因子图.静态场景的联合优化因子图如图5(b)所示,该方法通常适用于静态场景或采用动态特征剔除策略的SLAM算法.QuadricSLAM[12]将二次曲面参数和相机位姿优化优化放在联合优化中,构建了室内场景的语义地图.Tian等[19]提出的方法将初始化椭球和关键帧位姿放在统一优化因子图中进行优化,提升了室外场景下的定位精度和二次曲面建图准确性.动态场景的因子图如图5(c)所示,引入了动态物体位姿估计和模型参数优化的误差因子.VDO-SL-AM[49]使用物体语义信息和基于场景光流的特征关联,实现刚性物体位姿估计,将动态和静态结构放在统一的后端优化框架中.后续研究如[14–15,17,20]也将物体位姿优化放在局部建图线程中以实现联合优化.近年来,融合二次曲面路标观测的VSLAM成为了研究的热点[12,19,75,78].2)滑动窗口优化策略.相比于静态场景,动态场景下的物体观测容易受到漏检、遮挡等因素的干扰,基于关键帧的关联方案不能为动态物体提供准确的数据关联信息.为了克服这些问题,一些基于滑窗的优化方式被提出[43,47–48].滑动窗口由固定帧数的观测队列组成,当新的帧观测加入队列时,位于时序最早的帧观测被移出,同时,其维护的状态也通过滑窗边缘化的方式进行求解,如图6所示.图6滑动窗口优化结构示意图Fig.6Sliding window based optimization method滑动窗口优化将物体位姿和相机位姿放在统一优化框架中,由于运动物体的特点,使用滑窗优化可以有效利用连续帧的特征信息.DynaSLAM2[43]将场景静态结构,相机位姿以及动态物体运动轨迹维护在一个紧耦合的局部BA进行优化,通过目标检测的二维检测框构建物体位姿约束,使用舒尔补加速稀疏矩阵边缘化求解,解决滑窗优化的计算效率问题.Cluster-VO[47]使用点聚类的方式,将物体点和背景点放在滑窗内进行优化.该方法使用时间和空间双通道的关键帧管理策略保证计算效率,同时对遮挡的运动物体进行预测和跟踪.可以看出,滑窗的方式具有快速响应、参数优化更准确的特点,适用于动态物体的跟踪和位姿估计.基于因子图的联合优化可以有效利用关键帧信息,对室内场景的物体优化更准确.7未来发展和展望利用语义信息,SLAM可以适应动态和复杂环境下的定位,并通过物体级语义地图提升系统的环境感知能力.其技术可以应用于无人驾驶、机器人导航、智慧城市等领域.未来,融合语义信息的高层次信息可以为增强现实(AR)和虚拟现实(VR)提供更丰富的。
基于自然语言的Apriori关联规则的视觉挖掘方法摘要:抽象-可视化数据挖掘技术可以以图形方式向用户展示数据挖掘过程,从而使用户更易于理解挖掘过程及其结果,而且在数据挖掘中也非常重要。
然而,现在大多数视觉数据挖掘都是通过可视化的结果而进行的。
同时,它不适用于关联规则的可视化处理的图形显示。
鉴于上述缺点,本文采用自然语言处理方法,以自然语言视觉地进行Apriori关联规则的整体挖掘过程,包括数据预处理,挖掘过程和挖掘结果的可视化显示为用户提供了一套具有更多感知和更易于理解的特征的集成方案关键字:apriori 关联规则数据挖掘可视化1 引言视觉数据挖掘技术是可视化技术和数据挖掘技术的结合。
使用计算机图形、图像处理技术等方法将数据挖掘的源数据,中间结果和最终挖掘结果转换成易于理解的图形或图像,然后进行贯穿的理论,方法和技术交互式处理。
根据数据挖掘应用中可视化的不同阶段,数据挖掘的可视化可以分为源数据可视化,挖掘过程可视化和结果可视化。
(1)源数据可视化源数据可视化方法在数据挖掘之前,以可视化的形式将整个数据集呈现给用户。
目的是使用户能够快速找到有趣的地区,从而实现挖掘目标和目标的下一步。
(2)过程可视化过程可视化实现起来相当复杂。
主要有两种方法- 一种是在采矿过程中可视化地呈现中间结果,并使用户根据中间结果的反馈方便地调整参数和约束。
另一种方法是以图标和流程图的形式保持整个数据挖掘过程,根据用户可以观察数据源,数据集成,清理和预处理过程以及采矿结果的存储和可视化等等。
(3)结果可视化数据挖掘结果可视化是指在采矿过程结束时以图形和图像的形式描述挖掘结果或知识,以提高用户对结果的理解,并使用户更好地评估和利用采矿结果。
2、国外家庭视觉数据挖掘研究状况目前,视觉数据挖掘技术的研究在国内外都处于起步阶段,如何使用可视化技术来显示利用各种数据挖掘算法生成后的模型。
该方向的主要研究内容是通过一些特殊视觉图形中的关联规则、决策树和聚类等算法向用户显示生成的结果,以帮助用户更好地了解结果数据挖掘模型。
Usability Analysis of Visual Programming Environments: a ‘cognitive dimensions’ frameworkT. R. G. GreenMRC Applied Psychology Unit15 Chaucer Road, Cambridge CB2 2EF, UKM. PetreDept. of Mathematics and Computer ScienceOpen University, Milton Keynes MK7 6AA, UKCONTACT ADDRESS:T. R. G. GreenMRC Applied Psychology Unit15 Chaucer Road, Cambridge CB2 2EFUKInternet: Thomas.Green@tel: +44-1223-355294 (ext. 280)fax: +44-1223-359062To appear in Journal of Visual Languages and Computing1 Introduction - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -12 Psychology and HCI of Programming- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -3Psychology of programming- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -4HCI of programming - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -73 Sketch of a framework of cognitive dimensions- - - - - - - - - - - - - - - - - - - - - - - - - - - -84 Design alternatives in VPLs - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -10Basic - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -11LabVIEW- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -11Prograph - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -13 5 Applying the Cognitive Dimensions - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -14Abstraction Gradient - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -14Closeness of Mapping - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -16Consistency - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -18Diffuseness/Terseness - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -18Error-proneness - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -19Hard Mental Operations- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -20Hidden Dependencies - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -24Premature Commitment- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -26Progressive Evaluation- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -28Role-expressiveness - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -28Secondary Notation and Escape from Formalism - - - - - - - - - - - - - - - - - - -29Viscosity: resistance to local change - - - - - - - - - - - - - - - - - - - - - - - - - - - - -32Visibility and Juxtaposability - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -33 6 Discussion and Conclusions- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -35What the cognitive dimensions framework can tell the designer - - - - - - - -35Future progress in cognitive dimensions- - - - - - - - - - - - - - - - - - - - - - - - - -36Future progress in VPL design - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -37 Acknowledgements - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 38 References - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 44 Appendix A Viscosity Test - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 51Usability Analysis of Visual Programming Environments:a ‘cognitive dimensions’ frameworkT. R. G. Green and M. PetreAbstract:The cognitive dimensions framework is a broad-brush evaluation technique for interactivedevices and for non-interactive notations. It sets out a small vocabulary of terms designed tocapture the cognitively-relevant aspects of structure, and shows how they can be traded offagainst each other. The purpose of this paper is to propose the framework as an evaluationtechnique for visual programming environments. We apply it to two commercially-availabledataflow languages (with further examples from other systems) and conclude that it is effec-tive and insightful; other HCI-based evaluation techniques focus on different aspects andwould make good complements. Insofar as the examples we used are representative, currentVPLs are successful in achieving a good ‘closeness of match’, but designers need to considerthe ‘viscosity’ (resistance to local change) and the ‘secondary notation’ (possibility of convey-ing extra meaning by choice of layout, colour, etc.).1. IntroductionThe evaluation of full-scale programming environments presents something of a challenge to existing HCI. Many, indeed most, of the evaluative techniques that have been proposed in HCI are designed to concen-trate on physical, low-level details of interaction between a user and a device. From GOMS [10] onwards, there has been a tradition of close inspection of ‘simple tasks’, such as deleting a word in a text editor, and of trying to predict time taken to learn or to perform the task. But that tradition is not suitable for evaluating programming environments. If we tried to evaluate a programming environment that way we would be overwhelmed by a mass of detailed time predictions of every simple task that could be performed in that environment. Even if we had the timings, and could digest them, they would only address a few of the questions that designers ask. And finally, GOMS and similar HCI approaches have not so far been applied to notational issues, such as whether to use identifier declarations in a programming language; the HCI approach has concentrated on interactive situations, rather than to notational design.Green [28], [30] presented an alternative approach, called ‘cognitive dimensions of notations’, as a frame-work for a broad-brush assessment of almost any kind of cognitive artifact. Unlike many other approaches, the cognitive dimensions framework is task-specific, concentrating on the processes and activities rather than the finished product. This broad-brush framework supplements the detailed and highly specific anal-yses typical of contemporary cognitive models in HCI and it has more to say to users who are not HCI specialists.The Cognitive Dimensions framework: ‘Cognitive dimensions’ constitute a small set of terms describing the structure of the artifact, which are mutually orthogonal (in principle) and which are derived by seeking generalised statements of how that structure determines the pattern of user activity. Any cog-nitive artifact can be described in their terms, and although that description will be at a very high level it will predict some major aspects of user activity.The dimensions are not guidelines, which are handfuls of unrelated precepts for design; they are neither descriptions of devices nor descriptions of how to use devices; and they are definitely not a cognitive model of the user, although they rest on a common-sense ‘proto-theory’ of what users do. They are discussion tools, descriptions of the artifact-user relationship, intended to raise the level of discourse.Briefly, these are the claims we make for the cognitive dimensions framework:1.Broad-brush analysis is usable by non-specialists in HCI because it avoids ‘death by detail’: it offers afew striking points covering a couple of pages, rather than pages of analysis. It is extremely quick and cheap: an afternoon of careful thought about a system is probably all that is needed.2.Because this assessment is structural, it can be used at an early stage in design. (By the same token itneeds to be supplemented at a later stage by other methods.)3.We claim that the terms it uses conform to many notions which are recognisable but unnamed in thediscourse of non-HCI specialists. Readers should not expect to discover many new ideas, but they should recognise many that were previously unformulated.4.By introducing a defined vocabulary for such ideas, the framework not only makes it easier to con-verse about cognitive artifacts without having to explain all the concepts, but also provides a checklist.Designers and evaluators will find it easier to avoid gross oversights (e.g. not including a cross-refer-encer as part of a spreadsheet).5.For different types of user activity it is possible to set up a preferred profile across the dimensions.Exploratory design will require one type of profile, tightly-specified safety-critical design will require a different profile.6.With a defined vocabulary in place it becomes much easier to describe how remedies for weaknessescan be provided, and how different dimensions trade off against each other.Explicitly presenting one’s ideas as discussion tools is, we believe, a new approach to HCI, yet doing so is doing nothing more than recognising that discussion among choosers and users carries on interminably, in the corridors of institutes and over the Internet. Our hope is to improve the level of discourse and thereby to influence design in a roundabout way. Many protagonists of HCI have tried the direct route; they have explicitly attempted to develop methods of design. Such is not our aim. Indeed, we feel it would be impertinent to suggest that cognitive psychologists can tell professional designers what to do; we have no pretensions that we can design programming languages. We do not even, in this paper, attempt to lay down a set of evaluative criteria that designs should meet, such as are to be found in books of guidelines.The purpose of the cognitive dimensions framework is to lay out the cognitivist’s view of the design space in a coherent manner, and where possible to exhibit some of the cognitive consequences of making a particular bundle of design choices that position the artifact in the space. It is the designer who has to decide on thespecification and where to locate the artifact in the design space, and to invent a solution. It is the designer, not the cognitivist, who has to weigh cognitive costs and benefits against the requirements of expense, soft-ware engineering, personnel training, organisational design, etc.Structure of paper: In this paper we present the cognitive dimensions as a method of evaluating vis-ual programming languages (VPLs) and their environments. (For simplicity we shall not be pedantic in distinguishing language and environment.) We start by briefly reviewing some of the major findings about the psychology of programming and the HCI of programming; we then present an outline of the cognitive dimensions, and show how the dimensions are based on contemporary theory. We then use the dimen-sions one by one to consider the successes and weaknesses of visual programming environments.We have chosen not to attempt to review the state of the art on VPLs. To illustrate the cognitive dimensions, we shall draw on two commercially-available VPLs, LabVIEW and Prograph. These languages are not state-of-the-art, but they are genuinely usable – complex programs have been successfully and enthusias-tically built in them by end-users who are scientific specialists, not programming specialists. Both languages adopt the dataflow model, using box-and-wire representation, but they illustrate different design decisions with different usability consequences. Brief descriptions are given in Section 4.Since they are commercially available we have been able to benefit from personal experience, from com-ments and help from experts, and from a certain amount of empirical observation. Each cognitive dimension will be related to these two languages. We have also taken examples of specific issues from other VPL designs.2. Psychology and HCI of ProgrammingThe framework of ‘cognitively-relevant dimensions’ is founded (not too tightly, alas) on present-day views on the activity of programming, which we shall review very briefly. ‘Programming’ is a seriously over-loaded term, comprising a host of different activities and situations [61]; perhaps it is fortunate that we have no space to enter into the niceties here. What follows is a very high-level view, restricted to that which is relevant to the cognitive dimensions framework.To start with, we need to consider the users and the situation. If we wish to design a programming envi-ronment, who is it for? What are they doing? What aspects of the activity are affected by the programming environment?First, we shall assume that anyone may be a user; expert or novice, end-user or computer-science profes-sional. We shall ignore the large literature on what experts know that novices do not know, but we shall take into account the extra support that novices need.Second, we shall tend to limit our discussion to situations like exploratory or incremental programming. We shall pay no attention to other design criteria, such as safety-critical design or coding for efficiency; nor to other parts of the software creation process, such as communication and negotiation during require-ments elicitation; nor to the demands of the local situation or organization, even though they are known to affect choice of cognitive strategy [87]. One thing at a time!And thirdly, within those limits we shall try to consider as much of the programming process as is affected by the programming environment. Not just coding, nor just comprehension.We shall distinguish rather loosely between the ‘psychology’ and the ‘HCI’ of the programming environ-ment, using ‘psychology’ to refer to the meaning of the code (“How do I solve this problem? What does that code mean?”) and ‘HCI’ to mean interaction with the notational system (control of layout, searching for items).2.1 Psychology of programmingThe maxims of information representation: We start with a truism that may all too easily get overlooked – data is not information. Data must be presented in a usable form before it becomes informa-tion, and the choice of representation affects the usability. But usability is not simply ‘better’ or ‘worse’; how good a representation is depends on what you want to use it for.Diagrams are famously better than text for some problems, worse for others. One school of thought main-tains that the difference lies in the cognitive processes of locating and indexing the components of information, a view well analysed for pulley problems in mechanics by Larkin and Simon [43] who show that the two representations they used, diagrammatic and symbolic, carried the same information but imposed very different processing costs.Mental processing analyses apply just as cogently to differences in programming languages. Green [26], [34] and Vessey [85], [80] have independently developed bodies of research that demonstrate notational structure effects. Their work can be summarised in two maxims.Every notation highlights some kinds of information at the expense of obscuring other kinds. Not everything can be highlighted at once. If a language highlights dataflow than it may well obscure the control flow; if a lan-guage highlights the conditions under which actions are to be taken, as in a rule-based language, then it probably obscures the sequential ordering of actions. Corollary: part of the notation design problem is to make the obscured information more visible.When seeking information, there must be a cognitive fit between the mental representations and the external repre-sentation. If your mental representation is in control flow form, you will find a dataflow language hard to use; if you think iteratively, recursion will be hard.Taken together these maxims mean that a programming system (including the language and the program-mer) will not be successful unless the language fits the tasks the programmer needs to do, and the programmer’s mental representation fit the language representations.Mental representations: The mental representation of a program is at a higher level than pure code. This has been demonstrated in various ways. Soloway and Ehrlich [81] introduced the notion of a schema or ‘programming plan’, binding together several semantically-related but dispersed statements in a Pascal-like program to make a group which taken together achieved a goal, such as ‘form a running total’. Détienne [19] reviews this literature, Rist [68], [69] extends and tightens the schema concept and relates it to a full theory of program development and comprehension in novices and experts. In Prolog, the corre-sponding notion seems to be a ‘technique’ [6], which has a similar function but is more abstract.The dislocation and dispersal of related statements has been seen by some as a major problem in learning to program. Spohrer and Soloway [82] report that, contrary to the received wisdom which says that most novice bugs are caused by misconceptions about language constructs, “many bugs arise as a result of plan composition problems — difficulties in putting the ‘pieces’ of a program together”. Even though the novice knows what bits are required, linking the bits together is too difficult. Further analysis of a familiar and distressing problem, ‘Why can’t smart students solve simple programming problems?’ [78] has brought home the importance of plan composition as a cause of failure to program.Dataflow languages have not received the same degree of attention, although recently it has been shown that the schema analysis can be applied to spreadsheets [75] and visual dataflow languages [31].Concentration on ‘programming plans’ may have led some researchers to downplay other types of knowl-edge. Gilmore [23] shows that possessing strategies for planning and debugging is a prerequisite for programming success. The development of visual programming may well give more scope for visual or spatial reasoning than the older, text-based languages. Saariluoma and Sajaniemi [72], [73], [74] ingen-iously showed that spreadsheet programmers reasoned about formulae in terms of the areas on the spreadsheet, while Green and Navarro [31] showed that mental representations of programs had different structures for Basic, for spreadsheets, and for LabVIEW. Far more research is needed in this area but the message seems to be that where possible, spatial reasoning is used as a support.Order of program development: The development of a program is not linear. Programmers nei-ther write down a program in text order from start to finish, nor work top-down from the highest mental construct to the smallest. They sometimes jump from a high level to a low level or vice versa, and they fre-quently revise what they have written so far [4], [14], [16], [34], [86]. For the purposes of programming support, that is all we need to know. Although the causes and nature of deviations from top-down devel-opment have inspired much research, the implication for a programming environment is quitestraightforward; the order of working should be left to the programmer, not prescribed by the environment.Effect of environment: Green et al. [34] showed that at the coding stage, programmers using text-based languages develop their code in little spurts (possibly corresponding to mental chunks or schemas) which are knitted in to what has been written so far. It follows that programmers need to read and under-stand what has been written so far, in order to knit the new material in. They called this the ‘parsing-gnisrap’ cycle (gnisrap = parsing backwards). Davies [17] extended this to consider the relationship with the environment: an editor which allows easy access to a large window of code makes the cycle easier. Once again, there is further work to be done here (see Ormerod and Ball, [56], for recent developments), but there is a straightforward implication, that the ‘window of access’ needs to be large enough.From problem to program: Brooks [7] described program design in terms of mappings between problem domain and program domain. Subsequent research (e.g. [58], [60]) has strongly reinforced that view. (The spatial reasoning mentioned above can be seen as using an intermediate mapping where possi-ble.) There is a powerful corollary: it is not easy to deal with entities in the program domain that do not have corresponding entities in the problem domain. Lewis and Olson [45] show that for potential end user programmers, an abundance of low-level primitives is one of the great cognitive barriers to programming. Nardi [51] persuasively argues the case for task-specific languages, since by definition they have a high proportion of entities that map very directly back to problem domain. Visual programming languages are not the only possible way to create task-specific programming languages, but they can be very effective.Anderson et al. [1] distinguish between ‘inherent goals’, which exist in the problem domain, and ‘planning goals’, which exist solely in the solution domain. Computing gross profit for the year would be an inherent goal; declaring an identifier called Profit would be a planning goal. T he crux of the problem in designing a programming language for end-users, according to Lewis and Olson [45], is to avoid spawning shoals of planning goals. The problem with an abundance of low-level primitives, in their view, is precisely that weaving them together correctly creates many planning goals. The problem of plan composition, men-tioned above, can be seen as a problem of spawning planning goals.Understanding and evaluating the program: The ‘parsing-gnisrap’ cycle shows that program-mers need to read and evaluate incomplete programs as well as finished ones. The less experienced the programmer, the smaller the amount that is produced before it must be evaluated. Novices need ‘progres-sive evaluation’ [1], an environment where it is easy to check a program fragment before adding to it. Ideally, every step could be checked individually, and combinations of steps could be checked to see whether something had gone adrift. So the environment needs to allow seriously incomplete program frag-ments to be evaluated; a far cry from early systems for, say, Pascal, where a program had to pass a rigorous syntactic check before it could be executed.Experts need to check and debug their programs, too, and – hardly surprisingly – one of the means they use to locate a recalcitrant bug in a big program is to cut the program into smaller fragments. Weiser [89]identified ‘slicing’ as an expert debugging strategy, a slice being the smallest fragment of code that could be run as a program and that reproduced the errant behaviour. Textual languages support slicing by com-menting out unwanted code.By the maxims of information presentation, the understandability of a programming language depends on the match between the way it is structured and the type of question to be answered. For example, a tradi-tional GOTO-language highlights the procedural information (the order of execution of statements) at the expense of the declarative information (how many types of input are distinguished and how each type is handled). Therefore, it is easier to answer procedural questions from a GOTO language than to answer declarative questions. This has been demonstrated for textual languages [26] and its analogue has been demonstrated for visual data flow languages [34], [49].One way to improve the ‘cognitive fit’ is to include cues to improve the accessibility of information. So tex-tual languages may include perceptual cues, such as indenting or choice of typeface, or symbolic cues [26]. Alternatively, the environment may offer comprehension aids, such as software visualization tools [20], which present their information in a structure that highlights what the programmer wants to know. These tools are usually designed purely by intuition, but one day it may be possible to design them as deliber-ately-engineered complements to the programming language they support.2.2 HCI of programmingThe previous section dealt primarily with moving between the problem domain and the program domain; in this short section we need to consider what is involved in interacting with the code, on the screen or on paper.Control of layout and text: Despite an overwhelming literature on the design of text-editors and word-processors very little is known about their use for programming. ‘Engineering estimates’ of time required for simple editing tasks using conventional editors are available [55], and it would be surprising indeed if they failed to apply to programming, but little is presently known about the use of specialised editors, program synthesizers, and the like.Still less is known about code management in visual languages, not even in the common box-and-line structure. A ‘straw’ comparison (i.e. N=1) by the authors is mentioned below (see Section 5.12 on Viscos-ity), but serious research is needed to tell us how programmers plan and manage the updating of diagrammatic notations.Searching and browsing: There is similarly very little known about searching and browsing in vis-ual programming languages – indeed, not much is known about browsing in any kind of language. But it is well-established that programmers need wide access to many parts of the program. Far from reading theprogram line by line, they visit different parts to make comparisons, establish dependencies, and so on [42], [70].Sometimes programmers use existing code as a seed for new code. This is a deliberately-supported feature of object-oriented languages, where programmers frequently specialise an existing class for new purpose, and it is also used by learners making use of example programs to guide them to solutions [52].Different types of browsers (such as scrolling through a long text versus hypertext methods) favour differ-ent strategies and impose different costs [50], but few general principles are yet forthcoming. Whatever type of browser is used, however, it will impose its own cognitive overheads, both in finding the way and in finding the way back. Hypertext systems create long trails of windows which can be so confusing that some systems give explicit support in keeping track of trails [77].3. Sketch of a framework of cognitive dimensionsThe framework of cognitive dimensions consists of a small number of terms which have been chosen to be easy for non-specialists to comprehend, while yet capturing a significant amount of the psychology and HCI of programming. Moreover, it is supposed to apply beyond programming, to a wide variety of other notations and to interactive devices as well, although we shall not discuss other applications in this paper. Finally, the ‘dimensions’, so-called, are meant to be coherent with each other, like physical dimensions.There is a place for both scientific knowledge and craft knowledge in inventing such a framework. The sci-entific knowledge, in the form of the psychological and HCI-based literature just reviewed, tells us that certain empirical effects have been observed, but we have to rely on craft knowledge to tell us whether those effects are important in context, and whether there are significant effects that have been missed. So the craft knowledge sometimes acts as a check on the scientific knowledge, strengthening or discounting its conclusions. The opposite is just as true; the craft knowledge tells us what experts think they do, but controlled observation gives us a different viewpoint and separates out some of the myths1.The framework can be tested by comparing it post-hoc with its roots, to show that each ‘dimension’ taken in turn does have empirical support; and it can be tested by trying it out, seeing whether all known phe-nomena, or at least the important ones, have a place in the framework; and it can be tested by seeing whether experienced programmers, designers or end-users respond to each ‘dimension’ with recognition and understanding, perhaps with some phrase like “Yes, I know about that, I just didn’t have a name for it until now”; and it can be tested by showing that the ‘dimensions’ have internal coherency and dynamics, perhaps one day even by formulating a ‘theory of information artifacts’. And finally, the ultimate test isshow that the different narratives of software design, as seen by different stake-holders, each have their own contributions to make [18].。
070451 Controlling chaos based on an adaptive nonlinear compensatingmechanism*Corresponding author,Xu Shu ,email:123456789@Abstract The control problems of chaotic systems are investigated in the presence of parametric u ncertainty and persistent external distu rbances based on nonlinear control theory. B y designing a nonlinear compensating mechanism, the system deterministic nonlinearity, parametric uncertainty and disturbance effect can be compensated effectively. The renowned chaotic Lorenz system subject to parametric variations and external disturbances is studied as an illustrative example. From Lyapu nov stability theory, sufficient conditions for the choice of control parameters are derived to guarantee chaos control. Several groups of experiments are carried out, including parameter change experiments, set-point change experiments and disturbance experiments. Simulation results indicate that the chaotic motion can be regulated not only to stead y states but also to any desired periodic orbits with great immunity to parametric variations and external distu rbances.Keywords: chaotic system, nonlinear compensating mechanism, Lorenz chaotic systemPACC: 05451. IntroductionChaotic motion, as the peculiar behavior in deterministic systems, may be undesirable in many cases, so suppressing such a phenomenon has been intensively studied in recent years. Generally speaking chaos suppression and chaos synchronization[1-4 ]are two active research fields in chaos control and are both crucial in application of chaos. In the following letters we only deal with the problem of chaos suppression and will not discuss the chaos synchronization problem.Since the early 1990s, the small time-dependent parameter perturbation was introduced by Ott,Grebogi, and Y orke to eliminate chaos,[5]many effective control methods have been reported in various scientific literatures.[1-4,6-36,38-44,46] There are two lines in these methods. One is to introduce parameter perturbations to an accessible system parameter, [5-6,8-13] the other is to introduce an additive external force to the original uncontrolled chaotic system. [14-37,39-43,47] Along the first line, when system parameters are not accessible or can not be changed easily, or the environment perturbations are not avoided, these methods fail. Recently, using additive external force to achieve chaos suppression purpose is in the ascendant. Referring to the second line of the approaches, various techniques and methods have been proposed to achieve chaos elimination, to mention only a few:(ⅰ) linear state feedback controlIn Ref.[14] a conventional feedback controller was designed to drive the chaotic Duffing equation to one of its inherent multiperiodic orbits.Recently a linear feedback control law based upon the Lyapunov–Krasovskii (LK) method was developed for the suppression of chaotic oscillations.[15]A linear state feedback controller was designed to solve the chaos control problem of a class of new chaotic system in Ref.[16].(ⅱ) structure variation control [12-16]Since Y u X proposed structure variation method for controlling chaos of Lorenz system,[17]some improved sliding-mode control strategies were*Project supported by the National Natural Science Foundation of C hina (Grant No 50376029). †Corresponding au thor. E-mail:zibotll@introduced in chaos control. In Ref.[18] the author used a newly developed sliding mode controller with a time-varying manifold dynamic to compensate the external excitation in chaotic systems. In Ref.[19] the design schemes of integration fuzzy sliding-mode control were addressed, in which the reaching law was proposed by a set of linguistic rules. A radial basis function sliding mode controller was introduced in Ref.[20] for chaos control.(ⅲ) nonlinear geometric controlNonlinear geometric control theory was introduced for chaos control in Ref.[22], in which a Lorenz system model slightly different from the original Lorenz system was studied considering only the Prandtl number variation and process noise. In Ref.[23] the state space exact linearization method was also used to stabilize the equilibrium of the Lorenz system with a controllable Rayleigh number. (ⅳ)intelligence control[24-27 ]An intelligent control method based on RBF neural network was proposed for chaos control in Ref.[24]. Liu H, Liu D and Ren H P suggested in Ref.[25] to use Least-Square Support V ector Machines to drive the chaotic system to desirable points. A switching static output-feedback fuzzy-model-based controller was studied in Ref.[27], which was capable of handling chaos.Other methods are also attentively studied such as entrainment and migration control, impulsive control method, optimal control method, stochastic control method, robust control method, adaptive control method, backstepping design method and so on. A detailed survey of recent publications on control of chaos can be referenced in Refs.[28-34] and the references therein.Among most of the existing control strategies, it is considered essentially to know the model parameters for the derivation of a controller and the control goal is often to stabilize the embedded unstable period orbits of chaotic systems or to control the system to its equilibrium points. In case of controlling the system to its equilibrium point, one general approach is to linearize the system in the given equilibrium point, then design a controller with local stability, which limits the use of the control scheme. Based on Machine Learning methods, such as neural network method[24]or support vector machine method,[25]the control performance often depends largely on the training samples, and sometimes better generalization capability can not be guaranteed.Chaos, as the special phenomenon of deterministic nonlinear system, nonlinearity is the essence. So if a nonlinear real-time compensator can eliminate the effect of the system nonlinearities, chaotic motion is expected to be suppressed. Consequently the chaotic system can be controlled to a desired state. Under the guidance of nonlinear control theory, the objective of this paper is to design a control system to drive the chaotic systems not only to steady states but also to periodic trajectories. In the next section the controller architecture is introduced. In section 3, a Lorenz system considering parametric uncertainties and external disturbances is studied as an illustrative example. Two control schemes are designed for the studied chaotic system. By constructing appropriate L yapunov functions, after rigorous analysis from L yapunov stability theory sufficient conditions for the choice of control parameters are deduced for each scheme. Then in section 4 we present the numerical simulation results to illustrate the effectiveness of the design techniques. Finally some conclusions are provided to close the text.2. Controller architectureSystem differential equation is only an approximate description of the actual plant due to various uncertainties and disturbances. Without loss of generality let us consider a nonlinear continuous dynamic system, which appears strange attractors under certain parameter conditions. With the relative degree r n(n is the dimension of the system), it can be directly described or transformed to the following normal form:121(,,)((,,)1)(,,,)(,,)r r r z z z z za z v wb z v u u d z v u u vc z v θθθθθθθθ-=⎧⎪⎪⎪=⎪=+∆+⎨⎪ ++∆-+⎪⎪ =+∆+⎪=+∆⎩ (1) 1y z =where θ is the parameter vector, θ∆ denotes parameter uncertainty, and w stands for the external disturbance, such that w M ≤with Mbeingpositive.In Eq.(1)1(,,)T r z z z = can be called external state variable vector,1(,,)T r n v v v += called internal state variable vector. As we can see from Eq.(1)(,,,,)(,,)((,,)1)d z v w u a z v w b z v uθθθθθθ+∆=+∆+ ++∆- (2)includes system nonlinearities, uncertainties, external disturbances and so on.According to the chaotic system (1), the following assumptions are introduced in order to establish the results concerned to the controller design (see more details in Ref.[38]).Assumption 1 The relative degree r of the chaotic system is finite and known.Assumption 2 The output variable y and its time derivatives i y up to order 1r -are measurable. Assumption 3 The zero dynamics of the systemis asymptotically stable, i.e.,(0,,)v c v θθ=+∆ is asymptotically stable.Assumption 4 The sign of function(,,)b z v θθ+∆is known such that it is always positive or negative.Since maybe not all the state vector is measurable, also (,,)a z v θθ+∆and (,,)b z v θθ+∆are not known, a controller with integral action is introduced to compensate theinfluenceof (,,,,)d z v w u θθ+∆. Namely,01121ˆr r u h z h z h z d------ (3) where110121112100ˆr i i i r r r r i i ii r i i d k z k k k z kz k uξξξ-+=----++-==⎧=+⎪⎪⎨⎪=----⎪⎩∑∑∑ (4)ˆdis the estimation to (,,,,)d z v w u θθ+∆. The controller parameters include ,0,,1i h i r =- and ,0,,1i k i r =- . Here011[,,,]Tr H h h h -= is Hurwitz vector, such that alleigenvalues of the polynomial121210()rr r P s s h sh s h s h --=+++++ (5)have negative real parts. The suitable positive constants ,0,,1i h i r =- can be chosen according to the expected dynamic characteristic. In most cases they are determined according to different designed requirements.Define 1((,,))r k sign b z v θμ-=, here μstands for a suitable positive constant, and the other parameters ,0,,2i k i r =- can be selected arbitrarily. After011[,,,]Tr H h h h -= is decided, we can tune ,0,,1i k i r =- toachievesatisfyingstaticperformances.Remark 1 In this section, we consider a n-dimensional nonlinear continuous dynamic system with strange attractors. By proper coordinate transformation, it can be represented to a normal form. Then a control system with a nonlinear compensator can be designed easily. In particular, the control parameters can be divided into two parts, which correspond to the dynamic characteristic and the static performance respectively (The theoretic analysis and more details about the controller can be referenced to Ref.[38]).3. Illustrative example-the Lorenz systemThe Lorenz system captures many of the features of chaotic dynamics, and many control methods have been tested on it.[17,20,22-23,27,30,32-35,42] However most of the existing methods is model-based and has not considered the influence ofpersistent external disturbances.The uncontrolled original Lorenz system can be described by112121132231233()()()()x P P x P P x w x R R x x x x w xx x b b x w =-+∆++∆+⎧⎪=+∆--+⎨⎪=-+∆+⎩ (6) where P and R are related to the Prendtl number and Rayleigh number respectively, and b is a geometric factor. P ∆, R ∆and b ∆denote the parametric variations respectively. The state variables, 1x ,2x and 3x represent measures of fluid velocity and the spatial temperature distribution in the fluid layer under gravity , and ,1,2,3i w i =represent external disturbance. In Lorenz system the desired response state variable is 1x . It is desired that 1x is regulated to 1r x , where 1r x is a given constant. In this section we consider two control schemes for system (6).3.1 Control schemes for Lorenz chaotic system3.1.1 Control scheme 1The control is acting at the right-side of the firstequation (1x), thus the controlled Lorenz system without disturbance can be depicted as1122113231231x Px Px u xRx x x x x x x bx y x =-++⎧⎪=--⎨⎪=-⎩= (7) By simple computation we know system (7) has relative degree 1 (i.e., the lowest ordertime-derivative of the output y which is directly related to the control u is 1), and can be rewritten as1122113231231z Pz Pv u vRz z v v v z v bv y z =-++⎧⎪=--⎨⎪=-⎩= (8) According to section 2, the following control strategy is introduced:01ˆu h z d=-- (9) 0120010ˆ-d k z k k z k uξξξ⎧=+⎪⎨=--⎪⎩ (10) Theorem 1 Under Assumptions 1 toAssumptions 4 there exists a constant value *0μ>, such that if *μμ>, then the closed-loop system (8), (9) and (10) is asymptotically stable.Proof Define 12d Pz Pv =-+, Eq.(8) can be easily rewritten as1211323123z d u v Rz z v v vz v bv =+⎧⎪=--⎨⎪=-⎩ (11) Substituting Eq.(9) into Eq.(11) yields101211323123ˆz h z d dv R z z v v v z v bv ⎧=-+-⎪=--⎨⎪=-⎩ (12) Computing the time derivative of d and ˆdand considering Eq.(12) yields12011132ˆ()()dPz Pv P h z d d P Rz z v v =-+ =--+- +-- (13) 0120010000100ˆ-()()ˆ=()d k z k k z k u k d u k d k z k d d k dξξξ=+ =--++ =-- - = (14)Defining ˆdd d =- , we have 011320ˆ()()dd d P h P R z P z v P v P k d=- =+- --+ (15) Then, we can obtain the following closed-loop system101211323123011320()()z h z dvRz z v v v z v bv d Ph PR z Pz v Pv P k d⎧=-+⎪=--⎪⎨=-⎪⎪=+---+⎩ (16) To stabilize the closed-loop system (16), a L yapunovfunction is defined by21()2V ςς=(17)where, ςdenotes state vector ()123,,,Tz v v d, isthe Euclidean norm. i.e.,22221231()()2V z v v dς=+++ (18) We define the following compact domain, which is constituted by all the points internal to the superball with radius .(){}2222123123,,,2U z v v d zv v dM +++≤(19)By taking the time derivative of ()V ςand replacing the system expressions, we have11223322*********01213()()(1)V z z v v v v dd h z v bv k P d R z v P R P h z d P v d P z v d ς=+++ =----++ +++-- (20) For any ()123,,,z v v d U ∈, we have: 222201230120123()()(1)V h z v b v k P dR z v PR Ph z d P v d d ς≤----+ ++++ ++ (21)Namely,12300()(1)22020V z v v dPR Ph R h R P ς⎡⎤≤- ⎣⎦++ - 0 - - 1 - 2⨯00123(1)()2Tb PR Ph P k P z v v d ⎡⎤⎢⎥⎢⎥⎢⎥⎢⎥⎢⎥⎢⎥0 ⎢⎥2⎢⎥++⎢⎥- - - +⎢⎥⎣22⎦⎡⎤⨯ ⎣⎦(22) So if the above symmetrical parameter matrix in Eq.(22) is positive definite, then V is negative and definite, which implies that system (16) is asymptotically stable based on L yapunov stability theory.By defining the principal minor determinants of symmetrical matrix in Eq.(22) as ,1,2,3,4i D i =, from the well-known Sylvester theorem it is straightforward to get the following inequations:100D h => (23)22004RD h =-> (24)23004R b D bh =-> (25)240302001()(1)(2)821[2(1)]08P M D k P D b PR Ph PR D Pb Ph R PR Ph =+-+++--+++>(26)After 0h is determined by solving Inequalities (23) to (25), undoubtedly, the Inequalities (26) can serve effectively as the constraints for the choice of 0k , i.e.20200031(1)(2)821[2(1)]8P M b PR Ph PR D Pb Ph R PR Ph k P D ++++ ++++>- (27)Here,20200*31(1)(2)821[2(1)]8P M b PR Ph PR D Pb Ph R PR Ph P D μ++++ ++++=-.Then the proof of the theorem 1 is completed. 3.1.2 Control scheme 2Adding the control signal on the secondequation (2x ), the system under control can be derived as112211323123x P x P x x R x x x x u xx x bx =-+⎧⎪=--+⎨⎪=-⎩ (28) From Eq.(28), for a target constant 11()r x t x =,then 1()0xt = , by solving the above differential equation, we get 21r r x x =. Moreover whent →∞,3r x converges to 12r x b . Since 1x and 2x havethe same equilibrium, then the measured state can also be chosen as 2x .To determine u , consider the coordinate transform:122133z x v x v x=⎧⎪=⎨⎪=⎩ and reformulate Eq.(28) into the following normal form:1223121231231zRv v v z u vPz Pv v z v bv y z =--+⎧⎪=-⎨⎪=-⎩= (29) thus the controller can be derived, which has the same expression as scheme 1.Theorem 2 Under Assumptions 1, 2, 3 and 4, there exists a constant value *0μ>, such that if *μμ>, then the closed-loop system (9), (10) and (29) is asymptotically stable.Proof In order to get compact analysis, Eq.(29) can be rewritten as12123123z d u v P z P v vz v bv =+⎧⎪=-⎨⎪=-⎩ (30) where 2231d Rv v v z =--Substituting Eq.(9) into Eq.(30),we obtain:1012123123ˆz h z d dv P z P v v z v bv ⎧=-+-⎪=-⎨⎪=-⎩ (31) Giving the following definition:ˆdd d =- (32) then we can get22323112123212301()()()()dRv v v v v z R Pz Pv Pz Pv v v z v bv h z d =--- =--- ----+ (33) 012001000ˆ-()d k z k k z k u k d u k dξξ=+ =--++ = (34) 121232123010ˆ()()()(1)dd d R Pz Pv Pz Pv v v z v bv h z k d=- =--- --+-+ (35)Thus the closed-loop system can be represented as the following compact form:1012123123121232123010()()()(1)zh z d v Pz Pv v z v bv d R Pz Pv Pz Pv v v z v bv h z k d⎧=-+⎪⎪=-⎪=-⎨⎪=---⎪⎪ --+-+⎩(36) The following quadratic L yapunov function is chosen:21()2V ςς=(37)where, ςdenotes state vector ()123,,,Tz v v d , is the Euclidean norm. i.e.,22221231()()2V z v v dς=+++ (38) We can also define the following compact domain, which is constituted by all the points internalto the super ball with radius .(){}2222123123,,,2U z v v d zv v dM =+++≤ (39)Differentiating V with respect to t and using Eq.(36) yields112233222201230011212322321312()(1)(1)()V z z v v v v dd h z P v bv k dP R h z d P z v z v v P b v v d P v d P z v d z v d ς=+++ =----+ +++++ ++--- (40)Similarly, for any ()123,,,z v v d U ∈, we have: 2222012300112133231()(1)(1)(2V h z P v b v k dPR h z d P z v v P b d P v d d M z dς≤----+ +++++ ++++ + (41)i.e.,12300()(12)22V z v v dPR M h P h P Pς⎡⎤≤- ⎣⎦+++ - -2 - 0 ⨯ 001230(12)(1)2TP b PR M h P k z v v d ⎡⎤⎢⎥⎢⎥⎢⎥ - ⎢⎥⎢⎥⎢⎥ ⎢⎥22⎢⎥⎢⎥ +++ - - -+⎢⎥⎣22⎦⎡⎤⨯ ⎣⎦(42) For brevity, Let1001(12)[(222)82(23)]P PR M h b PR P h M P b α=++++++ ++(43) 2201[(231)(13)]8P M P b b PR h α=+-+++ (44)230201(2)[2(12)8(2)(4)]PM P b P P PR M h P b Ph P α=++ +++ ++- (45)Based on Sylvester theorem the following inequations are obtained:100D h => (46)22004PD h P =-> (47)3202PMD bD =-> (48)403123(1)0D k D ααα=+---> (49)where,1,2,3,4i D i =are the principal minordeterminants of the symmetrical matrix in Eq.(42).*0k μ>*12331D αααμ++=- (50)The theorem 2 is then proved.Remark 2 In this section we give two control schemes for controlling chaos in Lorenz system. For each scheme the control depends on the observed variable only, and two control parameters are neededto be tuned, viz. 0h and 0k . According to L yapunov stability theory, after 0h is fixed, the sufficient condition for the choice of parameter 0k is also obtained.4. Simulation resultsChoosing 10P =,28R =, and 8/3b =, the uncontrolled Lorenz system exhibits chaotic behavior, as plotted in Fig.1. In simulation let the initial values of the state of thesystembe 123(0)10,(0)10,(0)10x x x ===.x1x 2x1x 3Fig.1. C haotic trajectories of Lorenz system (a) projected on12x x -plane, (b) projected on 13x x -plane4.1 Simulation results of control the trajectory to steady stateIn this section only the simulation results of control scheme 2 are depicted. The simulation results of control scheme 1 will be given in Appendix. For the first five seconds the control input is not active, at5t s =, control signal is input and the systemtrajectory is steered to set point2121(,,)(8.5,8.5,27.1)T Tr r r x x x b =, as can be seen inFig.2(a). The time history of the L yapunov function is illustrated in Fig.2(b).t/sx 1,x 2,x 3t/sL y a p u n o v f u n c t i o n VFig.2. (a) State responses under control, (b) Time history of the Lyapunov functionA. Simulation results in the presence ofparameters ’ changeAt 9t s =, system parameters are abruptly changed to 15P =,35R =, and 12/3b =. Accordingly the new equilibrium is changedto 2121(,,)(8.5,8.5,18.1)T Tr r r x x x b =. Obviously, aftervery short transient duration, system state converges to the new point, as shown in Fig.3(a). Fig.4(a) represents the evolution in time of the L yapunov function.B. Simulation results in the presence of set pointchangeAt 9t s =, the target is abruptly changedto 2121(,,)(12,12,54)T Tr r r x x x b =, then the responsesof the system state are shown in Fig.3(b). In Fig.4(b) the time history of the L yapunov function is expressed.t/sx 1,x 2,x 3t/sx 1,x 2,x 3Fig.3. State responses (a) in the presence of parameter variations, (b) in the presence of set point changet/sL y a p u n o v f u n c t i o n Vt/sL y a p u n o v f u n c t i o n VFig.4. Time history of the Lyapunov fu nction (a) in the presence of parameter variations, (b) in the presence of set point changeC. Simulation results in the presence ofdisturbanceIn Eq.(5)external periodic disturbance3cos(5),1,2,3i w t i π==is considered. The time responses of the system states are given in Fig.5. After control the steady-state phase plane trajectory describes a limit cycle, as shown in Fig.6.t/sx 1,x 2,x 3Fig.5. State responses in the presence of periodic disturbancex1x 3Fig.6. The state space trajectory at [10,12]t ∈in the presence ofperiodic disturbanceD. Simulation results in the presence of randomnoiseUnder the influence of random noise,112121132231233xPx Px x Rx x x x u xx x bx εδεδεδ=-++⎧⎪=--++⎨⎪=-+⎩ (51) where ,1,2,3i i δ= are normally distributed withmean value 0 and variance 0.5, and 5ε=. The results of the numerical simulation are depicted in Fig.7,which show that the steady responses are hardly affected by the perturbations.t/sx 1,x 2,x 3t/se 1,e 2,e 3Fig.7. Time responses in the presence of random noise (a) state responses, (b) state tracking error responses4.2 Simulation results of control the trajectory to periodic orbitIf the reference signal is periodic, then the system output will also track this signal. Figs.8(a) to (d) show time responses of 1()x t and the tracking trajectories for 3-Period and 4-period respectively.t/sx 1x1x 2t/sx 1x1x 2Fig.8. State responses and the tracking periodic orbits (a)&( b)3-period, (c)&(d) 4-periodRemark 3 The two controllers designed above solved the chaos control problems of Lorenz chaoticsystem, and the controller design method can also beextended to solve the chaos suppression problems of the whole Lorenz system family, namely the unified chaotic system.[44-46] The detail design process and close-loop system analysis can reference to the author ’s another paper.[47] In Ref.[47] according to different positions the scalar control input added,three controllers are designed to reject the chaotic behaviors of the unified chaotic system. Taking the first state 1x as the system output, by transforming system equation into the normal form firstly, the relative degree r (3r ≤) of the controlled systems i s known. Then we can design the controller with the expression as Eq.(3) and Eq.(4). Three effective adaptive nonlinear compensating mechanisms are derived to compensate the chaotic system nonlinearities and external disturbances. According toL yapunov stability theory sufficient conditions for the choice of control parameters are deduced so that designers can tune the design parameters in an explicit way to obtain the required closed loop behavior. By numeric simulation, it has been shown that the designed three controllers can successfully regulate the chaotic motion of the whole family of the system to a given point or make the output state to track a given bounded signal with great robustness.5. ConclusionsIn this letter we introduce a promising tool to design control system for chaotic system subject to persistent disturbances, whose entire dynamics is assumed unknown and the state variables are not completely measurable. By integral action the nonlinearities, including system structure nonlinearity, various disturbances, are compensated successfully. It can handle, therefore, a large class of chaotic systems, which satisfy four assumptions. Taking chaotic Lorenz system as an example, it has been shown that the designed control scheme is robust in the sense that the unmeasured states, parameter uncertainties and external disturbance effects are all compensated and chaos suppression is achieved. Some advantages of this control strategy can be summarized as follows: (1) It is not limited to stabilizing the embeddedperiodic orbits and can be any desired set points and multiperiodic orbits even when the desired trajectories are not located on the embedded orbits of the chaotic system.(2) The existence of parameter uncertainty andexternal disturbance are allowed. The controller can be designed according to the nominal system.(3) The dynamic characteristics of the controlledsystems are approximately linear and the transient responses can be regulated by the designer through controllerparameters ,0,,1i h i r =- .(4) From L yapunov stability theory sufficientconditions for the choice of control parameters can be derived easily.(5) The error converging speed is very fast evenwhen the initial state is far from the target one without waiting for the actual state to reach the neighborhood of the target state.AppendixSimulation results of control scheme 1.t/sx 1,x 2,x 3t/sL y a p u n o v f u n c t i o n VFig.A1. (a) State responses u nder control, (b) Time history of the Lyapunov functiont/sx 1,x 2,x 3t/sx 1,x 2,x 3Fig.A2. State responses (a) in the presence of parameter variations, (b) in the presence of set point changet/sL y a p u n o v f u n c t i o n Vt/sL y a p u n o v f u n c t i o n VFig.A3. Time history of the L yapu nov fu nction (a) in the presence of parameter variations, (b) in the presence of set point changet/sx 1,x 2,x 3Fig.A4. State responses in the presence of periodic disturbanceresponsest/sx 1,x 2,x 3Fig.A5. State responses in the presence of rand om noiset/sx 1x1x 2Fig.A6. State response and the tracking periodic orbits (4-period)References[1] Lü J H, Zhou T S, Zhang S C 2002 C haos Solitons Fractals 14 529[2] Yoshihiko Nagai, Hua X D, Lai Y C 2002 C haos Solitons Fractals 14 643[3] Li R H, Xu W , Li S 2007 C hin.phys.16 1591 [4]Xiao Y Z, Xu W 2007 C hin.phys.16 1597[5] Ott E ,Greb ogi C and Yorke J A 1990 Phys.Rev .Lett. 64 1196 [6]Yoshihiko Nagai, Hua X D, Lai Y C 1996 Phys.Rev.E 54 1190 [7] K.Pyragas, 1992 Phys. Lett. A 170 421 [8] Lima,R and Pettini,M 1990 Phys.Rev.A 41 726[9] Zhou Y F, Tse C K, Qiu S S and Chen J N 2005 C hin.phys. 14 0061[10] G .Cicog na, L.Fronzoni 1993 Phys.Rew .E 30 709 [11] Rakasekar,S. 1993 Pramana-J.Phys.41 295 [12] Gong L H 2005 Acta Phys.Sin.54 3502 (in C hinese) [13] Chen L,Wang D S 2007 Acta Phys.Sin.56 0091 (in C hinese) [14] C hen G R and Dong X N 1993 IEEE Trans.on Circuits andSystem-Ⅰ:Fundamental Theory and Applications 40 9 [15] J.L. Kuang, P.A. Meehan, A.Y.T. Leung 2006 C haos SolitonsFractals 27 1408[16] Li R H, Xu W, Li S 2006 Acta Phys.Sin.55 0598 (in C hinese) [17] Yu X 1996 Int.J.of Systems Science 27 355[18] Hsun-Heng Tsai, C hyu n-C hau Fuh and Chiang-Nan Chang2002 C haos,Solitons Fractals 14 627[19] Her-Terng Yau and C hieh-Li C hen 2006 C hao ,SolitonsFractal 30 709[20] Guo H J, Liu J H, 2004 Acta Phys.Sin.53 4080 (in C hinese) [21] Yu D C, Wu A G , Yang C P 2005 Chin.phys.14 0914 [22] C hyu n-C hau Fuh and Pi-Cheng Tu ng 1995 Phys.Rev .Lett.752952[23] Chen L Q, Liu Y Z 1998 Applied Math.Mech. 19 63[24] Liu D, R en H P, Kong Z Q 2003 Acta Phys.Sin.52 0531 (inChinese)[25] Liu H, Liu D and Ren H P 2005 Acta Phys.Sin.54 4019 (inChinese)[26] C hang W , Park JB, Joo YH, C hen GR 2002 Inform Sci 151227[27] Gao X, Liu X W 2007 Acta Phys.Sin. 56 0084 (in C hinese) [28] Chen S H, Liu J, Lu J 2002 C hin.phys.10 233 [29] Lu J H, Zhang S. 2001 Phys. Lett. A 286 145[30] Liu J, Chen S H, Xie J. 2003 C haos Solitons Fractals 15 643 [31] Wang J, Wang J, Li H Y 2005 C haos Solitons Fractals 251057[32] Wu X Q, Lu JA, C hi K. Tse, Wang J J, Liu J 2007 ChaoSolitons Fractals 31 631[33] A.L.Fradkov , R .J.Evans, 2002 Preprints of 15th IF AC W orldCongress on Automatic Control 143[34] Zhang H G 2003 C ontrol theory of chaotic systems (Shenyang:Northeastern University) P38 (in C hinese)[35] Yu-Chu Tian, Moses O.Tadé, David Levy 2002Phys.Lett.A.296 87[36] Jose A R , Gilberto E P, Hector P, 2003 Phys. Lett. A 316 196 [37] Liao X X, Yu P 2006 Chaos Solitons Fractals 29 91[38] Tornambe A, V aligi P.A 1994 Measurement, and C ontrol 116293[39] Andrew Y.T.Leung, Liu Z R 2004 Int.J.Bifurc.C haos 14 2955 [40] Qu Z L, Hu,G .,Yang,G J, Qin,G R 1995 Phys.Rev .Lett.74 1736 [41] Y ang J Z, Qu Z L, Hu G 1996 Phys.Rev.E.53 4402[42] Shyi-Kae Yang, C hieh-Li Chen, Her-Terng Yau 2002 C haosSolitons Fractals 13 767。
2018年2月Journal on Communications February 2018 第39卷第2期通信学报V ol.39No.2基于格的用户匿名三方口令认证密钥协商协议王彩芬,陈丽(西北师范大学计算机科学与工程学院,甘肃兰州 730070)摘 要:随着量子理论的快速发展,离散对数问题和大整数分解问题在量子计算下存在多项式求解算法,其安全性受到严重威胁,因此,提出2个基于环上带误差学习问题的用户匿名三方口令认证密钥协商方案,包括基于格的隐式认证密钥协商方案和基于格的显式认证密钥协商方案,并证明了其安全性。
其中,隐式认证密钥协商协议通信量少、认证速度快,显式认证密钥协商协议安全性更高,同时实现用户和服务器的双向认证、可抗不可测在线字典攻击。
与其他口令认证密钥协商协议相比,所提协议有更高的效率和更短的密钥长度,能够抵抗量子攻击,因此,该协议既高效又安全,适用于大规模网络下的通信。
关键词:格密码;可证明安全;口令认证;密钥交换;环上带误差中图分类号:TP309.7文献标识码:Adoi: 10.11959/j.issn.1000-436x.2018021Three-party password authenticated key agreementprotocol with user anonymity based on latticeWANG Caifen, CHEN LiCollege of Computer Science and Engineering, Northwest Normal University, Lanzhou 730070, China Abstract: With the rapid development of quantum theory and the existence of polynomial algorithm in quantum computation based on discrete logarithm problem and large integer decomposition problem, the security of the algorithm was seriously threatened. Therefore, two authentication key agreement protocols were proposed rely on ring-learning-with-error (RLWE) assumption including lattice-based implicit authentication key agreement scheme and lattice-based explicit authentication key agreement scheme and proved its security. The implicit authentication key agreement protocol is less to communicate and faster to authentication, the explicit authentication key agreement protocol is more to secure. At the same time, bidirectional authentication of users and servers can resist unpredictable online dictionary attacks. The new protocol has higher efficiency and shorter key length than other password authentication key agreement protocols. It can resist quantum attacks. Therefore, the protocol is efficient, secure, and suitable for large-scale network communication.Key words: lattice-based cryptology, provably secure, password authentication, key exchange, ring-learning-with-error1 引言随着大数据时代的到来以及量子计算机的快速发展,人们对数据的安全有了更高的要求,格密码依靠其独特的困难问题和归约结果成为密码学研究的热点,基于离散对数问题和大整数分解问题在量子计算下存在多项式求解算法,其安全性受到严重威胁,因此,格理论被应用于抗量子攻击的密收稿日期:2017-09-18;修回日期:2018-01-17基金项目:国家自然科学基金资助项目(No.61662069, No.61562077, No.61662071);西北师范大学青年教师科研能力提升计划基金资助项目(No.NWNU-LKQN-14-7)Foundation Items: The National Natural Science Foundation of China (No.61662069, No.61562077, No.61662071), The Founda-·22·通信学报第39卷码体制中。
专利名称:Visualizing system, visualizing method, andvisualizing program发明人:Tatsuro Chiba申请号:US10533675申请日:20031105公开号:US20060262117A1公开日:20061123专利内容由知识产权出版社提供专利附图:摘要:A vector field () including its local three-dimensional attribute is substantially visualized on a two-dimensional field of view in an intuitionally visible way (pp). For the visualization, the vector field () is mapped onto a three-dimensional coordinate space () toproduce corresponding coordinate point sequences (p), the degree of elevation (A) in a local area of a plane in which the coordinate point sequences are connected (p) is determined, the degree of depression (C) in the local area is determined (p), the degree of elevation/depression (B) in the local area is determined by weight-combining the degree of elevation (A) and the degree of depression (C) (p), the coordinate space () is mapped onto a two-dimensional plane (), and gray-scale display (F) corresponding to the degree of elevation/depression is conducted on the area of the two-dimensional plane () corresponding to the local area (p).申请人:Tatsuro Chiba地址:Tokyo JP国籍:JP更多信息请下载全文后查看。
Visualization of Scalar Topology for Structural EnhancementC.L.Bajaj V.PascucciDepartment of Computer Sciences and TICAMUniversity of Texas,Austin,TX78733D.R.SchikoreCenter for Applied Scientific ComputingLawrence Livermore National Laboratory,Livermore,CA94550AbstractScalarfields arise in every scientific application.Existing scalar visualization techniques require that the user infer the global scalar structure from what is frequently an insufficient display of information.We present a visualization technique which nu-merically detects the structure at all scales,removing from the user the responsibility of extracting information implicit in the data,and presenting the structure explicitly for analysis.We further demonstrate how scalar topology detection proves use-ful for correct visualization and image processing applications such as image co-registration,isocontouring,and mesh com-pression.Keywords:Scientific Visualization,Scalar Fields,Curves and Surfaces,V ector Topology1IntroductionVisualization of scalarfields is common across all scientific dis-ciplines,including geographic data such as altitude and temper-ature,medical applications with CT and MRI values,and pres-sure and vorticity magnitude in computationalfluid dynamics. The purpose of the visualization is to aid the user in understand-ing the structure of the data[29].Common methods for visualizing scalar fields can be grouped into two broad classes.First are methods whose aim is to detect structure and present a display to the user which communicates this structure.Critical to these methods is the definition of structure,and how well the definition matches the visualization users’need.Second are those methods which attempt to display the entire scalarfield simultaneously,leav-ing interpretation of the display to the binations of the two methods serve to reinforce the information provided by each visualization.We will use for comparison one technique from each of these categories,isocontouring and colormapping. Isocontours,or constant valued curves and surfaces from continuous2D and3D scalarfields,are a common visualiza-tion technique for displaying scalarfield structure[21].By their definition,isocontours represent the data only at discrete lev-els,and as such are an effective technique for determining the“shape”of objects in the scalarfield.Shape extraction as de-fined by isocontoursis well understood and appreciated in many applications,such as Medical Imaging,as isocontours in a den-sityfield may result in realistic models of skeletal structure,skin surface,or various organs[22].Also implicit in their definition is the fact that isocontours are an incomplete representation of the scalarfield,as one can only infer from an isocontour that the data to one side is above the isovalue,and the data to the other side is below the isovalue.With multiple isocontours,the scalarfield effectively becomes segmented into afinite number of ranges,within which the structure remains unknown.The same claim of incompleteness can be made of any technique which only displays a portion of thefield.Moreover it is not obvious which isovalues one should select and how namy of them[4].Colormapping of scalar data defines a discrete or continuous range of colors onto which the scalar values are e of color,though proven to be useful in many visualization tech-niques,introduces complications due of perceptual issues,such as colorblindness.Colormaps may also mislead the user,for ex-ample when small-scale structure in the data is washed out due to the large range of values taken on by the variable.Scientific data which is time-varying in nature intensifies the problems with the methods described above.In the typical case,a scalar variable may take on a wide range of values over thecourse of a simulation,however at certain times during the sim-ulation the range may be much smaller.With both isocontours and colormapped display,it is desirable to use the same isoval-ues and colormap for each time-step being displayed in order to reduce the possibility of introducing artifacts which may be misinterpreted as features.This requirement complicates the task of choosing a good colormap or selection of isovalues fora time-varying visualization.In this paper we present a complementary scalar structure visualization technique which does not depend on the user to determine structure from the graphical display,but instead de-fines,computes,and displays the structure of a scalarfield di-rectly.Through detection of all critical points(saddles,max-ima,and minima),we construct an embedded graph by com-puting integral curves in the gradientfield from saddle points to an attached critical point,as illustrated infigure1.Curves in 1Figure1:Isocontours(dotted)of part of a scalarfield along with the critical points and integral curvesthis topological graph are always perpendicular to isocontours of the scalarfield[23],and we will demonstrate that these curves contain complementary information to that provided by display of isocontours or colormapped scalarfields,providing a method which is both useful in its own right and which also enhances the commonly used techniques for visualizing scalarfields.We further indicate that the definition of structure which is provided by the scalar topology proves useful in several additional visu-alization and image processing applications.2Related WorkMuch of the work in enhancing colormapped visualization of scalarfields has dealt with determining“good”colormaps which effectively display the data.Bergman,et.al.,define rules based on perception,user goals,and data characteristics to automatically select a colormap which will meet the user requirements[6].Histogram equalization is a technique which spreads the data evenly over the range of colors,using the avail-able color space to it’s fullest[28].The result is that each color in the colormap is used an equal number of times.Gershon[14] uses“Generalized Animation”to display otherwise static scalar data in a dynamic way,taking advantage of the ability of the visual system to detect dynamic changes.Animation draws at-tention to fuzzy details in the data which may not be detected in the static representation.There has been several papers in detecting isocontours in2d and3d scalar data[21,30].Additional work concentrates on handling problems in regions containing saddle points which cause difficulty in determining the topological structure of the surface contained in the region[25,31,26].The problem of de-tecting ridges and valleys in digital terrain has been treated in several papers[12].McCormack,et.al.consider the problem of detecting drainage patterns in geographic terrain[24].Inter-rante,et.al.have used ridge and valley detection on3d surfaces to enhance the shape of transparently rendered surfaces[19]. Extrema graphs were used by Itoh and Koyamada to speed iso-contour extraction[20].A graph containing extreme points and boundary points of a scalarfield can be guaranteed to intersect every isocontour at least once,allowing seed points to be gener-ated by searching only the cells contained in the extrema graph. Helman and Hesselink detect vectorfield topology by clas-sifying the zeros of a vectorfield and performing particle trac-ing from saddle points[17].The resulting partitioning consists of regions which are topologically equivalent to uniformflow. Globus,et.al.describe a software system for3d vector topol-ogy and briefly note that the technique may also be applied to the gradient of a scalarfield in order to identify maxima and minima[15].Bader et.al.and Collard et.al.examine the gra-dientfield of the charge density in a molecular system[2,1,10]. The topology of this scalarfield represents the bonds linking to-gether the atoms of the molecule.Bader goes on to show how features higher level structures in the topology represent chains, rings,an cages in the molecule.Bader’s example is a defin-ing motivation for developing the automatic extraction and vi-sualization of topology from a scalarfield.In many situations, topology provides a more intuitive and physically meaningful visualization.Grosse[16]also presents methods of approxi-mating the scalar topology of the electron density function of proteins.One of his methods uses tensor product B-splinefits while the other scales Fourier coefficients of the electron den-sity function.3Scalar TopologyPrevious techniques for enhancing scalarfield visualization at-tempt to address the inability of colormapping and isocontour-ing to capture and directly represent features in the data.We ad-dress this problem not through feature enhancement using ex-isting visualization techniques,but through direct feature detec-tion and display.For our purpose of detection and display,we define the topology of a scalarfield defined with domainto consist of the following:1.The local maxima of2.The local minima of3.The saddle points of4.Selected integral curves joining each of the above Integral curves are defined as curves which are everywhere tangent to the gradientfield of.Intuitively,these curves rep-resent the path followed by a heat-seeking particle in a temper-aturefield,or the path followed by a ball rolling down a hill in a field of elevation values.In vectorfield topology,the curves ad-vected in theflowfield segment thefield into regions which are topologically equivalent to uniformflow.In the case of scalar topology,integral curves segment thefield into regions in which the gradientflow is uniform,or in other words,the scalar func-tion is monotonic.Such a segmentation of the scalarfield into regions of simple behavior reveals the structure of the scalar field for the visualization user.We outline the procedure for visualization of scalar topology as follows:1.Detect stationary(critical)points in.2.Classify stationary points.3.Integrate selected integral curves in gradientfield.In the following subsections,we will define our model of a continuous scalarfield and look at each of the steps defined above.3.1Scalar Field ModelIn typical scientific applications,data is represented at the nodes of a mesh of elements and interpolated linearly across the inte-rior of the elements.Such a data model is continuous and has a discontinuous gradientfield,making it unsuitable for our purpose of tracing integral curves in the gradientfield.We seek to construct a data model such that:1.The original nodal data is interpolated.2.The gradient at the boundaries is continuous.3.Critical points in the scalarfield are not removed,and thenumber introduced is keptsmall.Figure2:Artificial extreme points introduced by central differ-encingWe could satisfy thefirst two properties by computing derivatives by a method such as central differencing,which would uniquely define a continuous bi-cubic scalar inter-polant[3].However,such a choice of interpolant is likely to violate our third requirement by introducing critical points,as illustrated for the1-D case infigure2.To address this problem,we use a“damped”central differ-encing scheme as described in the following sections.The re-sulting scalarfield will remain a piecewise continuous bi-cubic function,which we represent in Bernstein-B´e zier form as:whereAs a result,the derivatives of the scalarfield can be repre-sented as:Figure3:Damped central differences maintain critical points Otherwise,the point data at,,and are monotonic,and we dampen the central difference as follows:Original vertex weight Weight determined by fist order partial derivativesWeight determined by second order partial derivatives in two variables Weight determined by third order partial derivative in three variables===Figure 6:Constraints on the mixed partial derivatives for 3Dever,due to the special construction of our interpolant,we have knowledge about where the critical points will occur,and can compute them quite efficiently.Critical points which occur at the vertices of the mesh will be preserved,and can be computed from the bilinear or trilin-ear field respectively,with the guarantee that they exist as well in the higher order shape preserving interpolant.Critical points interior to a cell will occur in locations at which the monotonic-ity constraint could not be met.In smooth parts of the field,there will be no problem computing a monotone field,which will guarantee the absence of critical points.In cells at which constraints were violated,we perform subdivision of the cell in order to locate the critical points,followed by Newton-Rhapson iteration to refine the positions of the zeroes.Saddles from the initial bilinear or trilinear mesh can be approximated by com-puting the position of the bilinear or trilinear saddle analyti-cally,followed by iteration in the bi-cubic or tri-cubic field,re-spectively.3.3Classification of Critical PointsQualitative information about the behavior of the gradient field near a critical point is obtained by analysis of the Hessian of ,given for 2D:The eigenvalues and eigenvectors of the above matrix de-termine the behavior of the gradient field and hence the scalar field near the critical point,much the same as for the behav-ior of a general vector field[7,17].One difference to note is that for a gradient field,the matrix of derivatives is symmetric (),and therefore the eigenvalues will all be real.This is intuitively expected,as imaginary eigenval-ues indicate rotation about the critical point,and a gradient field is an irrotational vector field.This observation allows us to sim-plify the classification of critical points as depicted in figure 7.A positive eigenvalue corresponds to gradient flow away from the critical point,while a negative eigenvalue indicatesMaximaMinimaRegular SaddleConstantDegenerate SaddleFigure 7:Some of the scalar critical pointsgradient flow toward the critical point.In the case of a sad-dle point,there is gradient flow toward and away from the crit-ical point,distinguishing it from the field behavior near other critical points.In this case,the eigenvectors corresponding to the positive and negative eigenvalues define the principal direc-tions of the flow toward and away from the saddle,respectively.It is this property that will be used in the next section to compute critical curves in the gradient field.3.4Tracing Integral CurvesHaving computed and classified the critical points,the final step for computing the scalar topology is the tracing of selected crit-ical curves between the detected points.Even for three and higher dimensional scalar fields we restrict our focus to only computing critical curves,and ignore critical surfaces and hy-persurfaces and other degeneracies in the field,Saddle points have the property that the eigenvectors of the Hessian are the separatrices of the saddle.A particle following the gradient field along the these directions will come to rest at the saddle point,while particles slightlyto either side of the sep-aratrices will diverge rapidly near the point.It is for this reason that saddle points and the critical curves associated with their separatrices are useful in determining the structure of a scalar field.The number of critical curves emanating from saddles along separatrices is twice the field dimension.In 2D,four crit-ical curves are computed for each saddle point,two in the di-rection corresponding to the positive eigenvalue,and two in the direction corresponding to the negative eigenvalue.In 3D,the number is six,and so on.Integral curves are computed using the following 4th order adaptive step Runge Kutta integration in the gradient field[27],where is the time step which adapts per iteration,and is a field point :1.2.4.5.The initial position for the iterative stepping is placed a small distance from the saddle point along the appropriate eigenvec-tor.The steps are bounded such that we take no less than5steps per cell,maintaining a high level of putation of the critical curve ends when we reach the vicinity of another critical point within a certain,in which case the curve termi-nates at that point.Other curves may end at the boundaries of the mesh.4Quality ComparisonHere we compare the qualities of scalar topology visualization with those of isocontours and colormapping.Integral curves are everywhere orthogonal to isocontours. The two techniques arise from an orthogonal definition of “structure”for a scalar variable.Contours are an attempt to compute and display the exact shape of an object in a scalar field,while the topology graph attempts to show the relations among all such objects in thefield,without giving the details of shapes of particular objects.Note that scalarfield topology is invariant under translation and uniform scaling.This quality is very similar to colormapping of scalar variables,in which the entire range of variables is mapped into a color space.Transla-tion and scaling of the scalar variables changes only the map-ping function,not the result.5ExamplesFigures8,9,10and the top two pictures of the Color Plate, demonstrate the use of scalar topology along with both isocon-tours and colormapped visualizations of density in an off-axis pion collision.Figure8uses a simple greyscale colormap,and it is clear that much of the area of interest in the center is washed out.Figure9uses a hue-based colormap and adds isocontours of three isovalues to reveal more of the structure and aid the per-ception.Infigure10,we show the scalar topology of density. This image clearly brings out the detail of the structure of the variable.The topfigures of the Color Plate show a closeup of the interesting topological regions,as well as shows a combi-nation of all three visualization techniques.While small scale structure is important in many scientific applications,in some circumstances the visualization user is in-terested only in large scale structure.For this situation,we ap-ply afilter to smooth the data before applying the topology de-tection algorithm.Figures11,12show two visualizations of topology in a scalarfield representing wind speed.Infigure 11,the unfiltered scalarfield topology reveals some noise in the data.Figure12shows the topology for the same data after a Gaussianfilter has been applied.The middlefigures of the color plate shows an example of scalar topology applied to a mathematically defined surface.In the leftfigure the scalar topology is displayed.In the rightfig-ure both topology and four isocontours are displayed.Notice that even with four isolevels displayed,there are critical points within contour regions which are not revealed like the two max-ima on the bottom left that are not separated by any isocontour. The bottomfigures of the color plate show an example of scalar topology applied to a3D scalarfields(the wave function computed for a high potential iron protein).6Other ApplicationsComputation of scalar topology has the potential to serve many other visualization and image processing applications.We mention only a few here:Data Correlation-Due in part to the invariance under trans-lation and scaling,scalar topology is useful in visually de-termining linear correlation between multiple scalar vari-ables.Image Co-registration-Scalar topology in adjacent planes provides a“1D skeleton”which may be used to align the planes.Warping/Morphing-Editing of the scalar backbone may be used to apply a warping effect to an image,or to warp be-tween the backbones of two similar images.Mesh Reduction-The scalar topology may serve as a guide to aid in computation of reduced resolution meshes. Surface Triangulation-Adaptive triangulation of arbitrary mathematical surfaces by decomposition into monotonic patches which may be subdivided to an arbitrary precision.7ConclusionsExisting scalar visualization techniques lack the ability to ex-plicitly present the structure of a scalarfield to the user.We have presented a definition of scalar structure and a straight-forward algorithm for computing and displaying the structure. For typical scientific data,the scalar data model remains true to the original linear data,minimizing introduction of false critical points,and also simplifying the detection of critical points. The resulting topology visualization serves to both provide information which is not available in commonly used scalar visualization techniques,as well as reinforcing or enhancing the information provided by common visualization techniques. Furthermore,computation of scalar topology offers promise to-ward improving several visualization and image processing ap-plications.AcknowledgementsWe are grateful to Lawrence Livermore National Lab for ac-cess to the pion collision data set.The Earth Science dataset iscourtesy the Space Science and Engineering Center at the Uni-versity of Wisconsin.The Wave function data set is courtesy the Visualization lab,SUNY-Stony Brook.This research was supported in part by AFOSR grant F49620-97-1-0278and ONR grant N00014-97-1-0398.References[1]R.Bader.Atoms in Molecules.Clarendon Press,Oxford,1990.[2]R.Bader,T.Tung Nguyen-Dang,and Y.Tal.Quantumtopology of molecular charge distributions ii.molecular structure and its charge.Journal of Chemical Physics, 70:4316–4329,1979.[3]C.Bajaj.Modelling physicalfields for interrogative visu-alization.In Tim Goodman and Ralph Martin,editor,The Mathematics of Surfaces VII,pages241–rmation Geometers Ltd.,1997.[4]C.L.Bajaj,V.Pascucci,and D.R.Schikore.The contourspectrum.In R.Yagel and H.Hagen,editors,Proceeding Visualization’97,pages167–173,Phoenix,AZ,October 1997.IEEE Computer Society&ACM SIGGRAPH. [5]R.Beatson and Z.Ziegler.Monotonicity preserving sur-face interpolation.In SIAM J.Numer.Anal.,volume22, pages401–411,April1985.[6]L.Bergman,B.Rogowitz,and L.Treinish.A rule-basedtool for assisting colormap selection.In G.M.Nielson andD.Silver,editors,Visualization’95Proceedings,pages118–125,October1995.[7]W.Boyce and R.DiPrima.Elementary Differential Equa-tions and Boundary V alue Problems.John Wiley and Sons,Inc.,New York,fifth edition,1992.[8]R.Carlson and F.Fritsch.Monotone piecewise bicubicinterpolation.In SIAM J.Numer.Anal.,volume22,pages 386–400,April1985.[9]R.Carlson and F.Fritsch.An algorithm for monotonepiecewise bicubic interpolation.In SIAM J.Numer.Anal., volume26,pages230–238,February1989.[10]K.Collard and G.Hall.Orthogonal trajectories ofthe electron density.International Journal of Quantum Chemistry XII,0:623–637,1977.[11]S.Dodd,D.McAllister,and J.Roulier.Shape-preservingspline interpolation for specifying bivariate functions on grids.In IEEE Computer Graphics and Applications, pages70–79,September1983.[12]R.J.Fowler and J.J.Little.Automatic extraction of irreg-ular network digital terrain models.In Computer Graph-ics(SIGGRAPH’79Proceedings),volume13(3),pages 199–207,August1979.[13]F.Fritsch and R.Carlson.Monotone piecewise cubic in-terpolation.In SIAM J.Numer.Anal.,volume17,pages 238–246,April1980.[14]N.D.Gershon.Visualization of fuzzy data using general-ized animation.In A.E.Kaufman and G.M.Nielson,ed-itors,Visualization’92Proceedings,pages268–273,Oc-tober1992.[15]A.Globus,C.Levit,and sinski.A tool for visual-izing the topology of three-dimensional vectorfields.InG.M.Nielson and L.Rosenblum,editors,Visualization’91Proceedings,pages33–40,October1991.[16]E.Grosse.Approximation and Optimization of ElectronDensity Maps.PhD thesis,Stanford University,1980. [17]J.Helman and L.Hesselink.Visualizing vectorfieldtopology influidflows.IEEE Computer Graphics and Ap-plications,11(3),1991.[18]J.Hoscheck and sser.Fundamentals of ComputerAided Geometric Design.A K Peters,Wellesley,Mas-sachusetts,1993.[19]V.Interrante,H.Fuchs,and S.Pizer.Enhancing trans-parent skin surfaces with ridge and valley lines.In G.M.Nielson and D.Silver,editors,Visualization’95Proceed-ings,pages52–59,October1995.[20]T.Itoh and K.Koyamada.Isosurface extraction by usingextrema graphs.In R.D.Bergeron and A.E.Kaufman, editors,Visualization’94Proceedings,pages77–83,Oc-tober1994.[21]William E.Lorensen and Harvey E.Cline.Marchingcubes:A high resolution3D surface construction algo-rithm.In Maureen C.Stone,editor,Computer Graphics (SIGGRAPH’87Proceedings),volume21,pages163–169,July1987.[22]W.Lorenson.Marching through the visible man.In G.M.Nielson and D.Silver,editors,Visualization’95Proceed-ings,pages368–373,October1995.[23]J.Marsden and A.Tromba.V ector Calculus.W.H.Free-man and Company,New York,third edition,1988. [24]J.E.McCormack,M.N.Gahegan,S.A.Roberts,J.Hogg,and B.S.Hoyle.Feature-based derivation of drainage net-works.Int.Journal of Geographical Information Systems, 7(3):263–279,1993.[25]B.K.Natarajan.On generating topologically consis-tent isosurfaces from uniform samples.Technical Report HPL-91-76,Hewlett-Packard,June1991.[26]G.M.Nielson and B.Hamaan.The asymptotic decider:Resolving the ambiguity of marching cubes.In G.M.Nielson and L.Rosenblum,editors,Visualization’91Pro-ceedings,pages83–91,October1991.Figure8:Visualization of density in a pion collision simulation: GreyscalecolormappingFigure9:Visualization of density in a pion collision simulation: Isocontours overlayed on hue based colormapping[27]W.Press,S.Teukolsky,W.V etterling,and B.Flannery.Numerical Recipes in C,Second Edition.Cambridge Uni-versity Press,1992.[28]A.Rosenfeld and A.Kak.Digital Picture Processing.Academic Press,San Diego,1982.[29]E.Tufte.The Visual Display of Quantitative Information.Graphics Press,1983.[30]Jane Wilhelms and Allen V an Gelder.Octrees for fasterisosurface generation extended abstract.In Computer Graphics(San Diego Workshop on V olume Visualization), volume24,pages57–62,November1990.[31]Jane Wilhelms and Allen V an Gelder.Topological con-siderations in isosurface generation extended abstract.In Computer Graphics(San Diego Workshop on V olume Vi-sualization),volume24,pages79–86,November1990.Figure10:Visualization of density in a pion collision simula-tion:Topology overlayed on hue basedcolormappingFigure11:Visualization of wind speed from a climate model: Scalar topology for noisydataFigure12:Visualization of wind speed from a climate model: After applying a Gaussianfilter。