当前位置：文档之家› Active learning in multimedia annotation and retrieval-A survey

Active learning in multimedia annotation and retrieval-A survey

Active Learning in Multimedia Annotation and Retrieval:A Survey

MENG WANG and XIAN-SHENG HUA,Microsoft Research Asia

10 Active learning is a machine learning technique that selects the most informative samples for labeling and

uses them as training data.It has been widely explored in multimedia research community for its capability

of reducing human annotation effort.In this article,we provide a survey on the efforts of leveraging active learning in multimedia annotation and retrieval.We mainly focus on two application domains:image/video annotation and content-based image retrieval.We?rst brie?y introduce the principle of active learning

and then we analyze the sample selection criteria.We categorize the existing sample selection strategies

used in multimedia annotation and retrieval into?ve criteria:risk reduction,uncertainty,diversity,density

and relevance.We then introduce several classi?cation models used in active learning-based multimedia annotation and retrieval,including semi-supervised learning,multilabel learning and multiple instance learning.We also provide a discussion on several future trends in this research direction.In particular,we

discuss cost analysis of human annotation and large-scale interactive multimedia annotation.

Categories and Subject Descriptors:H.3.1[Information Storage and Retrieval]:Content Analysis and Indexing

General Terms:Algorithms,Experimentation,Human Factors

Additional Key Words and Phrases:Active learning;image annotation;video annotation;content-based

image retrieval;sample selection;model learning

ACM Reference Format:

Wang,M.and Hua,X.-S.2011.Active learning in multimedia annotation and retrieval:a survey.ACM Trans.

Intell.Syst.Technol.2,2,Article10(February2011),21pages.

DOI=10.1145/1899412.1899414https://www.doczj.com/doc/ff15564874.html,/10.1145/1899412.1899414

1.INTRODUCTION

With rapid advances in storage devices,networks and compression techniques,multi-

media data are increasing in an explosive way.The management of these data becomes

a challenging task.The automatic understanding and ef?cient retrieval of these data

are highly desired.Active learning is a machine learning technique that learns a model

in an interactive way.In comparison with passive learning that trains models with pre-collected training data,active learning is able to select the most representative data in

an iterative manner based on the model learned in each iteration.In this way,better performance can be obtained in comparison with using the precollected or randomly collected training data.Therefore,active learning is an effective approach to reducing human labeling effort(or achieving better performance with the same amount of hu-

man efforts),and consequently it has been widely explored in multimedia annotation Authors’address:M.Wang and X.-S.Hua,Microsoft Research Asia,5/F,Beijing Sigma Center,

No.49,Zhichun Road,Hai Dian District,Beijing,China100080;email:eric.mengwant@https://www.doczj.com/doc/ff15564874.html,;

xshua@https://www.doczj.com/doc/ff15564874.html,.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted

without fee provided that copies are not made or distributed for pro?t or commercial advantage and that

copies show this notice on the?rst page or initial screen of a display along with the full citation.Copyrights for components of this work owned by others than ACM must be honored.Abstracting with credit is permitted.

To copy otherwise,to republish,to post on servers,to redistribute to lists,or to use any component of this

work in other works requires prior speci?c permission and/or a fee.Permissions may be requested from Publications Dept.,ACM,Inc.,2Penn Plaza,Suite701,New York,NY10121-0701USA,fax+1(212)

869-0481,or permissions@https://www.doczj.com/doc/ff15564874.html,.

c 2011ACM2157-6904/2011/02-ART10$10.00

DOI10.1145/1899412.1899414https://www.doczj.com/doc/ff15564874.html,/10.1145/1899412.1899414

ACM Transactions on Intelligent Systems and Technology,Vol.2,No.2,Article10,Publication date:February2011.

10:2M.Wang and X.-S.Hua

Fig.1.A typical learning-based multimedia annotation scheme.

and retrieval.In this article,we consider two application domains:(1)image and video annotation;and(2)content-based image retrieval.

1.1.Image and Video Annotation

Annotation aims to assign images and videos a set of labels that describe their content at syntactic and semantic levels[Naphade and Smith2004b;Wang et al.2009a,2009b; Li and Wang2008].With the help of these labels,the management and manipulation of image and video data can be greatly facilitated,such as delivery,summarization and retrieval.Here we consider two annotation scenarios,namely multiclass and multilabel annotation.Multiclass annotation can be viewed as a categorization task,that is,cat-egorizing images and videos into a set of exclusive and prede?ned concepts.Multilabel annotation is more challenging in comparison with multiclass problem as the labels are nonexclusive and each image or video clip can be assigned multiple labels.Both multiclass annotation and multilabel annotation are typically accomplished with ma-chine learning techniques.The process is illustrated in Figure1.A labeled training set is collected and low-level features are extracted.Models are learned with the training data,and then newly given image or video clip can be directly predicted accordingly. The difference is that multiclass annotation is naturally a multiclass classi?cation problem,whereas multilabel annotation is usually regarded as a set of binary classi-?cation problems.Given a concept,each unit is classi?ed as“positive”or“negative”according to whether it is associated with this concept.

Despite extensive research has been dedicated to image and video annotation,ex-isting studies show that a large training set is often needed in order to achieve a reasonable performance due to the large variety of the content of image and video data.However,manual annotation is laborious and time-consuming.For example,ex-isting studies show that typically annotating1hour of video with100concepts can take anywhere between8to15hours[Lin et al.2003].Therefore,several works use active learning in image and video annotation to reduce the required training data.

ACM Transactions on Intelligent Systems and Technology,Vol.2,No.2,Article10,Publication date:February2011.

Active Learning in Multimedia Annotation and Retrieval:A Survey10:3

Fig.2.An illustration of active learning-based CBIR.Besides illustrating retrieval results,the system also provides several images for labeling,and users can pick up the relevant images(marked with red boxes).

1.2.Content-Based Image Retrieval

Most of the existing commercial image retrieval systems are built based upon textual information,such as the images’?le names,ALT texts,captions and surrounding texts,but they do not directly leverage the content of images.Content-Based Image Retrieval(CBIR)has been an active research?eld for over15years,which aims to retrieve the desired images based on a user provided or synthesized query image[Rui et al.1999;Smeulders et al.2000].Given the query image,the most intuitive approach is to retrieve the close images in a feature space.However,despite many different features and similarity metrics have been explored,simply retrieving images with a query example usually cannot achieve satisfying performance.Relevance feedback is an approach to addressing this dif?culty[Rui et al.1998].As illustrated in Figure2,it improves the retrieval performance by letting users label several returned results such that a classi?er or better similarity measurement can be learned.As a consequence, active learning?nds its application for its capability of providing the most informative images to users for labeling.

Many different active learning algorithms have been used in these two application domains.In this article,we present a comprehensive survey on these efforts.As we will introduce later,sample selection and model learning are two major components in an active learning scheme.We will analyze and categorize the existing sample selection approaches explored in multimedia annotation and retrieval.We also intro-duce the combination of active learning with several machine learning techniques, including semi-supervised learning,multilabel learning and multiple instance learn-ing.It is worth noting that active learning has been widely explored in many different applications,such as text categorization[Tong and Koller2000],speech recognition [Hakkani-tur et al.2002],information extraction[Settles et al.2008]and dataset collection[Collins et al.1995].In this work we will focus on multimedia annota-tion and retrieval,and for other applications readers can refer to Settles[2009]and Olsson[2009]and the references therein.Multimedia annotation and retrieval is an ACM Transactions on Intelligent Systems and Technology,Vol.2,No.2,Article10,Publication date:February2011.

10:4M.Wang and X.-S.Hua important research?eld and here we have chosen two application domains from it, namely image/video annotation and content-based image retrieval.It is worth men-tioning that active learning has also been applied in other applications,such as content-based music retrieval[Mandel et al.2006;Foote1997].Settles[2009]and Olsson[2009]have reviewed many active learning techniques,but they mainly fo-cus on the machine learning methodologies and the applications of natural language processing and bioinformatics.Huang et al.[2008]introduced the application of ac-tive learning in interactive multimedia retrieval.But they have not introduced the exploitation of active learning in multimedia annotation.In addition,Huang et al. [2008]mainly focuses on introducing several typical and exemplary active learning paradigms in multimedia retrieval,whereas there is a lack of categorization and sum-mary of the existing algorithms.This motivates us to develop a dedicated survey on active learning-based multimedia annotation and retrieval,which analyzes and cate-gorizes the existing research efforts as well as discusses several future trends in this direction.

The rest of the article is organized as follows.First,we brie?y introduce active learning and a typical SVM-based active learning scheme in Section2.In Section3,we analyze the sample selection approaches of active learning in multimedia annotation and retrieval.In Section4,we introduce several classi?cation models of active learning in multimedia annotation and retrieval.Section5discusses some future direction of active learning in multimedia annotation and retrieval.Finally,we conclude the paper in Section6.

2.A BRIEF INTRODUCTION OF ACTIVE LEARNING

Active learning actually contains three different paradigms:membership query learn-ing[Angluin1998;King et al.2004],stream-based active learning[Cohn et al.1994; Dagan and Engselon1995]and pool-based active learning[Cohen et al.1996].In this article,we focus on pool-based active learning,in which samples are selected from an existing pool for labeling according to certain criteria.A typical active learning system is composed of two parts,that is,a learning engine and a sample selection engine.It works in an iterative way,as illustrated in Figure3.In each round,the learning engine trains a model based on the current training set.Then,the sample selection engine selects the most informative unlabeled samples for manual labeling,and these samples are added to training set.In this way,the obtained training set is more informative than that gathered by random sampling.

It is obvious that the sample selection strategy plays a crucial role in an active learning scheme,as the learning engine can actually adopt any existing classi?cation algorithm.Cohen et al.[1996]suggests that the optimal active learning approach should select the samples that minimize expected risk.In Cohen et al.[1996]and Roy and McCallum[2001],reduced risk is estimated with respect to each unlabeled sample, and then the most effective samples are selected.However,for most learning methods, it is infeasible to estimate the risk.Therefore,practically most active learning methods adopt an uncertainty criterion,that is,to select the samples closest to the classi?cation boundary[Tong and Chang2001;Tong and Koller2000].Wu et al.[2006]and Zhang and Chen[2003]further proposed to incorporate the density distribution of samples into the sample selection process.Brinker[2003]pointed out that the selected samples should be diverse.

Among many different classi?cation models and sample selection methods,we?rst introduce SVM-based active learning as a typical scheme.More approaches will be introduced in the next sections.For a binary classi?cation problem,given a training data set that contains l labeled samples{(x1,y1),(x2,y2),...,(x l,y l)},x i∈R d,y i∈{0,1}, ACM Transactions on Intelligent Systems and Technology,Vol.2,No.2,Article10,Publication date:February2011.

Active Learning in Multimedia Annotation and Retrieval:A Survey 10:5

Fig.3.An illustration of active learning.It selects the most informative samples for manual labeling,such that the obtained training set is more effective than that collected by random sampling.

the classi?cation hyperplane is de?ned as

w, (x ) +b =0,(1)

where (.)is a mapping from R d to a Hilbert Space H ,and .,. denotes the dot product in H .Thus,the decision function f (x )is

f (x )=sign ( w, (x ) +b ).(2)

SVM aims to ?nd the hyperplane with the maximum margin between the two classes,that is,the optimal hyperplane.This can be obtained by solving these quadratic opti-mization problem

min w,b ,ξ12 w 2+C l i =1

ξi subject to y i ( w, (x i ) +b )≥1?ξi

ξi ≥0,?i =1,2,...,l (3)

In Tong and Chang [2001]and Tong and Koller [2000],an active learning approach is proposed using SVM.It is derived from the minimization of the expected size of version space.Version space is de?ned to be the set of consistent hypotheses [Mitchell 1982].It means that a hypothesis f is in version space if it satis?es f (x i )>0if y i =1and f (x i )<0if y i =?1for every training sample x i .For simplicity ,we only consider an unbiased SVM model,that is,the separating hyperplane that passes through origin.Then,the version space can be denoted by

V ={w | w =1,y i w, (x i ) >0,i =1,2,...,n }(4)

It is proved that selecting the samples closest to the current hyperplane in each round is an approximate approach to reducing version space in a fastest way [Tong and Chang 2001;Tong and Koller 2000].Note that the samples closest to the hyperplane ACM Transactions on Intelligent Systems and Technology ,Vol.2,No.2,Article 10,Publication date:February 2011.

10:6M.Wang and X.-S.Hua

Fig.4.The pseudo-code of SVM-based active learning approach.

are exactly the ones with highest uncertainty measure.Therefore,this sample selection strategy can also be regarded as based on an uncertainty measurement.The detailed implementation of SVM-based active learning scheme is illustrated in Figure4.

3.SAMPLE SELECTION STRATEGIES IN ACTIVE LEARNING

As previously mentioned,sample selection plays a crucial role in active learning.In this section,we provide a review of typical sample selection strategies used in active learning-based multimedia annotation and retrieval.

3.1.Risk Reduction

Intuitively,the optimal sample selection strategy is consistent with the aim of the learner,that is,minimizing the expected risk,which can be denoted by

E T[(?y(x)?y(x))2|x]p(x)dx,(5)

where y(x)is the true label of sample x,?y is the classi?er output,p(x)is the probability density function of sample distribution,and E T denotes expectation over both the conditional density P(y|x)and training data L.If the reduction of the expected risk of labeling each unlabeled sample can be estimated,then the optimal sample selection can be achieved.

For several models,the expected risk reduction can be estimated.For example,Hoi and Lyu[2005]and Bao et al.[2009]estimated the risk reduction after labeling an image in a graph-based semi-supervised learning algorithm and a so-called locally non-negative linear structure learning algorithm,respectively.Yan et al.[2003]proposed a method that estimates the reduced risk after labeling a sample in a multiclass setting. Qi et al.[2009]proposed a multilabel active learning method to estimate the reduced risk after labeling an image with respect to a concept.Cohen et al.[1996]decomposed the risk in Eq.(5)into three terms,that is,

E T[(?y(x)?y(x))2|x]=E[(y(x)?E[y(x)])2|x]+E L[(?y(x)?E[y(x)])2|x]

+E L[(?y(x)?E L[?y(x)])2|x],(6) ACM Transactions on Intelligent Systems and Technology,Vol.2,No.2,Article10,Publication date:February2011.

Active Learning in Multimedia Annotation and Retrieval:A Survey10:7 where E L[.]is an expectation over labeled data L and E[.]is an expectation over the conditional density P(y|x).The?rst term estimates the variance of the true label y given only x,the second term estimates the prediction error induced by the model itself,and the third term is a variance that estimates the mean squared error of prediction with respect to the true model.Cohen et al.[1996]then proposed a training sample selection method to minimize the variance term by estimating the reduced variance after adding each sample to training set.But unfortunately,estimating the expected variance reduction is not an easy task for most classi?ers.Roy and McCallum [2001]proposed a method to empirically estimate the reduced expected risk based on Monte Carlo sampling.But this approach is computationally intensive since it needs to traverse all samples to estimate the reduced risk after adding it to training set and updating the model.In multimedia annotation and retrieval,we usually need to deal with large-scale data,and thus many efforts resort to more heuristic sample selection criteria.Here,we categorize these strategies into the following four criteria: uncertainty,diversity,density and relevance.

3.2.Uncertainty

Applying uncertainty criterion means that the most uncertain samples should be se-lected.This heuristic stems from the fact that in many learning algorithms the essential classi?cation boundary can be preserved based solely on the nearby samples,and the sample that are far from the boundary can be regarded as redundant.For example, in a SVM model the hyperplane can be mainly constructed based on only the support vectors(i.e.,the samples that lie closest to the hyperplane).A typical measure that estimates uncertainty is entropy,that is,

Uncertainty(x)=?

Pr(y i|x)logp(y i|x),(7)

where Pr(y i|x)is the estimated probability of y i given x.For a binary classi?cation, the samples that are closest to classi?cation boundary will be selected according to the above equation.The uncertainty criterion can also be viewed as a greedy strategy to reduce risk(without model updating,the method to reduce maximal expected risk is to select the most uncertain samples).This criterion has been widely explored for its simplicity[He et al.2004;Tong and Chang2001;Wu et al.2006].Since nearly all classi?cation models can output prediction probabilities or con?dence scores,the uncertainty-based sample selection can be easily accomplished with an O(n)cost,where n is the number of samples in unlabeled pool.For SVM model,uncertainty criterion will select samples that are closest to classi?cation boundary,and it is proved that this strategy will reduce the version space in a fastest way[Tong and Chang2001;Tong and Koller2000].When multiple learners exist,a widely applied strategy is selecting the samples that have the maximum disagreement amongst them[Freund et al.1997; Seung et al.1992].Here the disagreement of multiple learners can also be regarded as an uncertainty measure,and thus this strategy is categorized into the uncertainty criterion as well.For multiclass annotation,Joshi et al.[2009]proposed an uncertainty estimation method that considers the posterior probabilities of the best and the second best predictions,that is,

Uncertainty(x)=Pr(y1|x)?Pr(y2|x)(8) where y1and y2are the classes with the largest and second largest posterior class probabilities,respectively.If their margin is small,it means that the model is more confused on the sample and thus it is with high uncertainty.It can be easily proved that this approach is equivalent to the conventional entropy-based method in binary

ACM Transactions on Intelligent Systems and Technology,Vol.2,No.2,Article10,Publication date:February2011.

10:8M.Wang and X.-S.Hua classi?cation,but in multiclass setting the proposed method has shown remarkable improvements on several benchmark datasets [Joshi et al.2009].

3.3.Diversity

The diversity criterion is ?rst investigated in batch mode active learning.In many applications,we need to select a batch of samples instead of just one in an active learning iteration.For example,sometimes the updating or retraining of a model needs extensive computation,and thus labeling just one sample each time will make the active learning process quite slow .Joshi et al.[2009]proposed that the selected samples in a batch should be diverse.Given a kernel K ,the angle between two samples x i and x j is de?ned as

cos( x i ,x j )=|K (x i ,x j )|

K (x i ,x i )K (x j ,x j )

.(9)Then,the diversity measure can be estimated as Diversity (x )=1?max x i ∈S |K (x i ,x )|

K (x i ,x i )K (x ,x ),(10)

where S is the set of selected samples in a batch.Directly optimizing the selection of a batch of samples is dif?cult,and thus typically it is accomplished with a greedy process,that is,each time a sample is selected according to the combination of diversity and other criteria and then it is added to the batch.Several studies [Dagli et al.2006;Gos-selin and Cord 2004;Wu et al.2006]also show that the diversity criterion should not only be investigated in batch mode but also be considered on all labeled samples,such that the selected samples will not be constrained in a more and more restricted area.In Hoi et al.[2006],Fisher information matrix is adopted for sample selection to keep the diversity of the selected samples,and in Dagli et al.[2006]an information-theoretic diversity measure is proposed based on Shannon’s entropy .Given a set of points {x 1,x 2,...,x n },an empirical entropy can be estimated by Kernel Density Estimation as h {x i }n i =1 =?1n n i =1log ??1n n j =1

K (x i ,x j )??.(11)Then,the sample is selected based on the criterion that maximally reduces empirical entropy [Dagli et al.2006].

3.4.Density

Several works indicate that the samples within the regions of high density should be selected.Wu et al.[2006]de?ned a “representativeness”measure for each sample according to its distance to nearby samples.Zhang and Chen [2003]estimated data distribution by Kernel Density Estimation (KDE)[Parzen 1962]and then explored it in sample selection.Given a set of points {x 1,x 2,...,x n },the probability density function ?p (x )can be estimated by

?p (x )=1n n i =1K (x ?x i ),(12)where K (x )is a kernel function that satis?es K (x )>0and K (x )dx =1.Consequently ,

the density measure can be de?ned by normalizing to [0,1]as follows:Density (x )= n

j =1K (x ?x j )max i n j =1

K (x ?x j ).(13)ACM Transactions on Intelligent Systems and Technology ,Vol.2,No.2,Article 10,Publication date:February 2011.

Active Learning in Multimedia Annotation and Retrieval:A Survey10:9 Besides that,there are also several other clustering-based methods which?rst group the samples and then only select the cluster centers in active learning[Nguyen and Smeulders2004;Qi et al.2004;Song et al.2005].Since cluster centers usually lie in the regions with high density,these works can also be regarded as applying a density strategy,that is,trying to select samples in dense regions.Qi et al.[2004]proposed a method that will adjust the clusters with a merging and splitting operations after each round of active learning.

3.5.Relevance

Relevance strategy is usually applied in multilabel image/video annotation and re-trieval.As introduced in Section1,in these tasks samples are classi?ed to be relevant or irrelevant according to whether they are associated with the given concept or query. Of course,the aforementioned criteria such as uncertainty can also be applied in these tasks.But in many cases it is found that the using relevance criterion,that is,di-rectly selecting the samples that have the highest probabilities to be relevant,is more effective[Ayache and Qu′e not2007;Gosselin and Cord2004;Vendrig et al.2002]. Gosselin and Cord[2004]investigated SVM-based active learning in CBIR and showed that better performance can be achieved by selecting more relevant images.Ayache and Qu′e not[2007]have conducted an empirical study on different sample selection strategies for active learning on TRECVID video annotation benchmark.Experimen-tal results clearly show that for several concepts the relevance criterion can achieve comparable or even better performance than the uncertainty criterion.This can be at-tributed to the fact that positive samples are usually less than negative ones in these tasks,and the distribution of negative samples is usually in very broad domain.In addition,in these tasks ranking performance is usually adopted for evaluation,such as Average Precision(even for annotation,ranking performance is widely adopted instead of classi?cation accuracy).This implies that?nding more relevant samples with high con?dence is more important than correctly classifying the samples.Therefore,the rel-evance criterion may outperform the uncertainty criterion since it is able to?nd more positive samples.Therefore,relevant samples should contribute more than irrelevant samples.

3.6.Discussion

Up to now,we have discussed?ve different typical sample selection criteria.It’s dif-?cult to directly compare the superiorities of these criteria as they will depend on speci?c tasks as well as the classi?cation models.For example,simply applying rele-vance criterion may achieve the best results under several extremely unbalanced cases that positive samples are much rare.When we choose batch mode active learning for computational ef?ciency,integrating diversity criterion will be helpful.In many cases, these criteria are combined explicitly or implicitly.The diversity and density criteria are rarely used individually as they are independent with the classi?cation results. Most frequently they are used to enhance the uncertainty criterion.Wang et al.[2007] have combined uncertainty,diversity,density and relevance in sample selection as Effectiveness(x)=γDensity(x)(αUncertainty(x)+(1?α)Relevance(x))

+(1?γ)Diversity(x).(14) Several sample selection algorithms implicitly explore multiple criteria.[Yuan et al. 2006]adopted the uncertainty and diversity criteria in CBIR,but meanwhile,they also propose to shift the boundary such that more relevant samples can be selected.Thus, this strategy can be viewed as a combination of the uncertainty,diversity and relevance criteria.Hoi et al.[2006]proposed a batch mode sample selection approach based on the ACM Transactions on Intelligent Systems and Technology,Vol.2,No.2,Article10,Publication date:February2011.

10:10M.Wang and X.-S.Hua principle of maximally reducing Fisher information ratio.They show that the derived sample selection criteria will select samples that are uncertain,dissimilar to already selected samples and similar to unlabeled samples.Therefore,it is consistent with our uncertainty,diversity and density criteria.In the Appendix,there is a table illustrat-ing several research works that adopt active learning in image/video annotation or retrieval.We have illustrated the description of the applications,the adopted learning methods and sample selection criteria.

4.CLASSIFICATION MODELS FOR ACTIVE LEARNING

As introduced in Section2,an active learning process can be regarded as the repeat of a prediction and a sample selection steps.Therefore,with certain sample selection strategy,nearly all classi?cation algorithms can be applied to form an active learning scheme.From the Table I in Appendix we can see that many different classi?cation models have been explored,such as k-NN,SVM,Gaussian Mixture Model[Redner and Walker1984],Maximum Entropy Classi?er[Berger et al.1996]and graph-based semi-supervised learning[Zhu et al.2003a].Figure4illustrates the active learning process for SVM,and actually we only need to replace SVM with other classi?cation models and then obtain their active learning schemes.In this section,we mainly focus on the exploitation of active learning in several special machine learning techniques.More speci?cally,we will consider the following combinations:(1)Semi-Supervised Learn-ing+Active Learning;(2)Multilabel Learning+Active Learning;and(3)Multiple Instance Learning+Active Learning.

4.1.Semi-Supervised+Active Learning

Semi-Supervised Learning(SSL)is a technology that deals with the dif?culty of train-ing data insuf?ciency[Chapelle et al.2006;Zhu2009].While labeled data are usually limited,unlabeled data can be easily obtained.For example,nowadays we can easily collect large-scale multimedia data from Internet.In contrast to supervised learning, SSL leverages a large amount of unlabeled samples based on certain assumptions,such that the obtained models can be more accurate than those achieved by purely super-vised methods.Since SSL and active learning both involve unlabeled data,they have been exploited together in many image/video annotation and retrieval works[Wang et al.2007;Zhu et al.2003b;Sahbi et al.2008].Zhu et al.[2003b]proposed an active learning approach based on a graph-based SSL method,in which the reduction of ex-pected risk of labeling each sample can be predicted without retraining classi?cation model.Hoi and Lyu[2005]further integrated SVM with the graph-based SSL method and applied the method to CBIR.Bao et al.[2009]adopted a similar approach based on an improved graph-based SSL method named Locally Non-negative Linear Structure Learning(NLSL).Existing research in machine learning community has also demon-strated the effectiveness of combining active learning and multiview semi-supervised learning(such as co-training[Blum and Mitchell1998])from both theoretical and empirical perspectives[Muslea et al.2002;Wang and Zhou2008].Song et al.[2005] proposed an active learning method based on co-training in video annotation.They construct two classi?ers with two different modalities and then estimate the posteriori probabilities with an independence assumption of the two modalities.In each iteration, several uncertain samples are selected for manual labeling,and the samples that are certain for one classi?er are added to training set to teach the other classi?er.The active learning process is as follows:

(1)Train two initial complementary classi?ers based on the complementary feature

sets from a training set.

ACM Transactions on Intelligent Systems and Technology,Vol.2,No.2,Article10,Publication date:February2011.

Active Learning in Multimedia Annotation and Retrieval:A Survey10:11 (2)Predict the labels of testing samples using the two classi?ers.The label of each

testing sample is then determined considering the outputs of two classi?ers and their con?dence scores.

(3)Select the most uncertain samples for each classi?er,and ask user to label them.

(4)Take the manually labeled samples as well as several con?dent samples as new

training data to update the classi?ers.

(5)Repeat(2)–(4)for certain number of rounds and output the?nal results.

4.2.Multilabel Learning+Active Learning

In multilabel image/video annotation,the concepts are not exclusive and each im-age or video clip can be associated with multiple concepts.An intuitive active learning approach in the multilabel context is to select samples considering the models of all con-cepts and then manually label these samples with respect to all the concepts.Naphade and Smith[2004a]adopted this strategy in active multilabel video annotation and showed that it introduces less labor cost in comparison with conducting active learning for each concept individually.However,this conclusion is derived by counting the num-ber of labeled samples.Existing studies show that manually labeling an image/video clip with respect to many concepts is laborious.In addition,users tend to miss relevant labels in this type of manual labeling[Volkmer et al.2005].Therefore,several works resort to selecting a set of samples for each individual concept or directly selecting sample-concept pairs.In these approaches,humans only need to label each sample with respect to one concept.

Wang et al.[2007]adopted a two-step process.In each iteration,?rst a concept is selected with the expectation of getting the highest performance gain,and a batch of suitable samples is selected to be annotated for this concept based on the combination of multiple criteria.The concept selection criterion can be viewed as optimizing the average of the annotation performance of multiple concepts.Its process is as follows: (1)Initialization.A batch of samples is selected for each concept and a classi?er is

learned for each concept.This process repeats for two rounds,and thus the perfor-mance gain of each classi?er can be predicted.

(2)The concept with highest performance gain is selected.

(3)Select a batch of samples according to certain criteria(more speci?cally,

Wang et al.[2007]adopted Eq.(12)).

(4)Classi?er is learned for the concept,and its performance gain can be predicted

based on the samples labeled in the last two rounds.

(5)Repeat(2)–(4)for certain number of rounds and output the?nal results.

Qi et al.[2007]proposed a method that selects sample-concept pairs for manual an-notation in a correlative multilabel learning approach.The sample-concept selection is derived based on the reduction of multilabel Bayesian error bound,which is formu-lated as

(x s,y s)=

arg max

x s∈P,y s∈U(x s)

i=1

MI(y i;y s|y L(x s),x s),(15)

where P is unlabeled sample pool,(x s,y s)is the sample-concept pair to be selected,U(x) and L(x)indicate the index sets of labeled and unlabeled concepts of x respectively,and MI(y i;y s|y L(x s),x s)indicates the mutual information between the variables y i and y s

given the known labeled parts y L(x

i )

.This equation can be simply rewritten as

(x s,y s)=

arg max

x s∈P,y s∈U(x s)

H(y s|y L(x s),x s)+

i=1,i=s

MI(y i;y s|y L(x s),x s)(16)

ACM Transactions on Intelligent Systems and Technology,Vol.2,No.2,Article10,Publication date:February2011.

10:12M.Wang and X.-S.Hua Now we can see that the sample-concept selection is based on two terms,namely H(y s|y L(x s),x s)and MI(y i;y s|y L(x s),x s).The?rst term is an entropy measure that es-timates the uncertainty of the selected pair(x s,y s),and this is consistent with the previously introduced uncertainty criterion.The second term actually measures the redundancy among the selected concept and the rest ones.By incorporating this term, the sample-concept selection is able to reduce the uncertain of the other concepts. Therefore,this method not only considers the uncertainty of the to-be-selected sample-concept pair but also the correlations of concepts.Zhang et al.[2009]further extended the algorithm to a multiview scenario,and the selection of label-sample pairs further takes their uncertainty over different views into consideration.

4.3.Multiple Instance Learning+Active Learning

Multiple Instance Learning(MIL)is a technique for problems that the label infor-mation of training data is incomplete[Dietterich et al.1997;Maronand and Ratan 1998;Zhang and Goldman2001;Andrews et al.2002].In a typical supervised learning task,every training instance is associated with a label,but in MIL the labels are only assigned to bags of instances.This technology has been widely explored in different applications such as drug activity prediction[Dietterich et al.1997],image annotation and retrieval[Maronand and Ratan1998;Zhang and Goldman2001],and text catego-rization[Andrews et al.2002].It is particularly suitable for image/video annotation and retrieval because many concepts actually only specify part of images/videos,such as several objects.For example,an image can be viewed as a bag and it contains many in-stances where each instance is a region,but labels are usually assigned only to images. When applying active learning with MIL,an intuitive approach is to adapt the existing criteria for supervised learning to the multiple-instance content.For example, we can adopt the uncertainty criterion by estimating the con?dence scores of bag classi?cation.However,in image/video annotation and retrieval humans can label both bags and instances.For example,users can label a whole image,and they can also label a speci?c region in the image.Therefore,several efforts have been dedicated to developing multi-level active MIL algorithms.A typical multi-level active MIL scheme is shown in Figure5.Clearly,the main dif?culty relies on the choice between bag labeling and instance labeling and the corresponding selection criteria.

Settles et al.[2007]proposed an approach that labels instances in positive bags. They proposed two methods for instance selection in positive bags,one is multiple-instance uncertainty and the other is expected gradient length.Vijayanarasimhan and Grauman[2008]proposed a method that is able to label both instances and bags.They assume that labeling an instance introduces more cost than labeling a bag.This is easy to understand as labeling relevant regions will be more complex than assigning a label to a whole image.For example,Vijayanarasimhan and Grauman[2008]showed that labeling relevant regions needs about four times cost than labeling the whole image.Their proposed approach takes into account both the cost and the reduction of expected risk and then makes decision accordingly.Overall,the optimal scheme for the combination of multiple instance learning and active learning also depends on speci?c models and tasks,such as whether users are willing to label media instances(such as image regions)and models can integrate labeled media bags and instances.It may need to simultaneously consider sample and instance selection as well as the costs of labeling bags and instances.

5.FUTURE DIRECTIONS OF ACTIVE LEARNING IN MULTIMEDIA

ANNOTATION AND RETRIEVAL

Extensive efforts have proved the effectiveness of active learning in multimedia anno-tation and retrieval,but there are also many open challenges.It is worth noting that,as

ACM Transactions on Intelligent Systems and Technology,Vol.2,No.2,Article10,Publication date:February2011.

Active Learning in Multimedia Annotation and Retrieval:A Survey10:13

Fig.5.A typical multi-level active MIL scheme.The central problems are the choice between bag labeling and instance labeling and the bag/instance selection criteria.

a machine learning technology,active learning per se has many worth-studying prob-lems,such as learning rate and bound[Hanneke and Yang2010].But here we focus on its application in multimedia annotation and retrieval.We discuss the following two subtopics:(1)cost analysis of manual annotation;and(2)large-scale interactive mul-timedia labeling.This is because as an interactive approach,active learning involves two parts,namely human and computer.But most of the existing research efforts have been dedicated to computation algorithms,such as sample selection strategies and learning models,whereas human part receives relatively less attention.However, human also plays a very important role in active learning,and therefore we choose such two subtopics to discuss.One topic analyzes the human annotation behavior and attempts to bene?t active learning-based multimedia annotation and retrieval with this analysis,and the topic attempts to leverage a large number of human labelers to annotate large-scale multimedia data.We will introduce the problems,several related works and the possible solutions.

5.1.Cost Analysis of Manual Annotation

Active learning is intended to reduce humans’manual efforts,but in most of the existing works performance evaluation is measured by counting the number of labeled samples. For example,an algorithm will be declared effective if active learning needs less labeled samples to achieve the same performance in comparison with random sample selection. But a hidden assumption here is that the cost of manually labeling a sample is identical and the sample selection criteria are derived accordingly.However,this is not the truth in multimedia annotation and retrieval.Existing studies[Volkmer et al.2005]show that different concepts may lead to different average annotation time per sample.In addition,annotating different samples also may cost different efforts even with the same concept.For example,the time cost of annotating an image may depend upon the typicality[Tang et al.2007]and recognizability of the objects in it(see Figure6). Therefore,it is more reasonable to take the cost of annotating different samples into account in practical active learning approaches.For example,Settles et al.[2008]takes both the uncertainty measurements and the annotation costs into account in sample selection strategies and have shown its effectiveness.

ACM Transactions on Intelligent Systems and Technology,Vol.2,No.2,Article10,Publication date:February2011.

10:14M.Wang and X.-S.Hua

Fig.6.Three samples all should be labeled as positive for the concept car.But the annotation costs of these three samples may decrease from left to right due to the different recognizabilities of the objects in them. Yan et al.[2009]?rst tried to model the cost of manual image annotation.They categorize manual annotation into two types,namely tagging and browsing.Tagging means users annotate each image with a chosen set of keywords and browsing means users browse images sequentially and judge each image’s relevance to a concept.Linear models are adopted to predict both the cost of tagging and browsing.For the tagging of an image,its expected time cost is predicted as

t=K l t f+t s,(17) where K l is the number of concepts associated with the image,t f is the average time of entering a keyword,and t s is the initial setup time for annotation.For the browsing of a concept,its expected time cost is predicted as

t=L k t p+(L?L k)t n,(18) where L is the number of all images,L k is the number of relevant images,and t p and t n are the time costs of annotating a relevant image and an irrelevant image respectively. Based on the two models,Yan et al.[2009]proposed two methods to organize the annotation task by combining tagging and browsing,such that annotation cost can be greatly reduced.

Vijayanarasimhan and Grauman[2009]adopted a learning-based method to model the time cost of segmenting and annotating an image.They collect training data by taking advantage of Amazon’s Mechanical Turk system and then learn a repressor with several features that are able to indicate images’complexity levels.The model has been investigated in a multilabel multiple-instance active learning scheme such that the sample selection will not only consider its informativeness but also the cost of its annotation.

In addition to the previously introduced manual labeling cost issue,how many sam-ples should be labeled in each round is also a problem.As previously introduced,the re-training or updating process of models may be time consuming and thus it is more reasonable to select a batch of samples in each round to reduce the rounds of model updating.In such scenario,the size of batch needs to be well established to achieve a trade-off between computational ef?ciency and labeling cost.

Overall,the study of manual annotation is far from comprehensiveness.Human’s manual annotation may be affected by many factors and many simple strategies may signi?cantly reduce manual annotation cost.For example,in the annotation of a con-cept,we can group images into several clusters and most of the images in a cluster should share a same relevance label.Then users can assign a global label to a cluster and they only need to change the labels of few samples in the cluster,and this will be more ef?cient than labeling the images sequentially.

https://www.doczj.com/doc/ff15564874.html,rge-Scale Interactive Multimedia Annotation

In the annotation of large-scale multimedia corpus,it is not easy to achieve satisfying performance even by applying active learning.Therefore,the scale of to-be-manually-annotated data will also be very large.Therefore,a large-scale interactive multimedia ACM Transactions on Intelligent Systems and Technology,Vol.2,No.2,Article10,Publication date:February2011.

Active Learning in Multimedia Annotation and Retrieval:A Survey10:15

Fig.7.A schematic illustration of a large-scale interactive image/video annotation system. annotation scheme will involve two parts,that is,large-scale active learning and large-scale manual annotation.For large-scale active learning,there are already many methods to ef?ciently train classi?ers on large dataset[Hsieh et al.2008;Panda et al. 2006a]and online learning is a promising approach to dealing with labeled data that are continuously increasing[Roy and McCallum2001;Cauwenberghs and Poggio 2000].Here,we focus on large-scale human annotation.From a global point of view, we can have two potential groups of people involved in the annotation process,that is, dedicated data labelers and grassroots users.Dedicated data labelers are experienced and they will provide high-quality annotation results.But in the Web2.0era,we have witnessed the great potential of Internet grassroot users.If well motivated,they are also able to contribute greatly in large-scale interactive multimedia annotation tasks. The following scenarios can be applied to attract these users to label given data. (1)Game.Designing attractive games and during playing game the players will be

asked to con?rm labels of multimedia data with a friendly interface.ESP[Ahn and Dabbish2004]and Peekaboom games[Ahn et al.2006]are good examples.

(2)Pay.Pay by the estimation of annotation efforts,such as the number of annotated

images or video clips.The pay can be real currency or virtual ones which can be used to buy online products/content.A good example is to use Amazon’s Mechanical Turk1[Sorokin and Forsyth2008].

(3)reCAPTCHA.2CAPTCHA is a type of challenge-response test used to determine

that the response is not generated by a computer.A typical CAPTCHA can be generating an image with distorted text,which is supposed that it can only be recognized by human beings[Ahn et al.2008].Recently some researchers have implemented methods by which some of the effort and time spent by people who are responding to CAPTCHA challenges can be regarded as a distributed work system.

This system,called reCAPTCHA,includes solved and unrecognized elements(such as images of text which were not successfully recognized via OCR)in each chal-lenge.The respondent thus answers both elements and roughly half of his or her effort validates the challenge while the other half is collected as useful information [Ahn et al.2008].This idea can also be applied to do image and video annotation. Figure7demonstrates the schematic illustration of a large-scale interactive multi-media annotation system.The task assignment module will divide and assign annota-tion tasks to different labelers through aforementioned approaches.The quality control 1https://www.doczj.com/doc/ff15564874.html,.

2https://www.doczj.com/doc/ff15564874.html,.

ACM Transactions on Intelligent Systems and Technology,Vol.2,No.2,Article10,Publication date:February2011.

10:16M.Wang and X.-S.Hua module will control the quality of annotation results.In particular for grassroots la-belers,the annotation may contain signi?cant noises and several users may even use robot or random algorithms to accomplish the annotation tasks.The quality control component will judge the annotation quality of different users,remove labelers with low-quality annotation and perform an annotation re?nement step.

6.CONCLUSION

This article presents a survey on the efforts of leveraging active learning in multimedia annotation and retrieval.We have brie?y introduced the principle of active learning, and we categorize the existing sample selection criteria used in multimedia annotation and retrieval into?ve criteria:risk reduction,uncertainty,diversity,density and rel-evance.We also introduced several different classi?cation models for active learning, including semi-supervised learning,multilabel learning and multiple instance learn-ing.We provide a discussion on several future trends of active learning in multimedia annotation and retrieval.In particular,we introduce cost analysis of human annotation and large-scale multimedia labeling.

APPENDIX

Table I.Several Existing Works Applying Active Learning in Image/Video Annotation or Retrieval

Application/

Work Learning Method/

Sample Selection Criteria

CBIR

[Bao et al.2009]Graph-Based SSL

risk reduction

CBIR

[Dagli et al.2006]SVM

uncertainty;diversity

CBIR

[Geng et al.2008]SVM

uncertainty

CBIR

[Goh et al.2004]SVM

uncertainty;relevance

CBIR

[Gosselin and Cord2004]Bayes classi?er;k-NN;SVM

uncertainty;diversity

CBIR

[He et al.2004]SVM

uncertainty

CBIR

[Hoi and Lyu2005]Graph-Based SSL

risk reduction

CBIR

[Panda et al.2006b]SVM

uncertainty;diversity

CBIR

[Settles et al.2007]Multiple Instance Logistic Regression

uncertainty;expected gradient length

CBIR

[Tong and Chang2001]SVM

uncertainty

Multiclass Image Annotation [Vijayanarasimhan and Grauman2008]SVM with multiple instance kernel

risk reduction

(Continued on next page) ACM Transactions on Intelligent Systems and Technology,Vol.2,No.2,Article10,Publication date:February2011.

Active Learning in Multimedia Annotation and Retrieval:A Survey10:17

Table I.(Continued.)

CBIR

[Wu et al.2006]SVM

uncertainty;density;diversity

CBIR

[Yuan et al.2006]SVM

uncertainty;relevance;diversity

Multiclass Image Annotation [Hoi and Lyu2005]SVM

uncetainty;diversity

Multiclass Image Annotation [Jain and Kapoor2009]Probabilistic k-NN

uncertainty

Multiclass Image Annotation [Joshi et al.2009]SVM

uncertainty

Multiclass Image Annotation [Sahbi et al.2008]Manifold Learning

uncertainty;diversity

Multiclass Image Annotation [Yang et al.2009]Multiple kernel learning

uncertainty

Multiclass Image Annotation [Zhang and Chen2003]Kernel Regression

uncertainty;density

Multiclass Video Annotation [Yan et al.2003]SVM

risk reduction

Multilabel Image Annotation [Sychay et al.2002]SVM

uncertainty

Multilabel Image Annotation [Zhang et al.2009]Multilabel learning

uncertainty

Multilabel Video Annotation [Ayache and Qu′e not2007]SVM

uncertainty;positivity

Multilabel Video Annotaton

[Chen et al.2005]SVM

uncertainty

Multilabel Video Annotation [Hua and Qi2008]Multilabel learning

uncertainty

Multilabel Video Annotation [Naphade and Smith2004a]SVM

uncertainty

Multilabel Video Annotation [Qi et al.2004]SVM

uncertainty;density

Multilabel Video Annotation [Song et al.2005]Gaussian Mixture Model

uncertainty;density

Multilabel Video Annotation [Tang et al.2007]Graph-Based SSL

uncertainty

Multilabel Video Annotation [Vendrig et al.2002]Maximum Entropy Classi?er

relevance

Multilabel Video Annotation [Wang et al.2007]Manifold Learning

uncertainty;density;diversity;relevance

ACM Transactions on Intelligent Systems and Technology,Vol.2,No.2,Article10,Publication date:February2011.

10:18M.Wang and X.-S.Hua

REFERENCES

A HN,L.AND D ABBISH,https://www.doczj.com/doc/ff15564874.html,beling images with a computer game.In Proceedings of ACM CHI.

A HN,L.,L IU,R.,AND

B LUM,M.2006.Peekaboom:A game for locating objects in images.In Proceedings of the

ACM Conference on Human Factors in Computing Systems.

A HN,L.,M AURER,B.,M CMILLEN,C.,A BRAHAM,D.,AND

B LUM,M.2008.Recaptcha:Human-based character

recognition via web security measures.Science.

A NDREWS,S.,T SOCHANTARIDIS,I.,AND H OFMANN,T.2002.Support vector machines for multiple-instance learn-

ing.In Proceedings of the Neural Information Processing Systems.

A NGLUIN,D.1998.Queries and concept learning.Mach.Learn.2.

A YACHE,S.AND Q U′E NOT,G.2007.Evaluation of active learning strategies for video indexing.In Proceedings

of International Workshop on Content-Based Multimedia Indexing.

B AO,L.,

C AO,J.,X IA,T.,Z HANG,Y.,AN

D L I,J.2009.Locally non-negative linear structure learning for interactive

image retrieval.In Proceedings of ACM Multimedia.

B ERGER,A.,P IETRA,S.D.,AND P IETRA,V.D.1996.A maximum entropy approach to natural language processing.

Computat.Linguistics22,1.

B LUM,A.AND M ITCHELL,https://www.doczj.com/doc/ff15564874.html,bining labeled and unlabeled data with co-training.In Proceedings of

Workshop on Computational Learning Theory.

B RINKER,K.2003.Incorporating diversity in active learning with support vector machines.In Proceedings of

the International Conference on Machine Learning.

C AUWENBERGHS,G.AN

D P OGGIO,T.2000.Incremental and decremental support vector machine learning.In

Proceedings of Neural Information Processing Systems.

C HAPELLE,O.,Z IEN,A.,AN

D S CH¨OLKOPF,B.2006.Semi-Supervised Learning.MIT Press.

C HEN,M.,C HRISTEL,M.,H AUPTMANN,A.,AN

D W ACTLAR,H.2005.Putting active learning into multime-

dia applications:dynamic de?nition and re?nement of concept classi?ers.In Proceedings of ACM Multimedia.

C OHEN,D.A.,G HAHRAMANI,Z.,AN

D J ORDAN,M.I.1996.Active learning with statistical models.J.Artif.Intell.

Res.

C OHN,D.,A TLAS,L.,AN

D L ADNER,R.1994.Improving generalization with active learning.Mach.Learn.15,2.

C OLLINS,B.,

D ENG,J.,L I,K.,AND F EI-F EI,L.1995.Towards scalable dataset construction:an active learning

approach.In Proceedings of the European Conference on Machine Learning.

D AGAN,I.AND

E NGSELON,https://www.doczj.com/doc/ff15564874.html,mittee-based sampling for training probabilistic classi?ers.In Proceed-

ings of the International Conference on Machine Learning.

D AGLI,C.K.,R AJARAM,S.,AND H UANG,T.S.2006.Leveraging active learning for relevance feedback using an

information-theoretic diversity measure.In Proceedings of the International Conference on Image and Video Retrieval.

D IETTERICH,T.G.,L ATHROP,R.H.,AND L OZANO-P EREZ,T.1997.Solving the multiple-instance problem with

axis-parallel rectangles.Artif.Intell.

F OOTE,J.1997.Content-based retrieval of music and audio.In Proceedings of SPIE Multimedia Storage

Archiving Systems II.

F REUND,M.,S EUNG,H.S.,S HAMIR,E.,AND T ISHBY,N.1997.Selective sampling using the query by committee

algorithm.Mach.Learn.28.

G ENG,B.,Y ANG,L.,Z HA,Z.J.,X U,C.,AND H UA,X.S.2008.Unbiased active learning for image retrieval.In

Proceedings of the International Conference on Multimedia&Expo.

G OH,K.S.,C HANG,E.Y.,AND L AI,W.C.2004.Multimodal concept-dependent active learning for image trieval.

In Proceedings of ACM Multimedia.

G OSSELIN,P.H.AND C ORD,M.2004.A comparison of active classi?cation methods for content-based image

retrieval.In Proceedings of the International Workshop on Computer Vision Meets Databases.

H AKKANI-TUR,D.,R ICCARDI,G.,AND G ORIN,A.2002.Active learning for automatic speech recognition.In

Proceedings of the International Conference on Acoustics,Speech and Signal Processing.

H ANNEKE,S.AND Y ANG,L.2010.Negative results for active learning with convex losses.In Proceedings of the

International Conference on Arti?cial Intelligence and Statistics.

H E,J.R.,L I,M.,Z HANG,H.J.,T ONG,H.,AND Z HANG,C.2004.Mean version space:a new active learning method

for content-based image retrieval.In Proceedings of the ACM Workshop on Multimedia Information Retrieval.

H OI,S.C.,J IN,R.,Z HU,J.,AND L YU,M.R.2006.Batch mode active learning and its application to medical

image classi?cation.In Proceedings of the International Conference on Machine Learning.

ACM Transactions on Intelligent Systems and Technology,Vol.2,No.2,Article10,Publication date:February2011.

Active Learning in Multimedia Annotation and Retrieval:A Survey10:19 H OI,S.C.AND L YU,M.R.2005.A semi-supervised active learning framework for image retrieval.In Proceed-

ings of the International Conference on Computer Vision and Pattern Recognition.

H SIEH,C.J.,C HANG,K.W.,L IN,C.J.,K EERTHI,S.S.,AND S UNDARARAJAN,S.2008.A dual coordinate descent

method for large-scale linear svm.In Proceedings of the International Conference on Machine Learning.

H UA,X.AND Q I,G.J.2008.Online multi-label active annotation:towards large-scale content-based video

search.In Proceedings of ACM Multimedia.

H UANG,T.S.,D AGLI,C.K.,R AJARAM,S.,C HANG,E.Y.,M ANDEL,M.I.,P OLINER,G.E.,AND E LLIS,D.P.W.2008.

Active learning for interactive multimedia retrieval.Proc.IEEE96,4.

J AIN,P.AND K APOOR,A.2009.Active learning for large multi-class problems.In Proceedings of the International Conference on Computer Vision and Pattern Recognition.

J OSHI,A.J.,P ORIKLI,F.,AND P APANIKOLOPOULOS,N.2009.Multi-class active learning for image classi?cation.

In Proceedings of the International Conference on Computer Vision and Pattern Recognition.

K ING,R.D.,W HELAN,K.E.,J ONES,F.M.,R EISER,P.G.,B RYANT,C.H.,M UGGLETON,S.H.,K ELL,D.B.,AND O LIVER,S.G.2004.Functional genomic hypothesis generation and experimentation by a robot scientist.

Nature427,6971.

L I,J.AND W ANG,J.2008.Real-time computerized annotation of pictures.IEEE Trans.Patt.Anal.Mach.

Intell.30,6.

L IN,C.,T SENG,B.,AND S MITH,J.R.2003.VideoAnnEx:IBM MPEG-7annotation tool for multimedia indexing and concept learning.In Proceedings of the International Conference on Multimedia&Expo.

M ANDEL,M.,P OLINER,G.,AND E LLIS,D.2006.Support vector machine active learning for music retrieval.

Multimed.Syst.12,1.

M ARONAND,O.AND R ATAN,A.L.1998.Multiple-instance learning for natural scene classi?cation.In Proceed-ings of the International Conference on Machine Learning.

M ITCHELL,T.1982.Generalization as search.Artif.Intell.

M USLEA,I.,M INTON,S.,AND K NOBLOCK,C.A.2002.Active+semi-supervised learning=robust multi-view learning.In Proceedings of the International Conference on Machine Learning.

N APHADE,M.AND S MITH,J.R.2004a.Active learning for simultaneous annotation of multiple binary semantic concepts.In Proceedings of the International Conference on Image Processing.

N APHADE,M.R.AND S MITH,J.R.2004b.On the detection of semantic concepts at TRECVID.In Proceedings of the ACM Multimedia.

N GUYEN,H.T.AND S MEULDERS,A.2004.Active learning using pre-clustering.In Proceedings of the International Conference on Machine Learning.

O LSSON,F.2009.A literature survey of active machine learning in the context of natural language processing.

SICS Tech.rep.,Swedish Institute of Computer Science.

P ANDA,N.,C HANG,E.Y.,AND W U,G.2006a.Concept boundary detection for speeding up SVMs.In Proceedings of the International Conference on Machine Learning.

P ANDA,N.,G OH,K.,AND C HANG,E.Y.2006b.Active learning in very large image databases.J.Multimed.Tools Appl.

P ARZEN,E.1962.On the estimation of a probability density function and the mode.Ann.Math.Stat.33.

Q I,G.,H UA,X.S.,R UI,Y.,T ANG,J.,M EI,T.,AND Z HANG,H.J.2007.Correlative multi-label video annotation.

In Proceedings of ACM Multimedia.

Q I,G.,H UA,X.S.,R UI,Y.,T ANG,J.,AND Z HANG,H.J.2009.Two-dimensional multilabel active learning with an ef?cient online adaptation model for image classi?cation.IEEE Trans.Patt.Anal.Mach.Intell.

31.

Q I,G.,S ONG,Y.,H UA,X.S.,Z HANG,H.J.,AND D AI,L.R.2004.Video annotation by active learning and cluster tuning.In Proceedings of the CVPR Workshop.

R EDNER,R.AND W ALKER,H.1984.Mixture densities,maximum likelihood and the em algorithm.SIAM Rev.26,2.

R OY,N.AND M C C ALLUM,A.2001.Toward optimal active learning through monte carlo estimation of error reduction.In Proceedings of the International Conference on Machine Learning.

R UI,Y.,H UANG,T.S.,AND C HANG,S.F.1999.Image retrieval:current techniques,promising directions and open https://www.doczj.com/doc/ff15564874.html,m.Image Rep.10,4.

R UI,Y.,H UANG,T.S.,O RTEGA,M.,AND M EHROTRA,S.1998.Relevance feedback:a power tool in interactive content-based image retrieval.IEEE Trans.Circuits Syst.Video Tech.8,5.

S AHBI,H.,E TYNGIER,P.,A UDIBERT,J.,AND K ERIVEN,R.2008.Manifold learning using robust graph Laplacian for interactive image search.In Proceedings of the International Conference on Computer Vision and Pattern Recognition.

ACM Transactions on Intelligent Systems and Technology,Vol.2,No.2,Article10,Publication date:February2011.

10:20M.Wang and X.-S.Hua S ETTLES,B.2009.Active learning literature https://www.doczj.com/doc/ff15564874.html,puter Sciences Tech.rep.,University of Wisconsin-

Madison.

S ETTLES,B.,C RAVEN,M.,AND F RIEDLAND,L.2008.Active learning with real annotation costs.In Proceedings of the NIPS Workshop on Cost-Sensitive Learning.

S ETTLES,B.,C RAVEN,M.,AND R AY,S.2007.Multiple-instance active learning.In Proceedings of Neural Infor-mation Processing Systems.

S EUNG,H.S.,O PPER,M.,AND S OMPOLINSKY,H.1992.Query by committee.In Proceedings of the Annual Workshop on Computational Theory.

S MEULDERS,A.W.,W ORRING,M.,S ANTINI,S.,G UPTA,A.,AND J AIN,R.2000.Content based image retrieval at the end of the early years.IEEE Trans.Patt.Anal.Mach.Intell.22,12.

S ONG,Y.,H UA,X.S.,D AI,L.R.,AND W ANG,M.2005.Semi-automatic video annotation based on active learning with multiple complementary predictors.In Proceedings of the ACM Workshop on Multimedia Information Retrieval.

S OROKIN,A.AND F ORSYTH,D.2008.Utility data annotation via amazon mechanical turk.In Proceedings of the CVPR Workshop.

S YCHAY.G.,C HANG,E.Y.,AND G OH,K.2002.Effective image annotation via active learning.In Proceedings of the International Conference on Image Processing.

T ANG,J.,H UA,X.S.,Q I,G.,G U,Z.,AND W U,X.2007.Beyond accuracy:Typicality ranking for video annotation.

In Proceedings of the International Conference on Multimedia&Expo.

T ONG,S.AND C HANG,E.Y.2001.Support vector machine active learning for image retrieval.In Proceedings of the ACM Multimedia.

T ONG,S.AND K OLLER,D.2000.Support vector machine active learning with applications to text classi?cation.

In Proceedings of the International Conference on Machine Learning.

V ENDRIG,J.,DEN H ARTOG,J.,VAN L EEUWEN,D.,P ATRAS,I.,R AAIJMAKERS,S.,VAN R EST,J.,S NOEK,C.,AND W ORRING, M.2002.Trec feature extraction by active learning.In Proceedings of the TRECVID Workshop.

V IJAYANARASIMHAN,S.AND G RAUMAN,K.2008.Multi-level active prediction of useful image annotations for recognition.In Proceedings of the Neural Information Processing Systems.

V IJAYANARASIMHAN,S.AND G RAUMAN,K.2009.What it going to cost you?:Predicting effort https://www.doczj.com/doc/ff15564874.html,rmativeness for multi-label image annotations.In Proceedings of the Symposium on Computer Vision and Pattern Recognition.

V OLKMER,T.,S MITH,J.R.,AND N ATSEV,A.2005.A web-based system for collaborative annotation of large image and video collections.In Proceedings of ACM Multimedia.

W ANG,M.,H UA,X.S.,M EI,T.,T ANG,J.,Q I,G.J.,S ONG,Y.,AND D AI,L.R.2007.Interactive video annotation by multi-concept multi-modality active https://www.doczj.com/doc/ff15564874.html,put.1,4.

W ANG,M.,H UA,X.S.,T ANG,J.,AND H ONG,R.2009a.Beyond distance measurement:constructing neighborhood similarity for video annotation.IEEE Trans.Multimed.11,3.

W ANG,M.,H UA,S,X.,H ONG,R.,T ANG,J.,Q I,G.J.,AND S ONG,Y.2009b.Uni?ed video annotation via multi-graph learning.IEEE Trans.Circ.Syst.Video Tech.19,5.

W ANG,W.AND Z HOU,Z.2008.On multi-view active learning and the combination with semi-supervised learning.In Proceedings of the International Conference on Machine Learning.

W U,Y.,K OZINTSEV,I.,B OUGUET,J.-Y.,AND D ULONG,C.2006.Sampling strategies for active learning in personal photo retrieval.In Proceedings of the International Conference on Multimedia&Expo.

Y AN,R.,N ATSEV,A.,AND C AMPBELL,M.2009.Hybrid tagging and browsing approaches for ef?cient manual image annotation.IEEE Multimed.Mag.16,2.

Y AN,R.,Y ANG,J.,AND H AUPTMANN,A.2003.Automatically labeling video data using multi-class active learning.

In Proceedings of the International Conference on Computer Vision.

Y ANG,J.,L I,Y.,T IAN,Y.,D UAN,L.,AND G AO,W.2009.Multiple kernel active learning for image classi?cation.

In Proceedings of the International Conference on Multimedia&Expo.

Y UAN,J.,Z HOU,X.,Z HANG,J.,W ANG,M.,Z HANG,Q.,W ANG,W.,AND S HI,B.2006.Positive sample enhanced angle-diversity learning for SVM-based image retrieval.In Proceedings of the International Conference on Multimedia&Expo.

Z HANG,C.AND C HEN,T.2003.Annotating retrieval database with active learning.In Proceedings of the International Conference on Image Processing.

Z HANG,Q.AND G OLDMAN,S.A.2001.EM-DD:An improved multiple-instance learning technique.In Proceed-ings of the Neural Information Processing Systems.

Z HANG,X.,C HENG,J.,X U,C.,L U,H.,AND M A,S.2009.Multi-view multi-label active learning for image classi?cation.In Proceedings of the International Conference on Multimedia&Expo.

ACM Transactions on Intelligent Systems and Technology,Vol.2,No.2,Article10,Publication date:February2011.