在多种视频领域检测事件的通用框架
- 格式:pdf
- 大小:6.27 MB
- 文档页数:10
AGenericFrameworkforEventDetection
inVariousVideoDomains
TianzhuZhang1,2,ChangshengXu1,2,GuangyuZhu3,SiLiu1,2,3,HanqingLu1,2
1NationalLabofPatternRecognition,InstituteofAutomation,
ChineseAcademyofSciences,Beijing100190,P.R.China
2China-SingaporeInstituteofDigitalMedia,Singapore119615,Singapore
3DepartmentofElectricalandComputerEngineering,NationalUniversityofSingapore,Singapore{tzzhang,csxu,sliu,luhq}@nlpr.ia.ac.cn,elezhug@nus.edu.sg
ABSTRACT
Eventdetectionisessentialfortheextensivelystudiedvideoanal-
ysisandunderstandingarea.Althoughvariousapproacheshave
beenproposedforeventdetection,thereisalackofagenericevent
detectionframeworkthatcanbeappliedtovariousvideodomains
(e.g.sports,news,movies,surveillance).Inthispaper,wepresent
agenericeventdetectionapproachbasedonsemi-supervisedlearn-
ingandInternetvision.Concretely,aGraph-basedSemi-Supervised
MultipleInstanceLearning(GSSMIL)algorithmisproposedto
jointlyexploresmall-scaleexpertlabeledvideosandlarge-scale
unlabeledvideostotraintheeventmodelstodetectvideoevent
boundaries.Theexpertlabeledvideosareobtainedfromtheanaly-
sisandalignmentofwell-structuredvideorelatedtext(e.g.movie
scripts,web-castingtext,closecaption).Theunlabeleddataare
obtainedbyqueryingrelatedeventsfromthevideosearchengine
(e.g.YouTube)inordertogivemoredistributiveinformationfor
eventmodeling.AcriticalissueofGSSMILinconstructinga
graphistheweightassignment,wheretheweightofanedgespeci-
fiesthesimilaritybetweentwodatapoints.Totacklethisproblem,
weproposeanovelMultipleInstanceLearningInducedSimilar-
ity(MILIS)measurebylearninginstancesensitiveclassifiers.We
performthethoroughexperimentsinthreepopularvideodomains:
movies,sportsandnews.Theresultscomparedwiththestate-of-
the-artsarepromisinganddemonstrateourproposedapproachis
performance-effective.
CategoriesandSubjectDescriptors
H.3.1[InformationStorageandRetrieval]:ContentAnalysis
andIndexing-Abstractingmethods,Indexingmethods.
GeneralTerms
Algorithm,Measurement,Performance,Experimentation
Keywords
EventDetection,Graph,MultipleInstanceLearning,Semi-supervised
Learning,BroadcastVideo,Internet,Web-castingText
Permissiontomakedigitalorhardcopiesofallorpartofthisworkforpersonalorclassroomuseisgrantedwithoutfeeprovidedthatcopiesarenotmadeordistributedforprofitorcommercialadvantageandthatcopiesbearthisnoticeandthefullcitationonthefirstpage.Tocopyotherwise,torepublish,topostonserversortoredistributetolists,requirespriorspecificpermissionand/orafee.MM’10,October25–29,2010,Firenze,Italy.Copyright2010ACM978-1-60558-933-6/10/10...$10.00.1.INTRODUCTION
Withtheexplosivegrowthofmultimediacontentonbroadcast
andInternet,itisurgentlyrequiredtomaketheunstructuredmul-
timediadataaccessibleandsearchablewithgreateaseandflexibil-
ity.Eventdetectionisparticularlycrucialtounderstandingvideo
semanticconceptsforvideosummarization,indexingandretrieval
purposes.Therefore,extensiveresearcheffortshavebeendevoted
toeventdetectionforvideoanalysis[23,26,33].
Mostoftheexistingeventdetectionapproachesrelyonvideo
featuresanddomainknowledge,andemploylabeledsamplesto
traineventmodels.Thesemanticgapbetweenlow-levelfeatures
andhigh-leveleventsofdifferentkindsofvideos,theambiguous
videocues,backgroundclutterandvariantchangesofcameramo-
tion,etc.,furthercomplicatethevideoanalysisandimpedethe
implementationofeventdetectionsystems.Moreover,duetothe
diversedomainknowledgeindifferentvideogenresandinsuffi-
cienttrainingdata,itisdifficulttobuildagenericframeworkto
unifyeventdetectionindifferentvideodomains(e.g.sports,news,
movies,surveillance)withahighaccuracy.
Tosolvetheseissues,mostoftechniquesforeventdetectioncur-
rentlyrelyonvideocontentandsupervisedlearningintheform
oflabeledvideoclipsforparticularclassesofevents.Itisneces-
sarytolabelalargeamountofsamplesinthetrainingprocessto
achievegooddetectionperformance.Inordertoreducethehuman
labor-intension,onecanexploittheexpertsupervisoryinformation
intextsource[3,15,23],suchasmoviescripts,web-castingtext
andclosedcaptions,whichcanprovideusefulinformationtolo-
catepossibleeventsinvideosequences.However,itisverycost-
expensiveandtime-consumingtocollectlarge-scaletrainingdata
bytextanalysis,andtherearestillmanyvideoswithoutthecorre-
spondingtextinformationforuse.TheInternet,nevertheless,isa
richinformationsourcewithmanyeventvideostakenundervari-
ousconditionsandroughlyannotated.Forexample,thesurround-
ingtextisanimportantclueusedbysearchengines.Ourintuition
isthatitisconvenienttoobtainalarge-scalecollectionofvideos
asunlabeleddatatoimprovetheperformanceofeventdetection.
Bydoingthis,weproposeasemi-supervisedlearningframework
toexploittheexpertlabeledandunlabeledvideodatatogether.
However,thelabeleddatabytextanalysisonlyhavetheweakly
associatedlabels,whichmeansweknowthevideo’slabel,butthere
maybenopreciseinformationaboutthelocalizationofeventin
video.FortheunlabeleddatabyInternetsearching,wealsodonot
knowthepreciselocalizationofevent.Tofindthepreciselocal-
izationofevent,wetemporallycutavideointomultiplesegments
andfindthesegmentscorrespondingtotheevent.Thuseventlo-
calizationcanbeconsideredasatypicalmultipleinstancelearning
problem,whereeachsegmentisaninstanceandallsegmentsofa
103