Combining Information Sources for Video Retrieval The Lowlands Team at TRECVID

格式：pdf
大小：618.80 KB
文档页数：10

下载文档原格式

/ 10

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

Combining Information Sources for Video Retrieval

The Lowlands Team at TRECVID2003

Thijs Westerveld Tzvetanka Ianeva†∗,Lioudmila Boldareva‡,Arjen P.de Vries

and Djoerd Hiemstra‡

Centrum voor Wiskunde en Informatica(CWI) Amsterdam,The Netherlands {thijs,arjen}@cwi.nl

†Departament

d’Inform`a tica

Universitat de Val`e ncia

Val`e ncia,Spain

tzveta.ianeva@uv.es

‡Department of Computer

Science

University of Twente

Enschede,The Netherlands

{boldarli,hiemstra}@cs.utwente.nl

Abstract

The previous video track results demonstrated that it is far from trivial to take advantage of multiple modalities for the video retrieval search task.For almost any query,results based on ASR transcripts have been better than any other run.This year’s main success in our runs is that a combination of ASR and visual performs better than either alone!In addition we experimented with dynamic shot models, combining topic examples,feature extraction and in-teractive search.

1Introduction

Often it is stated that a successful video retrieval sys-tem should take advantage of information from all available sources and modalities.Merging knowledge from for example speech,vision,and audio would yield better results than using only one of them.But previous video track results demonstrated that it is far from trivial to take advantage of multiple modal-ities for a video retrieval search task.

For this year’s TRECVID workshop,we experi-mented with combining diﬀerent types of informa-

∗The work has been carried out while at CWI,supported by grants GV FPIB01362and CTESRR/2003/43

tion:

•combining diﬀerent models/modalities

•combining multiple example images

•combining model similarity and human-judged

similarity

The basic retrieval models we use to investigate the merging of diﬀerent sources are the same models we used last year[12],they are described brieﬂy in sec-tion2and more extensively in[13].Section2.1de-scribes a dynamic variant of the model that allows for describing spatio-temporal information as opposed to the spatial information captured in the basic models. These dynamic models can describe shots instead of still keyframes.For modelling the ASR transcripts, we use a basic language modelling approach(see Sec-tion2.2).The interactive retrieval setup(Section2.3) builds upon the visual models and language mod-els.After the introduction of the separate models, we present experiments for combining textual and visual information(Section4),combining diﬀerent examples(Section5)and the feature task(Section 6).Combining automatic similarity and interactive similarity is discussed in Section7.

Combining Information Sources for Video Retrieval The Lowlands Team at TRECVID

合集下载

文档推荐

最新文档