- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
• Multimodal conversational dialog system • Use model of body to track whole person instead of single features • Track pose of articulated arm • Direction instead of point in plane or space
C. Sticksel: Body Tracking for Pointing Gesture Interfaces – p. 1
C. Sticksel: Body Tracking for Pointing Gesture Interfaces – p. 1
Efficiency of MHT
Exponential in time and memory, however: • Enumerate k-best hypotheses in polynomial time • Prune unlikely hypotheses • Use gating to avoid generating unlikely hypotheses • Use path coherence function to incorporate speed and direction constraints
C. Sticksel: Body Tracking for Pointing Gesture Interfaces – p.
Tracking Deictic Gestures
• Natural dialogues, collaborative interaction: speaker must be untethered • Only video sequences to evaluate, no other clues • Approaches: • Feature-based: Track hand independently • Model-based: Track articulated pose
C. Sticksel: Body Tracking for Pointing Gesture Interfaces – p. 1
3-D Model for Pose Tracking
• Simple cylindrical model: head, arm, forearm, torso • Cylinders connected through spherical joints • Two cameras in front of the user above a projection screen • 3-D model through stereo correspondence
C. Sticksel: Body Tracking for Pointing Gesture Interfaces – p.
MHT Algorithm
• Generate hypotheses based on reported measurements and existing tracks • Measurements can • start a new track • continue an existing track • be false alarm • Multiple hypotheses in case of conflicts, i.e. more than one interpretation possible
Example Situation
C. Sticksel: Body Tracking for Pointing Gesture Interfaces – p.
Multiple Hypothesis Tracking
Track multiple features in an image sequence explicitly modelling • Track initiation • Track termination • Track continuation • Spurious measurements • Uniqueness contraint Advantageous over other data association methods, e.g. nearest neighbor, correlation etc.
C. Sticksel: Body Tracking for Pointing Gesture Interfaces – p.
Classification of gestures
Spontaneous, unplanned gestures accompanying speech: Iconic Metaphoric Deictic Beat Depict a concrete object or event Depict an abstract idea Refer to an object in space Emphasize a part of speech
C. Sticksel: Body Tracking for Pointing Gesture Interfaces – p.
Overview
1. Gestures and Deixis 2. A Tracking Framework for Collaborative Human Computer Interaction • Feature-based tracking of hands • Robust multiple hypothesis tracking 3. Tracking Articulated Pose for Untethered Diectic Reference • Model-based tracking of pose • Registering a 3-D model with 3-D data
• Collaborative natural human computer interaction • Track hands and faces • Extract features from color and motion
C. Sticksel: Body Tracking for Pointing Gesture Interfaces – p.
Body Tracking for Pointing Gesture Interfaces
Seminar Multimodal Rooms
Christoph Sticksel Interactive Systems Lab (ISL) ¨ ¨ Fakultat fur Informatik ¨ Universitat Karlsruhe (TH)
C. Sticksel: Body Tracking for Pointing Gesture Interfaces – p.
Gesture in Multimodal Dialogues
• Input modality: pointing instead of clicking or gestures to trigger actions • Multimodal dialogue systems: speech and gesture for natural human computer interaction • Speech and gesture in a multimodal system: • Web-browsing • Querying information • Gestures do not duplicate speech, but support it or carry important semantics
C. Sticksel: Body Tracking for Pointing Gesture Interfaces – p. 1
MHT in conflict case
Gate
t−2
Track 1
t−1
Objects Measurements
t−1 Track 2 t−2
C. Sticksel: Body Tracking for Pointing Gesture Interfaces – p. 1
C. Sticksel: Body Tracking for Pointing Gesture Interfaces – p.
Deixis is important
Experiment: describe mechanical device with the aid of a pre-drawn diagram (Eisenstein, 2003), compare with other experiments. Results: • Gestures refer to the diagram • Speakers often use both hands • Dexis is more frequent
C. Sticksel: Body Tracking for Pointing Gesture Interfaces – p.
Feature-based Approach
E. Polat, M. Yeasin, and R. Sharma. A tracking framework for collaborative human computer interaction. In Proceedings of the Fourth IEEE International Conference on Multimodal Interfaces (ICMI ’02), pages 27–32, October 2002
C. Sticksel: Body Tracking for PointinHale Waihona Puke Baidu Gesture Interfaces – p. 1
Registering the 3-D Model
• Register model with data: calculate motion transformation (rotation, translation) for model to fit data • Register each limb separately, using ICP algorithm • Joint constraints: limbs must be spherically connected • Enforce constraints on estimated motion of individual limbs
C. Sticksel: Body Tracking for Pointing Gesture Interfaces – p. 1
Tracking Articulated Pose
D. Demirdjian and T. Darrell. 3-d articulated pose tracking for untethered deictic reference. In Proceedings of the Fourth IEEE International Conference on Multimodal Interfaces (ICMI ’02), pages 267–272, October 2002
MHT Algorithm (2)
• Calculate likelihood of each hypothesis Ωk i Pik = P (Ωk |Z k ) i • Using probability assumptions about: Measurements Z k Prior hypothesis Ωk−1 g Current association hypothesis Ψh Path coherence function