Visual Tracking and Recognition using Appearance-Adaptive Models in Particle Filters

格式：pdf
大小：1.21 MB
文档页数：36

下载文档原格式

机械工程及自动化专业外文翻译--可以行走、翻身并站立的有两手和两足的机器人

外文原文：Two-Armed Bipedal Robot that can Walk, Roll Over and Stand upMasayuki INABA, Fumio KANEHIROSatoshi KAGAMI, Hirochika INOUEDepartment of Mechano-InformaticsThe University of Tokyo7-3-l Hongo, Bunkyo-ku, 113 Tokyo, JAPANAbstractFocusing attention on flexibility and intelligent reactivity in the real world, it is more important to build, not a robot that won’t fall down, but a robot that can get up if it does full down. This paper presents a research on a two-armed bipedal robot, an apelike robot, which can perform biped walking, rolling over and standing up. The robot consists of a head, two arms, and two legs. The control system of the biped robot is designed based on the remote-brained approach in which a robot does not bring its own brain within the body and talks with it by radio links. This remote-brained approach enables a robot to have both a heavy brain with powerful computation and a lightweight body with multiple joints. The robot can keep balance in standing using tracking vision, detectwhether it falls down or not by a set of vertical sensors, and perform getting up motion colaborating two arms and two legs. The developed system and experimental results are described with illustrated real examples.1 IntroductionAs human children show, it is indispensable to have capability of getting up motion in order to learn biped locomotion. In order to build a robot which tries to learn biped walking automatically, the body should be designed to have structures to support getting up as well as sensors to know whether it lays or not.When a biped robot has arms, it can perform various behaviors as well as walking. Research on biped walking robots has presented with realization[1][2][3].It has mainly focused on the dynamics in walking,treating it as an advanced problem in control[3][4][5].However, focusing attention on the intelligent reactivity in the real world, it is more important to build, not a robot that won’t fall down, but a robot that can get up if it does fall down.In order to build a robot that can get up if it falls down, the robot needs sensing system to keep the body balance and to know whether it falls down or not. Although vision is one ofthe most important sensing functions of a robot, it is hard to build a robot with a powerful vision system on its own body because of the size and power limitation of a vision system. If we want to advance research on vision-based robot behaviors requiring dynamic reactions and intelligent reasoning based on experience, the robot body has to be lightweight enough to react quickly and have many DOFS in actuation to show a variety of intelligent behaviors.As for the legged robot [6] [7] [8],there is only a little research on vision-based behaviors[9]. The difficulties in advancing experimental research for vision-based legged robots are caused by the limitation of the vision hardware. It is hard to keep developing advanced vision software in limited hardware. In order to solve the problems and advance the study of vision-based behaviors, we have adopted a new approach through building remote-brained robots. The body and the brain are connected by wireless links by using wireless cameras and remote-controlled actuators.As a robot body does not need computers on-board,it becomes easier to build a lightweight body with many DOFS in actuation.In this research, we developed a two-armed bipedal robot using the remote-brained robot environment and made it toperform balancing based on vision and getting up through cooperating arms and legs. The system and experimental results are described below.2 The Remote-Brained SystemThe remote-brained robot does not bring its own brainwithin the body. It leaves the brain in the mother environment and communicates with it by radio links. This allows us to build a robot with a free body and a heavy brain. The connection link between the body and the brain defines the interface between software and hardware. Bodies are designed to suit each research project and task. This enables us advance in performing research with a variety of real robot systems[10].A major advantage of remote-brained robots is that the robot can have a large and heavy brain based on super parallel computers. Although hardware technology for vision has advanced and produced powerful compact vision systems, the size of the hardware is still large. Wireless connection between the camera and the vision processor has been a research tool. The remote-brained approach allows us to progress in the study of a variety of experimental issues in vision-based robotics.Another advantage of remote-brained approach is that the robot bodies can be lightweight. This opens up the possibility of working with legged mobile robots. As with animals, if a robot has 4 limbs it can walk. We are focusing on vision-based adaptive behaviors of 4-limbed robots,mechanical animals, experimenting in a field as yet not much studied.The brain is raised in the mother environment in-herited over generations. The brain and the mother environment can be shared with newly designed robots. A developer using the environment can concentrate on the functional design of a brain. For robots where the brain is raised in a mother environment, it can benefit directly from the mother’s ‘evolution’,meaning that the software gains power easily when the mother is upgraded to a more powerful computer. Figure 1 shows the configuration of the remote-brained system which consists of brain base, robot body and brain-body interface.In the remote-brained approach the design and theperformance of the interface between brain and body is the key. Our current implementation adopts a fully remotely brained approach, which means the body has no computer onboard. Current system consists of the vision subsystems, the non-vision sensor subsystem and the motion control subsystem. A block can receive video signals from cameras on robot bodies. The vision subsystems are parallel sets each consisting of eight vision boards.A body just has a receiver for motion instruction signals and a transmitter for sensor signals. The sensor information is transmitted from a video transmitter. It is possible to transmit other sensor information such as touch and servo error through the video transmitter by integrating the signals into a video image[11]. The actuator is a geared module which includes an analog servo circuit and receives a posit.ion reference value from the motion receiver. The motion control subsystem can handle up to 104 actuators through 13 wave bands and send the reference values to all the actuators every 20msec.3 The Two-Armed Bipedal RobotFigure 2 shows the structure of the two-armed bipedal robot. The main electric components of the robot are joint servo actuators, control signal receivers, an orientation sensor with transmitter, a battery set for actuators and sensors sensor and a camera with video transmitter. There is no computer on-board. A servo actuator includes a geared motor and analog servo circuit in the box. The control signal to each servo module is position reference. The torque of servo modules available cover 2Kgcm - 14Kgcm with the speed about 0.2sec/60deg. The control signal transmitted onradio link encodes eight reference values. The robot in figure 2 has two receiver modules onboard to control 16 actuators.Figure 3 explains the orientation sensor using a set of vertical switches. The vertical switch is a mercury switch. When the mercury switch (a) is tilted, the drop of mercury closes the contact between the two electrodes. The orientation sensor mount two mercury switches such as shown in (b). The switches provides two bits signal to detect four orientation of the sensor as shown in (c). The robot has this sensor at its chest and it can distinguish four orientation; face up, face down, standing and upside down.The body structure is designed and simulated in the mother environment. The kinematic model of the body is described in an object-oriented lisp, Euslisp which has enabled us to describe the geometric solid model and window interface for behavior design.Figure 4 shows some of the classes in the programming environent for remote-brained robot written in Euslisp. The hierachy in the classes provides us with rich facilities for extending development of various robots.4 Vision-Based BalancingThe robot can stand up on two legs. As it can change the gravity center of its body by controling the ankle angles, it can perform static bipedal walks. During static walking the robot has to control its body balance if the ground is not flat and stable.In order to perform vision-based balancing it is re-quired to have high speed vision system to keep ob-serving moving schene. We have developed a tracking vision board using a correlation chip[l3]. The vision board consists of a transputer augmented with a special LSI chip(MEP[14]: Motion Estimation Processor) which performs local image block matching.The inputs to the processor MEP are an image as a reference block and an image for a search window.The size of the reference block is up to 16 by 16 pixels.The size of the search window depends on the size of the reference block is usually up to 32 by 32 pixels so that it can include 16 * 16 possible matches. The processor calculates 256 values of SAD (sum of absolute difference) between the reference block and 256 blocks in the search window and also finds the best matching block, that is, the one which has the minimumSAD value.Block matching is very powerful when the target moves only in translation. However, the ordinary block matching method cannot track the target when it rotates. In order to overcome this difficulty, we developed a new method which follows up the candidate templates to real rotation of the target. The rotated template method first generates all the rotated target images in advance, and several adequate candidates of the reference template are selected and matched is tracking the scene in the front view. It remembers the vertical orientation of an object as the reference for visual tracking and generates several rotated images of the reference image. If the vision tracks the reference object using the rotated images, it can measures the body rotation. In order to keep the body balance, the robot feedback controls its body rotation to control the center of the body gravity. The rotational visual tracker[l5] can track the image at video rate.5 Biped WalkingIf a bipedal robot can control the center of gravity freely, itcan perform biped walk. As the robot shown in Figure 2 has the degrees to left and right directions at the ankle position, it can perform bipedal walking in static way.The motion sequence of one cycle in biped walking consists of eight phases as shown in Figure 6. One step consists of four phases; move-gravity-center-on-foot,lift-leg, move-forward-leg, place-leg. As the body is described in solid model, the robot can generate a body configuration for move-gravity-center-on-foot according to the parameter of the hight of the gravity center. After this movement, the robot can lift the other leg and move it forward. In lifting leg, the robot has to control the configuration in order to keep the center of gravity above the supporting foot. As the stability in balance depends on the hight of the gravity center, the robot selects suitable angles of the knees.Figure 7 shows a sequence of experiments of the robot in biped walking.6 Rolling Over and Standing UpFigure 8 shows the sequence of rolling over, sitting and standing up. This motion requires coordination between arms and legs.As the robot foot consists of a battery, the robot can make use of the weight of the battery for the roll-over motion. When the robot throws up the left leg and moves the left arm back and the right arm forward, it can get rotary moment aroundthe body. If the body starts turning, the right leg moves back and the left foot returns its position to lie on the face. This rollover motion changes the body orientation from face up to face down. It can be verified by the orientation sensor.After getting face down orientation, the robot moves the arms down to sit on two feet. This motion causes slip movement between hands and the ground. If the length of the arm is not enough to carry the center of gravity of the body onto feet, this sitting motion requires dynamic pushing motion by arms. The standing motion is controlled in order to keep the balance.7 Integration through Building Sensor-Based Transition NetIn order to integrate the basic actions described above, we adopted a method to describe a sensor-based transition network in which transition is considered according to sensor status. Figure 9 shows a state transition diagram of the robot which integrates basic actions: biped walking, rolling over, sitting, and standing up. This integration provides the robot with capability of keeping walking even when it falls down. The ordinary biped walk is composed by taking two states, Left-leg Fore and Right-leg Fore, successively.The poses in ‘Lie on the Back’ and ‘Lie on the Face’are as same as one in ‘Stand’. That is, the shape ofthe robot body is same but the orientation is different.The robot can detect whether the robot lies on the back or the face using the orientation sensor. When the robot detects falls down, it changes the state to ‘Lie on the Back’ or ‘Lie on the Front’ by moving to the neutral pose. If the robot gets up from ‘Lie on the Back’, the motion sequence is planned to execute Roll-over, Sit and Stand-up motions. If the state is ‘Lie on the Face’, it does not execute Roll-over but moves arms up to perform the sitting motion.8 Concluding RemarksThis paper has presented a two-armed bipedal robot whichcan perform statically biped walk, rolling over and standing up motions. The key to build such behaviors is the remote-brained approach. As the experiments have shown, wireless technologies permit robot bodies free movement. It also seems to change the way we conceptualize robotics. In our laboratory it has enabled the development of a new research environment, better suited to robotics and real-world AI.The robot presented here is a legged robot. As legged locomotion requires dynamic visual feedback control, its vision-based behaviors can prove the effectiveness of the vision system and the remote-brained system. Our vision system is based on high speed block matching function implemented with motion estimation LSI. The vision system provides the mechanical bodies with dynamic and adaptive capabilities in interaction with human. The mechanical dog has shown adaptive behaviors based on distance measurement by tracking. The mechanical ape has shown tracking and memory based visual functions and their integration in interactive behaviors.The research with a two-armed bipedal robot provides us with a new field in intelligent robotics research because of itsvariety of the possible behaviors created from the flexiblility of the body. The remote-brained approach will support learning-based behaviors in this research field. The next tasks in this research include how to learn from human actions and how to allow the robots to improve their own learned behaviors.References[1] I. Kate and H. Tsuik. The hydraulically powered biped walking machine with a high carrying capacity. In Proc. Of 4th Int. Sys. on External Control of Human Extremities,1972.[2] H. Miura and I. Shimoyama. Dynamic walk of a biped. International Journal of Robotics Research, Vol. 3, No. 2,pp. 60-74, 1984.[3] S. Kawamura, T. Kawamura, D. fijino, F. Miyazaki, and S. Arimoto. Realization of biped locomotion by motion pattern learning. Journal of the Robotics Society of Japan,Vol. 3, No. 3, pp. 177-187, 1985.[4] Jessica K. Hodgins and Marc H. Raibert.Biped gymnastics.International Journal of Robotics Research, Vol. 9,No. 2, pp. 115-132, 1990.[5] A. Takanishi, M. Ishida, Y. Yamazaki, and I. Kato. The realization of dynamic walking by the biped walking robotwl-lord. Journal of the Robotics Society of Japan, Vol. 3, No. 4, pp. 325-336, 1985.[6] R.B. McGhee and G.I. Iswandhi. Adaptive locomotion of a multilegged robot over rough terrain. IEEE Trans.On Systems, Man and Cybernetics,Vol.SMC-9,No.4,pp. 176-182,1979.[7] M. H. Raibert, Jr. H. B. Brown, and S. S. Murthy. 3-d balance using 2-d algorithms. Robotics Research : the First International Symposium on Robotics Research (ISRRI),pp. 279-301, 1983.[8] S. Hirose, M. Nose, H. Kikuchi, and Y. Umetani. Adaptive gait control of a quadruped walking vehicle. Robotics Research : the First International Symposium on Robotics Research (ISRRI), pp. 253-369, 1983.[9] R.B. McGhee, F. Ozguner, and S.J. Tsai. Rough terrain locomotion by a hexapod robot using a binocular ranging system. Robotics Research:the First International Symposium on Robotics Research (ISRR1),pp.228-251, 1984.[10] M. Inaba. Remote-Brained Robotics: Interfacing AI with Real World Behaviors. in Proceedings of the 6th International Symposium 1993; Robotics Research: The SixthInternational Symposium,pp.335-344.International Foundation for Robotics Research, 1993.[11] M. Inaba, S. Kagami, K. Sakakki, F. Kanehiro, and H. In-oue.Vision-Based Multisensor Integration in Remote- Brained Robots. In 1994 IEEE International Conference on Multisensor Fusion and Integration fo Intelligent Systems,pp. 747-754, 1994.[I2] M. Inaba, T. Kamada, and H. Inoue. Rope Handling by Mobile Hand-Eye Robots. In Proceedings of International Conference on Advanced Robotics ICAR’93,pp. 121-126,1993.[13] H. Inoue, T. Tachikawa, and M. Inaba. Robot vision system with a correlation chip for real-time tracking, optical flow and depth map generation. In Proceedings of the 1992 IEEE International Conference on Robotics and Automation,pp. 1621-1626. 1992.[14] SGS-THOMSON Microelectronics. STI3220 motion estimation processor (tentative data). In Image Processing Data Book, pp. 115-138. SGS-THOMSON Microelectronics, 1990.[15] Masayuki Inaba, Satoshi Kagami, and Hirochika Inoue. Real time vision-based control in sumo playing robot. InProceedings of the 1993 JSME International Conference on Advanced Mechatronics, pp.854-859,1993.中文译文：可以行走、翻身并站立的有两手和两足的机器人摘要在实践中把注意力集中在灵活性和智能反应，更重要的是创想，不是一个不会倒下的机器人，而是一个倒下来可以站起来的机器人。

孪生网络目标跟踪算法

第37卷第2期2021年2月福建电脑Journal of F ujian ComputerVol.37 No.2Feb.2021孪生网络目标跟踪算法程栋栋1吕宗旺1祝玉华2\河南工业大学信息科学与工程学院河南郑州450000)2(黄河水利职业技术学院河南开封475004)摘要在计算机视觉领域中，卷积神经网络发挥着越来越重要的作用。

在海量数据的驱动下，深度学习表现出了比传统方法更为优越的特征表达能力。

基于孪生网络的目标跟踪算法由于准确性和实时性等优点，相关研宄受到越来越多的重视。

本文首先阐述了计算机视觉的研宄意义，着重介绍了几种基于孪生网络的目标跟踪算法，最后总结了这些算法的优点以及未来的研宄方向。

关键词深度学习；孪生网络；目标跟踪中图法分类号 TP391 D0I:10.16707/ki.fjpc.2021.02.026Target Tracking Algorithms Based on Siamese NetworkCHENG Dongdong1,LV Zongwang1,ZHU Yuhua21(School of Information Science and Engineering,Henan University of Technology,Zhengzhou,China,450000)2(Yellow River Water Conservancy Vocational and Technical College,Kaifeng,China,475004)1引言计算机视觉的研究工作与人类现代化的生产生活密不可分，相关技术可以应用在智能视频监控、工厂自动化生产、无人驾驶等方面[1]。

对于目标跟踪的研究是计算机视觉领域的一个重要方向。

通常情况下，目标跟踪被定义为在一个连续的视频序列中，得到指定物体的位移信息，从而描绘出该物体的位移轨迹，并对其位移数据进行分析，最终达到理解物体运动行为的目的[2]。

高水平游泳运动员训练全过程监控系统设计

2020年12月第20卷第4期廊坊师范学院学报(自然科学版)Journal of Langfang Normal University(Natural Science Edition)Dec.2020Vol.20No.4高水平游泳运动员训练全过程监控系统设计王晓宇(淮南师范学院，安徽淮南232038)【摘要】构建高水平游泳运动员训练全过程的视频采集系统，对采集的视频图像进行信息转换分析,通过视觉特征信息跟踪识别方法进行训练过程的动作特征点融合，利用视频动作图像的信息识别，建立训练全过程视频图像的关键动作特征点的提取模型，继而提取视频图像的关键特征,通过空间三维信息融合方法建立训练动作三维重构模型，结合视频追踪和过程轨迹识别，提取有效动作信息检测，实现高水平游泳运动员训练全过程监控系统优化。

【关键词】高水平游泳运动员;训练全过程;监控;系统设计Design of Monitoring System for the Whole TrainingProcess of High-level SwimmersWang Xiaoyu(Huainan Normal University,Huainan232038,China)[Abstract]This paper designs a video acquisition system for the whole training process of high-level swimmers.It conducts information conversion analysis on the collected video images of the entire training process of high-level swimmers,and uses visual feature information tracking and recognition methods to integrate the movement feature points of the training process. Using the information recognition of the video action image,the system establishes the extraction model of the key movement feature points of the high-level swimmer training video image throughout the training process,then extracts the key features of the video image,and establishes three-dimensional reconstruction model of training movements based on information fusion bined with video tracking and process trajectory recognition,the system extracts effective action information for detection,and optimizes the monitoring system for the whole training process of high-level swimmers.[Key words]high-level swimmers;the whole training process;monitoring;system design〔中图分类号〕TP391〔文献标识码〕A〔文章编号]1674-3229(2020)04-0095-050引言随着机器视觉信息识别技术的发展,采用图像视觉监控识别方法,进行高水平游泳运动员训练全过程监控,采用优化的图像处理技术,建立机器视觉下的训练全过程视频监控和图像分析模型,通过对训练全过程视频特征采样和图像的信息提取，结合运动视频跟踪识别，提高高水平游泳运动员训练全过程监控和特征分析能力，对促进游泳训练水平提升具有重要意义⑴。

图像处理和计算机视觉中的经典论文

前言：最近由于工作的关系，接触到了很多篇以前都没有听说过的经典文章，在感叹这些文章伟大的同时，也顿感自己视野的狭小。

想在网上找找计算机视觉界的经典文章汇总，一直没有找到。

失望之余，我决定自己总结一篇，希望对 CV领域的童鞋们有所帮助。

由于自
己的视野比较狭窄，肯定也有很多疏漏，权当抛砖引玉了
1990年之前
1990年
1991年
1992年
1993年
1994年
1995年
1996年
1997年
1998年
1998年是图像处理和计算机视觉经典文章井喷的一年。

大概从这一年开始，开始有了新的趋势。

由于竞争的加剧，一些好的算法都先发在会议上了，先占个坑，等过一两年之后再扩展到会议上。

1999年
2000年
世纪之交，各种综述都出来了
2001年
2002年
2003年
2004年
2005年
2006年
2007年
2008年
2009年
2010年
2011年
2012年。

基于动态视觉显著性的感兴趣目标提取与跟踪_李蕙

［12 ］［11 ］
图1
基于动态视觉显著性的感兴趣目标提取模型
形成的，大小依次为输入视频帧的 1 /2 ～ 1 /256 。然后在 2 个金字塔层之间利用中央周边差操作进行特具体算法如下：通过将代表周边背景信息征的计算，的较小尺度的图像进行线性插值，使之与代表中心信息的较大尺度的图像大小相同，然后进行点对点的减操作，以符号 Θ 表示。假沿当前帧为 f n ，图像的亮度特征用下式计算： I f n （ c，s） = I f n （ c） ΘI f n （ s），（ 1）
c =2 s = c +3
（ 5） fn CO = N 〔 N（ O f （ c，s，θ））〕。 ∑ c =0 45 ， 90 ， 135° ） θ = （ 0，
n
3
（ 6）（ 4）图 2 为一幅彩色图像的静态分量显著图。图 2 b、 c、 d 分别为应用上述方法提中 a 为输入视频帧，取的颜色分量显著图、亮度分量显著图以及方向分量显著图。
Abstract ： When the tracking moving object in video image sequences by particle filter， usually the object to be tracked was manually selected in the first video frame or segmented by background subtraction． Inspired by the human visual mechanism，an algorithm which can locate the object of interest automatically based on dynamic visual saliency modeled by scale invariant feature transform （ SIFT） flow was proposed． To detect the object of interest， some other static features such as color， intensity， and orientation were also extracted． A saliency map was developed by combining both the static saliency and the dynamic saliency， and the most salient object was selected as the object of interest． This method was further applied to object tracking with particle filter． The object template was constructed by fusing the color，gradient， and local binary pattern（ LBP ） texture features． The results show that the proposed method can simulate the human’ s dynamic attention process to some extent when an object starts to move in a scene and track the object of interest robustly． Keywords： visual saliency； motion saliency； scale invariant feature transform flow； particle filter； local binary pattern； object tracking

视觉心理测试英文文章

视觉心理测试英文文章Visual Perception Test: Unraveling the Secrets of Your Mind.Visual perception is a fascinating and complex process that allows us to make sense of the world around us. Through our eyes, we receive countless amounts of information that our brains must swiftly and efficiently interpret, creating a coherent and meaningful representation of our surroundings. Visual perception tests are designed to assess various aspects of this remarkable ability, providing insights into how we see, process, and respond to visual stimuli.Types of Visual Perception Tests.Visual perception tests encompass a wide range of assessments, each designed to evaluate specific aspects of this cognitive function. Some of the most common types include:Acuity Tests: These tests measure the sharpness or clarity of vision, determining how well you can perceive fine details at different distances.Contrast Sensitivity Tests: These tests evaluate your ability to distinguish between objects of different brightness levels, assessing your sensitivity to contrast.Color Vision Tests: Color vision tests determine whether you can differentiate between different colors and detect color deficiencies.Depth Perception Tests: These tests measure yourability to perceive depth and three-dimensional space, assessing your binocular vision and stereopsis.Visual Field Tests: These tests map the extent of your peripheral vision, determining how much of your surroundings you can see without moving your eyes.Motion Perception Tests: These tests assess yourability to perceive movement and detect moving objects, evaluating your visual tracking and coordination skills.Visual Memory Tests: These tests measure your ability to remember and recognize visual information, assessing your visual working memory and long-term visual memory.Purpose and Applications.Visual perception tests serve a variety of purposes, including:Diagnosis of Visual Impairments: These tests help diagnose visual problems such as nearsightedness, farsightedness, astigmatism, and color blindness.Monitoring Eye Health: Regular visual perception tests can monitor changes in your vision over time, detecting potential eye diseases or conditions that may require treatment.Evaluating Cognitive Function: Visual perception testscan provide insights into cognitive abilities such as attention, memory, and processing speed, which are often impaired in neurodegenerative diseases like Alzheimer's.Research and Development: Visual perception tests are essential in research to understand how the visual system works and develop new treatments for visual impairments.Occupational Screening: Many industries requirecertain levels of visual acuity and other visual abilities for employment, and these tests help ensure that candidates meet the necessary standards.How to Prepare for a Visual Perception Test.Preparing for a visual perception test is generally straightforward. Here are some tips:Get a good night's sleep before the test.Avoid caffeine and alcohol before the test.Bring your eyeglasses or contact lenses if you normally wear them.Inform the examiner about any medications you are taking that may affect your vision.Understanding Your Results.After completing a visual perception test, your examiner will interpret your results and provide you with a report. Your results will indicate:Your visual acuity and other measures of visual function.Any visual impairments or abnormalities that may require further evaluation or treatment.Recommendations for follow-up care or lifestyle modifications to enhance your visual health.Conclusion.Visual perception tests are invaluable tools for assessing the health and function of our visual system. By providing insights into how we see and process visual information, these tests help diagnose visual impairments, monitor eye health, evaluate cognitive function, and guide research and development. By understanding the results of visual perception tests, we can take proactive steps to maintain optimal visual health and function throughout our lives.。

visual tracking 跟踪

Video content production and post-production (compositing, augmented reality, editing, re-purposing, stereo3D authoring, motion capture for animation, clickable hyper videos, etc.) Video content management (indexing, annotation, search, browsing) Valuable video
4 7/29/2013
With 3D (cinematic) shape prior
http://cvlab.epfl.ch/research/completed/realtime_tracking/
/~black/3Dtracking.html
5 7/29/2013
9
7/29/2013
With no appearance prior

Tracking bounding box and segmentation from user selection
/~cbibby/index.shtml
10
7/29/2013
Why?
Elementary or principal tool for multiple CV systems

Other sciences (neuroscience, ethology, biomechanics, sport, medicine, biology, fluid mechanics, meteorology, oceanography) Defense, surveillance, safety, monitoring, control, assistance Robotics, Human-Computer Interfaces Disposable video (camera as a sensor)

结合可变形卷积和注意力机制的目标跟踪方法

Science and Technology & Innovation｜科技与创新2024年第01期DOI：10.15913/ki.kjycx.2024.01.008结合可变形卷积和注意力机制的目标跟踪方法＊游丽萍，贝绍轶（江苏理工学院，江苏常州213000）摘要：为解决多数孪生网络目标跟踪算法特征提取能力弱、目标形变和遮挡场景适应性差等问题，提出一种结合可变形卷积和注意力机制的孪生网络目标跟踪算法。

首先，在特征提取网络中采用可变形卷积，使其能够自适应学习目标偏移量，提升模型适用性；然后，在骨干网络中引进SimAM注意力机制，在提升特征提取能力的同时减少计算量；最后，将公开数据集OTB2015和VOT2018与其他算法进行性能对比实验。

实验结果表明，所提方法的精确度和成功率比基准算法SiamFC在形变和遮挡等场景下有较好的鲁棒性。

关键词：孪生网络；目标跟踪；可变形卷积；注意力机制中图分类号：TP391.41 文献标志码：A 文章编号：2095-6835（2024）01-0031-04目标跟踪是计算机视觉的重要研究方向，被广泛应用于智能驾驶和现代军事等领域，但跟踪过程中存在的目标形变和遮挡等问题，使目标跟踪任务仍具有挑战性。

近年来，深度卷积因提取特征能力强被引入到目标跟踪领域。

DANELLJAN等（2016）[1]提出的C-COT 算法及NAM＆HAN（2016）[2]提出的MDNet算法都采用了深度卷积提取特征，虽然跟踪器效果较好，但跟踪速度慢，无法实时跟踪；BERTINETTO等（2016）[3]提出的SiamFC算法因实现了跟踪精度和速度的良好平衡而受到广泛关注。

SiamFC算法通过2个权值共享的AlexNet[4]网络提取特征，但AlexNet层数较浅，提取特征能力弱，跟踪效果不佳。

WANG等（2018）[5]提出的RASNet使用通道、残差和全局注意力模块对特征的空间和通道进行加权，提高了跟踪器的性能。

基于压缩感知的鲁棒性人脸表情识别

基于压缩感知的鲁棒性人脸表情识别施徐敢;张石清;赵小明【摘要】为了有效提高噪声背景下的人脸表情识别性能，提出一种基于压缩感知的鲁棒性人脸表情识别方法。

先通过对腐蚀的测试样本表情图像进行稀疏表示，再利用压缩感知理论寻求其最稀疏的解，然后采用求得的最稀疏解信息实现人脸表情的分类。

在标准的Cohn-Kanade表情数据库的实验测试结果表明，该方法取得的人脸表情识别性能优于最近邻法、支持向量机以及最近邻子空间法。

可见，该方法用于人脸表情识别，识别效果较好，鲁棒性较高。

%In order to effectively improve the performance of facial expression recognition under the noisy background, a method of robust facial expression recognition based on compressed sensing is proposed. Firstly, the sparse representation of corrupted expression images of the identified test sample is sought, then the compressed sensing theory is used to solve its sparsest solution. Finally, according to the sparsest solution, facial expression classification is performed. Experimental results on benchmarking Cohn-Kanade database show that facial expression performance obtained by this method is better than the nearest neighbor (NN), support vector machine (SVM) and the nearest subspace (NS). Therefore, the proposed method shows both good recognition performance and high robustness on facial expression recognition tasks.【期刊名称】《计算机系统应用》【年(卷),期】2015(000)002【总页数】4页(P159-162)【关键词】压缩感知;稀疏表示;表情识别;鲁棒性;腐蚀【作者】施徐敢;张石清;赵小明【作者单位】浙江理工大学机械自动控制学院，杭州 310018; 台州学院图像处理与模式识别研究所，临海 317000;台州学院图像处理与模式识别研究所，临海317000;浙江理工大学机械自动控制学院，杭州 310018; 台州学院图像处理与模式识别研究所，临海 317000【正文语种】中文人脸表情是人们观察情感的重要标志, 如何使得机器能够认识人脸表情, 是一个既实用又有趣的研究方向. 如何让机器自动、高效、准确地来识别人类的情绪状态, 比如高兴、悲伤、愤怒、恐惧等, 即所谓的“人脸表情识别”[1]方面的研究, 是当前信号处理、模式识别、计算机视觉等领域的热点研究课题. 该研究在智能人机交互、人工智能等方面有着重要的应用价值.尽管人脸表情识别经过了多年的发展, 已经取得了较多的研究成果, 但现有的人脸表情识别研究[2-10]大多没有考虑表情图像受到噪声的影响. 在自然环境中, 人脸表情图像的获取、传输和存贮过程中常常也会受到各种噪声(如姿态、光照、腐蚀、遮挡等)的干扰而使图像降质, 从而导致人脸表情识别的性能会随之下降. 因此, 如何提高人脸表情识别的鲁棒性仍然是一个亟需解决的问题.压缩感知(Compressed sensing)或压缩采样(Compressive sampling)[11,12], 是近年来新出现的一种信号采样理论, 它可以在远小于Nyquist采样率的条件下获得信号的离散样本, 然后通过非线性重建无失真的完美信号. 压缩感知理论指出, 采样速率由信号中的内容和结构所决定, 而不再决定于信号的带宽. 目前, 压缩感知理论在图像处理[13]、人脸识别[14]、视频追踪[15]等领域受到了研究者的高度关注, 并表现出了极其强大的生命力, 但在人脸表情识别领域, 尤其针对鲁棒性的人脸表情识别问题, 国内外相关的文献报道甚少.压缩感知理论研究的初衷主要用于信号的压缩和表示, 但其最稀疏的表示具有很好的判别性. 本文利用压缩感知理论中的稀疏表示分类(Sparse Representation-based Classification, SRC)思想[14], 提出一种基于压缩感知的鲁棒性人脸表情识别方法. 先通过对腐蚀的测试样本表情图像进行稀疏表示, 再利用压缩感知理论寻求其最稀疏的解, 然后采用求得的最稀疏解信息实现人脸表情的分类. 在标准的Cohn-Kanade表情数据库[16]的实验结果表明了该方法的可行性.设A=[A1,A2,…,AC]是一组训练样本集, 总数量为ｎ, 其中为第i类训练样本,y∈Rm是第i类的测试样本, 它可以由线性表示为:然而在实际情况中, 由于测试样本的类别一般是未知的, 所以式1可以写为式中, .由矩阵原来可知, m>n时, 矩阵(2)有唯一解; 但是在大多数情况下, m≤n, 此时矩阵(2)有无穷多个解. 为了使测试样本能够用自身所在类的训练样本进行线性表示, 这样的话系数向量x0中的非零向量应该尽可能少些. 所以对矩阵(2)求解可转换对矩阵(3)进行求解式中, ||·||0 表示l0范数, 它的作用是计算向量中非零元素的个数. 但是, 式(3)的求解非常困难, 这是个NP难题.由压缩感知理论可知: 当所求的系数足够稀疏时,可以把最小化l1范数的NP难题转化成最小化l1范数问题来求解．因此, 把式(3)改写为:然而在实际情况中, 获得的数据中经常含有噪声, 因此ｙ很难由A进行比较准确的线性表示, 因此, 把式(4)改写为式(5)可以通过以下的式(6)来求解SRC算法可归纳如下:1)对训练样本集A中的每一个列向量进行归一化．2)求解最小化l1范数问题:或者求解3) 计算残差4) . 是的标记.本文采用标准的Cohn-Kanade[16]数据库进行实验. 通过对Cohn-Kanade数据库的原始图像采样得到32×32像素图像, 然后分别采用稀疏表示分类方法SRC、最近邻法(Nearest neighbor, NN), 支持向量机(Support Vector Machine, SVM), 以及近年来流行的最近邻子空间法(Nearest subspace, NS)[17]进行人脸表情识别实验, 并比较它们的性能.除了SRC方法, 使用的其它分类方法的基本思想表述如下: 最近邻法(NN)是基于样本学习的K近邻分类器(KNN), 当K=1时的一种情况. 支持向量机(SVM)是一种基于统计学习理论的分类器. 本文SVM采用“一对一”多类分类算法, 核函数为径向基函数, 并对核函数参数值进行最优化, 即在训练样本数据上使用交叉验证方法实现. 最近邻子空间法(NS)是一种基于信号重构的无参数分类器, 其分类思想是将测试样本表示为各类所有训练样本的线性组合, 从中选择最优解来进行分类. 2.1 表情数据库Cohn-Kanade数据库含有210个对象的大约2000个左右的具有充足的正面光照的灰度图像序列. 图像序列的分辨率都是640×490. 该数据库总共含有七种基本的表情, 如生气, 高兴、悲伤、惊奇、讨厌、害怕以及中性, 如图1所示. 我们从数据库中选用来自96个对象的320图像序列用于实验测试. 选择图像序列的标准是能够标记出除中性之外的六种表情. 然后对每个选择的图像序列中提取出一帧中性表情图像以及一帧含六种表情之一的图像. 最后我们提取出包括七种表情的470幅图像, 其中生气32个, 高兴100个, 悲伤55个, 惊奇75个, 害怕47个, 讨厌45个和中性116个.2.2 无腐蚀的人脸表情识别实验在该实验中, 直接使用32×32像素大小的图像样本用于表情识别, 图像中不存在任何腐蚀现象. 表1列出了SRC、NN, SVM和NS四种不同方法所取得的人脸表情识别性能. 由表1可知, 在无任何腐蚀图像的条件下, 稀疏表示分类方法SRC取得的人脸表情识别性能最好, 达到94.76%的识别率. 这表明了SRC用于人脸表情识别具有优越的分类性能.为了进一步给出七种表情中不同表情的具体识别性能, 表2给出了在Cohn-Kanade数据库上SRC方法采用32×32像素所取得的不同表情的识别结果. 从表2的实验结果可见, 在Cohn-Kanade数据库上七种表情中大部分表情的正确识别率达到了100%.2.3 有腐蚀的人脸表情识别实验为了检验SRC的鲁棒性人脸表情识别性能, 对32×32像素大小的测试图像随机添加像素腐蚀(Pixel Corruption). 随机添加像素腐蚀就是从测试图像中随机选择一定比例的像素, 采用范围之内的随机值进行替代, 其中表示第个测试图像的最大像素值. 实验中, 像素腐蚀比例从0%到90%, 依次递增10%. 图2展示了Cohn-Kanade数据库中一副原始图像从采样到腐蚀的过程, 其中图(a)为原始640×490像素的图像, 图(b)为采样之后的32×32像素的图像, 图(c)对32×32像素图像添加50%的腐蚀比例之后的图像.图3列出了NN、SVM、NS和SRC四种方法在Cohn-Kanade数据库上随机添加像素腐蚀比例从0%到90%取得的识别结果. 由图3实验结果可见, 随着图像腐蚀比例的增大, 图像越来越模糊, 人脸表情识别率也随之下降. 在图像腐蚀比例由0%增长到30%为止, SRC的正确识别率下降速度缓慢, 而其他三种方法的识别率下降非常快. 随之腐蚀比例的不断增大(30%至90%), 各种方法的识别率都一致下降, 但是SRC方法的识别率平均超过其它三种方法10%以上. 显然, 我们看到了SRC方法在处理人脸表情问题上有着良好的鲁棒性. 这主要是SRC方法提取了信号的稀疏结构, 并利用l1范数来作为来求解信号的稀疏表示系数. 由于采用正则化技术, SRC 的稀疏表示系数具有非常稳定的数值解.本文通过考虑测试图像是否存在像素腐蚀的现象, 对基于压缩感知理论的稀疏表示分类方法SRC的鲁棒性人脸表情识别性能进行了探讨. 在无任何像素腐蚀的人脸表情识别实验中, SRC取得的人脸表情识别性能比其他方法高出2%左右, 而在有像素腐蚀图像的人脸表情识别实验中, SRC展示出了良好的鲁棒性性能, 尤其在像素腐蚀比例30%至90%之间, SRC比其他方法的识别率平均高出10%以上. 这表明本文采用的基于压缩感知理论的稀疏表示分类方法SRC用于鲁棒性人脸表情识别时, 拥有良好的分类性能和鲁棒性.1 Tian Y, Kanade T, Cohn JF. Facial expression recognition. Handbook of Face Recognition, 2011: 487–519.2 刘晓旻,章毓晋.基于Gabor直方图特征和MVBoost的人脸表情识别.计算机研究与发展,2007,44(7):1089–1096.3 刘帅师,田彦涛,万川.基于Gabor多方向特征融合与分块直方图的人脸表情识别方法.自动化学报,2012,37(12): 1455–1463.4 易积政,毛峡,薛雨丽.基于特征点矢量与纹理形变能量参数融合的人脸表情识别.电子与信息学报,2013,35(10): 2403–2410.5 朱晓明,姚明海.基于局部二元模式的人脸表情识别.计算机系统应用,2011,20(6):151–154.6 Aleksic PS, Katsaggelos AK. Automatic facial expression recognitionusing facial animation parameters and multistream HMMs. IEEE Trans. on Information Forensics and Security, 2006, 1(1): 3–11.7 Zheng W, Zhou X, Zou C, et al. Facial expression recognition using kernel canonical correlation analysis (KCCA). IEEE Trans. on Neural Networks, 2006, 17(1): 233–238.8 Zhao G, Pietikainen M. Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2007, 29(6): 915–928.9 Zhao X, Zhang S. Facial expression recognition using local binary patterns and discriminant kernel locally linear embedding. EURASIP Journal on Advances in Signal Processing, 2012, (1): 20.10 Yurtkan K, Demirel H. Feature selection for improved 3D facial expression recognition. Pattern Recognition Letters, 2014, 38: 26–33.11 Candes EJ, Wakin MB. An introduction to compressive sampling. IEEE Signal Processing Magazine, 2008, 25(2): 21–30.12 Donoho DL. Compressed sensing. IEEE Trans. on Information Theory, 2006, 52(4): 1289–1306.13 Yang J, Wright J, Huang TS, et al. Image super-resolution via sparse representation. IEEE Trans. on Image Processing, 2010, 19(11): 2861–2873.14 Wright J, Yang AY, Ganesh A, et al. Robust face recognition via sparse representation. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2009, 31(2): 210–227.15 Mei X, Ling H. Robust Visual Tracking and Vehicle Classification via Sparse Representation. IEEE Trans. on Pattern Analysis and MachineIntelligence, 2011, 33(11): 2259–2272.16 Kanade T, Tian Y, Cohn J. Comprehensive database for facial expression analysis. International Conference on Face and Gesture Recognition. Grenoble, France. 2000. 46–53.17 Lee KC, Ho J, Kriegman DJ. Acquiring linear subspaces for face recognition under variable lighting. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2005, 27(5): 684–698.。

基于仿生眼的无人机视觉跟踪云台摄像机控制系统(英文)

In the UAV visual tracking system, the UAV, the onboard camera and the ground target are all in motion. Therefore, the system has the following characteristics. Firstly, UAV is inherently unstable and capable of exhibiting high acceleration rates, in addition there are the engine vibration and the wind, thus the image from the onboard camera is very unstable. Moreover, the image can't be processed using normal spatio-temporal filtering of the camera image sequence for estimation of local motion. Next, the system demands high-performance real-time processing. But UAV has limited on-board power and payload capacity, so that it limits the usage of the on-board hardware. Thus, the new appropriate hardware must be designed; meanwhile, the adaptive robust algorithm must be introduced. The vision and the control system must be compact, efficient, and lightweight for effective on-board integration. Thirdly, the on-board camera is always in motion, the distance and direction between the on-board camera and ground moving target change constantly, so the existing target recognition and tracking method, which fit for the image from the static camera, is no longer suitable. It is insufficient to only control the camera for UAV visual tracking system. The rolling angle and pitching angle of the on-board camera are requested to be extremely accurate. Once the angle has a tiny deflection, the target may be lost. Moreover, the flight attitude seriously influences the view of the on-board camera. The stable tracking must coordinate to ontrol the flight attitude of UAV and the movement of the onboard camera.

目标跟踪Visual tracking总结汇报

非线性:
w

i
(
x
i
)
i
求解（闭环解）：
n
f (z) wT z i (z, xi) i 1
=（K+ I）1 y K ij (xi, x j)
检测：
f(z) (K Z )T
CSK的实现
检测阶段，从新的一帧中采样图像块 z，利用学习到的目标表观 x 和分器 α，可得出 z 内所有位置处的检测响应
训练：Ridge regression (岭回归)
线性:
f (z) wT z
x y min
(f( )
2
)
w2
w
i
i
i
求解(闭环解):
w ( X T X I )1 X T y
若样本在原始特征空间线性不可分，则通过一个映射函数 φ (x) 将输入的特征值映射到一个更
高维的特征空间实现线性可分
DCF：
d
S (x) xl * f l f l 1
Loss
function: SRDCF：
t
2
d
2
t( f )
S x y ( ) f l
k
fk
k
k 1
l 1
t
2
d
2
t( f )
S x y ( ) W f l
k
fk
k
k 1
Input image
Extraction
Sample
x patch
Cyclic
sxh(imfts, n)
Kernelized least
squaresA=Yuxu x(m, n) xm,n, x

CVPR2013总结

CVPR2013总结前不久的结果出来了，⾸先恭喜我⼀个已经毕业⼯作的师弟中了⼀篇。

完整的⽂章列表已经在CVPR的主页上公布了（），今天把其中⼀些感兴趣的整理⼀下，虽然论⽂下载的链接⼤部分还都没出来，不过可以follow最新动态。

等下载链接出来的时候⼀⼀补上。

由于没有下载链接，所以只能通过题⽬和作者估计⼀下论⽂的内容。

难免有偏差，等看了论⽂以后再修正。

显著性Saliency Aggregation: A Data-driven Approach Long Mai, Yuzhen Niu, Feng Liu 现在还没有搜到相关的资料，应该是多线索的⾃适应融合来进⾏显著性检测的PISA: Pixelwise Image Saliency by Aggregating Complementary Appearance Contrast Measures with Spatial Priors Keyang Shi, Keze Wang, Jiangbo Lu, Liang Lin 这⾥的两个线索看起来都不新，应该是集成框架⽐较好。

⽽且像素级的，估计能达到分割或者matting的效果Looking Beyond the Image: Unsupervised Learning for Object Saliency and Detection Parthipan Siva, Chris Russell, Tao Xiang, 基于学习的的显著性检测Learning video saliency from human gaze using candidate selection , Dan Goldman, Eli Shechtman, Lihi Zelnik-Manor这是⼀个做视频显著性的，估计是选择显著的视频⽬标Hierarchical Saliency Detection Qiong Yan, Li Xu, Jianping Shi, Jiaya Jia的学⽣也开始做显著性了，多尺度的⽅法Saliency Detection via Graph-Based Manifold Ranking Chuan Yang, Lihe Zhang, Huchuan Lu, Ming-Hsuan Yang, Xiang Ruan这个应该是扩展了那个经典的 graph based saliency，应该是⽤到了显著性传播的技巧Salient object detection: a discriminative regional feature integration approach , Jingdong Wang, Zejian Yuan, , Nanning Zheng⼀个多特征⾃适应融合的显著性检测⽅法Submodular Salient Region Detection , Larry Davis⼜是⼤⽜下⾯的⽂章，提法也很新颖，⽤了submodular。

人脸识别算法在考勤系统的应用

人脸识别算法在考勤系统的应用①王治强, 孙晓东, 杨　永, 孙　鹏(大连东软信息学院智能与电子工程学院, 大连 116023)通讯作者: 王治强摘　要: 深度学习与大数据技术的相遇, 促使人脸识别技术在精度上已经达到很高水平, 然而在实际应用场景中, 尤其在复杂背景、移动中以及自然状态下的人脸识别, 还没有达到令人满意的效果. 针对人脸识别在考勤应用中的问题进行算法设计与改进, 提出递归最小窗口算法, 对M:N 多人脸识别场景下人脸跟踪算法进行优化设计, 通过多角度采样提高识别精度和识别鲁棒性, 并在人脸考勤系统中进行应用实现与验证, 取得多人同步3 s 内完成考勤的成绩, 在用户体验上获得了较明显的提升.关键词: 人脸识别; 人脸跟踪; M:N 识别; 多角度人脸识别引用格式: 王治强,孙晓东,杨永,孙鹏.人脸识别算法在考勤系统的应用.计算机系统应用,2021,30(1):89–93. /1003-3254/7755.htmlApplication of Face Recognition Algorithm in Attendance SystemWANG Zhi-Qiang, SUN Xiao-Dong, YANG Yong, SUN Peng(School of Intelligence and Electronic Engineering, Dalian Neusoft University of Information, Dalian 116023, China)Abstract : The encounter between deep learning and big data technology has prompted face recognition technology to achieve a high level of accuracy. However, in actual application scenarios, especially in complex background, moving and natural face recognition, it has not yet achieved people’s satisfactory. Aiming at the problem of face recognition in attendance system, we propose using recursive minimum window algorithm to optimize the design of the face tracking algorithm in the M:N multi-face recognition scenario, and using multi angle sampling to improve the recognition accuracy and robustness. The proposed method is implemented and verified in the face attendance system. Achievement of multi-person synchronization within 3 seconds is achieved, and the user experience has been significantly improved.Key words : face recognition; face tracking; M:N recognition; multi angle face recognition随着深度学习的飞跃发展, 人脸识别技术趋于成熟, 被广泛应用于企业、学校、机构等各种考勤场景.使用人脸识别进行考勤克服了传统IC 卡考勤易忘、易替代的缺陷, 也比指纹考勤更方便卫生, 易被用户所接受[1]. 然而, 现今的人脸识别考勤系统普遍识别速度慢, 在早晚高峰的时候排队考勤成为常事, 并且考勤的时候头部要摆的很端正才可以识别成功, 因此, 用户体验并没有达到令人满意的程度[2].本文针在系统设计上对现今人脸考勤系统存在的速度慢、只能逐人识别不能群识别、需要摆正头部体验不自然等缺陷进行改善, 实现体验自然的人脸识别考勤系统.1 人脸识别算法解析在人脸识别应用中, 涉及到的人脸识别相关算法主要包括人脸检测、人脸跟踪以及人脸匹配3方面[3].由于大数据技术与深度学习的结合, 使得人脸识别在精度和鲁棒性方面得到很大的提高[4], 然而由于人脸的计算机系统应用 ISSN 1003-3254, CODEN CSAOBNE-mail: Computer Systems & Applications,2021,30(1):89−93 [doi: 10.15888/ki.csa.007755] ©中国科学院软件研究所版权所有.Tel: +86-10-62661041① 收稿时间: 2020-06-05; 修改时间: 2020-06-30; 采用时间: 2020-07-10; csa 在线出版时间: 2020-12-31非刚性特点和角度等多种因素的影响, 在实际应用场景中并没有达到令人满意的效果, 尤其是在移动中和人的自然状态下多角度人脸识别还存在诸多难点和挑战[5,6]. 此外, 由于人脸跟踪抓拍算法差异较大, 导致人脸识别算法的运行效率和匹配效果产生了较大差异.有的研究中通过分布式技术, 采用服务器机群解决速度提升[7], 虽然有一定成效, 但成本巨大, 难于推广.本文针对考勤场景中的人脸识别算法进行了多种实验与探索, 最终归结出导致在实际应用场景中效果不佳的最核心的问题主要有两方面: 一是要解决人脸的跟踪, 尤其在M:N模式下的多人脸跟踪问题; 二是多角度人脸的精确匹配问题. 由于人脸精确匹配的运算量极大, 如果对图像进行逐帧的特征提取及匹配运算, 在普通处理器的计算能力下是无法满足的. 另一方面, 运动中人脸的角度变幻莫测, 若采用逐帧匹配, 在角度不佳情况下漏识率会大大提高[8]. 因此, 本文重点关注这两点问题, 优化人脸跟踪算法降低实时图像处理运算量, 同时在跟踪过程中捕捉最佳角度和图片质量的人脸图像, 从而提高精确匹配成功率[9,10].此外, 提出多角度采样的方法提高自然状态下人脸识别通过率. 在人脸识别考勤系统中进行实现和验证, 实验证实这样的解决方案提高了人脸识别的速度与识别通过率.2 人脸识别算法设计2.1 人脸跟踪算法人脸跟踪算法主要有3种: 基于模型匹配的跟踪、基于区域匹配的跟踪和基于特征匹配的跟踪[11].在实际应用中基于特征匹配的方法使用较为广泛, 如Azarbayejani等提出的递归估计算法[12]和清华大学的艾海舟等采用的基于人脸检测的跟踪算法[13]. 但是在多人脸跟踪情况下, 如何降低人脸检测中的负荷以及如何保证后进入摄像机视野的人脸被快速检测仍然是研究的重点.本文的人脸跟踪算法属于特征匹配, 是在递归估计算法思想基础上进行改进, 提出一种递归最小检测窗口的方法(Recursion Minimum Window, RMW)降低人脸检测的负荷, 并进行多线程扩展, 从而实现多人脸同步跟踪. 本文采用的人脸检测算法是的基于模板的方法, 该方法鲁棒性较好, 检测运算量与图像像素大小成正比[14]. RMW方法工作原理就是尽可能缩小人脸检测的范围, 从而提升处理速度. 具体处理方法为, 根据人脸尺寸和运动速度预测目标在下一帧中最大可能的运动范围, 也就是包含目标物体在内的最小图像, 然后基于预测的目标范围裁剪图像, 作为人脸检测算法的输入. 如此便可以在图像输入到人脸检测模块之前极大的裁剪掉无效图像, 提升目标检测的效率和准确性.如图1所示为RMW方法工作原理.原始图像剪裁图像人脸检测RMW 预处理器预测当前帧最小窗口图1 RMW方法原理图W,H W′H′V RσRMW算法对单人脸检测跟踪处理速度提升计算公式如式(1)所示, 为原始图像的宽高, 和为最小预测窗口的宽高, 为人脸移动速度, 为视频图像的帧率, 为常量. 由公示可见, 速度提升倍数约为原始图像面积和人脸边框面积的比值. 即人脸在图像中尺寸越小, 则跟踪速度越快.RMW算法的整体处理流程如图2所示. 首先根据上一帧跟踪状态预测当前帧最小检测窗口, 如果是起始帧则窗口初始化为整幅图像, 然后根据最小窗口剪裁图像, 调用人脸检测算法检测人脸并根据人脸信息更新跟踪状态, 如此循环跟踪直到退出.开始预测当前帧最小检测窗口根据预测窗口剪裁图像对图像进行人脸检测更新人脸跟踪状态退出？否是结束图2 RMW算法处理流程多人脸同步跟踪情况下, 对RMW算法进行多线程设计, 每张人脸使用独立线程跟踪, 多人脸跟踪的处理线程模型如图3所示. 结构上包含全幅人脸检测线程和RMW人脸跟踪线程两部分, 全幅检测线程执行周期较长, 可以间隔多个图像帧执行一次, 为发现新进计算机系统应用2021 年第 30 卷第 1 期入摄像机视野的人脸做整幅图像的人脸检测, 一旦检测到新人脸, 则启动RMW 跟踪线程对人脸进行跟踪,RMW 跟踪线程检测目标丢失的时候则退出, 全幅检测线程发现新人脸则继续添加新的跟踪线程, 从而持续维护跟踪列表.全幅人脸检测全幅人脸检测添加跟踪线程添加跟踪线程RMW 人脸跟踪(人脸#n)RMW 人脸跟踪(人脸#m)RMW 人脸跟踪(目标#m)目标丢失跟踪结束图3 多人脸跟踪线程模型从摄像机获取一帧实时图像, 采用RMW 算法预测截取最小检测窗口, 然后调用人脸识别库检测人脸,更新跟踪列表并计算人脸角度、速度、图像质量等状态信息, 通过线程池方式并发执行跟踪处理, 并在一帧检测的开头和结束进行线程同步. 具体一帧图像的检测跟踪的详细流程图如图4所示.一帧跟踪开始一帧跟踪结束多线程实现判断删除条件并删除人脸启动 RMW 线程调用颜行人脸库进行人脸比对确定最相似人脸判断人脸检测数量根据 RMW 算法预测最小窗口从帧图像中截取窗口图像调用颜行人脸库检测人脸根据检测结果更新跟踪列表从摄像头获取图像帧从跟踪列表获取一张人脸的跟踪信息 (尺寸、位置、速度等)图4 RMW 人脸跟踪处理详细流程图2.2 多角度人脸匹配精度优化自然状态下的人脸被摄像机捕捉到的角度与状态存在各种可能, 传统多角度人脸识别算法有很多种, 如线性判别分析(LDA)、主成分分析(PCA)、回归函数法等[15,16]. 然而在实际应用中, 在采集人脸信息环节并不容易实现. 为了让用户体验更加自然, 本文在人脸采集的时候对采集角度做了扩展, 每张脸采集: 正中、偏上、偏下、偏左、偏右5张照片. 在精确匹配的时候根据抓拍人脸的当时状态完成匹配, 同时加上人脸角度的约束条件, 则会使匹配结果更加精确. 人脸采集的图样如图5所示.图5 人脸多角度采集3 人脸考勤应用系统设计3.1 系统功能需求本文所设计的人脸识别考勤系统主要包含如下几方面功能:1) 用户管理功能: 采集用户的人脸信息, 还有对员工信息的增、删、改、查管理功能;2) 人脸识别功能: 采用实时M:N 的方式进行多人脸同步识别, 支持多人同时考勤, 并支持人脸大角度识别, 增强用户体验;3) 考勤管理功能及其他: 包含员工考勤管理、数据统计、考勤设置、考勤语音提示和系统设置功能.3.2 系统软件设计.人脸识别验证功能是本系统的核心功能, 系统使用第三方颜行人脸识别库实现人脸检测与匹配, 软件模块图如图6所示. RMW 人脸跟踪以及人脸匹配的应用算法在人脸跟踪识别模块中实现, 调用颜行Facevall lib 实现人脸检测与匹配. 调用Speech lib 实现语音提示功能.2021 年第 30 卷第 1 期计算机系统应用GUI 用户界面人脸考勤管理人脸跟踪匹配模块Camera LibFacevall LibSpeech LibDatabase数据库业务逻辑用户管理考勤记录报表统计系统设置图6 系统软件结构框图4 系统实现4.1 系统硬件实现为满足计算性能要求, 考勤终端硬件设计采用X86架构CPU 为Intel i7-8550u, RAM 为8 GB 的工控机为主控单元, 使用1080P 帧频30 fps 的高清摄像机采集人脸图像, 外接触摸屏等人机交互外设.4.2 系统软件实现软件开发基于Windows C# Visual Studio 开发环境, 依据面向对象设计思想, 采用多线程方式, 实现了用户界面、图像采集、人脸跟踪和检索算法、考勤管理和网络通信等功能. 软件界面基于WPF 窗体设计,采用C#5新特性async await 关键字和lambda 表达式方便地实现多线程异步执行, 提升系统并行效率. 在多线程并发执行方面, 用线程池技术实现多线程并发处理. 数据存储采用MySQL 数据库. 考勤终端与数据库采用C/S 架构, 支持多终端同步运行, 多终端同步考勤.人脸考勤界面采用单独的窗口, 实现效果图如图7所示. 人脸考勤成功后会记录数据库, 保存考勤时人脸照片, 用于考勤记录查询及统计报表功能.图7 人脸考勤界面人脸信息采集窗口如图8所示, 采集上下左右及正中5张人脸照片.5 系统测试针对人脸识别在实际应用中的效果, 从两方面验证算法性能, 一方面是处理每个图像帧的时间效率, 另一方面从实际应用的效果, 即识别通过率进行测试.图8 人脸信息采集界面5.1 图像帧处理效率测试为测试人脸跟踪与识别算法的效率, 对一分钟1800帧视频图像进行识别, 统计平均每帧图像的处理时间. 为检验RMW 人脸跟踪算法性能, 与不使用RMW 算法, 即与每帧图像都全幅检测的普通跟踪算法相对比, 计算RMW 算法在处理效率方面的效果. 测试视频中人脸移动速度较慢, 小于0.5 m/s, 分别在不同人数和距离情况下测试了多组数据.如图9所示, 从测试结果可以看出, RMW 算法大大降低了每帧图像的平均处理时间, 效率提升很明显,为普通算法的两倍以上, 并且人数越少效果越明显. 另外, 从RMW 算法的两组数据可以看出, 人脸距离摄像机的距离对处理时间的影响也很大, 因为距离越远, 人脸所占像素越小, 检索与匹配的计算量也越小.1234同时识别人脸数量普通算法 (1 m)RMW 算法 (1 m)普通算法 (2 m)RMW 算法 (2 m)5678910图9 人脸数与每帧处理时间曲线图5.2 人脸识别通过率测试为检验人脸识别实际应用的通过率, 使用1000人共5千张人脸的人脸检索库, 测试从单人到6人同时考勤效果. 运动状态包括静止、缓慢行走(0.5 m/s)、正常速度行走(1 m/s). 测试结果记录识别时间和漏识次数. 时间记录从距离摄像机3 m 开始计时到识别到人脸后屏幕提示姓名为止的时间.计算机系统应用2021 年第 30 卷第 1 期测试结果如图10所示, 静止状态下识别率都达到100%, 人数的增加和移动速度的加快都会降低识别通过率, 平均识别时间会随人数增加而增加. 漏识的主要原因是由于人脸移动速度快的情况下, 还没等到系统捕捉到高质量的图像, 人就已经走过去了, 并且人数越多, 捕捉人脸需要的时间就越长, 从而导致漏识几率增大.1234同步识别人数静止通过率静止识别时间慢行通过率慢行识别时间常速行通过率常速行识别时间56图10 识别通过率和平均识别时间结果图5.3 自然状态下人脸识别测试为测试人脸检索库引入正中和上下左右5张人脸图片的效果, 对缓慢行走状态下人脸识别通过率进行测试, 分别对人脸角度为正中、偏左、偏右、偏上、偏下5种状态进行测试. 测试结果如表1所示.表1 单张采集与多张采集识别通过率对比(单位: %)采集数正中偏上偏下偏左偏右平均单张1007095808586多张100901001009597由测试结果可见, 采集多角度采集的情况与单张采集情况相比, 除了正中角度外, 其余4个角度都有明显提升, 平均通过率提升11%, 可见该方法在实际应用中是切实有效的.6 结论本文在人脸识别考勤系统的设计中引入RMW跟踪算法, 提高了多人脸同步跟踪的效率, 并通过采集正中、上下左右5张人脸照片的方式增强自然状态下人脸识别通过率, 取得了较有成效的实际应用效果. 但在同时考勤人数大于4的时候, 由于处理性能和摄像机图像质量下降的原因, 会导致漏识率明显增高. 然而, 对于一般应用场景, 本文的算法基于现有人脸识别库、普通处理器以及相机条件下, 已经可以达到实际应用需求.如要进一步提高应用效果, 除了继续优化算法, 还可以从提高处理器性能和采用高速相机角度进行解决.参考文献李雄, 文开福, 钟小明, 等. 基于深度学习的人脸识别考勤管理系统开发. 实验室研究与探索, 2019, 38(7): 115–118.[doi: 10.3969/j.issn.1006-7167.2019.07.027]1景晨凯, 宋涛, 庄雷, 等. 基于深度卷积神经网络的人脸识别技术综述. 计算机应用与软件, 2018, 35(1): 223–231.2Kan MN, Wu JT, Shan SG, et al. Domain adaptation for face recognition: Targetize source domain bridged by common subspace. International Journal of Computer Vision, 2014, 109(1): 94–109.3Kim M, Kumar S, Pavlovic V, et al. Face tracking and recognition with visual constraints in real-world videos.Proceedings of 2008 IEEE Conference on Computer Vision and Pattern Recognition. Anchorage, AK, USA. 2008. 1–8. 4Rencher AC, Christensen WF. Methods of Multivariate Analysis. 3rd ed. New York: Wiley, 2012.5Zhao W, Chellappa R, Phillips PJ, et al. Face recognition: A literature survey. ACM Computing Surveys, 2003, 35(4): 399–458. [doi: 10.1145/954339.954342]6王昱, 阎苹, 丁明跃. 一个快速人脸识别系统. 计算机与数字工程, 2004, 32(3): 16–18. [doi: 10.3969/j.issn.1672-9722.2004.03.006]7张涛, 蔡灿辉. 基于多特征Mean Shift的人脸跟踪算法. 电子与信息学报, 2009, 31(8): 1816–1820.8Zheng YY, Yao J. Multi-angle face detection based on DP-Adaboost. International Journal of Automation and Computing, 2015, 12(4): 421–431. [doi: 10.1007/s11633-014-0872-8]9高修峰, 张培仁, 李子青. 人脸图像质量评估标准. 小型微型计算机系统, 2009, 30(1): 95–99.10伊昭荣, 郑豪, 岳国宾. 人脸跟踪技术综述. 福建电脑, 2012, 28(11): 7–9, 15. [doi: 10.3969/j.issn.1673-2782.2012.11.005]11Azarbayejani A, Horowitz B, Pentland A. Recursive estimation of structure and motion using relative orientation constraints. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. New York, CA, USA. 1993.294–299.12梁路宏, 艾海舟. 基于人脸检测的人脸跟踪算法. 计算机工程与应用, 2001, 37(17): 42–45. [doi: 10.3321/j.issn:1002-8331.2001.17.016]13梁路宏, 艾海舟, 何克忠, 等. 基于多关联模板匹配的人脸检测. 软件学报, 2001, 12(1): 94–102.14林宇生, 王建国, 杨静宇. 一种对角LDA算法及其在人脸识别上的应用. 中国图象图形学报, 2008, 13(4): 686–690.[doi: 10.11834/jig.20080415]15李春明, 李玉山, 张大朴, 等. 多角度不同表情下的人脸识别. 计算机科学, 2006, 33(2): 223–224, 229. [doi: 10.3969/j.issn.1002-137X.2006.02.063]162021 年第 30 卷第 1 期计算机系统应用。

行为识别相关文献

动作识别和行为理解相关文献
1
文章信息
Richard Souvenir and Justin Baoint Manifold for Action Recognition", CVPR2008
摘要
研究者们越来越关注基于视频的视角不变运动识别问题。这项研究将有利于对运动和医学中收集的非控制视频进行精确的建模与分析。之前的视角不变方法通常需要在训练与测试阶段均利用多个摄像头，或者需要存储单个运动模式多个视角的大量样本。
我们把该方法与常用的图像/视频相似性度量方法相比较，并且将其利用于目标检测，图像检索和行为检测中。
4
文章信息
Bo Wu and Ram Nevatia, "Tracking of Multiple, Partially Occluded Humans based on Static Body Part Detection", CVPR2006
8
文章信息
Siying Liu, Guo Dong, Chye Hwang Yan, Sim Heng Ong, "Video Segmentation: Propagation, Validation and Aggregation of a Preceding Graph", CVPR 2008
摘要
提出了在视频中识别真实感行为的系统框架。主要的挑战是如何从无约束的混杂视频中提取可靠而有意义的特征。我们从视频中同时抽取运动和静态特征。
由于这两种原始特征比较稠密并带有噪波，我们提出了剪除这些特征的策略。用运动语义来得到稳定的运动特征，清除静态特征。
为了进一步创建紧凑的可视化词典，使用判别式的信息理论算法从语义上来分类相关特征。最后合成所有异构且相互辅助的特征用来识别。

孙富春简介pdf

孙富春简介pdf孙富春简历孙富春，清华大学计算机科学与技术系教授，博士生导师，国家863计划专家组成员，国家自然基金委重大研究计划“视听觉信息的认知计算”指导专家组成员，计算机科学与技术系学术委员会副主任, 智能技术与系统国家重点实验室常务副主任; 兼任国际刊物《IEEE Trans. on Fuzzy Systems》，《Mechatronics》和《International Journal of Control, Automation, and Systems》副主编或大区主编，《International Journal of Computational Intelligence Systems》和《Robotics and Autonumous Systems》编委；兼任国内刊物《中国科学F：信息科学》和《自动化学报》编委；兼任中国人工智能学会认知系统与信息处理专业委员会主任，IEEE CSS智能控制技术委员会委员。

98年3月在清华大学计算机应用专业获博士学位。

98年1月至2000年1月在清华大学自动化系从事博士后研究，2000年至今在计算机科学与技术系工作。

工作期间获得的主要奖励有：2000年全国优秀博士论文奖，2001年国家863计划十五年先进个人，2002年清华大学“学术新人奖”，2003年韩国第十八届Choon-Gang 国际学术奖一等奖第一名，2004年教育部新世纪人才奖，2005年清华大学校先进个人，2006年国家杰出青年基金。

获奖成果5项，两项分别获2010年教育部自然科学奖二等奖（排名第一）和2004年度北京市科学技术奖（理论类）二等奖（排名第一）、一项获2002年度教育部提名国家科技进步二等奖（排名第二）、三项获省部级科技进步三等奖。

译书一部，专著两部，在国内外重要刊物发表或录用论文150余篇，其中在IEE、IEEE汇刊、Automatica等国际重要刊物发表论文90余篇，80余篇论文收入SCI，SCI期刊他人引用700余次，200多篇论文收入EI，有两篇论文曾被评为国内二级学会的最佳优秀论文奖。

基于视觉引导的工业机器人定位抓取系统设计

基于视觉引导的工业机器人定位抓取系统设计一、本文概述Overview of this article随着工业自动化技术的不断发展，工业机器人在生产线上的应用越来越广泛。

其中，定位抓取系统是工业机器人的重要组成部分，其准确性和稳定性直接影响到生产效率和产品质量。

本文旨在设计一种基于视觉引导的工业机器人定位抓取系统，以提高工业机器人的智能化水平和抓取精度。

With the continuous development of industrial automation technology, the application of industrial robots on production lines is becoming increasingly widespread. Among them, the positioning and grasping system is an important component of industrial robots, and its accuracy and stability directly affect production efficiency and product quality. This article aims to design a visual guided industrial robot positioning and grasping system to improve the intelligence level and grasping accuracy of industrial robots.本文首先介绍了工业机器人在现代工业生产中的应用及其重要性，并指出了定位抓取系统在设计中的关键性。

接着，阐述了基于视觉引导的定位抓取系统的基本原理和优势，包括通过摄像头捕捉目标物体的图像信息，利用图像处理算法提取目标物体的特征，并通过机器人控制系统实现精准定位与抓取。

基于视觉的旋翼无人机地面目标跟踪(英文)

scale-space extrema detection
I. INTRODUCTION UAV is one of the best platforms to perform dull, dirty or dangerous (3D) tasks [1]. UAV can be used in various applications where human is impossible to intervene. It greatly expands the application space of visual tracking. Research on the technology of vision based ground target tracking for UAV has been a great concern among cybernetic experts and robotic experts, and has become one of the most active research directions in UAV applications. Currently, researchers from America, Britain, France and Sweden are on the cutting edge in this field [2]. Typical visual tracking platforms for UAV include Scan Eagle, GTMax, RQ-11, RQ-16, DragonFly, etc. Because of many advantages, such as small size, light weight, flexible, easy to carry and low cost, rotor UAV has a broad application prospect in the fields of traffic monitoring, resource exploration, electricity patrol, forest fire prevention, aerial photography, atmospheric monitoring, etc [3]. Vision based ground target tracking system for rotor UAV is such a system that gets images by the camera installed on a low-flying rotor UAV, then recognizes the target in the images and estimates the motion state of the target, and finally according to the visual information regulates the pan-tilt-zoom (PTZ) camera automatically to keep the target at the center of the camera view. In view of the current situation of international researches, the study of ground target tracking system for

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

MITSUBISHI ELECTRIC RESEARCH LABORATORIES
Visual Tracking and Recognition using Appearance-Adaptive Models in Particle Filters
Shaohua Zhou Rama Chellappa Baback Moghaddam TR2004-028 June 2004 Abstract We propose an approach that incorporates appearance-based models in a particle ﬁlter to realize robust visual tracking and recognition algorithms. In conventional tracking algorithms, the appearance model is either ﬁxed or rapidly changing, and the motion model is simply a random walk with ﬁxed noise variance. Also, the number of particles is typically ﬁxed. All these factors make the visual tracker unstable. To stabilize the tracker, we propose the following features: an observation model arising from an adaptive appearance model, an adaptive velocity motion model with adaptive noise variance, and an adaptive number of particles. The adaptive-velocity model is derived using a ﬁrst-order linear predictor based on the appearance difference between the incoming observation and the previous particle conﬁguration. Occlusion analysis is implemented using robust statistics. Experimental results on tracking visual objects in long outdoor and indoor video sequences demonstrate the effectiveness and robustness of our tracking algorithm. We then perform simultaneous tracking and recognition by embedding them in one particle ﬁlter. For recognition purposes, we model the appearance changes between frames and gallery images by constructing the intra- and extra-personal spaces. Accurate recognition is achieved when confronted by pose and view variations.
IEEE TRANSACTION ON IMAGE PROCESSING., VOL. X, NO. Y, MONTH 2004
2
Index Terms Visual tracking, visual recognition, particle ﬁltering, appearance-adaptive model, occlusion.
Publication History:– 1. First printing, TR2004-028, June 2004
IEEE TRANSACTION ON IMAGE PROCESSING., VOL. X, NO. Y, MONTH 2004
1
Visual tracking and recognition using appearance-adaptive models in particle ﬁlters
Published in: IEEE Transactions on Image Processing, 2004
This work may not be copied or reproduced in whole or in part for any commercial purpose. Permission to copy in whole or in part without payment of fee is granted for nonproﬁt educational and research purposes provided that all such whole or partial copies include the following: a notice that such copying is by permission of Mitsubishi Electric Research Laboratories, Inc.; an acknowledgment of the authors and individual contributions to the work; and all applicable portions of the copyright notice. Copying, reproduction, or republishing for any other purpose shall require a license with payment of fee to Mitsubishi Electric Research Laboratories, Inc. All rights reserved. Copyright c Mitsubishi Electric Research Laboratories, Inc., 2004 201 Broadway, Cambridge, Massachusetts 02139
Shaohua Kevin Zhou1 , Rama Chellappa1 , and Baback Moghaddam2
1
Center for Automation Research (CfAR) and
Department of Electrical and Computer Engineering University of Maryland, College Park, MD 20740 Email: {shaohua, rama}@ 201 Broadway, Cambridge, MA 02139 Email: {baback}@
2
Mitsubishi Electric Research Laboratories (MERL)
Abstract We present an approach that incorporates appearance-adaptive models in a particle ﬁlter to realize robust visual tracking and recognition algorithms. Tracking needs modeling inter-frame motion and appearance changes whereas recognition needs modeling appearance changes between frames and gallery images. In conventional tracking algorithms, the appearance model is either ﬁxed or rapidly changing, and the motion model is simply a random walk with ﬁxed noise variance. Also, the number of particles is typically ﬁxed. All these factors make the visual tracker unstable. To stabilize the tracker, we propose the following modiﬁcations: an observation model arising from an adaptive appearance model, an adaptive velocity motion model with adaptive noise variance, and an adaptive number of particles. The adaptivevelocity model is derived using a ﬁrst-order linear predictor based on the appearance difference between the incoming observation and the previous particle conﬁguration. Occlusion analysis is implemented using robust statistics. Experimental results on tracking visual objects in long outdoor and indoor video sequences demonstrate the effectiveness and robustness of our tracking algorithm. We then perform simultaneous tracking and recognition by embedding them in a part we model the appearance changes between frames and gallery images by constructing the intra- and extra-personal spaces. Accurate recognition is achieved when confronted by pose and view variations.