Monocular Tracking 3D People with Back Constrained Scaled Gaussian Process Latent Variable
- 格式:pdf
- 大小:399.32 KB
- 文档页数:7
Monocular3D Pose Estimation and Tracking by Detection Mykhaylo Andriluka1,2Stefan Roth1Bernt Schiele1,21Department of Computer Science,TU Darmstadt2MPI Informatics,Saarbr¨u ckenAbstractAutomatic recovery of3D human pose from monocular image sequences is a challenging and important research topic with numerous applications.Although current meth-ods are able to recover3D pose for a single person in con-trolled environments,they are severely challenged by real-world scenarios,such as crowded street scenes.To address this problem,we propose a three-stage process building on a number of recent advances.Thefirst stage obtains an ini-tial estimate of the2D articulation and viewpoint of the per-son from single frames.The second stage allows early data association across frames based on tracking-by-detection. These two stages successfully accumulate the available2D image evidence into robust estimates of2D limb positions over short image sequences(=tracklets).The third and final stage uses those tracklet-based estimates as robust im-age observations to reliably recover3D pose.We demon-strate state-of-the-art performance on the HumanEva II benchmark,and also show the applicability of our approach to articulated3D tracking in realistic street conditions. 1.IntroductionThis work addresses the challenging problem of3D pose estimation and tracking of multiple people in clut-tered scenes using a monocular,potentially moving camera. This is an important problem with many applications in-cluding video indexing,automotive safety,or surveillance. There are multiple challenges that contribute to the diffi-culty of this problem and need to be addressed simulta-neously.Probably the most important challenge in artic-ulated3D tracking is the inherent ambiguity of3D pose from monocular image evidence.This is particularly true for cluttered real-world scenes with multiple people that are often partially or even fully occluded for longer peri-ods of time.Another important challenge,even for2D pose recovery,is the complexity of human articulation and ap-pearance.Additionally,complex and dynamically changing backgrounds of realistic scenes complicate data association across multiple frames.While many of these challenges have been addressed individually,we are not aware of any work that has addressed all of them simultaneously usinga Figure1.Example results:3D tracking in a challenging scene. monocular,potentially moving camera.The goal of this paper is to contribute a sound Bayesian formulation to address this challenging problem.To that end we build on some of the most powerful approaches pro-posed for people detection and tracking in the literature.In three successive stages we accumulate the available2D im-age evidence to enable robust3D pose recovery. Overview of the approach.Ultimately,our goal is to es-timate the3D pose Q m of each person in all frames m of a sequence of length M,given the image evidence E1:M in all frames.To that end,we define a posterior distribution over pose parameters given the evidence:p(Q1:M|E1:M)∝p(E1:M|Q1:M)p(Q1:M).(1) Here,Q1:M denotes the3D pose parameters over the entire sequence.Clearly,a key difficulty is that the posterior in Eq.1has many local optima as the estimation of3D poses is highly ambiguous given monocular images.To address this problem this paper proposes a new three-stage approach sequentially reducing the ambiguity in3D pose recovery.Before giving an overview of the three-stage process,let us define the observation likelihood p(E1:M|Q1:M).We assume conditional independence of the evidence in each frame given the3D pose parameters Q m.The likelihood thus factorizes into single-frame likelihoods:p(E1:M|Q1:M)=Mm=1p(E m|Q m).(2)In this paper,the evidence in each frame is represented by the estimate of the person’s2D viewpoint w.r.t.the camera and the posterior distribution of the2D positions and orien-tations of body parts.To estimate these reliably from single frames,thefirst stage(Sec.2)builds on a recently proposed part-based people detection and pose estimation framework based on discriminative part detectors[3].To accumulate further2D image evidence,the second stage(Sec.3)extracts people tracklets from a small num-ber of consecutive frames using a2D-tracking-by-detection approach.Here,the output of thefirst stage is refined in the sense that we obtain more reliable2D detections of the peo-ple’s body parts as well as more robust viewpoint estimates.The third stage(Sec.4)then uses the image evidence ac-cumulated in the previous two stages to recover3D pose.As described later,we model the temporal prior p(Q1:M)over 3D poses as a hierarchical Gaussian process latent variable model(hGPLVM)[17].We combine this with a hidden Markov model(HMM)that allows to extend the people-tracklets,which cover only a small number of frames at a time,to possibly longer3D people-tracks.Note that our 3D model is assumed to generate the bottom-up evidence from2D body models and thus constitutes a hybrid genera-tive/discriminative approach(c.f.[26]).The contributions of this paper are two-fold.The main contribution is a novel approach to human pose estimation, which combines2D position,pose and viewpoint estimates into an evidence model for3D tracking with a3D motion prior,and is able to accurately estimate3D poses of mul-tiple people from monocular images in realistic street en-vironments.The second contribution,which serves as a building block for3D pose estimation,is a new pedestrian detection approach based on a combination of multiple part-based models.While the power of part-based models for people detection has already been demonstrated(e.g.,[3]), here we show that combining multiple part-based models leads to significant performance gains,and while improving over the state-of-the art in detection,also allows to estimate viewpoints of people in monocular images.Related Work.Due to the difficulties involved in reliable 3D pose estimation,this task has often been considered in controlled laboratory settings[4,6,13,24,29],with so-lutions frequently relying on background subtraction and simple image evidence,such as silhouettes or edge-maps. In order to constrain the search in high-dimensional pose spaces these approaches often use multiple calibrated cam-eras[13],complex dynamical motion priors[29],or de-tailed body models[4].Their combination allows to achieve impressive results[13,15],similar in performance to com-mercial marker-based motion capture systems.However,realistic street scenes do not satisfy many of the assumptions made by these systems.For such scenes multiple synchronized video streams are difficult to obtain,the appearance of people is significantly more complex,and robust extraction of evidence is challenged by frequent full and partial occlusions,clutter,and camera motion.In order to address these challenges,a number of methods leverage recent advances in people detection and either use detection for prefiltering and initialization[11,14],or integrate detec-tion,tracking and pose estimation within a single“tracking-by-detection”framework[2].This paper builds on state-of-the-art people detection and 2D pose estimation and leverages recent work in this area [3,7,10,20],which we combine with a dynamic motion prior[2,28].While[2]has shown to enable2D pose es-timation for people in sideviews,this paper goes beyond by estimating poses in3D from multiple -pared to[14],we are able to estimate poses in monocular images,while their approach uses stereo.[11]proposes to combine detection and3D pose estimation for monocu-lar tracking,but relies on the ability to detect characteristic poses of people,which we do not require here.Estimation of3D poses from2D body part positions was previously proposed in[25].However,this approach was evaluated only in laboratory conditions for a single subject and it remains unclear how well it generalizes to more com-plex settings with multiple people as considered here.There is also a lot of work on predicting3D poses di-rectly from image features using regression[1,16,27],clas-sification[21],or a search over a database of exemplars [19,22].These methods typically require a large database of training examples to achieve good performance and are challenged by the high variability in appearance of people in realistic settings.In contrast,we represent the complex appearance of people using separate appearance models for each body part,which reduces the number of required train-ing images and makes our representation moreflexible. 2.Multiview People Detection in Single Frames2D people detection and pose estimation serves as one of our key building blocks for3D pose estimation and track-ing.Our approach is driven by three major goals:(1)We want to take advantage of the recent developments in2D people detection and pose estimation to define robust ap-pearance models for3D pose estimation and tracking;(2) we aim to reduce the search space of possible3D poses by taking advantage of inferred2D poses;and(3)we want to extract the viewpoint from which people are visible to re-duce the inherent2D-to-3D ambiguity.To that end we build on and extend a recent2D people detection approach[3].2.1.Basic pictorial structures modelPictorial structures[8]represent objects,such as peo-ple,as aflexible configuration of N different parts L m= {l m0,l m1,...,l mN}.m denotes the current frame of(a)(b)(c)(d)(e)(f)(g)(h)Figure2.Training samples shown for each viewpoint:(a)right,(b)r.-back,(c)b.,(d)left-b.,(e)l.,(f)l.-front,(g)f.,(h)r.-f.the sequence.The state of part i is given by l mi={x mi,y mi,θmi,s mi},where x mi and y mi denote its im-age position,θmi the absolute orientation,and s mi the partscale.The posterior probability of the2D part configurationL m given the single frame image evidence D m is given asp(L m|D m)∝p(D m|L m)p(L m).(3)The prior on body configurations p(L m)has a tree structureand represents the kinematic dependencies between bodyparts.It factorizes into a unary term for the root part(here,the torso)and pairwise terms along the kinematic chains:p(L m)=p(l m0)(i,j)∈Kp(l mi|l mj),(4)where K is the set of edges representing kinematic rela-tionships between parts.p(l m0)is assumed to be uniform,and the pairwise terms are taken to be Gaussian in the trans-formed space of the joints between the adjacent parts[3,8].The likelihood term is assumed to factorize into a productof individual part likelihoodsp(D m|L m)=Ni=0p(d mi|l mi).(5)To define the part likelihood,we rely on the boosted partdetectors from[3],which use the truncated output of anAdaBoost classifier[12]and a dense shape context repre-sentation[5,18].Our model is composed of8body parts:left/right lower and upper legs,torso,head and left/rightupper and lower arms(later sideview detectors also useleft/right feet for better performance).Apart from its excellent performance in complex realworld scenes[3,7],the pictorial structures model also hasthe advantage that inference is both optimal and efficientdue to the tree structure of the model.We perform sum-product belief propagation to compute the marginal poste-riors of individual body parts,which can be computed effi-ciently using convolutions[8].Data and evaluation.While the detector of[3]is inprinciple capable of detecting people from arbitrary views,its detection performance has only been evaluated on sideviews.To evaluate its suitability for our multiview set-ting,we collected a dataset of1486images for training,248for validation,and248for testing,which we carefullyselected so that sufficiently many people are visible fromall viewpoints.In addition to the persons’bounding boxes,we also annotated the viewpoint of all people in our datasetby assuming8evenly spaced viewpoints,each45degreesFigure3.Calibrated output of the8viewpoint classifiers.apart from each other(front/back,left/right,and diagonalfront/back left/right).Fig.2shows example images fromour training set,one for each viewpoint.As expected and can be seen in Fig.4(a),the detectortrained on side-views as in[3]shows only modest perfor-mance levels on our multiview dataset.By retraining themodel on our multiview training set we obtain a substantialperformance gain,but still do not achieve the performancelevels of monolithic,discriminative HOG-based detectors[30]or HOG-based detectors with parts[9](see Fig.4(b)).However,since we not only need to detect people,but alsoestimate their2D pose,such monolithic or coarse part-based detectors are not appropriate for our task.2.2.Multiview ExtensionsTo address this shortcoming,we develop an extendedmultiview detector that allows2D pose estimation as well asviewpoint estimation.We train8viewpoint-specific detec-tors using our viewpoint-annotated multiview data.Theseviewpoint-specific detectors not only have the advantagethat their kinematic prior is specific to each viewpoint,butalso that part detectors are tuned to each view.We enrichthis set of detectors with one generic detector trained on allviews,as well as two side-view detectors as in[3]that ad-ditionally contain feet(which improves performance).We explored two strategies for combining the outputof this bank of detectors:(1)We simply add up the log-posterior of a person being at a particular image loca-tion as determined by the different detectors;and(2)wetrain a linear SVM using the11-dimensional vector of themean/variance-normalized detector outputs as features.TheSVM detector was trained on the validation set of248im-ages.Fig.4(a)shows that the simple additive combinationof viewpoint-specific detectors improved over the detectionperformance from each individual viewpoint-specific detec-tor.It also outperforms the approach from[3].Interestingly,the new SVM-based detector not onlysubstantially improves performance,but also outperformsthe current state-of-the-art in multiview people detection[9,30].As is shown in Fig.4(b),the performance improveseven further when we extend our bank of detectors withthe HoG-based detector from[30].While this is not themain focus of our work,this clearly shows the power of thefirst stage of our approach.Several example detections inFig.4(c)demonstrate the benefits of combining viewpoint-specific detectors.(a)(b)(c)Figure parison between (a)viewpoint-specific models and combined model,and (b)comparison to state-of-the-art on the “Multi-viewPeople”dataset;(c)sample detections obtained with the side-view detector of [3](top),the generic detector trained on our multiview dataset (middle),and the proposed detector combining the output of viewpoint-specific detectors with a linear SVM (bottom).Right Right-Back Left-Left Left-Front Right-Avg.%Back Back Front Front Max 53.735.545.722.637.98.640.08.331.1SVM 72.612.748.612.355.744.570.416.242.2SVM-adj 71.422.329.518.084.718.150.729.235.4Table 1.Viewpoint estimation on the “MultiviewPeople”dataset.The task is to classify one of 8viewpoints (chance level 12.5%).Viewpoint estimation.Next we aim to estimate the per-son’s viewpoints,since such viewpoint estimates allow to significantly reduce the ambiguity in 3D pose.To that end,we rely on the bank of viewpoint-specific detectors from above,and train 8viewpoint classifiers,linear SVMs,on the detector outputs of the validation set.We consider two training and evaluation strategies:(SVM )Only training ex-amples from one of the viewpoints are used as positive ex-amples,the remainder as negative ones;and (SVM-adj ),where we group viewpoints into triplets of adjacent ones and train separate classifiers for each such triplet.As a base-line approach (Max ),we estimate the viewpoint by taking the maximum over the outputs of the 8viewpoint-specific detectors.Results are shown in Tab.1.SVM improves over the baseline in case when we require exact recognition of viewpoint by approx.11%,but SVM-adj also performs well.In addition,when we also consider the two adjacent viewpoints as being correct,SVM obtains an average per-formance of 70.0%and SVM-adj of 76.2%.This shows that SVM-adj more gracefully degrades across viewpoints,which is why we adopt it in the remainder.Since the scores of the viewpoint classifiers are not di-rectly comparable with each other,we calibrate them by computing the posterior of the correct label given the clas-sifier score,which maps the scores to the unit interval.The posterior is computed via Bayes’rule from the distributions of classifier scores on the positive and negative examples.We assume these distributions to be Gaussian,and estimate their parameters from classifier scores on the validation set.Fig.3shows calibrated outputs of all 8classifiers com-puted for a sequence of 40frames in which the person first appears from the “right”and then from the “right-back”viewpoint.The correct viewpoint is the most probable for most of the sequence,and failures in estimation often cor-respond to adjacent viewpoints.3.2D Tracking and Viewpoint EstimationAs discussed in the introduction,our goal is to accumu-late all available 2D image evidence prior to the third 3D tracking stage in order to reduce the ambiguity of 2D-to-3D lifting as much as possible.While the person detector described in the previous section is capable of estimating 2D positions of body parts and viewpoints of people from single frames,the second stage (described here)aims to im-prove these estimates by 2D-tracking-by-detection [2,31].To exploit temporal coherency already in 2D,we extract short tracklets of people.This,on the one hand,improves the robustness of estimates for 2D positions,scale and view-point of each person,since they are jointly estimated over an entire tracklet.Improved body localization in turn aids 2D pose estimation.On the other hand,it also allows to perform early data association .This is important for sequences with multiple people,where we can associate “anonymous”sin-gle frame hypotheses with the track of a specific person.Tracklet extraction.From the first stage of our approach we obtain a set of N m potentially overlapping bound-ing box hypotheses H m =[h m 1,...,h mN m ]for each frame m of the sequence,where each hypothesis h mi ={h x mi ,h y mi ,h smi }corresponds to a bounding box at partic-ular image position and scale.In order to obtain a set of tracklets,we follow the HMM-based tracking procedure in-troduced in [2]1.To that end we treat the person hypothe-ses in each frame as states and find state subsequences that1Adetailed description is given in Sec.3.3of [2].Figure 5.People detection based on single frames (top)and tracklets found by our 2D tracking algorithm;Different tracklets are identified by color and the estimated viewpoints are indicated with two letters (bottom).Note that several false positives in the top row are filtered out and additional –often partially occluded –detections are filled in (e.g.,on the left side of the leftmost image).are consistent in position,scale and appearance by itera-tively applying Viterbi decoding.The emission probabili-ties for each state are derived from the detection score.The transition probabilities between states h mi and h m −1,j are modeled using first-order Gaussian dynamics and appear-ance compatibility:p trans (h mi ,h m −1,j )=N (h mi |h m −1,j ,Σpos )·N (d app (h mi ,h m −1,j )|0,σ2app ).where Σpos =diag (σ2x,σ2y ,σ2s ),and d app (h mi ,h m −1,j )is the Euclidean distance between RGB color histograms computed for the bounding rectangle of each hypothesis.We set σx =σy =5,σs =0.1and σapp =0.05.Viewpoint tracking.Finally,for each of the tracklets we estimate the viewpoint sequence ω1:N =(ω1,...,ωN ),again using a simple HMM and Viterbi decoding.We con-sider the 8discrete viewpoints as states,the viewpoint clas-sifiers described in Sec.2as unary evidence,and Gaus-sian transition probabilities that enforce similar subsequent viewpoints to reflect that people tend to turn slowly.Evaluation.Fig.5shows an example of a short subse-quence in which we compare the detection results of the single-frame 2D detector with the extracted tracklets.Note how tracking helps to remove the spurious false positive de-tections in the background,and corrects failures in scale es-timation,which would otherwise hinder correct 2D-to-3D lifting.In Fig.3we visualize the single frame prediction scores for each viewpoint for the tracklet corresponding to the person with index 22on Fig.5.Note that while view-point estimation from single frames is reasonably robust,it can still fail at times (the correct viewpoint is “right”for frames 4to 30and “right-back”for frames 31to 40).The tracklet-based viewpoint estimation,in contrast,yields the correct viewpoint for the entire 40frame sequence.Finally,as we demonstrate in Fig.5,the tracklets also provide data association even in case of realistic sequences with frequent full and partial occlusions.4.3D Pose EstimationTo estimate and track poses in 3D,we take the 2D track-lets extracted in the previous stage and lift the 2D pose es-timated for each frame into 3D (c.f.[25]),which is done with the help of a set of 3D exemplars [19,22].Projections of the exemplars are first evaluated under the 2D body part posteriors,and the exemplar with the most probable pro-jection is chosen as an initial 3D pose.This initial pose is propagated to all frames of the tracklet using the known temporal ordering on the exemplar set.Note that this yields multiple initializations for 3D pose sequences,one for each frame of the tracklet.This 2D-to-3D lifting procedure is robust,because it is based on reliable 2D pose posteriors,and detections and viewpoint estimates from the 2D track-lets.Starting from these initial pose sequences,the actual pose estimation and tracking is done in a Bayesian frame-work by maximizing the posterior defined in Eq.(1),for which they serve as powerful initializations.The 3D pose is parametrized as Q m ={q m ,φm ,h m },where q m denotes the parameters of body joints,φm the rotation of the bodyin world coordinates,and h m ={h x m ,h y m ,h scalem}the po-sition and scale of the person projected to the image.The 3D pose is represented using a kinematic tree with P =10flexible joints,in which each joint has 2degrees of freedom.A configuration example is shown in Fig.6(a).The evidence at frame m is given by the E m ={D m ,ωm },and consists of the single frame image evidence D m and the 2D viewpoint estimate ωm obtained from the entire tracklet.Assuming conditional independence of 2D viewpoint and image evidence given the 3D pose,the single frame likelihood in Eq.2factorizes asp (E m |Q m )=p (ωm |Q m )p (D m |Q m ).(6)Based on the estimated 2D viewpoint ωm ,we model the viewpoint likelihood of the 3D viewpoint φm as Gaussian centered at the rotation component of φm along the y-axis:p (ωm |Q m )=N (ωm |proj y (φm ),σ2ω).(7)We define the likelihood of the3D pose p(D m|Q m)with the help of the part posteriors given by the2D body modelp(D m|Q m)=Nn=1p(proj n(Q m)|D m),(8)where proj n(Q m)denotes the projection of the n-th3Dbody part into the image.While such a3D likelihood is typ-ically defined as the product of individual part likelihoodssimilarly to Eq.5,this leads to highly multimodal posteri-ors and difficult inference.By relying on2D part posteriorsinstead,the3D model is focused on hypotheses for whichthere is sufficient2D image evidence from previous stages.To avoid expensive3D likelihood computations,we rep-resent each2D part posterior using a non-parametric repre-sentation.In particular,for each body part n in frame m wefind the J locations with the highest posterior probability E mn={(l j mn,w j mn),j=1,...,J},where l j mn∈R4cor-responds to the2D location(image position and orientation)and w j mn to the posterior density at this location.Given that,we approximate the2D part posterior as a kernel density es-timate with Gaussian kernelκ:p(proj n(Q m)|D m)≈jw j mnκ(l j mn,proj n(Q m)).(9)Dynamical model.In our approach we represent the tem-poral prior in Eq.1as the product of two terms:p(Q1:M)=p(q1:M)p(h1:M),(10) which correspond to priors on the parameters of3D pose as well as image position and scale.The prior on the person’s position and scale p(h1:M)is taken to be a broad Gaussian and models smooth changes of both the scale of the person and its position in the image.We model the prior over the parameters of the3D pose q1:M with a hierarchical Gaussian process latent variable model(hGPLVM)[17].We denote the M dimensional vec-tor of the values of i-th pose parameter across all frames as q1:M,i.In the hGPLVM each dimension of the original high-dimensional pose is modelled as an independent Gaus-sian process defined over a shared low dimensional latent space Z1:M:p(q1:M|Z1:M)=Pi=1N(q1:M,i|0,K z),(11)where P is the number of parameters in our pose repre-sentation,K z is a covariance matrix of the elements of theshared latent space Z1:M defined by the output of the co-variance function k(z i,z j),which in our case is taken to be squared exponential.The values of the shared latent space Z1:M are them-selves treated as the outputs of Gaussian processes with aone dimensional input,the time T1:M.Our implementationuses a d l=2dimensional shared latent space.Such a hi-erarchy of Gaussian processes allows to effectively modelboth correlations between different dimensions of the orig-inal input space and their dynamics.200200q k(a)(b)Figure6.(a):Representation of the3D pose in our model (parametrized joints are marked with arrows).(b):Initial pose sequence after2D-to-3D lifting(top)and pose sequence after op-timization of the3D pose posterior(bottom).MAP estimation.The hGPLVM prior requires two sets of auxiliary variables Z1:M and T1:M,which need to be dealt with during maximum a-posteriori estimation.Our strategy is to optimize Z only and keep the values of T fixed.This is possible since the values of T roughly cor-respond to the person’s state within a walking cycle,which can be reliably estimated using the2D tracklets.The full posterior over3D pose parameters being maximized is given by:p(Q1:M,Z1:M|E1:M,T1:M)∝p(E1:M|Q1:M)·p(q1:M|Z1:M)p(Z1:M|T1:M)p(h1:M).(12) We optimize the posterior using scaled conjugate gradients and initializing the optimization using the lifted3D poses.4.1.3D pose estimation for longer sequencesThe MAP estimation approach just described is only tractable if the number of frames M is sufficiently small.In order to estimate3D poses in longer sequences we apply a strategy similar to the one used in[2]:First we estimate3D poses in short(M=10)overlapping subsequences of the longer sequence.Since for each subsequence we initialize and locally optimize the posterior multiple times,this leaves us with a large pool of3D pose hypotheses for each of the frames,from which wefind the optimal sequence using a hidden Markov model and Viterbi decoding.To that end we treat the3D pose hypotheses in each frame as discrete states with emission probabilities given by Eq.6and define transition probabilities between states using the hGPLVM.5.ExperimentsWe evaluate our model in two diverse scenarios.First, we show that our approach improves the state-of-the-art in monocular human pose estimation on the standard “HumanEva II”benchmark,for which ground truth poses are available.Additionally,we evaluate our approach on two cluttered and complex street sequences with multiple people including partial and full occlusions.。
Multicamera People Tracking witha Probabilistic Occupancy MapFranc¸ois Fleuret,Je´roˆme Berclaz,Richard Lengagne,and Pascal Fua,Senior Member,IEEE Abstract—Given two to four synchronized video streams taken at eye level and from different angles,we show that we can effectively combine a generative model with dynamic programming to accurately follow up to six individuals across thousands of frames in spite of significant occlusions and lighting changes.In addition,we also derive metrically accurate trajectories for each of them.Our contribution is twofold.First,we demonstrate that our generative model can effectively handle occlusions in each time frame independently,even when the only data available comes from the output of a simple background subtraction algorithm and when the number of individuals is unknown a priori.Second,we show that multiperson tracking can be reliably achieved by processing individual trajectories separately over long sequences,provided that a reasonable heuristic is used to rank these individuals and that we avoid confusing them with one another.Index Terms—Multipeople tracking,multicamera,visual surveillance,probabilistic occupancy map,dynamic programming,Hidden Markov Model.Ç1I NTRODUCTIONI N this paper,we address the problem of keeping track of people who occlude each other using a small number of synchronized videos such as those depicted in Fig.1,which were taken at head level and from very different angles. This is important because this kind of setup is very common for applications such as video surveillance in public places.To this end,we have developed a mathematical framework that allows us to combine a robust approach to estimating the probabilities of occupancy of the ground plane at individual time steps with dynamic programming to track people over time.This results in a fully automated system that can track up to six people in a room for several minutes by using only four cameras,without producing any false positives or false negatives in spite of severe occlusions and lighting variations. As shown in Fig.2,our system also provides location estimates that are accurate to within a few tens of centimeters, and there is no measurable performance decrease if as many as20percent of the images are lost and only a small one if 30percent are.This involves two algorithmic steps:1.We estimate the probabilities of occupancy of theground plane,given the binary images obtained fromthe input images via background subtraction[7].Atthis stage,the algorithm only takes into accountimages acquired at the same time.Its basic ingredientis a generative model that represents humans assimple rectangles that it uses to create synthetic idealimages that we would observe if people were at givenlocations.Under this model of the images,given thetrue occupancy,we approximate the probabilities ofoccupancy at every location as the marginals of aproduct law minimizing the Kullback-Leibler diver-gence from the“true”conditional posterior distribu-tion.This allows us to evaluate the probabilities ofoccupancy at every location as the fixed point of alarge system of equations.2.We then combine these probabilities with a color and amotion model and use the Viterbi algorithm toaccurately follow individuals across thousands offrames[3].To avoid the combinatorial explosion thatwould result from explicitly dealing with the jointposterior distribution of the locations of individuals ineach frame over a fine discretization,we use a greedyapproach:we process trajectories individually oversequences that are long enough so that using areasonable heuristic to choose the order in which theyare processed is sufficient to avoid confusing peoplewith each other.In contrast to most state-of-the-art algorithms that recursively update estimates from frame to frame and may therefore fail catastrophically if difficult conditions persist over several consecutive frames,our algorithm can handle such situations since it computes the global optima of scores summed over many frames.This is what gives it the robustness that Fig.2demonstrates.In short,we combine a mathematically well-founded generative model that works in each frame individually with a simple approach to global optimization.This yields excellent performance by using basic color and motion models that could be further improved.Our contribution is therefore twofold.First,we demonstrate that a generative model can effectively handle occlusions at each time frame independently,even when the input data is of very poor quality,and is therefore easy to obtain.Second,we show that multiperson tracking can be reliably achieved by processing individual trajectories separately over long sequences.. F.Fleuret,J.Berclaz,and P.Fua are with the Ecole Polytechnique Fe´de´ralede Lausanne,Station14,CH-1015Lausanne,Switzerland.E-mail:{francois.fleuret,jerome.berclaz,pascal.fua}@epfl.ch..R.Lengagne is with GE Security-VisioWave,Route de la Pierre22,1024Ecublens,Switzerland.E-mail:richard.lengagne@.Manuscript received14July2006;revised19Jan.2007;accepted28Mar.2007;published online15May2007.Recommended for acceptance by S.Sclaroff.For information on obtaining reprints of this article,please send e-mail to:tpami@,and reference IEEECS Log Number TPAMI-0521-0706.Digital Object Identifier no.10.1109/TPAMI.2007.1174.0162-8828/08/$25.00ß2008IEEE Published by the IEEE Computer SocietyIn the remainder of the paper,we first briefly review related works.We then formulate our problem as estimat-ing the most probable state of a hidden Markov process and propose a model of the visible signal based on an estimate of an occupancy map in every time frame.Finally,we present our results on several long sequences.2R ELATED W ORKState-of-the-art methods can be divided into monocular and multiview approaches that we briefly review in this section.2.1Monocular ApproachesMonocular approaches rely on the input of a single camera to perform tracking.These methods provide a simple and easy-to-deploy setup but must compensate for the lack of 3D information in a single camera view.2.1.1Blob-Based MethodsMany algorithms rely on binary blobs extracted from single video[10],[5],[11].They combine shape analysis and tracking to locate people and maintain appearance models in order to track them,even in the presence of occlusions.The Bayesian Multiple-BLob tracker(BraMBLe)system[12],for example,is a multiblob tracker that generates a blob-likelihood based on a known background model and appearance models of the tracked people.It then uses a particle filter to implement the tracking for an unknown number of people.Approaches that track in a single view prior to computing correspondences across views extend this approach to multi camera setups.However,we view them as falling into the same category because they do not simultaneously exploit the information from multiple views.In[15],the limits of the field of view of each camera are computed in every other camera from motion information.When a person becomes visible in one camera,the system automatically searches for him in other views where he should be visible.In[4],a background/foreground segmentation is performed on calibrated images,followed by human shape extraction from foreground objects and feature point selection extraction. Feature points are tracked in a single view,and the system switches to another view when the current camera no longer has a good view of the person.2.1.2Color-Based MethodsTracking performance can be significantly increased by taking color into account.As shown in[6],the mean-shift pursuit technique based on a dissimilarity measure of color distributions can accurately track deformable objects in real time and in a monocular context.In[16],the images are segmented pixelwise into different classes,thus modeling people by continuously updated Gaussian mixtures.A standard tracking process is then performed using a Bayesian framework,which helps keep track of people,even when there are occlusions.In such a case,models of persons in front keep being updated, whereas the system stops updating occluded ones,which may cause trouble if their appearances have changed noticeably when they re-emerge.More recently,multiple humans have been simulta-neously detected and tracked in crowded scenes[20]by using Monte-Carlo-based methods to estimate their number and positions.In[23],multiple people are also detected and tracked in front of complex backgrounds by using mixture particle filters guided by people models learned by boosting.In[9],multicue3D object tracking is addressed by combining particle-filter-based Bayesian tracking and detection using learned spatiotemporal shapes.This ap-proach leads to impressive results but requires shape, texture,and image depth information as input.Finally, Smith et al.[25]propose a particle-filtering scheme that relies on Markov chain Monte Carlo(MCMC)optimization to handle entrances and departures.It also introduces a finer modeling of interactions between individuals as a product of pairwise potentials.2.2Multiview ApproachesDespite the effectiveness of such methods,the use of multiple cameras soon becomes necessary when one wishes to accurately detect and track multiple people and compute their precise3D locations in a complex environment. Occlusion handling is facilitated by using two sets of stereo color cameras[14].However,in most approaches that only take a set of2D views as input,occlusion is mainly handled by imposing temporal consistency in terms of a motion model,be it Kalman filtering or more general Markov models.As a result,these approaches may not always be able to recover if the process starts diverging.2.2.1Blob-Based MethodsIn[19],Kalman filtering is applied on3D points obtained by fusing in a least squares sense the image-to-world projections of points belonging to binary blobs.Similarly,in[1],a Kalman filter is used to simultaneously track in2D and3D,and objectFig.1.Images from two indoor and two outdoor multicamera video sequences that we use for our experiments.At each time step,we draw a box around people that we detect and assign to them an ID number that follows them throughout thesequence.Fig.2.Cumulative distributions of the position estimate error on a3,800-frame sequence(see Section6.4.1for details).locations are estimated through trajectory prediction during occlusion.In[8],a best hypothesis and a multiple-hypotheses approaches are compared to find people tracks from 3D locations obtained from foreground binary blobs ex-tracted from multiple calibrated views.In[21],a recursive Bayesian estimation approach is used to deal with occlusions while tracking multiple people in multiview.The algorithm tracks objects located in the intersections of2D visual angles,which are extracted from silhouettes obtained from different fixed views.When occlusion ambiguities occur,multiple occlusion hypotheses are generated,given predicted object states and previous hypotheses,and tested using a branch-and-merge strategy. The proposed framework is implemented using a customized particle filter to represent the distribution of object states.Recently,Morariu and Camps[17]proposed a method based on dimensionality reduction to learn a correspondence between the appearance of pedestrians across several views. This approach is able to cope with the severe occlusion in one view by exploiting the appearance of the same pedestrian on another view and the consistence across views.2.2.2Color-Based MethodsMittal and Davis[18]propose a system that segments,detects, and tracks multiple people in a scene by using a wide-baseline setup of up to16synchronized cameras.Intensity informa-tion is directly used to perform single-view pixel classifica-tion and match similarly labeled regions across views to derive3D people locations.Occlusion analysis is performed in two ways:First,during pixel classification,the computa-tion of prior probabilities takes occlusion into account. Second,evidence is gathered across cameras to compute a presence likelihood map on the ground plane that accounts for the visibility of each ground plane point in each view. Ground plane locations are then tracked over time by using a Kalman filter.In[13],individuals are tracked both in image planes and top view.The2D and3D positions of each individual are computed so as to maximize a joint probability defined as the product of a color-based appearance model and2D and 3D motion models derived from a Kalman filter.2.2.3Occupancy Map MethodsRecent techniques explicitly use a discretized occupancy map into which the objects detected in the camera images are back-projected.In[2],the authors rely on a standard detection of stereo disparities,which increase counters associated to square areas on the ground.A mixture of Gaussians is fitted to the resulting score map to estimate the likely location of individuals.This estimate is combined with a Kallman filter to model the motion.In[26],the occupancy map is computed with a standard visual hull procedure.One originality of the approach is to keep for each resulting connex component an upper and lower bound on the number of objects that it can contain. Based on motion consistency,the bounds on the various components are estimated at a certain time frame based on the bounds of the components at the previous time frame that spatially intersect with it.Although our own method shares many features with these techniques,it differs in two important respects that we will highlight:First,we combine the usual color and motion models with a sophisticated approach based on a generative model to estimating the probabilities of occu-pancy,which explicitly handles complex occlusion interac-tions between detected individuals,as will be discussed in Section5.Second,we rely on dynamic programming to ensure greater stability in challenging situations by simul-taneously handling multiple frames.3P ROBLEM F ORMULATIONOur goal is to track an a priori unknown number of people from a few synchronized video streams taken at head level. In this section,we formulate this problem as one of finding the most probable state of a hidden Markov process,given the set of images acquired at each time step,which we will refer to as a temporal frame.We then briefly outline the computation of the relevant probabilities by using the notations summarized in Tables1and2,which we also use in the following two sections to discuss in more details the actual computation of those probabilities.3.1Computing the Optimal TrajectoriesWe process the video sequences by batches of T¼100frames, each of which includes C images,and we compute the most likely trajectory for each individual.To achieve consistency over successive batches,we only keep the result on the first 10frames and slide our temporal window.This is illustrated in Fig.3.We discretize the visible part of the ground plane into a finite number G of regularly spaced2D locations and we introduce a virtual hidden location H that will be used to model entrances and departures from and into the visible area.For a given batch,let L t¼ðL1t;...;L NÃtÞbe the hidden stochastic processes standing for the locations of individuals, whether visible or not.The number NÃstands for the maximum allowable number of individuals in our world.It is large enough so that conditioning on the number of visible ones does not change the probability of a new individual entering the scene.The L n t variables therefore take values in f1;...;G;Hg.Given I t¼ðI1t;...;I C tÞ,the images acquired at time t for 1t T,our task is to find the values of L1;...;L T that maximizePðL1;...;L T j I1;...;I TÞ:ð1ÞAs will be discussed in Section 4.1,we compute this maximum a posteriori in a greedy way,processing one individual at a time,including the hidden ones who can move into the visible scene or not.For each one,the algorithm performs the computation,under the constraint that no individual can be at a visible location occupied by an individual already processed.In theory,this approach could lead to undesirable local minima,for example,by connecting the trajectories of two separate people.However,this does not happen often because our batches are sufficiently long.To further reduce the chances of this,we process individual trajectories in an order that depends on a reliability score so that the most reliable ones are computed first,thereby reducing the potential for confusion when processing the remaining ones. This order also ensures that if an individual remains in the hidden location,then all the other people present in the hidden location will also stay there and,therefore,do not need to be processed.FLEURET ET AL.:MULTICAMERA PEOPLE TRACKING WITH A PROBABILISTIC OCCUPANCY MAP269Our experimental results show that our method does not suffer from the usual weaknesses of greedy algorithms such as a tendency to get caught in bad local minima.We thereforebelieve that it compares very favorably to stochastic optimization techniques in general and more specifically particle filtering,which usually requires careful tuning of metaparameters.3.2Stochastic ModelingWe will show in Section 4.2that since we process individual trajectories,the whole approach only requires us to define avalid motion model P ðL n t þ1j L nt ¼k Þand a sound appearance model P ðI t j L n t ¼k Þ.The motion model P ðL n t þ1j L nt ¼k Þ,which will be intro-duced in Section 4.3,is a distribution into a disc of limited radiusandcenter k ,whichcorresponds toalooseboundonthe maximum speed of a walking human.Entrance into the scene and departure from it are naturally modeled,thanks to the270IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,VOL.30,NO.2,FEBRUARY 2008TABLE 2Notations (RandomQuantities)Fig.3.Video sequences are processed by batch of 100frames.Only the first 10percent of the optimization result is kept and the rest is discarded.The temporal window is then slid forward and the optimiza-tion is repeated on the new window.TABLE 1Notations (DeterministicQuantities)hiddenlocation H,forwhichweextendthemotionmodel.The probabilities to enter and to leave are similar to the transition probabilities between different ground plane locations.In Section4.4,we will show that the appearance model PðI t j L n t¼kÞcan be decomposed into two terms.The first, described in Section4.5,is a very generic color-histogram-based model for each individual.The second,described in Section5,approximates the marginal conditional probabil-ities of occupancy of the ground plane,given the results of a background subtractionalgorithm,in allviewsacquired atthe same time.This approximation is obtained by minimizing the Kullback-Leibler divergence between a product law and the true posterior.We show that this is equivalent to computing the marginal probabilities of occupancy so that under the product law,the images obtained by putting rectangles of human sizes at occupied locations are likely to be similar to the images actually produced by the background subtraction.This represents a departure from more classical ap-proaches to estimating probabilities of occupancy that rely on computing a visual hull[26].Such approaches tend to be pessimistic and do not exploit trade-offs between the presence of people at different locations.For instance,if due to noise in one camera,a person is not seen in a particular view,then he would be discarded,even if he were seen in all others.By contrast,in our probabilistic framework,sufficient evidence might be present to detect him.Similarly,the presence of someone at a specific location creates an occlusion that hides the presence behind,which is not accounted for by the hull techniques but is by our approach.Since these marginal probabilities are computed indepen-dently at each time step,they say nothing about identity or correspondence with past frames.The appearance similarity is entirely conveyed by the color histograms,which has experimentally proved sufficient for our purposes.4C OMPUTATION OF THE T RAJECTORIESIn Section4.1,we break the global optimization of several people’s trajectories into the estimation of optimal individual trajectories.In Section 4.2,we show how this can be performed using the classical Viterbi’s algorithm based on dynamic programming.This requires a motion model given in Section 4.3and an appearance model described in Section4.4,which combines a color model given in Section4.5 and a sophisticated estimation of the ground plane occu-pancy detailed in Section5.We partition the visible area into a regular grid of G locations,as shown in Figs.5c and6,and from the camera calibration,we define for each camera c a family of rectangular shapes A c1;...;A c G,which correspond to crude human silhouettes of height175cm and width50cm located at every position on the grid.4.1Multiple TrajectoriesRecall that we denote by L n¼ðL n1;...;L n TÞthe trajectory of individual n.Given a batch of T temporal frames I¼ðI1;...;I TÞ,we want to maximize the posterior conditional probability:PðL1¼l1;...;L Nül NÃj IÞ¼PðL1¼l1j IÞY NÃn¼2P L n¼l n j I;L1¼l1;...;L nÀ1¼l nÀ1ÀÁ:ð2ÞSimultaneous optimization of all the L i s would beintractable.Instead,we optimize one trajectory after theother,which amounts to looking for^l1¼arg maxlPðL1¼l j IÞ;ð3Þ^l2¼arg maxlPðL2¼l j I;L1¼^l1Þ;ð4Þ...^l Nüarg maxlPðL Nül j I;L1¼^l1;L2¼^l2;...Þ:ð5ÞNote that under our model,conditioning one trajectory,given other ones,simply means that it will go through noalready occupied location.In other words,PðL n¼l j I;L1¼^l1;...;L nÀ1¼^l nÀ1Þ¼PðL n¼l j I;8k<n;8t;L n t¼^l k tÞ;ð6Þwhich is PðL n¼l j IÞwith a reduced set of the admissiblegrid locations.Such a procedure is recursively correct:If all trajectoriesestimated up to step n are correct,then the conditioning onlyimproves the estimate of the optimal remaining trajectories.This would suffice if the image data were informative enoughso that locations could be unambiguously associated toindividuals.In practice,this is obviously rarely the case.Therefore,this greedy approach to optimization has un-desired side effects.For example,due to partly missinglocalization information for a given trajectory,the algorithmmight mistakenly start following another person’s trajectory.This is especially likely to happen if the tracked individualsare located close to each other.To avoid this kind of failure,we process the images bybatches of T¼100and first extend the trajectories that havebeen found with high confidence,as defined below,in theprevious batches.We then process the lower confidenceones.As a result,a trajectory that was problematic in thepast and is likely to be problematic in the current batch willbe optimized last and,thus,prevented from“stealing”somebody else’s location.Furthermore,this approachincreases the spatial constraints on such a trajectory whenwe finally get around to estimating it.We use as a confidence score the concordance of theestimated trajectories in the previous batches and thelocalization cue provided by the estimation of the probabil-istic occupancy map(POM)described in Section5.Moreprecisely,the score is the number of time frames where theestimated trajectory passes through a local maximum of theestimated probability of occupancy.When the POM does notdetect a person on a few frames,the score will naturallydecrease,indicating a deterioration of the localizationinformation.Since there is a high degree of overlappingbetween successive batches,the challenging segment of atrajectory,which is due to the failure of the backgroundsubtraction or change in illumination,for instance,is met inseveral batches before it actually happens during the10keptframes.Thus,the heuristic would have ranked the corre-sponding individual in the last ones to be processed whensuch problem occurs.FLEURET ET AL.:MULTICAMERA PEOPLE TRACKING WITH A PROBABILISTIC OCCUPANCY MAP2714.2Single TrajectoryLet us now consider only the trajectory L n ¼ðL n 1;...;L nT Þof individual n over T temporal frames.We are looking for thevalues ðl n 1;...;l nT Þin the subset of free locations of f 1;...;G;Hg .The initial location l n 1is either a known visible location if the individual is visible in the first frame of the batch or H if he is not.We therefore seek to maximizeP ðL n 1¼l n 1;...;L n T ¼l nt j I 1;...;I T Þ¼P ðI 1;L n 1¼l n 1;...;I T ;L n T ¼l nT ÞP ðI 1;...;I T Þ:ð7ÞSince the denominator is constant with respect to l n ,we simply maximize the numerator,that is,the probability of both the trajectories and the images.Let us introduce the maximum of the probability of both the observations and the trajectory ending up at location k at time t :Èt ðk Þ¼max l n 1;...;l nt À1P ðI 1;L n 1¼l n 1;...;I t ;L nt ¼k Þ:ð8ÞWe model jointly the processes L n t and I t with a hidden Markov model,that isP ðL n t þ1j L n t ;L n t À1;...Þ¼P ðL n t þ1j L nt Þð9ÞandP ðI t ;I t À1;...j L n t ;L nt À1;...Þ¼YtP ðI t j L n t Þ:ð10ÞUnder such a model,we have the classical recursive expressionÈt ðk Þ¼P ðI t j L n t ¼k Þ|fflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflffl}Appearance modelmax P ðL n t ¼k j L nt À1¼ Þ|fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}Motion modelÈt À1ð Þð11Þto perform a global search with dynamic programming,which yields the classic Viterbi algorithm.This is straight-forward,since the L n t s are in a finite set of cardinality G þ1.4.3Motion ModelWe chose a very simple and unconstrained motion model:P ðL n t ¼k j L nt À1¼ Þ¼1=Z Áe À k k À k if k k À k c 0otherwise ;&ð12Þwhere the constant tunes the average human walkingspeed,and c limits the maximum allowable speed.This probability is isotropic,decreases with the distance from location k ,and is zero for k k À k greater than a constantmaximum distance.We use a very loose maximum distance cof one square of the grid per frame,which corresponds to a speed of almost 12mph.We also define explicitly the probabilities of transitions to the parts of the scene that are connected to the hidden location H .This is a single door in the indoor sequences and all the contours of the visible area in the outdoor sequences in Fig.1.Thus,entrance and departure of individuals are taken care of naturally by the estimation of the maximum a posteriori trajectories.If there are enough evidence from the images that somebody enters or leaves the room,then this procedure will estimate that the optimal trajectory does so,and a person will be added to or removed from the visible area.4.4Appearance ModelFrom the input images I t ,we use background subtraction to produce binary masks B t such as those in Fig.4.We denote as T t the colors of the pixels inside the blobs and treat the rest of the images as background,which is ignored.Let X tk be a Boolean random variable standing for the presence of an individual at location k of the grid at time t .In Appendix B,we show thatP ðI t j L n t ¼k Þzfflfflfflfflfflfflfflfflffl}|fflfflfflfflfflfflfflfflffl{Appearance model/P ðL n t ¼k j X kt ¼1;T t Þ|fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}Color modelP ðX kt ¼1j B t Þ|fflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflffl}Ground plane occupancy:ð13ÞThe ground plane occupancy term will be discussed in Section 5,and the color model term is computed as follows.4.5Color ModelWe assume that if someone is present at a certain location k ,then his presence influences the color of the pixels located at the intersection of the moving blobs and the rectangle A c k corresponding to the location k .We model that dependency as if the pixels were independent and identically distributed and followed a density in the red,green,and blue (RGB)space associated to the individual.This is far simpler than the color models used in either [18]or [13],which split the body area in several subparts with dedicated color distributions,but has proved sufficient in practice.If an individual n was present in the frames preceding the current batch,then we have an estimation for any camera c of his color distribution c n ,since we have previously collected the pixels in all frames at the locations272IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,VOL.30,NO.2,FEBRUARY2008Fig.4.The color model relies on a stochastic modeling of the color of the pixels T c t ðk Þsampled in the intersection of the binary image B c t produced bythe background subtraction and the rectangle A ck corresponding to the location k .。
视觉惯性导航融合算法研究进展摘要:视觉惯性导航一直是无人驾驶与机器人领域重点、难点环节。
首先介绍经典算法框架,从基于最优平滑算法和预积分理论两个层面介绍经典视觉惯性融合算法。
然后介绍新型算法框架,主要在深度学习基础上,从对整体框架学习的程度,将新型视觉惯性融合算法分为两类,部分学习和整体学习,最后介绍了主流算法中关键技术与未来展望。
关键词:视觉惯性导航;优化;耦合算法;深度学习0 引言近些年来,伴随着图形处理技术、计算机视觉和元宇宙等相关技术的进步,实现无人设备高精度、强鲁棒性的定位导航始终为无人系统中的关键环节和待突破环节。
美国国防研究与工程署连续数年组织机器人生存挑战赛,以求在复杂环境下探索提升导航性能,确保美军在无人领域的领先地位。
中国科协也于2020年将无人车的高精度智能导航问题列为十大工程技术难题之一[1]。
无人系统导航领域一直以即时定位与地图构建(Simultaneous Localization And Mapping,SLAM)技术为核心。
纯视觉导航由于缺乏深度信息导致对运动敏感薄弱,需要依赖回环检测来校正误差,这导致其鲁棒性不足。
视觉导航与卫星导航系统的组合导航在复杂的城市场景下,光影变幻导致有效特征点选取难度变大,诸如地下车库等场景对卫星信号干扰强烈,而本文介绍的视觉惯性导航有着显著的优势。
一是误差的互补:视觉导航的稳定位姿估计可以弥补惯性导航自身固存的累积误差,惯性导航测量单元(Inertial Measurement Unit,IMU)高频率的动态信号可以补足相机的深度信息;二是适用速度领域的互补:视觉导航在低速静态领域的优秀表现,可以抑制IMU的零漂。
而在高速运动场景领域,惯性导航可以解决视觉导航由于图像帧之间缺乏重叠区域,导致特征提取算法失效的问题。
1 经典融合框架经典融合框架中,松耦合算法是视觉惯性领域早期重点研究内容。
松耦合算法主要依靠卡尔曼滤波器及其后续改进版本,诸如应用广泛的扩展卡尔曼滤波器(Extended Kalman Filter,EKF)、基于蒙特卡洛原理的无迹卡尔曼滤波器(Unscented Kalman Filter,UKF)、非线性非高斯领域表现不俗的粒子滤波器(Particle Filter,PF)等的松耦合方案陆续提出。
PETS 2009 Benchmark Data(PETS 2009基准数据集)数据摘要:The datasets are multisensor sequences containing different crowd activities.The aim of this workshop is to employ existing or new systems for the detection of one or more of 3 types of crowd surveillance characteristics/events within a real-world environment. The scenarios are filmed from multiple cameras and involve up to approximately forty actors.More specifically, the challenge includes estimation of crowd person count and density, tracking of individual(s) within a crowd, and detection of flow and crowd events.中文关键词:跟踪,事件检测,拥挤人群,人员计数,人群密度,PETS 2009,英文关键词:tracking,event detection,crowd person,count,person density,PETS 2009,数据格式:TEXT数据用途:The aim of this workshop is to employ existing or new systems for the detection of one or more of 3 types of crowd surveillance characteristics/events within a real-world environment.数据详细介绍:PETS 2009 Benchmark DataOverviewThe datasets are multisensor sequences containing different crowd activities. Please e-mail *********************if you require assistance obtaining these datasets for the workshop.Aims and ObjectivesThe aim of this workshop is to employ existing or new systems for the detection of one or more of 3 types of crowd surveillance characteristics/events within a real-world environment. The scenarios are filmed from multiple cameras and involve up to approximately forty actors. More specifically, the challenge includes estimation of crowd person count and density, tracking of individual(s) within a crowd, and detection of flow and crowd events.News06 March 2009: The PETS2009 crowd dataset is released.01 April 2009: The PETS2009 submission details are released. Please see Author Instructions.PreliminariesPlease read the following information carefully before processing the dataset, as the details are essential to the understanding of when notification of events should be generated by your system. Please check regularly for updates. Summary of the Dataset structureThe dataset is organised as follows:Calibration DataS0: Training Datacontains sets background, city center, regular flowS1: Person Count and Density Estimationcontains sets L1,L2,L3S2: People Trackingcontains sets L1,L2,L3S3: Flow Analysis and Event Recognitioncontains sets Event Recognition and Multiple FlowEach subset contains several sequences and each sequence contains different views (4 up to 8). This is shown in the diagram below:Calibration DataThe calibration data (one file for each of the 8 cameras) can be found here. The ground plane is assumed to be the Z=0 plane. C++ code (available here) is provided to allow you to load and use the calibration parameters in your program (courtesy of project ETISEO). The provided calibration parameters were obtained using the freely available Tsai Camera Calibration Software by Reg Willson. All spatial measurements are in metres.The cameras used to film the datasets are:view Model Resolution frame rate Comments001 A xis 223M 768x576 ~7 Progressive scan002 A xis 223M 768x576 ~7 Progressive scan003 P TZ Axis 233D 768x576 ~7 Progressive scan004 P TZ Axis 233D 768x576 ~7 Progressive scan005 S ony DCR-PC1000E 3xCMOS 720x576 ~7 ffmpeg De-interlaced 006 S ony DCR-PC1000E 3xCMOS 720x576 ~7 ffmpeg De-interlaced 007 C anon MV-1 1xCCD w 720x576 ~7 Progressive scan008 C anon MV-1 1xCCD w 720x576 ~7 Progressive scanFrames are compressed as JPEG image sequences. All sequences (except one) contain Views 001-004. A few sequences also contain Views 005-008. Please see below for more information.OrientationThe cameras are installed at the locations shown below to cover an approximate area of 100m x 30m (the scale of the map is 20m):The GPS coordinates of the centre of the recording are: 51°26'18.5N 000°56'40.00WThe direct link to the Google maps is as follows: Google MapsCamera installation points are shown above and sample frames are shown below:view 001 view 002 view 003 view 004view 005 view 006 view 007 view 008SynchronisationPlease note that while effort has been made to make sure the frames from different views are synchronised, there might be slight delays and frame drops in some cases. In particular View 4 suffers from frame rate instability and we suggest it be used as a supplementary source of information. Please let us know if you encounter any problems or inconsistencies.DownloadDataset S0: Training DataThis dataset contains three sets of training sequences from different views provided to help researchers obtain the following models from multiple views: Background model for all the cameras. Note that the scene can contain people or moving objects. Furthermore, the frames in this set are not necessarily synchronised. For Views 001-004 Different sequences corresponding to the following time stamps: 13-05, 13-06, 13-07, 13-19, 13-32, 13-38, 13-49 are provided. For views 005-008 (DV cameras) 144 non-synchronised frames are provided.Includes random walking crowd flow. Sequence 9 with time stmp 12-34 using Views 001-008 and Sequence 10 with timestamp 14-55 using Views 001-004. Includes regular walking pace crowd flow. Sequences 11-15 with timestamps 13-57, 13-59, 14-03, 14-06, 14-29 and for Views 001-004.DownloadBackground set [1.8 GB]City center [1.8 GB]Regular flow [1.3 GB]Dataset S1: Person Count and Density EstimationThree regions, R0, R1 and R2 are defined in View 001 only (shown in the example image). The coordinates of the top left and bottom right corners (in pixels) are given in the following table.Region Top-left Bottom-rightR0 (10,10) (750,550)R1 (290,160) (710,430)R2 (30,130) (230,290)Definition of crowd density (%): crowd density is based on a maximum occupancy (100%) of 40 people in 10 square metres on the ground. One person is assumed to occupy 0.25 square metres on the ground.Scenario: S1.L1 walkingElements: medium density crowd, overcastSequences: Sequence 1 with timestamp 13-57; Sequence 2 with timestamp 13-59. Sequences 1-2 use Views 001-004.Subjective Difficulty: Level 1Task: The task is to count the number of people in R0 for each frame of the sequence in View 1 only. As a secondary challenge the crowd density in regions R1 and R2 can also be reported (mapped to ground plane occupancy, possibly using multiple views).Download [502 MB]Sample Frames:Scenario: S1.L2 walkingElements: high density crowd, overcastSequences: Sequence 1 with timestamp 14-06; Sequence 2 with timestamp 14-31. Sequences 1-2 use Views 001-004.Subjective Difficulty: Level 2Task: This scenario contains a densely grouped crowd who walk from onepoint to another. There are two sequences corresponding timestamps 14-06 and 14-31.The task related to timestamp 14-06 is to estimate the crowd density in Region R1 and R2 at each frame of the sequence.The designated task for the sequence Time_14-31 is to determine both the total number of people entering through the brown line from the left side AND the total number of people exiting from purple and red lines, shown in the opposite figure, throughout the whole sequence. The coordinates of the entry and exit lines are given below for reference.Line Start EndEntry : brown (730,250) (730,530)Exit1 : red (230,170) (230,400)Exit2 : purple (500,210) (720,210)Download [367 MB]Sample Frames:Scenario: S1.L3 runningElements: medium density crowd, bright sunshine and shadows Sequences: Sequence 1 with timestamp 14-17; Sequence 2 with timestamp 14-33. Sequences 1-2 use Views 001-004.Subjective Difficulty: Level 3Task: This scenario contains a crowd of people who, on reaching a point in the scene, begin to run. The task is to measure the crowd density in Region R1 at each frame of the sequence.Download [476 MB]Sample Frames:Dataset S2: People TrackingScenario: S2.L1 walkingElements: sparse crowdSequences: Sequence 1 with timestamp 12-34 using Views 001-008 except View_002 for cross validation (see below).Subjective Difficulty: L1Task: Track all of the individuals in the sequence. If you undertake monocular tracking only, report the 2D bounding box location for each individual in the view used; if two or more views are processed, report the 2D bounding box location for each individual as back projected into View_002 using the camera calibration parameters provided (this equates to a leave-one-out validation). Note the origin (0,0) of the image is assumed top-left. Validation will be performed using manually labelled ground truth.Download [997 MB]Sample Frames:Scenario: S2.L2 walking Elements: medium density crowd Sequences: Sequence 1 with timestamp 14-55 using Views 001-004. Subjective Difficulty: L2Task: Track the individuals marked A and B (see figure) in the sequence and provide 2D bounding box locations of the individuals in View_002 which will be validated using manually labelled ground truth. Note the origin (0,0) of the image is assumed top-left. Note that individual B exits the field of view and returns toward the end of the sequence.Download [442 MB]Scenario: S2.L3 WalkingElements: dense crowdSequences: Sequence 1 with timestamp 14-41 using Views 001-004. Subjective Difficulty: L3Task: Track the individuals marked A and B in the sequence and provide 2D bounding box information in View_002 for each individual which will be validated using manually labelled ground truth.Download [259 MB]Dataset S3: Flow Analysis and Event RecognitionScenario: S3.Multiple FlowsElements: dense crowd, runningSequences: Sequences 1-5 with timestamps 12-43 (using Views 1,2,5,6,7,8) , 14-13, 14-37, 14-46 and 14-52. Sequences 2-5 use Views 001-004. Subjective Difficulty: L2Task: Detect and estimate the multiple flows in the provided sequences, mapped onto the ground plane as a occupancy map flow. Further details of the exact task requirements are contained under Author Instructions. These would be compared with ground truth optical flow of major flows in the sequences on the ground plane.Download [760 MB]Sample Frames:Scenario: S3.Event RecognitionElements: dense crowdSequences: Sequences 1-4 with timestamps 14-16, 14-27, 14-31 and 14-33. Sequences 1-4 use Views 001-004.Subjective Difficulty: L3Task: This dataset contains different crowd activities and the task is to provide a probabilistic estimation of each of the following events: walking, running, evacuation (rapid dispersion), local dispersion, crowd formation and splitting at different time instances. Furthermore, we are interested in systems that can identify the start and end of the events as well as transitions between them. Download [1.2 GB]Sample Frames:Additional InformationThe scenarios can also be downloaded from ftp:///pub/PETS2009/ (use anonymous login). Warning: ftp:// is not listing files correctly on some ftp clients. If you experience problems you can connect to the http server at /PETS2009/.Legal note: The video sequences are copyright University of Reading and permission is hereby granted for free download for the purposes of the PETS 2009 workshop and academic and industrial research. Where the data is disseminated (e.g. publications, presentations) the source should be acknowledged.数据预览:点此下载完整数据集。
第18卷第4期铁道科学与工程学报Volume18Number4 2021年4月Journal of Railway Science and Engineering April2021 DOI:10.19713/ki.43−1423/u.T20200583基于单目视觉三维重建的货运列车超限检测方法研究杜伦平1,朱天赐1,刘期柏2,梁力东2,王泉东1,傅勤毅1,刘斯斯1(1.中南大学交通运输工程学院,湖南长沙410072;2.广州局集团公司货运部,广东广州510088)摘要:针对基于雷达和激光等技术的货运列车超限检测系统存在检测区域不完整及只能在列车移动状态下进行测量的缺陷,提出一种基于单目视觉三维重建的货运列车超限检测方法。
通过单目视觉三维重建算法对获取的序列图像进行建模处理得到目标货运列车的三维点云模型。
对三维点云模型进行全局坐标系转换与切片投影,得到目标货运列车若干横截面二维点云图形。
结合铁路货运列车超限检测标准,构建标准界限图形。
将获得的二维点云图形代入标准界限图形中进行超限检测判别。
实验结果显示,该方法具有较高的检测精度和检测效率,能够满足铁路货运列车超限检测作业要求。
关键词:货运列车;超限检测;单目视觉;三维重建;点云模型处理中图分类号:TP29文献标志码:A开放科学(资源服务)标识码(OSID)文章编号:1672−7029(2021)04−1009−08Research on detection method of freight train gauge-exceeding based on3D reconstruction of monocular visionDU Lunping1,ZHU Tianci1,LIU Qibai2,LIANG Lidong2,WANG Quandong1,FU Qinyi1,LIU Sisi1(1.School of Traffic and Transportation Engineering,Central South University,Changsha410075,China;2.Freight Department of China Railway Guangzhou Group Co.,Ltd.,Guangzhou510088,China)Abstract:In view of the defects of the detection system of freight train gauge-exceeding based on radar and laser technology that the detection area is incomplete and can only be measured in the moving state of the train,a method of gauge-exceeding detection of freight train based on monocular visual3D reconstruction was proposed. The3D point cloud model of the target freight train was obtained by modeling and processing the acquired sequence images through the monocular visual3D reconstruction algorithm.The global coordinate system conversion and slice projection were performed on the three-dimensional point cloud model to obtain two-dimensional point cloud graphics of several cross sections of the target freight bined with the railway freight train gauge-exceeding detection standard,a standard limit graph was constructed.The obtained two-dimensional point cloud graphics were substituted into the standard boundary graphics for gauge-exceeding detection and discrimination.Experimental results show that the method has high detection accuracy and efficiency,and can meet the requirements of railway freight train gauge-exceeding detection.Key words:freight train;gauge-exceeding detection;monocular vision;three-dimensional reconstruction;point cloud model processing收稿日期:2020−06−24基金项目:广州铁路局科研项目(广铁合货运部(2019)0002);国家自然科学基金青年基金资助项目((2018)61806222)通信作者:刘斯斯(1988−),女,湖南长沙人,讲师,博士,从事计算机视觉、大数据挖掘技术研究;E−mail:********************.cn铁道科学与工程学报2021年4月1010铁路运输在我国运输业中占据决定性地位。
Ultraleap TouchFreeUltraleap 3Di is part of Ultraleap’s TouchFreeend-to-end solution – camera hardware, reliablehand tracking software, and developer tooling.The camera uses Ultraleap’s patented stereoinfrared technology and world-leading Geminihand tracking software. It is robust even inchallenging lighting conditions, and reliablytracks a wide variety of hand sizes and shapes.Ultraleap also provides a suite of developer toolsfor development of touchless experiences:• TouchFree Application:Detects a user’s hand in mid-air andconverts it to an on-screen cursor.Retrofit existing touchscreen interfaceswith touchless gesture control or evaluateusing your desktop monitor.• TouchFree Tooling for Web or Unity:Add touchless cursor control into kioskapplications in minutes, and have totalcontrol over how your application reactsto users’ hand movements. The Ultraleap 3Di Stereo Hand Tracking Camera isdesigned to be connected to an interactive screen.Together with Ultraleap’s world-leading software, ittransforms displays into touchless, three-dimensional,immersive surfaces.Ultraleap 3Di allows easy integration of Ultraleaphand tracking in a form factor ruggedized for use inpermanent settings – such as self-serve kiosks, digitalout-of-home installations, interactive displays in retail,museums, and theme parks, or medical/industrial uses.3DiStereo Hand Tracking CameraSimulation of Ultraleap 3Di tracking range when connected abovea 42” screen.Reverse of Ultraleap 3Di.Front of Ultraleap 3Di.SpecificationsUltraleap reserves the right to update or modify this specification without notice.Device dimensionsAll dimensions in mmw/https://e/******************o/UK: +44 117 325 9002o/US: +1 650 600 9916 UL-005619-DS - 3Di Datasheet (Issue 10)MountingUltraleap 3Di is designed to be integrated directly into permanent installations. Provided with the camera inthe product box is a Multi Mount. This mount holds the camera to the kiosk or screen and provides optimum camera placement. It can also sit on any horizontal surface to support desktop evaluation or deployments.Detailed guidelines on camera placement, mounting options, TouchFree software setup and designing for touchless experiences can be found on https:///touchfree-user-manual/.Multi Mount and Ultraleap 3Di.。
算倣语咅信is与电ifiChina Computer&Communication2021年第5期利用单目图像重建人体三维模型钱融王勇王瑛(广东工业大学计算机学院,广东广州510006)摘要:人体三维模型在科幻电影、网上购物的模拟试衣等方面有广泛的应用场景,但是在单目图像重建中存在三维信息缺失、重建模型不具有贴合的三维表面等问题-为了解决上述的问题,笔者提出基于SMPL模型的人体三维模型重建算法。
该算法先预估人物的二维关节点,使用SMPL模型关节与预估的二维关节相匹配,最后利用人体三维模型数据库的姿势信息对重建的人体模型进行姿势先验,使得重建模型具有合理的姿态与形状.实验结果表明,该算法能有效预估人体关节的三维位置,且能重建与图像人物姿势、形态相似的人体三维模型.关键词:人体姿势估计;三维人体重建;单目图像重建;人体形状姿势;SMPL模型中图分类号:TP391.41文献标识码:A文章编号:1003-9767(2021)05-060-05Reconstruction of a Three-dimensional Human Body Model Using Monocular ImagesQIAN Rong,WANG Yong,WANG Ying(School of Computer,Guangdong University of Technology,Guangzhou Guangdong510006,China) Abstract:The human body3D model are widely used in science fiction movies,online shopping simulation fittings,etc,but there is a lack of3D information in monocular image reconstruction,and the reconstructed model does not have problems such as a fit 3D surface.In order to solve the above mentioned problems,a human body3D model reconstruction algorithm based on SMPL model is proposed.The algorithm first estimates the two-dimensional joint points of the character,and uses the SMPL model joints to match the estimated two-dimensional joints;finally,the posture information of the three-dimensional human body model database is used to perform posture prior to the reconstructed human body model,making the reconstructed model reasonable Posture and shape.The algorithm was tested on the PI-INF-3DHP data set.The experimental results show that the algorithm can effectively predict the3D position of human joints,and can reconstruct a3D model of the human body similar to the pose and shape of the image.Keywords:human pose estimation;3D human reconstruction;monocular image reconstruction;human shape and pose;SMPL0引言人体三维模型所承载的信息量远远大于人体二维图像,能满足高层的视觉任务需求,例如在网购中提供线上试衣体验,为科幻电影提供大量的人体三维数据。
HIKVISION Dual-Lens People Counting Technology More lenses, greater accuracy.Table of Content1.Background (3)2.Key Technologies (3)2.1.Binocular Stereo Vision (3)2.2.3D People Detection and Tracking (6)2.3.Height Filtering (6)3.Applications (7)4.Summary (7)1.BACKGROUNDWith the rise in human resource cost and the competition becoming fiercer, government agencies, enterprises, public institutions, and self-employed workers are all transforming management mode from extensive to refined. Precise people counting data can provide the data reference for effective and accurate management and decision making. The technology can be applied to several industries, including retail and chain, real estate, transportation, public service, tourism, etc.Superseding manual counting and IR curtain counting methods, people counting based on video analysis has become the main technology. Moreover, dual-lens people counting technology based on the binocular stereo vision has better counting data accuracy, application scene compatibility, anti-interference ability, and configuration convenience. It’s a good choice for a people counting strategy.2.KEY TECHNOLOGIESHikvision dual-lens people counting technology is based on the binocular stereo vision and performs people detection and tracking in 3D space, analyzes the people moving track, and finally achieves precise counting data.The technology consists of the following three sub-technologies: binocular stereo vision, 3D people detection and tracking, and height filtering.2.1.BINOCULAR STEREO VISIONBinocular Stereo Vision is a technology that based on the observation of the same object from two different positions. We can generate the position offset in the two images to get the 3D geometric information of the environment and the objects.The parallax is the effect whereby the position or direction of an object appears to differ when viewed from different positions. The parallax angle is the angle between the two observing positions. The parallax range is the distance angle between the two observing positions. For the Binocular Stereo Vision, the right lens and the left lens optical axis should be parallel. As shown in Figure 1, the parallax at P point is d = u1-u2.Figure 1 Diagram of Binocular Stereo Vision TheoryThen use the parallax and the below formula to get the X,Y,Z coordinates of the P point inthe left lens.∗,∗,∗In this way, the binocular stereo vision can get the 3D coordinates of all points in the image and support further people detection and tracking.See the Figure 2 for the images from the right lens and the left lens at the same time.Figure 2 The Images From The Right Lens And The Left LensSee the following pictures for the parallax greyscale image and the parallax pseudo-colorimage.vO 1 u u P 1 (u 1, v 1)P 2 (u 2, v 2)P(X,Y,Z)O 2uvX CY CZ C X CY CZ C b (Parallax Range) f (Focal Length)Figure 3 The Parallax Greyscale Image And The Parallax Pseudo-Color ImageSee the following pictures for the image from the left lens and the corresponding top view of 3D image.Figure 4 The Image From The Left Lens And The Corresponded Top View of 3D ImageBased on the binocular stereo vision coordinates, the camera creates a model that is stereo and is updated in real time according to the images. Compared to the 2D modeling, the 3D modeling is more accurate and has better adaptability, which can be less affected by the illumination changes.See the following figure for the 3D environment modeling diagram.Figure 53D Environment Modeling Diagram3D Environment Modeling……2.2.3D PEOPLE DETECTION AND TRACKINGThe technology is based on the spatial distribution features that the head is always higher than the shoulder, and thus is able to detect the head accurately so as to focus the people. The technology can easily adapt to different installation heights and simple or complicated environments.See the following figure for the 3D people (head) detection result, the detected targets are marked with the colorful cubes.Figure 63D People (Head) DetectionAfter focusing the target, the camera can track the target until it disappears in the image. The technology can effectively prevent the repeated count of the loitering target.See the following figure for the moving track of the each target.Figure 73D Tracking2.3.HEIGHT FILTERINGBased on the height information in the Z-axis, the target height should be in the range of the average human height, to prevent the interference of the cart, baby carriage, etc.3.APPLICATIONSSee the below picture for the actual output image of the dual-lens people counting camera.Figure 8Snapshot of the Dual-Lens People Counting CameraThe dual-lens people counting technology can be widely applied to entrance and exit of shopping malls, super markets, chain stores, stations, etc. With the help of the statistic software, the data can be generated into daily, monthly, and annual reports and provides the commercial analysis basis.Museum Shopping MallFigure 9Application Scenes4.SUMMARYHikvision dual-lens people counting technology is a self-developed technology that is mainly based on binocular stereo vision and aims at providing the people counting solution to the commercial industries. Compared to the traditional monocular people counting technology, the dual-lens people counting technology has the following advantages:∙More accurate∙Easier configuration∙Better environment adaptability∙Better anti-interference ability, such as the interferences of cloth color, hair style, shadow, carts, target loitering, etc.。
航海及海运专业英语词汇(M6)mono-channel facsimile 单路传真机mono-control 单一控制mono-control=monocontrol 单一控制mono-crystalline 单晶的mono-crystalline=monocrystalline 单晶的mono-gyro compass 单转子陀螺罗经mono-hull ship 单壳体船mono-hull ship 单体船mono-hull 单体船mono-hull 单体船单船体的mono-pontoon hull 单箱型船体mono-pump 单螺杆泵mono-rail motor crab 单轨电动起重绞车mono-tube type boiler 单盘管式直流锅炉mono-wall 整体式水冷壁monoacid 一酸价的monoacid 一酸价的一元的monoammonium phosphate 磷酸一铵monoatomic gas 单原子气体monoatomic 一原子的monoatomic 一原子的一元的一价的monobasic 一碱价的monobasic 一碱价的一元的monoblock casting 整体铸造monoblock cylinder 整体铸造气缸monoblock cylinder 整体铸造汽缸monoblock engine 单排发动机monoblock engine 整体发动机monoblock pump 单体泵monoblock pump 整装泵monoblock unit 整装机组monoblock 整体monobuoy 单浮筒monoceros 麒麟星座monocerotis 麒麟座的monochloro difluoromethane 一氯二氟甲烷monochlorodifluoromethane 一氯二氟甲烷monochlorodifluoromethane 一氧二氟甲烷monochloromonobromomethane type 一氯一溴、甲烷型monochloromonobromomethane type extinguisher 一氯一溴甲烷灭火器monochloromonobromomethane type 一氯一溴、甲烷型monochloromonobromomethane 一氯一溴甲烷monochromatic display 单色显示monochromatic light 单色光monochromatic signal 单色信号monochromatic 单色的monochrome television 黑白电视monoclinic system 单斜系monoclinic 单斜的monocolor 单色monocontrol 单一控制monocoque container 薄壳式集装箱monocrystal 单晶monocrystalline silicon 单晶硅monocrystalline 单晶的monocular observation 单目观察monocular observation 单眼观察monocular rangefinder 单眼测距仪monocular 单筒望远镜单眼的monocular 单眼的monocular 单眼的单筒望远镜monocyclic start induction motor 单周期起动感应电动机monocyclic 单调的monocyclic 单周的monodromy 单值monofrequency 单频率monogamy 一一对应monogyro compass 单转子陀螺罗经monohedron lines 单曲面型线monolayer 单层monolithic circuit 单块电路monolithic construction 整体建造monolithic da converter 单片数-模转换器monolithic integrated circuit 单块集成电路monolithic integrated circuit 单片集成电路monolithic operational amplifier 单片运算放大器monolithic processor 单片处理机monolithic ship 单体结构船monolithic 单片的monomer 单体monomial 单项式monomolecular 单分子的monomotor 单发动机monophonic recorder 单声道录音机monopod jack-up 单柱自升式平台monopolar 单极的monopole antenna 单极天线monopolized commodities 专卖货物monopoly clause 独家经营条款monopoly contract 独家经营契约monopoly 垄断monopoly 专营monoprocessor 单处理机monopulse autotracking 单脉冲自动跟踪monopulse radar 单脉冲雷达monopump 单螺杆泵monorail motor crab 单轨电动起重绞车monoscope camera 单假象管摄像机monoscope camera 单象管摄象机monoscope 单像管monoscope 单象管monoscope 简单静像管monosilane 甲硅烷monostable circuit 单稳态电路monostable flip-flop 单稳态触发电路monostable flip-flop 单稳态触发器monostable multivibrator 单稳态多谐振荡器monostable 单稳态式monotone decreasing function 单调递减函数monotone function 单调函数monotone increasing function 单调递增函数monotone 单调的monotron 直越式速调管monotube boiler 单管式锅炉monotube 单管monovalent 单价的monowall tube 膜式水冷壁管monoxide 一氧化物monsoon current 季风海流monsoon current 季风洋流monsoon drift 季风流monsoon fog 季风雾monsoon 季风monster tanker 超巨型油船monster tanker 超巨型油船巨型油轮mont 山montage 安装;装配montage 安装装配montanha 山monte 山monte 山山month of the phases 太阴月month 月month 月份month 月份;月month 月份月month 月月月份monthly general account 月度往来帐monthly inspection 每月检查monthly installment 按月付monthly overtime 每月超时monthly total of entries 每月总进口数monthly 月刊a.每月的monthly 月刊每月的months after date 到期以后的月数months after payment 支付后……月months after sight 见票后……月months 月montreal compromise 蒙特利尔妥协案montreal draft convention 蒙特利尔公约草案monument 大木笔;纪念碑monument 大木笔纪念碑大索插接用)大木笔墓碑monument 纪念碑monument 墓碑monument 墓碑纪念碑moon culminating stars 伴月星moon culmination 月亮中天moon dog 假月moon month 太阴月moon pool 月池moon rise 月出moon set 月没moon sheered 船首尾上翘的船型moon 月亮moon's age 月龄moon's horizontal parallax 月球地平视差moon's horizontal semidiameter 月球地平半径差moon's nodes 赤道白道交点moon's path 白道moon's phases 月相moonbow 月虹moonorbiting satellite 月球轨道卫星moonpool door 船阱门moonrise and moonset tables 月亮出没时间表moonrise and moonset 月出没moonrise 月出moonset 月没moonsheered vessel 首尾高舷弧船moonstone 月长石moor alongside 靠泊moor anchors ahead 船首双锚泊moor fore and aft 船)首尾系泊首尾系泊moor fore and aft 首尾系泊moor fore and-aft 首尾系泊moor head and stern 船)首尾系泊首尾系泊首尾系泊moor light 停泊灯moor with an open hawse 迎风抛八字锚moor with stern way 后退抛双锚法moor with two anchors 抛双锚moor .moor .vi.系泊moor 系泊moor 系泊;双锚泊moorage 系泊moorage 系泊费;系船处;停泊场moorage 系泊系泊处moorage 系泊系泊处系泊费moore tube lamp 穆尔放电管moored mine 锚雷moored array 锚系基阵moored floating body 锚泊浮体moored sonobuoy 锚系声纳浮标moored surveillance system 锚泊监测系统moored vessel 锚泊浮体mooredmine 锚雷moorfast anchor 快速系泊锚mooring anchor 固定锚mooring appliance 系缆具mooring area 锚泊区mooring arm imodco型单点系泊浮筒上的系缆臂mooring armimodco 型单点系泊浮筒上的系缆臂mooring arrangement test 系泊设备试验mooring arrangement 系泊设备mooring attachment point 系泊联结点mooring attachment point 系泊连接点mooring base 系泊底座mooring basin 泊地mooring basin 系泊港池mooring berth 系泊船位mooring berth 系泊地mooring bitt 带缆桩mooring bitt 系船柱mooring bitt 系缆桩mooring bitts 双柱系缆桩mooring block 浮筒沉坠mooring block 系泊锚mooring boat 带缆艇mooring bollard 带缆桩mooring bollards 双柱系缆桩mooring bracket 系泊托架mooring bridle 固定锚锚链mooring bridle 系在浮筒上的链或短缆mooring buoy 系泊浮筒mooring buoy 系泊浮筒系泊浮mooring by the head and stern 首尾系泊mooring by the head and stern 首尾系泊船)首尾系泊mooring cable 系泊索mooring capacity 系泊能力mooring capstan 系泊绞盘mooring chafing chain 系泊防擦链mooring chain 同上mooring chain 系泊链mooring chain 系船缆mooring chain 系船链mooring chock 导缆钩mooring chock 导缆器mooring cleat 系泊羊角mooring cleat 系缆耳mooring clump 浮筒沉坠mooring compensator 系缆伸缩性mooring component 系泊部件mooring craft 系缆艇mooring craft 运锚艇mooring craft 运锚小艇mooring cup 系缆穴mooring deck 系船甲板mooring dog 系缆马钉mooring dolphin 系船柱mooring drag 活动锚mooring equipment 系泊设备mooring fast 系泊缆mooring fibre rope 系船纤维缆mooring fittings 系泊设备mooring fittings 系泊属具mooring fittings 系缆设备mooring fore and aft 首尾系泊mooring gear 系泊装置mooring gear 系留装置mooring ground tackle 碇泊装置mooring guy 系留索mooring hardware 系泊用五金mooring hawse 导索孔mooring hawser 系泊缆mooring hawser 系船缆mooring head and stern 首尾系泊mooring head and stern 首尾系泊首尾系泊mooring head in 船首朝港内的系泊mooring head out 船首朝港外的系泊mooring hole 导缆孔mooring hook 系泊钩mooring hulk 系泊趸船mooring island 系泊岛mooring layout 系缆布置mooring leg 系泊腿mooring lever 系泊杠杆mooring lighter 港内运送长期系泊设备的小船mooring lighter 系泊驳mooring lighter 小船mooring line catenary 锚缆悬链线mooring line connector 锚缆联结器mooring line connector 锚缆连接器mooring line load 锚缆载荷mooring line manipulation 系泊缆操作mooring line of sea spider 海蜘蛛式系泊索mooring line 系泊缆mooring line 系缆mooring machinery 系泊机械mooring mast 系泊杆mooring mat 护缆垫mooring mat 缆索防擦垫mooring operating mode management 系泊工况管理mooring pale 码头系缆桩mooring pattern 系泊缆布局mooring pendant 浮筒链mooring pendant 连接系船浮筒与浮筒底链的小链mooring pile 系泊桩mooring pipe 导缆管导缆孔mooring pipe 导缆孔mooring point 系泊点mooring pontoon 系泊趸船mooring port 导缆孔mooring post 码头系缆桩mooring post 码头系缆桩系缆柱mooring post 系缆柱mooring rack 靠船用排桩mooring restraint 锚泊限制mooring right 系泊权mooring ring 浮筒系船环mooring ring 系索环mooring riser 系泊立管mooring rope 系泊缆mooring rope 系泊索mooring screw 螺旋锚mooring screw 螺旋式锚mooring shack 系船卸扣mooring shackle 系船卸扣mooring sinker 浮筒沉坠mooring sinker 浮筒沉坠浮筒沉坠mooring speed 系缆速度mooring spud 带缆柱mooring spud 带缆桩mooring stall 船台mooring steamer 系泊船mooring stiffness 系泊劲度mooring strain 系泊变形mooring stump 系泊桩mooring swivel 双链转环mooring swivel 系泊旋转台mooring system 锚泊系统mooring system 系泊系统mooring tackle 系泊索具mooring telegraph 抛、起锚传令钟mooring telegraph 系泊传令钟mooring terminal 系泊点mooring test 系泊试验mooring tractor 船坞牵引机mooring trial 系泊试车mooring trial 系泊试验mooring winch 带缆绞车mooring winch 绞缆机mooring winch 系泊绞车mooring wire 系泊钢缆mooring yoke 系泊刚臂mooring 锚泊;系泊mooring 系泊mooring 系泊属具mooring 系泊属具系泊属具mooring 系泊系泊属具系泊处mooring-dolphin 系船簇桩moorings 港内系船设备;系泊处moorings 港内系船设备系泊处mooringunmooring charges 系解缆费moorsom rules 莫逊法则moorsom system 莫尔斯姆法moorsom tonnage 莫逊吨位moorsom's measure 莫逊吨位丈量法moory 荒野的;泽地的moory 荒野的泽地的moot 提出讨论;问题moot 提出讨论问题mop point 最高工作压力点mop 拖把mop 拖把拖把mop-catamaran 扫油双体船mopboard 壁脚板moraine 冰碛morass 沼泽地moratorium 缓期偿债moratorium 缓期偿债;合法的延缓moratorium 暂禁moray firth case 默里湾案mordant 留色料mordant 留色料媒染剂腐蚀剂留色的媒染的腐蚀性的more of less 增减条款more or less 溢短装more or less charterer's option 由租船人选择上下限内的装载量more or less clause 溢短装条款more or less owner's option 船东有增减选择权由船东选择上下限内的装货数量more or less supplier's option 由供货人选择上下限内的装货数量more or less terms 数量增减范围条款more or less 增减条款morning and evening twilight 晨昏蒙影morning fog 晨雾morning orderbook 早晨工作命令簿morning sight 上午测天morning star 晨星morning twilight 晨影morning watch 早班morning 早晨morro 陡岬morse code fog signal 莫尔斯码雾号morse code light flashing signal 通信闪光灯morse code light or morse code fog signal 莫尔斯灯光或雾号信号morse code light 莫尔斯信号灯morse code 摩斯灯morse code 莫尔斯电码morse code 莫尔斯电码莫尔斯符号morse code 莫尔斯符号morse fog signals 莫尔斯码语雾号morse key 莫尔斯键morse lamp 莫尔斯信号灯morse lamp 莫斯信号灯morse light bulb 莫斯信号灯泡morse light 莫尔斯信号灯morse signal 莫尔斯信号morse signalling lamp 莫尔斯信号灯morse signals 莫尔斯符号morse standard taper 莫氏标准锥度morse taper shank drill 莫氏锥柄钻morse taper 莫氏锥度morse telegraphy 莫尔斯电报morse twist drill 莫氏麻花钻morse 莫斯电码莫氏mortal injury 致命伤害mortality 死亡率mortar and rocket apparatus 抛绳设备mortar jetter 喷枪mortar jetting method 喷浆法mortar line 抛射绳mortar mark 砂浆标号mortar 砂浆mortar 研钵砂浆mortar 研钵沙桨mortgage bond 抵押单mortgage contract 抵押合同mortgage of vessel under construction 建造中船舶抵押权mortgage ofship 船舶抵押mortgage 抵押mortgage 抵押权mortgage 抵押权抵押mortgagee clause 抵押权人条款mortgagee 承受抵押者mortgagee 抵押权人mortgagee 抵押权人受押人mortgagee 受抵人mortgagee 受押人mortgagor 抵押人mortgagor 抵押人出押人mortgagor 抵押者mortice and tenon joint 镶榫接合mortice chisel 笋眼凿mortice lock 榫眼锁mortice mortise and tenon joint镶榫接合mortice mortise chisel榫凿mortice mortise lock榫眼锁mortice mortisemortice 榫眼mortise block 链滑轮mortise chisel 笋眼凿mortise 榫眼mortise 榫眼;用榫接合mortised block 镶轮滑车mortising machine 制榫机mortising machine 榫眼机mos field effect transistor 金属氧化物半导体场效应晶体管mos memory 金属氧化物半导体存储器mos transistor 金属氧化物半导体晶体管mos 金属氧化物半导体mosaic cracking 龟裂mosaic 嵌镶图案mosaic 嵌镶图案感光嵌镶幕嵌镶光电阴极mosaic 镶嵌细木工moslem tomb 穆斯林墓地mosque 清真寺mosque 清真寺回教寺院mosquito boat 快艇mosquito fever 疟疾mosquito fleet 快艇队mosquito incense 蚊香mosquito net 蚊帐mosquito 蚊子most adverse heeling moment 最不利的横倾力矩most favored nation treatment 最惠国待遇most favored nation 最惠国most favoured licence clause 最惠特许条款most favoured nation clause 最惠国条款most favoured nation treatment 最惠国待遇most favoured nation 最惠国most pressing priority 最迫切的优先性most probable position 最大或然船位最大可能位置most probable position 最大或然船位最或然船位most probable position 最大可能位置most probable position 最或然船位most severe condition 最恶劣条件most significant bit 最高位most significant digit 最高位most stringent packing group 最严格的包装组most stringent ship type requirement 最严格的船型要求most stringent ship type 最严格的船型most unfavourable condition 最不利的状态most-favored nation clause 最惠国条款most-favored nation treatment 最惠国待遇most-favored 最惠的mother board 母板mother board 主板;母板mother board 主板母板mother carey's chickens! 海燕来了!海燕来了mother clock 母钟mother liquor 母液mother of pearl cloud 贝母云mother of pearl clouds 贝母云mother rod 主连杆mother ship 母船mother ship 母舰mother-plate 母材mother-plate 母材母体金属mother-ship 母船motion axes 运动坐标轴系motion characteristics 运动特性motion link 运动联杆motion link 运动链系motion of ship 船舶运动motion of translation 线性运动motion orientation 运动定向motion recorder 摇摆参数记录器motion recorder 运动参数记录器motion simulation 运动模拟motion 运动motion 运动摆动motion 运行motion-work wheel 主动轮motional admittance 动生导纳motional feed back 动圈反馈motional impedance 动生阻抗motional 运动的motivator 操纵机构motive force 动力motive force 原动力motive power 动力motive 发动的motivity 原动力motometer 转数表motometer 转数计motor driven pump 电动泵motor alternator 电动变流机motor alternator 电动交流发电机motor alternator 电动交流发电机组motor antisubmarine boat 反潜快艇motor armature 电动机电枢motor assisted 马达助动的motor base 电动机底座motor bearing 电动机轴承motor bearing 电动机轴承发动机轴承motor bearing 发动机轴承motor bedplate 电动机底座motor bedplate 电动机底座发动机底座motor bedplate 发动机底座motor block 电动滑车motor board 电动机配电板motor boat 机动艇motor boat 机动艇摩托艇motor braking 电动机制动motor capacity 电动机容量motor car carrier 汽车运输船motor car ferry 汽车轮渡motor carbon 电动机碳刷motor cargo boat light 轻型机动货船motor cargo boat 机动货船motor cargo boat 机动货船内燃机货船motor casing 电动机机壳motor casing 电动机碳刷motor chamber diameter 发动机燃烧室直径motor characteristic 电动机特性motor circuit 动力电路motor clipper 机帆船motor coaster 沿海动船motor collier 柴油机运煤船motor control center 电动机控制台motor control cubicle 电动机控制室motor control panel 电动机控制板motor control panel 电动机控制盘motor control relay 电动机控制用继电器motor control 电动机控制motor controller 电动机控制器motor converter 电动变流机motor drive 电动机传动motor drive 电动机传动电动机传动装置motor driven compressor 电动压缩机motor driven feed pump 电动式舱口盖motor driven pump 电动泵motor driven starter 电动起动器motor driven welding machine 电动机拖动式焊机motor driven 电动机驱动的motor driven 电力驱动的motor dynamo 电动发电机motor dynamo 电动发电机电动发电机motor dynamometer 电动机功率计motor eccentricity 电动机偏心率motor efficiency 发动机效率motor efficiency 发动机效率电动机效率motor enclosure 电动机机壳motor excitation 电动机励磁motor fan 电动风扇motor fan 电动鼓风机motor fault 电动机故障motor fault 电动机故障发动机故障motor fault 发动机故障motor fishing vessel 机动渔船motor fleet vessel 机动舰船motor frame 电动机故障发动机座motor frame 电动机座motor frame 发动机座motor fuels 发动机燃料motor fuse 电动机熔断器motor generator panel 电动发电机控制板motor generator panel 电动发电机仪表盘motor generator panel 电动发电机仪表盘电动发电机控制板motor generator set 电动发电机组motor generator substation 电动发电机分站motor generator 电动发电机motor governor section 发动机调速器部分motor gunboat 机动炮艇motor hunting 同步电动机的追随motor inclosure 电动机机壳motor interrupter 电动断续器motor launch 机动艇motor launch 机动艇摩托艇机动小艇汽艇motor launch 机动小艇摩托艇汽艇motor lifeboat 机动救生艇motor lifeboat 机动救生艇摩托救生艇motor lifting rig 发动机起吊设备motor lighter 机动驳船motor load 电动机负载motor man 机工motor man 机匠motor man=motorman 机工motor mechanism 发动机构motor meter 电动机型仪表motor mine sweeper 摩托扫雷艇motor octane number 马达法辛烷值motor oil 电动机润滑油motor oil 机油motor oil 机油电动机油motor operated switch 电动断路器motor operated 电动机驱动的motor order telegraph 传令钟motor order telegraph 对机舱传令钟motor order telegraph 机舱车钟motor panel 电动机配电盘motor performance 发动机性能motor protection relay 电动机保继电器motor pump 电动泵motor reduction unit 发动机减速装置motor rescue boat 机动救生艇motor room 电机舱motor sailer 机帆船motor sailing vessel 机帆船motor ship 柴油机船motor ship 内燃机船motor ship=motorship 柴油机船motor silencer 发动机消声器motor silencer 发动机消音器motor siren 电笛motor siren 电动警报器motor slide rails 电机导轨motor speed adjustment 电动机调速motor speed control 发动机转速控制motor speed control 发动机转速控制电动机转速控制motor speed switch 转子速度加减键motor speed 电动机转速motor speed 电动机转速发动机转速motor speed 发动机转速motor spirit 发动机燃料motor spirits 汽油motor starter panel 电动机起动板motor starter 电动机起动器motor starterpanel 电动机起动板motor starting 电动机起动motor surfboat 机动救生艇motor switch 电动开关motor switch 发动机开关motor tanker 柴油机油船motor tanker 内燃机油船motor tanker 内燃机油船内燃机油轮motor terminal voltage 电动机端子电压motor torpedo vessel 机动鱼雷船motor tractor 牵引车motor tricycle 三轮摩托车motor truck 卡车motor type relay 电动机式继电器motor vehicle combination 机动车组motor vehicle 机动车motor vessel 柴油机船motor vessel 机动船motor vessel 内燃机船motor winch 电动绞车motor winch 电动起货机motor winding type elctro-clock 用电动机上弦的电钟motor winding type electro-clock 用电动机上弦的电钟motor winding 电动机绕组motor yacht 机动快艇motor yacht 机动快艇机动游艇motor yacht 机动游艇motor 电动机motor 电机motor 电机发动机motor 控制电动机motor 马达motor-alternator 电动发电机motor-driven hatchcover 电动机驱动舱口盖motor-driven 电动机驱动的motor-dynamo 电动直流发电机motor-generator panel 电动发电机仪表盘motor-generator set 电动发电机组motor-generator 电动发电机motor-operated switch 电动断路器motor-operated 电动机驱动的motor-winch 电动绞车motora's ship stabilizer 船舶防摇器的一种motorboat 汽船motorcar deck 汽车甲板motorcraft 机动艇motorcycle 机器脚踏车motorhand 机电工motorization 摩托化电动化motorize 使摩托化使电动化motorized junk 机帆船motorized non-return valve 电动止回阀motorized survival craft 机动救生艇筏motorized valve 电动阀motorized 摩托化的电动化的motorless 无动力的motorless 无动力的无发动motorman 机工motorman 加油工motormeter 转数表motormeter 转数表电动机型仪表motormeter 转速表motorship 柴油机船motorway 高速公路mottle cast iron 杂晶铸铁mottle 斑点mottle 班点mouillage 锚地mould 模子moulded base line 型基线moulded beam 型宽moulded breadth 型宽moulded depth midship 船中处型深moulded depth 型深moulded displacement 型排水量moulded draft 型吃水moulded draught amidship 船中部型吃水moulded draught 型吃水moulded 模的moulded 型moulder 铸工moulding 造型mouldy 霉烂moulin 磨坊冰川锅穴mound breakwater 防波堤mound 土墩mound 小山mount checker 管脚检验器mount for power system 动力系统支架mount 大山;安装;架mount 山丘支架mount 座mountain breeze 山风mountain daylight saving time 山区夏令时mountain lift 登山缆车mountain range 山脉mountain standard time 山区标准时间mountain 山mountain 山岳mounted panel 支持板mounted spare 装配备件mounter 安装工mounting appliance 安装用具mounting arm 支臂mounting block 固定件mounting bolt hole 装配螺栓孔mounting bolt 装配螺栓mounting clip 装配夹mounting cost 安装费mounting fittings 装配附件mounting flange 支承法兰mounting hole 安装孔mounting of propeller by hydraulic jack 螺旋桨液压无键装配mounting of rotor 转子安装mounting plate 装配底板mounting rack 装配架mounting ring 装配环mounting 安置mounting 座mountings 配件mouse a hook 缠扎钩口mouse 鼠狭窄的出入口mouse 狭窄的出入口;缠扎钩口;鼠mouse 狭窄的出入口缠扎钩口鼠mousing hook 用小绳缠扎钩口的钩mousing shackle 吊钩止滑卸扣mousing 缠扎钩口结绳;吊钩的封口闩mousing 钩闩mousing-hook 防脱钩mouth of a hatch 舱口mouth of a river 河口mouth of hook 钩口mouth of river bays 河口湾mouth of river ports 河口港mouth of shackle 卸扣开口mouth of shears 冲剪口mouth piece 口承mouth piece=mouthpiece 口承mouth plug 锚链孔塞mouth ring 防磨衬套垫环mouth to mouth method 口吸法mouth 河口mouth 口;河口;出入口mouth 收敛部分mouth 嘴mouthpiece 口承mouths of rivers 河口moutonnee hummock 圆冰丘movable appendage 活动附体movable arm 活动臂movable bearing dial 活动方位刻度盘movable bearing plate 测向仪探向器movable bilge block 活动舭墩movable block 动滑车movable block 动滑车动滑车绞辘中)动滑车movable bulwark 可拆舷墙movable car deck 可卸汽车甲板movable car decks 可移动汽车甲板符号movable causeway 可移堤道movable center 弹性顶尖movable coil 动圈movable contact 动触点movable coupling 活动联轴节movable coupling 活动联轴器movable decking 活动平台movable eccentric 活动偏心轮movable eye bolt 带眼螺栓movable eye-bolt 带眼螺栓movable fit 动配合movable foot bridge 活动引桥movable gantry 活动起重台架movable jaw 活爪movable joint 活接头movable leg 活动桩腿movable lever davit 可移动吊杆movable load 活动负载movable load 可动载荷movable mast 可移动桅杆movable mast 可折桅movable pillar 活动支柱movable propeller blade 可动螺旋桨叶movable property 动产movable reflector 动镜movable reflector 动镜;指标镜movable vane 枝叶movable winch 移动绞车movable mast 可倒桅movable mastmovable 活动的movable 可移动的movables 动产move ahead … meters!前移……米move astern … meters!后移……米move 使移动使运行移动moveable leg 活动桩腿movement information distribution station 航行情报收发站movement of goods 货物移动movement of vessel 船舶动态movement 移动movement 移动移动运转movement 运动movement reporting system 船舶动态报告系统mover 发动机mover 推进器moving armature 动衔铁moving axes 动坐标moving blade 动叶moving blade 动叶片moving blade 转动叶片moving blading 转动叶片组moving block 动滑车moving block 移动滑车moving block 移滑车moving charge 移泊费moving coil ammeter 动圈式安培表moving coil instrument 动圈式测量仪表moving coil loudspeaker 动圈式扬声器moving coil meter 动圈式扬仪表moving coil relay 动圈式继电器moving coil speaker 动圈式扬声器moving coil type 动圈式moving coil voltmeter 动圈式伏特计moving coil 动圈moving coil 动圈式moving contact assembly 动触点组moving contact pin 动触针moving contact 动触点moving diaphragm element 活动膜片戊元件moving diaphragm element 活动膜片形元件moving element 活动元件moving iron instrument 磁电式测量仪表moving iron instrument 动铁式测量仪表moving iron instrument 动铁式仪表moving iron type 可动铁片式moving load 动载荷moving load 活动负载moving magnet type 可动磁铁式moving member 运动构件moving part fluidic element 带运动件射流元件moving part 活动部分moving parts 运动部件moving rail 输送机moving sleeve 运套筒moving staircase 自动梯moving surface 运动表面moving tackle 动绞辘moving target detection system 动目标探测系统moving target detection system 动目标探测系统移动物标探测系统moving target detection system 移动物标探测系统moving target indication radar 运动目标显示雷达moving target indicator 活动目标指示器moving target indicator 移动式目标指示器moving target indicator 运动目标显示器moving vane instrument 动叶式仪表moving 动的moving 活动的moving 移动moving 移动的moving 移动移动的moya 火山泥mozambique current 莫三比克洋流mpmp 型链mr boiler 船用全辐射型锅炉mr reheat type boiler 船用全辐射再热型锅炉mr 船用全辐射型msw system 主海水系统mts units 米·吨·秒制单位mu factor = μ factor 放大系数mu factor 放大系数μmu metal μ合金mu metal=mumetal μ合金mu metron 微米测微表mu millimicronmu μmu 微muara 河口muara 河口河口much 的最高级)最mucilage 胶浆mucilage 胶水mud anchor (灯船mud anchor 泥底锚mud baffle 泥箱挡板mud bank 泥滩mud berth 泥锚地mud berth 坐泥泊位mud boat 泥驳mud boat 泥船mud boat 运泥船mud box 沉淀箱mud box 泥箱mud bucket 泥斗mud carrier 泥浆船mud catcher 集泥器mud chamber 泥箱mud chamber 泥箱污水滤箱mud cock 排泥旋塞mud cock 排污旋塞mud collector 集泥器mud collector 集泥箱集泥器mud collector 集泥箱污水过滤器mud cover 挡泥罩mud cover 泥箱盖mud drag 拖网渔船mud dredge 同上mud dredge 挖泥船mud dredger 拖网渔船mud dredger 挖泥船mud drum 泥箱mud fillter 滤泥器mud filten 滤泥器mud filter 滤泥器mud flat 泥滩mud foreshore 淤泥岸mud guard dented 挡泥板瘪mud guard 挡泥板mud hold 泥舱mud hole 出泥渣孔mud hole=mudhole 出泥渣孔mud land 低潮时露出的泥岸地mud lighter 泥驳mud line 淤泥线mud mat 泥垫mud pilot 凭眼力观察水色变化而估计水深的引航人员mud plug 泥箱管道开关mud pump room 泥浆泵舱mud pump 泥浆泵mud ring cross brace 底圈横梁mud scow 平底泥驳mud sump 泥箱mud valve 底部吹泄阀mud valve 泥阀mud valve 排泥阀mud valve 排泥阀底部吹泄阀mud 泥mud 泥浆mud 泥泥底mud 泥淤泥mud-hook 小锚muddy and sandy bottom 泥砂底muddy bottom 泥底muddy ice 含有泥沙贝壳石等的冰块muddy ice 含有泥沙贝壳石等的冰块含有泥沙贝壳石头等的冰块muddy 多泥的mudhole 出泥渣孔muelle 防波堤muerto 墨西哥夏季的强北风muff coupling 套筒接合muff coupling 套筒联轴节muff coupling 套筒联轴器muff joint 套筒接合muff 衬套muffle 消声器muffle 消音器muffled oars 消音桨muffler device 消声装置muffler 消声器muffler 消音器muggy monsoon 湿热季节风muggy 闷热的mule 轻便牵引机mule 轻便牵引机小型机车mule 小型机车multi cell plenum craft 多气室气垫船multi channel and dual id operation 多信道双识别码操作multi core cable 多心电缆multi cylinder engine 多缸发动机multi electrode tube 多极管multi fuel engine 多种燃料发动机multi meter 万用表multi mode disturbance 多模干扰multi mu tube 可变放大系数管multi phase 多相multi pinplug 多路插头multi ply wood 多层板multi pulse mode chain mp型链multi purpose ship 多用途船multi range testmeter 多量程检测计multi stage amplification 多级放大multi stage skirt 多节围裙multi supported rudder 多支承舵multi valve 多管multi way 多路的multi- 表示多的意思multi- 多multi-address 多地址的multi-beam echo-sounder 多束测深仪multi-bladed rudder 多叶舵multi-burners cutting machine 多头气割机multi-burners cutting machine 多头切割机multi-casing steam turbine 多缸汽轮机multi-casing steam turbine 多缸式汽轮机multi-cell plenum craft 多气室气垫船multi-change speed 多级变速multi-channel amplifier 多通道放大器multi-channel communication 多路通信multi-channel receiver 多频接收机multi-channel recorder 多线记录器multi-channel slip ring 多槽滑环multi-channel television 多频道电视机multi-channel traffic analyser 多频道通信分析器multi-channel 多波道的multi-channel 多频道multi-channel 多信道multi-coil feed heater 多盘管式给水加热器multi-coil 多盘管multi-coiled ship 多消磁绕组船multi-collar thrust bearing 多环式推力轴承multi-colour flow diagram 彩色流程图multi-conductor cable 多芯电缆multi-conductor plug 多触点插头multi-conductor 多导线的multi-connector 多路插头multi-contact relay 多触点继电器multi-contact switch 多触点开关multi-contact 多触点的multi-core cable 多芯电缆multi-core cable 多心电缆multi-cycle sampling 多循环取样multi-cylinder engine 多缸式发动机multi-cylinder steam turbine 多缸汽轮机multi-cylinder 多缸的multi-cylinder 多汽缸multi-cylinder 多芯电缆multi-deck open space 多层甲板开敞处所multi-deck ship 多层甲板船multi-decked ship 多甲板船multi-decker ferry 多甲板渡船multi-diesel propulsion 多台柴油机推进multi-disc friction clutch 多片摩擦离合器multi-discclutch 多片离合器multi-disciplinary research 多学科研究multi-effect compressor 多效压缩机multi-effect evaporator 多级蒸发器multi-electrode mercury arc rectifier 多极汞弧整流器multi-electrode tube 多极管multi-engine geared drive power plant 多台发动机共轴齿轮传动动力装置multi-engine geared drive 多台发动机共轴齿轮传动multi-engine geared installation 多台发动机共轴齿轮装置multi-engine installation 多台发动机推进装置multi-engined 多台发动机的multi-feed oiler 多点加油器multi-flow heater 多流式加热器multi-flow 多流式multi-fold 多重的multi-folding hatchcover 组合折叠式舱口盖multi-frequency code signalling 多频编码信号方式multi-frequency pulse 多频脉冲multi-frequency reed gauge 振簧式多频频率表multi-fuel diesel 多种燃料柴油机multi-fuel engine 多种燃料发动机multi-fuel internal combustion engine 多种燃料内燃机multi-function array radar 多功能雷达multi-function array radar 多功能雷达多功能相控阵雷达multi-function array radar 多功能天线阵雷达multi-function array radar 多功能天线阵雷达多功能相控阵雷达multi-function support vessels 多功能支援船multi-functional berth 多用途泊位multi-grab 多用途夹具multi-grade lubrication oil 多种等级润滑油multi-hull craft 多体艇multi-hull passenger ship 多体客船multi-hull type 多体式multi-hulled ship 多体船multi-hulled vessel 多体船multi-lateral 多方向的multi-lateral=multilayer 多方向的multi-layer welding 多层焊multi-layer 多层的multi-level control system 分级控制系统multi-level 多能级的multi-limbed tube 多分支管multi-location control 多处控制multi-loop 多回路的多匝的多环的蛇形的multi-media conference 多媒介会议multi-meter 万用电表multi-meter 万用电表多量程仪表multi-meter=multimeter 万用电表多量程仪表multi-modal transport document 多式联运单据multi-modal transport operators liability for concealed cargo damage 多式联运中经营人对发生区段不明的货物损害的责任multi-modal transport 综合运输multi-mode disturbance 多模干扰multi-mutube 可变放大系数管multi-national company 多国公司multi-needle compass card 多磁针罗经标度盘multi-node 多节的multi-orifice nozzle 多孔喷嘴multi-pass welding 多道焊multi-passwelding 多道焊multi-phase 多相multi-pin connector 多芯插头multi-pin plug 多路插头multi-pintle rudder 多支承舵multi-plate clutch 多片离合器multi-plate freezer 多板式冻结装置multi-plate friction clutch 多片摩擦离合器multi-plug connector 多插头联结器multi-plug connector 多插头连接器multi-plug 多插头式multi-plunger pump 多柱塞泵multi-ply wood 多层板multi-ply 多股的multi-ply 多股的多层的multi-point mooring 多点系泊multi-point mooring 多点系泊设施multi-point mooring 多点系泊设施多点系泊multi-point 多点的multi-pole switch 多级开关multi-ported valve 多口阀multi-position 多位置multi-pressure stage 多压级。
目前同时定位与地图构建(Simultaneous Localization and Mapping,SLAM)技术是机器人、自动驾驶、增强现实等领域的关键技术之一。
定位与地图构建是自主移动平台导航与控制领域的两个基本问题,SLAM技术恰好是同时解决这两个问题的最有效解决方案。
人们在生产生活中对自主移动产品与日俱增的需求也使得SLAM技术成为自主移动平台领域研究的热点问题。
回溯SLAM技术的发展历程,早期主要研究基于概率估计的SLAM,例如拓展卡尔曼滤波[1]、粒子滤波[2]和组合滤波[3]。
文献[4]中以扩展卡尔曼滤波构图与定位面向半稠密三维重建的改进单目ORB-SLAM周彦1,旷鸿章1,牟金震2,王冬丽1,刘宗明21.湘潭大学自动化与电子信息学院,湖南湘潭4111052.上海航天控制技术研究所,上海201109摘要:构建更详细的地图以及估计更精准的相机位姿一直都是同时定位与地图构建(Simultaneous Localization And Mapping,SLAM)技术所追求的目标,但是以上目标与实时性要求、较低的计算代价和受限的计算资源条件是相矛盾的。
提出一种在单目ORB-SLAM(Oriented FAST and Rotated BRIEF-SLAM)方法的基础上利用关键帧中提取到的直线特征进行半稠密三维重建的方法。
由ORB-SLAM实时提供一组关键帧及其对应的相机位姿信息和一系列地图点,提出一种关键帧再剔除算法进一步减少冗余帧数目,使用直线段提取方法提取各帧中的直线段,使用纯几何约束方法对以上检测得到的直线段进行匹配,生成一个由直线段构成的半稠密三维场景模型。
实验结果表明新方法持续稳定的运行,能在低计算代价条件下快速地在线三维重建。
关键词:单目视觉;半稠密;三维重建;同时定位与地图构建;关键帧再剔除文献标志码:A中图分类号:TP242.6doi:10.3778/j.issn.1002-8331.1912-0198Improved Monocular ORB-SLAM for Semi-dense3D ReconstructionZHOU Yan1,KUANG Hongzhang1,MU Jinzhen2,WANG Dongli1,LIU Zongming21.School of Automation and Electronic Information,Xiangtan University,Xiangtan,Hunan411105,China2.Shanghai Aerospace Control Technology Institute,Shanghai201109,ChinaAbstract:Building more detailed3D maps and estimating more accurate camera poses have always been the goal pur-sued by Simultaneous Localization And Mapping(SLAM)technology.The above goals are contradictory to real-time require-ments,low computational cost,and limited computing resource of SLAM.A novel semi-dense3D reconstruction method based on the monocular Oriented FAST and Rotated BRIEF-SLAM(ORB-SLAM)by using the line segment features extracted from keyframes is proposed.Specifically,the improved method builds upon ORB-SLAM,which first provides a series of map points and a set of keyframes and their corresponding camera poses information in real-time.A Keyframe Re-Culling(KRC)algorithm is proposed to further reduce redundant keyframes.A line segment extraction algorithm is employed to extract line segments in each keyframe.By adopting a purely geometric constraints method to match2D line segments from different keyframes to generate a semi-dense3D scene model.Experimental results show that the novel method runs steadily and reliably,and can quickly perform online3D reconstruction with low computational cost.Key words:monocular-vision;semi-dense;3D reconstruction;Simultaneous Localization And Mapping(SLAM);key-frames re-culling基金项目:国家自然科学基金(61773330,61100140,61104210);湖南省自然科学基金(2017JJ2253);上海市科学技术委员会项目(19511120900);湖南省教育厅科研项目(17B259)。
单目三维重建综述英文回答:Monocular 3D Reconstruction: A Comprehensive Review.Introduction.Monocular 3D reconstruction is the task of estimating the 3D structure of a scene from a single 2D image. This is a challenging problem, as the lack of multiple viewpoints makes it difficult to disambiguate depth and 3D relationships. However, monocular 3D reconstruction has a wide range of applications, including robotics, autonomous driving, and augmented reality.Methods.There are a variety of methods for monocular 3D reconstruction. These methods can be broadly divided into two categories:Geometric methods: These methods use geometric constraints to infer the 3D structure of a scene. For example, vanishing points can be used to estimate the location of the camera and the orientation of planes in the scene.Learning-based methods: These methods use machine learning techniques to learn the mapping from a single 2D image to a 3D representation. For example, convolutional neural networks (CNNs) can be trained to predict the depth map of a scene from a single image.Evaluation.The performance of monocular 3D reconstruction methods is typically evaluated using a variety of metrics, including:Accuracy: The accuracy of a method is measured by the mean absolute error (MAE) between the predicted 3Dstructure and the ground truth.Completeness: The completeness of a method is measured by the percentage of the ground truth 3D structure that is correctly predicted.Robustness: The robustness of a method is measured by its ability to handle challenging conditions, such as noise, occlusions, and motion blur.Applications.Monocular 3D reconstruction has a wide range of applications, including:Robotics: Monocular 3D reconstruction can be used to create maps of the environment for robots. This information can be used for planning, navigation, and object manipulation.Autonomous driving: Monocular 3D reconstruction can be used to create depth maps of the road ahead. Thisinformation can be used to detect obstacles, plan safepaths, and control the vehicle's speed.Augmented reality: Monocular 3D reconstruction can be used to create virtual objects that can be placed in the real world. This technology can be used for gaming, education, and training.Conclusion.Monocular 3D reconstruction is a challenging but important problem with a wide range of applications. In recent years, there has been significant progress in this area, thanks to the development of new methods and the availability of large datasets. As this field continues to develop, we can expect to see even more accurate, complete, and robust monocular 3D reconstruction methods.中文回答:单目三维重建,综述。
2022年全国普通高等学校招生统一考试浙江卷英语试卷养成良好的答题习惯,是决定成败的决定性因素之一。
做题前,要认真阅读题目要求、题干和选项,并对答案内容作出合理预测;答题时,切忌跟着感觉走,最好按照题目序号来做,不会的或存在疑问的,要做好标记,要善于发现,找到题目的题眼所在,规范答题,书写工整;答题完毕时,要认真检查,查漏补缺,纠正错误。
一、听力题1、What will the speakers do next?A. Check the map.B. Leave the restaurant.C. Park the car.2、Where are the speakers?A. At a bus stop.B. At home.C. At the airport.3、What did the speakers do last week?A. They had a celebration dinner.B. They went to see a newborn baby.C. They sent a mail to their neighbors.4、Why does the man make the phone call?A. To cancel a weekend trip.B. To make an appointment.C. To get some information.5、What does the man probably want to do?A. Do some exercise.B. Get an extra key.C. Order room service.听材料,回答第6、7题。
6、Why does the woman come to the man?A.To ask for permission.B.To extend an invitation.C.To express thanks.7、When are the students going to the museum?A.On Friday.B.On Saturday.C.On Sunday.听材料,回答第8至10题。
第11讲Unit 3 Sports and Fitness目标1掌握重点单词和词块的用法。
一、核心单词讲解清单一----词表词汇速记(限时25分钟)1. set an example 树立榜样set an example to/for sb 给某人树立榜样follow one's example 效仿某人;以某人为榜样take...for example 以……为例for example/instance 例如eg. We Chinese people are all called on to learn from Lei Feng, who we think sets an example to us all. 我们中国人都被号召向雷锋学习,我们认为他为我们树立了榜样。
2.honour(又作honor)n.荣誉;尊敬;荣幸● It's an honour for sb to do sth 对某人来说做某事很荣幸feel it an honour to do sth 为做某事感到荣幸in honour of...向……表示敬意;为纪念……● be/feel honoured to do sth 做某事感到荣幸● honourable adj.可敬的;体面的honoured adj.受尊敬的st but not least,I feel it a great honour to represent my class to take part in the group dancing competition.最后但同等重要的是,代表我们班参加团体舞比赛,我感到很荣幸。
3.determination n.决心;决定● with determination 有决心● determine vt.决定;确定;下决心;影响determine to do sth 决定做某事determine on/upon(doing)sth 决定(做)某事● determined adj.有决心的be determined to do sth 决心做某事(表示状态)【联想】“决定做某事”表达还有∶①decide to do sth ②make a decision to do sth③make up one's mind to do sth4.injure vt.使受伤;损害(hurt)● injured adj.受伤的;有伤的get injured 受伤the injured 伤员● injury n.伤害;损伤do an injury to sb 伤害某人5. lose heart 丧失信心;泄气lose one's heart to 某人倾心于……;爱上……put one's heart into 某人专心于……heart and soul全心全意;完全地learn...by heart 记忆;背诵eg. In no sense should you lose heart; keep trying and you will make it sooner or later. 决不应丧失信心;继续努力,你迟早会成功的。
• 1/3” Progressive Scan CMOS• Dual lens, dual CPU, support stereo views • 2 mm lens• Max. resolution 640 × 960 • H.264• Flash memory storing people counting data• 3D digital noise reduction • Up to 3 m IR rangeiDS-2XM6810/C-IM/ND is a specialized people counting camera, supporting separate counting of people entering, exiting, and passing by.The camera combines various crucial and advanced technologies, including binocular stereo vision, 3D people detection and tracking, and height filtering, to guarantee high accuracy.SpecificationsCameraImage Sensor1/3″ Progressive Scan CMOSShutter Speed1/25 s to 1/10,000 sAuto-iris ManualDigital Noise Reduction 3D DNRAngle Adjustment Tilt: 0° to 45°LensFocal length 2 mmAperture F2.2FOV Horizontal FOV 105°, vertical FOV 89°Lens Mount M12IRIR Range Up to 3 mCompression StandardVideo Compression Main stream: H.264 Sub-stream: H.264H.264 Type Baseline Profile/Main Profile/High ProfileVideo Bit Rate 32 Kbps to 8 MbpsPeople CountingStatistics Support separate counting of people entering, exiting, and passing by Data Uploading Support real-time uploading and uploading by statistic cycleEmail Report Daily, weekly, monthly, and annual report supportedHeight Filter SupportDoor Status Indication SupportImageMax. Resolution 640 × 960Main Stream Max. Frame Rate 50Hz: 25fps (640 × 480) 60Hz: 24fps (640 × 480)Sub-stream Max. Frame Rate 50Hz: 25fps (640 × 960) 60Hz: 24fps (640 × 960)Image Enhancement 3D DNRImage Setting Brightness and sharpness are adjustable by client software or web browserDay/Night Switch Auto, scheduled, day, and nightNetworkAlarm Trigger Video tampering, network disconnected, IP address conflict, illegal loginProtocols TCP/IP, ICMP, HTTP, HTTPS, FTP, DHCP, DNS, DDNS, RTP, RTSP, RTCP, NTP, UPnP, SMTP, SNMP, IGMP,802.1X, QoS, IPv6, UDP, BonjourGeneral Function Anti-flicker, heartbeat, password protection, watermark, door status detection, email data report Firmware Version V5.4.71API ISAPI, HTTPSimultaneous Live View Up to 6 channelsUser/Host Up to 32 users3 levels: Administrator, Operator and UserClient iVMS-4200Unit: mm [inch]Web BrowserIE8+, Chrome31-44, Firefox 30.0 - 51InterfaceCommunication Interface1 M12 aviation plugGeneralOperating Conditions -30°C to +60°C (-22 °F to +140 °F), humidity 95% or less (non-condensing) Power SupplyTerminal blockPoE (802.3af, class 3)Power Consumption and Current PoE (802.3af, 36 V to 57 V), max. 7 W, 0.2A to 0.13A Material Body: metal; enclosure: plasticDimensions Camera: 216 × 87 × 57 mm (8.51″ × 3.41″ × 2.22″) Package: 260 × 125 × 125 mm (10.24″ × 4.92″ × 4.92″) WeightApprox.850 g (1.87 lb.), 1100 g (2.43 lb.) with packageDimensions Available Model: iDS-2XM6810/C-IM/ND(2 mm)Installation RecommendationFollowing table shows the relations among focal length, mounting height, and max. detection span of typical values. For detailed installation advice, please contact the technical support of HIKVISION.Focal Length (mm) Mounting Height (m) Max. Detection Span (m)2.0 2.5 1.82.03.0 2.92.03.54.0Mounting HeightVehicle FloorMax. Detection SpanNote: The camera should be installed at 2.1 to 3.0 meters heightabove the first step of the entry/exit area of the vehicle if severalsteps are available.05047120180928。
第20卷第$期2018年5月大连民族大学%报Journal of Dalian Minzu University V〇1.20,N〇.3May2018文章编号:2096 -1383(2018)03 -0218 -04速度异常的横穿马路行人检测算法许Z豪,毛琳,杨大伟(大连民族大学机电工程学院,辽宁大连116605)摘要:针对人车混行区域行人异常运动极大地干扰驾驶者判断问题,提出一种单目摄像机条件下,基于TLD和粒子滤波器行人检测与跟踪技术的速度异常横穿马路行人检测方法,用于检测车前道路上的运动速度过快或过慢的行人目标,作为危险行人提醒驾驶者注意,从而实现行人目标的早期安全预警。
经仿真分析,能够有效地估计实际道路上行人的运动速度,判断运动速度异常的行人目标,为驾驶者和车载辅助驾驶系统提供必要的参考信息。
关键词:目标检测;行人安全;速度异常;辅助驾驶中图分类号:TP391.41 文献标志码:ADetectionAlgorithmof Jaywalkers withVelocity AbnormityXU Ye- hao,MAO Lin,YANG Da- wei(School of Electromechanical Engineering,Dalian Minzu University,Dalian Liaoning 116605,China)Abstract:Aiming at tlie problem that pedestriansr abnormal movements in the mixed area of pedestrians and vehicles will greatly interfere the driversr judgment,this study applies monocular camera and proposes a detection algoritlim of jaywall^ers witli velocity abnormity based on the detection and tracking metliod used TLD and particle filter.This method can detect the wallas very fast or slow in front of vehicles,and warn drivers to achieve early protection for destrians.With the simulation,this algorithm effectively estimates pedestrian's velocity on road and detects the anomalous pedestrian,which provides necessary references to drivers anddriver assistance systems.Key words:object detection;pedestrian safety;velocity abnormity;driver assistance横穿马路的行人运动速度太快、太慢或者突 变都可能影响驾驶者的判断,从而导致交通事故。
三维人扫描人体英汉互译英文回答:Three-dimensional human body scanning is a cutting-edge technology that has revolutionized the way we capture and visualize the human form. It involves using specialized 3D scanners to create detailed and accurate digital models of the human body, providing valuable insights into human anatomy, movement, and posture. This technology has a wide range of applications in various fields, including healthcare, fitness, fashion, and entertainment.In healthcare, 3D human body scanning is used for creating personalized prosthetics, surgical planning, and assessing body composition. It allows doctors to obtain precise measurements and create customized prosthetics that fit perfectly, improving patient comfort and mobility. In surgical planning, 3D body scans provide surgeons with detailed anatomical information, enabling them to visualize complex procedures and make informed decisions.Additionally, body composition analysis using 3D scanning helps monitor body fat, muscle mass, and other health indicators, supporting personalized fitness and nutrition plans.In the fitness industry, 3D body scanning is utilized for analyzing posture, tracking progress, and designing personalized workout programs. It provides accurate measurements of body dimensions, posture alignment, and range of motion, helping fitness professionals to identify imbalances and create tailored exercise plans that maximize results. By tracking changes over time, individuals can monitor their progress and adjust their training strategies accordingly.The fashion industry has also embraced 3D human body scanning to enhance the design and fitting process. It allows designers to create virtual garments that can be fitted to precise body measurements, reducing the need for multiple physical fittings and improving the accuracy of garment sizing. For consumers, 3D body scanning offers personalized shopping experiences by enabling them tovirtually try on clothes and find garments that fit perfectly, reducing the hassle of returns and exchanges.In the entertainment industry, 3D human body scanning plays a crucial role in creating realistic character models for movies, video games, and virtual reality experiences. It provides animators with accurate body proportions and Bewegungen, allowing them to create lifelike charactersthat move and interact naturally. This technology also enables the creation of custom avatars for virtual reality and gaming environments, enhancing the immersive experience for users.Overall, 3D human body scanning has emerged as a transformative technology with diverse applications across multiple sectors. It empowers individuals to understand their bodies better, enables healthcare professionals to provide more precise and personalized care, and enhances the design and entertainment industries by creatingrealistic and tailored experiences. As the technology continues to evolve, we can anticipate even more groundbreaking applications in the future.中文回答:三维人体扫描是一种前沿技术,它彻底改变了我们捕捉和可视化人体的方式。
BU CS TR98-019rev.2.To appear in Proc.IEEE Conf.on Computer Vision and Pattern Recognition,June19993D Trajectory Recovery for Tracking Multiple Objectsand Trajectory Guided Recognition of ActionsR´o mer Rosales and Stan SclaroffComputer Science Department,Boston UniversityAbstractA mechanism is proposed that integrates low-level(im-age processing),mid-level(recursive3D trajectory estima-tion),and high-level(action recognition)processes.It isassumed that the system observes multiple moving objectsvia a single,uncalibrated video camera.A novel extendedKalmanfilter formulation is used in estimating the rela-tive3D motion trajectories up to a scale factor.The re-cursive estimation process provides a prediction and errormeasure that is exploited in higher-level stages of actionrecognition.Conversely,higher-level mechanisms providefeedback that allows the system to reliably segment andmaintain the tracking of moving objects before,during,andafter occlusion.The3D trajectory,occlusion,and seg-mentation information are utlized in extracting stabilizedviews of the moving object.Trajectory-guided recognition(TGR)is proposed as a new and efficient method for adap-tive classification of action.The TGR approach is demon-strated using“motion history images”that are then recog-nized via a mixture of Gaussian classifier.The system wastested in recognizing various dynamic human outdoor ac-tivities;e.g.,running,walking,roller blading,and cycling.Experiments with synthetic data sets are used to evaluatestability of the trajectory estimator with respect to noise.1IntroductionTracking non-rigid objects and classifying their motion is achallenging problem.The importance of tracking and mo-tion recognition problems is evidenced by the increasingattention they have received in recent years[26].Effectivesolutions to these problems would lead to breakthroughs inareas such as video surveillance,motion analysis,virtualreality interfaces,robot navigation and recognition.Low-level image processing methods have been shownto work surprisingly well in restricted domains despite thelack of high-level models[9,8,30].Unfortunately,mostof these techniques assume a simplified version of the gen-eral problem;e.g.,there is only one moving object,ob-jects do not occlude each other,or objects appear at a lim-ited range of scales and orientations.While higher-level,model-based techniques can address some of these prob-lems[6,12,15,16,18,21,23],such methods typicallyrequire careful placement of the initial model.These limitations arise because object tracking,3D tra-jectory estimation,and action recognition are treated asseparable problems.In fact,these problems are inexorablyintertwined.For instance,an object needs to be tracked ifits3D trajectory is to be recovered;while at the same time,ing tracking to a plane,if the projective effect is avoidedthrough the use of a top,distant view[14].It is also pos-sible to use some heuristics about body part relations and motion on image plane like[13].In our work,we do not as-sume planar motion or detailed knowledge about the objectand our formulation can handle some changes in shape.More complicated representations like[16,12,18,23,15,21],use model-based techniques,generally articulatedmodels comprised of2D or3D solid primitive,sometimes accounting for self-occlusion by having an explicit object model.Shape models were used by[2]for human tracking.The most relevant motion recognition approaches re-lated to our work are those that employ view-based mod-els[3,9,19].In particular,[9]uses motion history im-ages(MHI)and motion energy images(MEI),temporal templates that are matched using a nearest neighbor ap-proach against examples of given motions already learned. The main problem of this method is the requirement of having stationary objects,and the insufficiency of the rep-resentation to discriminate among similar motions.Mo-tion analysis techniques have had the problem that registra-tion of useful,filtered information is a hard labor by itself [9,4,19].Our system provides a functional front-end that supports such tasks.3Basic ApproachA diagram of our approach is illustrated in Fig.1.Thefirst stage of the algorithm is based on the background subtrac-tion methods of[30].The system is initialized by acquiring statistical measurements of the empty scene over a number of video frames.The statistics acquired are the mean and covariance of image pixels in3D color space.The idea is to have a confidence map to guide segmentation of the foreground from the background.Once initialized,moving objects in the scene are seg-mented using a log-likelihood measure between the incom-ing frames and the current background model.The input video stream is low-passfiltered to reduce noise effects.A connected components analysis is then applied to the resulting image.Initial segmentation is usually noisy,so morphological operations and sizefilters are applied.If there are occlusions,or strong similarities between the empty scene model and the objects,then it is possible that regions belonging to the same object may be classi-fied as part of different objects or vice versa.To solve this problem,two sources of information are used:temporal analysis and trajectory prediction.In temporal analysis,a map of the previous segmented and processed frame is kept.This map is used as a possible approximation of the current connected elements.Connec-tivity in the current frame is compared with it,and regions are merged if they were part of the same object in previous frames.This can account for some shadow effects,local background foreground similarities and brief occlusions.In trajectory prediction,an Extended Kalman Filter (EKF)provides an estimate of each object's image bound-ing box position and velocity.The input to the EKF is a2D bounding box that encloses the moving object in the image. The extended Kalmanfilter then estimates the relative3D motion trajectories for each object,based on a3D linear trajectory model.In contrast to trajectory prediction basedLow−LevelReasoningMotionRecognitionFigure1:System Diagram.on a2D model,the3D approach is explicitly designed to handle nonlinear effects of perspective projection.Occlusion prediction is performed based on the current EKF estimates.Given that we know object position and the occupancy map,we can detect occlusions or collisions in the image plane.Our EKF formulation estimates the tra-jectory parameters for these objects assuming locally lin-ear3D motion,also the bounding box parameters are esti-mated.During an occlusion,the EKF can be used to give the maximum likelihood estimate of the current region cov-ered by the object,along with its velocity and position.For each frame,the estimated bounding box is used to resize and resample the moving blob into a canonical view that can be used as input to motion recognition modules. This yields a stabilized of the moving object throughout the tracking sequence,despite changes in scale and position.The resulting translation/scale stabilized images of the object are then fed to an action recognition module.Ac-tions are represented in terms of motion energy images (MEI's)and motion history images(MHI's)[4,9].An MEI is a cumulative motion image,and an MHI is a function of the recency of the motion at every pixel.By using stabi-lized input sequences,it is possible to make the MEI/MHI approach invariant to unrestricted3D translational motion. The stabilized representation is then fed to a moment-based action classifier.The action recognition module employs a mixture of Gaussian classifier,which is learned via the Ex-pectation Maximization(EM).In theory it is necessary to learn representations of every action for all possible trajectory directions.However,the complexity of such an exhaustive approach would be im-practical.We therefore propose a formulation that avoids this complexity without decreasing recognition accuracy. The problem is made tractable via trajectory-guided recog-nition(TGR),and is a direct consequence of our tracking and3D trajectory estimation mechanisms.In TGR,we partition the hemisphere of possible trajectory directions based on the trajectories estimated in the training data. Each partition corresponds to a group of similar trajectory directions.During training and classification,trajectory di-rection information obtained via the EKF is used to deter-mine the direction-partitioned feature space.This allowsautomatic learning and adaptation of the direction space tothose directions that are commonly observed.43D Trajectory from2D Image MotionOur method requires moving blob segmentation and con-nected components analysis as input to the tracking mod-ule.Due to space limitations,readers are to[24]for details of these modules.To reduce the complexity of the trackingproblem,two feature points are selected:two opposite cor-ners of the blob's bounding ing a blob's bounding box alleviates need to searching for corresponding point features in consecutive frames.In general we think that a detailed tracking of features is neither necessary nor easily tenable for non-rigid motion tracking at low resolution.It is assumed that although the object to be trackedis highly non-rigid,the3D size of the object's boundingbox will remain approximately the same,or at least vary smoothly.This assumption might be too strong in some cases;e.g.,if the internal motion of the object's parts can-not be roughly self contained in a bounding box.However, when analyzing basic human locomotion,we believe that these assumptions are a fair approximation.For our representation a3D central projection modelsimilar to[28,1]is used:is the inverse focal length.The origin of the coordinate system isfixed at the image plane. The model is numerically well defined even in the case of orthographic projection.Our state models a3D planar rectangular bounding box moving along a linear trajectory at constant velocity.Be-cause we are considering the objects as being planar,the depth at both feature points should be the same.The re-duction in the number of degrees of freedom improves the speed of convergence of the EKF and the robustness of the estimates.Our state vector then becomes:(2) where,are the corners of the 3D planar bounding box.The vector represents a corner's3D velocity relative to the camera.The sensitivity in and is directly dependent on theobject depth as objects that are farther away from the cam-era tend to project to fewer image pixels.The sensitivity of is an inverse function of camera focal length,becoming zero in the orthographic case.The3D trajectory and velocity are recovered up to ascale factor.However,the family of allowable solutions allproject to a unique solution on the image plane.We can therefore estimate objects'future positions on the image plane given their motion in space.The use of this3D trajectory model offers significantly improved ro-bustness over methods that employ a2D image trajectory model.Due to perspective foreshortening effects,trajecto-ries in the image plane are nonlinear,and2D models are therefore inaccurate.4.1Extended Kalman Filter Formulation Trajectory estimates are obtained via an extended Kalman Filter(EKF)formulation.Our state is guided by the fol-lowing linear equation:(3) where is our state at time,is the process noise and,the system evolution matrix,is based onfirst or-der Newtonian dynamics in3D space and assumed time invariant.If additional prior information on dynamics is available,then can be changed to better de-scribe the system evolution[22].Our measurement vector is, where are the image plane coordinates for the ob-served feature at time.The measurement vector is related to the state vector via the measurement equation:.Note that is non-linear.The EKF time update equation becomes:(4)(5) where is the process noise covariance.The measurement relationship to the process is nonlin-ear.At each step,the EKF linearizes around our current es-timate using the measurement and state partial derivatives. The measurement update equations become:(6)(7)(8)where is the Jacobian of with respect to:(9)where.Finally,the matrix is the Jacobian of with respect to,and is the measurement noise covariance at time.The general assumptions are:and are Gaussian random vectors with, and.For more detail,see[27, 29].Obviously,as more measurements are collected,the er-ror covariance of our estimates tends to decrease.Ex-perimentally40frames were needed for convergence with real data.As will be seen in our experiments,motions that are not linear in3D can also be tracked,but the estimate at the locations of sudden change in velocity or direction is more prone to instantaneous error.The speed of conver-gence when a change in trajectory occurs depends on the filter's expected noise.A reseting mechanism is used to detect when the EKF does not represent the true observa-tions.This is done by comparing the current projection of the estimate with the observation.5Motion RecognitionOur tracking approach allows the construction of an object centered representation.The resulting translation/scale sta-bilized images of the object are then fed to an action recog-nition module.Actions are represented in terms of motion energy images(MEI's)and motion history images(MHI's) [4,9].An MEI is a cumulative motion image,and an MHI is a function of the recency of the motion at every pixel. By using stabilized input sequences,it is possible to make the MEI/MHI approach invariant to unrestricted3D trans-lational motion.The seven Hu moment invariants are then computed for both the MHI and MEI[4,9].The resulting features are combined in a14-dimensional vector.The dimension of this vector is reduced via principal components analysis (PCA)[11].In our experiments,the PCA allowed a dimen-sionality reduction of64%(dim=5),while capturing90% of the variance of our training data.The reduced feature vector is then fed into a maximum likelihood classifier that is based on a mixture of Gaussians model.In the mixture model,each action class is represented by a set of mixture model parameters,.The model pa-rameters and prior distribution,for each action class are estimated during a training phase using the expec-tation maximization(EM)algorithm[10].Given, we calculate using Bayes rule.The motivation for using a mixture model is that there may be a number of variations(or body configurations)for motions within the same action class.The model should adequately span the space of standard configurations for a given action.However,we are only interested infinding a representative set of these modes.We therefore use an information theoretic technique to select the best number of parameters to be used via MDL principle:Figure2:Tracking example:2bodies walking in different trajectories,occluding eachother.Figure3:Normalized views of the2bodies in the sequence,one row per body.6views with their respective regions of support.Figure4:Recovered(top view)motion.Note that motion of body1is from right to left.Body2goes in the opposite direction. Estimates at the beginning of the sequence are not very accurate, because error covariance in the EKF is higher.Uncertainty is reduced as enough samples are given.the second body moves in the opposite direction.While the early position estimates are noisy,after20-40framestheEKFconvergesto a stable estimate of the trajectories.6.1Learning and Recognition of ActionsIn order to test our the full action recognition system,wecollected sequences(3hours total)of random people per-forming different actions on a pedestrian road in a outdoorenvironment(Charles River sequences).We trained thesystem to recognize four different actions(walking,run-ning,roller blading,biking)gathered from two different camera viewpoints.The camera was located and angle with respect to the road.Video sequences showing56examples of each actionwere extracted from this data set.The duration of eachexample ranged from one tofive seconds(30-150frames).The recognition experiment was conducted as follows.For each trial,a subset of40training examples per action was select at random from the full example set.The re-maining examples per action(excluded from the test set) were then classified.Example frames from the data set are shown in Fig.5.As before,the estimated bounding boxes for each movingobject are shown overlaid on the input video image.Our approach indicated that only two trajectories where mainlyFigure5:Example frames taken from the river sequences.Actions Running Biking-0.120.35 Running-0.010.11-0.23 Biking0.02-Totals0.350.03Table1:Confusion matrix for classifying four action classes: walking,running,roller blading,and biking.In the experimental trials,the total probability of an error in classification (chance=0.75).observed:either direction along the foot path.There-fore,the system learned just two sets of PDF's ().Results for either view were almost the same;average rates are therefore presented.Classification performance is summarized in the con-fusion matrix shown in Tab.1.Each confusion matrix cell represents the probability of error in classification.The entry at is the probability of action being misclassi-fied as action.The last column is the total probability of a given action being misclassified as another action.The last row represents the probability of misclassification to action class.In the experimental trials,the total probability of an error in classification(chance=0.75). 6.2Sensitivity ExperimentsIn order to provide a more comprehensive analysis of the 3D trajectory estimation technique,we tested its sensitiv-ity with synthesized data sets.We conducted Monte Carlo experiments to evaluate the stability of trajectory estima-tion and prediction.In our experiments,two types of noise effects were tested:noise in the2D image measurements, and noise in the3D motion model.Test sequences were generated using a synthetic planar bounding box moving along varying directions in3D from a given starting position.The3D box was then projectedonto the image plane using our camera model(Eq.:1)with unit focal length.Each synthetic image sequence was100 frames long.The set of directions was sampled by the az-imuth and the rotation around the vector correspond-ing to.For sampling we use(different directions).For each experiment, each of the576possible trajectory directions was tested using15sequences with randomly perturbed inputs.All synthetic video sequences were sampled at a pixel resolution of.This was mapped to a physical viewport size of world units.Therefore one pixel width is equivalent to in world units.The depths of the object from the image plane ranged in a scale from to.This resulted in a projected bounding box that occupied approximately3%of the image on average.For all of our experiments we define the error in our es-timate at a given time to be measured in the image plane. The mean squared error(MSE)in estimating the bounding box corner positions was computed over the100frames within each test sequence.To better understand the effect of error due to differences in the projected size of the ob-ject from frame to frame,second error measure,normal-ized MSE was also computed.In this measure,the error at each frame was normalized by length of the projected bounding box diagonal.Figure6:Example of performance with significant measure-ment noise.The plots show error as a function trajectory direction().Normalized mean-square error(upper surface)and non-normalized mean-square error error surfaces(lower surface)are shown.Note that mean-square error varies almost exclusively with(azimuth).To test the sensitivity of the system to noise in the mea-surements,time varying white noise with variance was added to the measured bounding box corner positions at each frame.This was meant to simulate sensor noise and variations in bounding box due to non-rigid deformations of the object.To test the sensitivity of the model formula-tion to perturbations in the actual3D object trajectory,time varying white noise with variance was added to perturb the3D position of the object at each frame.The resulting trajectories were therefore not purely linear.Our experiments have consistently shown that the mean-square error depends exclusively on the azimuth an-gle of the trajectory,.An illustrative example,Fig.6 shows error surfaces for and.The plot shows the mean-square error and normalized mean-square error,over all possible directions.As can be seen,Figure7:Graphs showing the sensitivity with respect to varying levels of measurement noise.Thefirst graph shows the normal-ized mean-square error in the state estimate over various trajec-tory directions.The second graph shows the normalized mean-square error in the state predicted for the future frame. Figure8:Sensitivity with respect to varying levels of noise in the3D motion trajectory.Thefirst graph shows the normalized mean-square error in the state estimate over various trajectory di-rections.The second graph shows the normalized mean-square error in the state predicted.mean-square error is relatively invariant to.Due to this result,our graphs drop by averaging over it,so that the complexity of visualization is reduced.Fig.7shows results of experiments designed to test the effect of increasing the measurement noise.We set relatively low with respect to the real3D dimen-sions,,,and varied.As expected,the error is lower at lower noise levels.Thefirst graph shows the normalized mean-square error in the state estimate over various trajectory directions.The second graph shows the normalized mean-square error in the state predicted for the future frame(ten frames ahead).This error is due to the linearization step in the EKF.This error effects the accuracy of“look ahead”needed to foresee an occlusion, and to predict the state during occlusions.The graph in Fig.8shows the results of experiments designed to test the effect of increasing noise in the3D motion trajectory.This corresponds to noise added to the model.Here,is varied as shown,, ,and is kept relatively small. Note that is set to very high values with respect to the expected model noise.The normalized mean-square error in the position estimates is relatively constant over for each different level of noise and is also higher than the pre-diction error.The main reason for this is that the expected low measurement error tends to pull the estimates towards highly noisy measurements.The prediction error in gen-eral increases with and,showing the higher error in linearizing among the current state space point.The errorM e a n −s q u a r e d e r r o rFigure 9:Sensitivity with number of frames ahead used to calcu-late the prediction.Normalized mean-square error overdecreases after a given value for due to the normaliza-tion effect and the smaller effect that close to has with respect to changes on the image plane.In a final set of experiments,we tested the accuracythe trajectory prediction,at varyingframes into the fu-ture.Results are shown in Fig.9.Notice that as expected,the 3D trajectory prediction is more accurate in short time windows.The uncertainty increases with the number of frames in the future we want to make our prediction.This is mainly due to the fact that the EKF computes an approx-imation linearizing around the current point in state space,and as we pointed out the underlying process is non-linear.The prediction error in general increases with ,showing the higher error as consequence of linearization.7ConclusionWe have shown how to integrate low-level and mid-level mechanisms to achieve more robust and general tracking.3D trajectories can be successfully estimated,given some constraints on non-rigid objects.Prediction and estimation based on a 3D model gives improved performance over 2D image plane based prediction.This trajectory estimation can be used to predict and correct for occlusion.We utilized EKF 3D trajectory estimates in a new framework:Trajectory-Guided Recognition (TGR).This general method significantly reduces the complexity of ac-tion classification,and could be used with other techniques (e.g.,,[3,19]).Our tracking approach allows the construc-tion of an object centered representation.The resulting translation/scale stabilized images of the object are then fed to the TGR action recognition module that selects the appropriate classifier based on trajectory direction.The system was tested in classifying four basic actions in a large number of video sequences collected in uncon-strained,outdoor scenes.The noise stability properties of the trajectory estimation subsystem were also tested us-ing synthetic data sequences.The results of the exper-iments are encouraging.Classification performance was quite good,considering the complexity of the task.References[1]A.Azarbayejani and A.Pentland.Recursive estimation of motion,structure,and focal lenght.PAMI ,17(6),1995.[2]A.Baumberg and D.Hogg.Learning flexible models from image sequences.ECCV ,1994.[3]M.Black and Y Yacoob.Tracking and recognizing rigid and non-rigid facial motion using local parametric models of im-age motion.ICCV ,1995.[4]A.Bobick and J Davis.An appearance-based representation of action.ICPR ,1996.[5]K.Bradshaw,I.Reid,and D.Murray.The active recovery of 3d motion trajectories and their use in prediction.PAMI ,19(3),1997.[6]C.Bregler.Tracking people with twists and exponential maps.CVPR98,1998.[7]R.Chellappa T.Broida.Estimating the kinematics and struc-ture of a rigid object from a sequence of monocular images.PAMI ,13(6):497-513,1991.[8]T.Darrell and A.Pentland.Classifying hand gestures with a view-based distributed representation.NIPS ,1994.[9]J.Davis and A.F.Bobick.The representation and recognition of human movement using temporal templates.CVPR ,1997.[10]A.Dempster,ird,and D.Rubin.Maximum like-lihood estimation from incomplete data.J.of the Royal Stat.Soc.(B),39(1),1977.[11]K.Fukunaga.Introduction to Statistical Pattern Recognition .Academic Press,1972.[12]D.Gavrila and L.Davis.Tracking of humans in action:a 3-dmodel-based approac.Proc.ARPA IUE Workshop ,1996.[13]L.Davis I.Haritaouglu,D.Harwood.W4s:A realtime sys-tem for detecting and tracking people in 2.5d.ECCV ,1998.[14]S.Intille and A.F.Bobick.Real time close world tracking.CVPR ,1997.[15]S.Ju,M.Black,and Y .Yacoob.Cardboard people:A param-eterized model of articulated image motion.Proc.Gesture Recognition ,1996.[16]I.Kakadiaris, D.Metaxas,and R.Bajcsy.Active part-decomposition,shape and motion estimation of articulated objects:A physics-based approach.CVPR ,1994.[17]T.Kohonen.Self-organized formation of topologically cor-rect feature maps.Bio.Cybernetics ,43:59-69,1982.[18]A.Pentland and B.Horowitz.Recovery of non-rigid motionand structure.PAMI ,13(7):730–742,1991.[19]R.Polana and R.Nelson.Low level recognition of humanmotion.Proc.IEEE Workshop on Nonrigid and Articulate Motion ,1994.[20]B.Rao,H.Durrant-Whyte,,and J.Sheen.A fully decentral-ized multi-sensor system for tracking and surveillance.IJRR ,12(1),1993.[21]J.M.Regh and T.Kanade.Model-based tracking of self-occluding articulated objects.ICCV ,1995.[22]D.Reynard, A.Wildenberg, A.Blake,and J.Marchant.Learning dynamics of complex motions from image se-quences.ECCV ,1996.[23]K.Rohr.Towards model-based recognition of human move-ments in image sequences.CVGIP:IU ,59(1):94-115,1994.[24]R.Rosales and S.Sclaroff.Improved tracking of multiplehumans with trajectory prediction and occlusion modeling.IEEE CVPR Workshop on the Interp.of Visual Motion ,1998.[25]R.Rosales and S.Sclaroff.Trajectory guided tracking andrecognition actions.TR BU-CS-99-002,Boston U.,1999.[26]M.Shah and R.Jain.Motion-Based Recognition .KluwerAcademic,1997.[27]H.W.Sorenson.Least-squares estimation:From gauss tokalman.IEEE Spectrum ,V ol.7,pp.63-68,1970.[28]R.Szeliski and S.Kang.Recovering 3d shape and motionfrom image streams using non-linear least squares.CVPR ,1993.[29]G.Welch and G.Bishop.An introduction to the kalman fil-ter,.TR 95-041,Computer Science,UNC Chapel Hill,1995.[30]C.Wren, A.Azarbayejani,T.Darrell,and A.Pentland.Pfinder:Real time tracking of the human body.TR 353,MIT Media Lab,1996.。
Monocular Tracking 3D People with Back Constrained Scaled Gaussian Process Latent Variable ModelsJunbiao Pang1,2, Qingming Huang1,2, Shuqiang Jiang21Graduate School of Chinese Academy of Sciences, China2Institute of Computing Technology, Chinese Academy of Sciences, China{jbpang, qmhuang,sqjiang}@AbstractTracking 3D people from monocular video is often poorly constrained. To mitigate this problem, prior information can be exploited. In learning the prior stage, most algorithms think representing high-dimensional pose space in low-dimensional space as dimension reduction procedure, without considering the geometrical relation or time correlation in pose space. Therefore, the prior loses physical constrains in pose space. In this paper, the back constrained scaled Gaussian process latent variable model (back constrained SGPLVM), a novel dimensionality reduction method for learning human poses is proposed. The low-dimensional latent space is optimized for preserving geometrical relation or time correlation in high-dimensional pose space. The learned latent space is a smooth manifold. The smooth latent space will be satisfactory state space for tracking to avoid searching in high-dimensional pose directly. Experiment demonstrates that the back constrained SGPLVM integrated with particle filtering framework can track 3D people accurately and robustly, despite weak and noisy image measurements..Keywords: Particle filtering, Gaussian Process, Human motion analysis1IntroductionHuman motion analysis is important for many applications: video surveillance, gesture analysis, advanced human computer interface, content based video retrieval, etc. Tracking 3D people in monocular video is a fundamental problem for human motion analysis. However, this is a challenging problem because of poor constraints and high dimensionality of the state space. Features extracted form 2D images are subject to self-occlusions, ambiguities, and loose-clothing. These observations contain false detections, ambiguities and do not provided depth information. The estimation of the human poses involves searching in high-dimensional, complex, state space, thus, filtering based tracking algorithm can suffer the curse of dimensionality due to the high-dimensional human pose state space. These make tracking 3D people difficult. Prior information is essential to resolve the poor constraints. Se- Copyright © 2006, NLPR. This paper appeared at the Asia-Pacific Workshop on Visual Information Processing (VIP06), November 7-9, 2006, Beijing, China. Reproduction for academic, not-for profit purposes permitted provided this text is included.Figure 1.an overview of the tacking procedure. First, Obtaining the training data form motion capture data, and applying back constrained SGPLVM to model the latent space; second, given a video, tracking the 3D people with particle filtering in latent space; last, visualizing the predicted 3D pose. arching in the low-dimensional state space avoids curse of dimensionality.We propose to encode the sophisticated human poses with intrinsical low dimensional space. The learned prior mitigates the poor constraints, searching in lowdimen- sional space avoids the curse of dimensionality. Recently, SGPLVM in Grochow et al (2004) and Urtasun et al (2005), has been used to exploit the sophisticated physical constraints of human motion in low-dimensional space. Given training sequences, SGPLVM optimizes the low-dimensional latent space embedding of human pose space. The human poses have strong time correlation or geometrical relation between each other. However, the latent space in SGPLVM does not preserve these relations. Addressing this issue, a mechanism is needed to preserve these relations into low-dimensional space.In this paper, the back constrained SGPLVM, we propose, keeps geometrical relation or time correlation in latent space; thus, more physical constrains are keep into latent space as prior, these constrains make the latent space be a smooth and continuous manifold. During the tracking, the latent space will be satisfactory state space for tracking. Human pose propagates in latent space; the corresponding 3D human pose is projected into image plane for measurement. Experiment demonstrates tracking with back constrained SGPLVM prior is accurate and robust. Figure.1 describes the whole tracking procedure.The rest of the paper is organized as follows: Related work on 3D pose estimation will be discussed, in section 2. The back constrained scaled Gaussian process latent variable model is described in section 3. In section 4, we depict the tracking algorithm with learned prior. In section 5, experimental results are given. Conclusion is made in section 6.2Related workEstimating 3D pose is challenging because some degrees of freedom of the articulated model representing the human body are not observed. A large number of research papers address the topic of 3D human pose estimation. Paper, Gavrila (1999), provides reviews of earlier works in this area, but, in recent years, many other techniques have been proposed and we discuss them in the following.2.1Model-Based TrackingThese approaches presuppose an explicitly known parametric body model. Tracking the human body poses in image sequences earlier used gradient-based methods in Bregler et al (1998) and Demirdjian et al (2001), to track short sequences of human movement. Recently, the sampling-based algorithm has been adopted for human motion tracking by representing the pose estimated with a set of samples in Sminchisescu el al (2001), Deutscher el al (2000), Cham et al (1999), Sidenbladh et al (2000) and Sigal et al (2004). These statistical approaches can overcome some degree of pose ambiguities and could track for longer duration. However, due to the high dimensionality of the state space, the problem of particle degeneracy may still exist, as much of the state space is depleted of state samples. In the presence of significant pose ambiguity, these methods may also fail, unless other data (such as multi-views data) is available.2.2Learning-Based MethodThese approaches consist of estimating the 3D pose by matching features of an input image to those available in a large set of labeled image features containing all of the possible body poses. In these methods, different image features have been used including edges in Grauman et al (2003), silhouette outlines in Rosales et al (2004), and shape context descriptor in Agarwal et al (2004). In Agarwal et al (2004), a regression technique known as the relevance vector machine was proposed. These methods do not require a human body model and initialization. However, these methods require clear background subtraction and a large number of training data which is realistic captured or virtual synthesized from multi-views.2.3Priors-Based TrackingThe learned model as prior aids tracking because the prior consists of complex physical constrains of poses and mitigates the ambiguity.Recent works in Urtasun et al (2005), Li et al (2006) and Tian et al (2005), pay more attention on representation of high-dimensional human poses and motion. Urtasun et al (2005) utilizes scaled Gaussian process latent variable as prior model and a second order Markov model for tracking. Locally linear coordination (LLC) in Li et al (2006) is learned to solve the embedding of high-dimensional space, and the learned model is utilized in Multiple Hypothesis Tracker for tracking 3D people in low dimensional space. Tian et al (2005) employs the GPLVM and a modified CONDENSATION algorithm where samples are drawn from low-dimensional latent space for tracking 2D people. Tracking in low-dimensional space can avoid curse of dimensionality. However, the high-dimensional humanposes have strong time correlation. These above learnedpriors are only dimensional reduction procedure without considering the relation between poses, such as, time correlation or geometric relation.Our work is motivated, in part, by prior-based tracking. Geometrical relation in high dimensional space has beenpreserved in latent space. The back constrained SGPLVMhas advantage over other dimensionality reduction technique for keeping more geometrical constrains or time correlation of high-dimensional space.3Back Constrained Scaled Gaussian ProcessLatent Variable ModelBack constrained SGPLVM is a latent variable model,which preserves the geometric relation of poses space inlatent space. The back constrained SGPLVM is a versionof Gaussian process latent variable model (GPLVM) in Lawrrence (2004). GPLVM is the dual problem of probabilistic PCA (PPCA) in Tipping et al (1999).GPLVM models the prior knowledge as Gaussian Processin the form of kernel matrix, and marginalizes the parameters in PPCA as well as optimizes latent variablewith maximization likelihood methods3.1SGPLVMSGPLVM is GPLVM with scale for each dimension data.Due to different data dimensions have different intrinsicscales, a small change in some data affects other data dramatically. For instance, a small change in globalrotation of the human model affects the pose much morethan a small change in the wrist angle. We only describethe basic mechanism of SGPLVM here.Let1,...,[]TNY y y≡denotes a N D× matrix. ,1,...,iy i N=is one of the training poses from MoCap data withD dimension. Under the Gaussian process model, the conditional density distribution [12]:21222||1(|,)exp()2(2)||TDN DWp Y X S tr K YW YKπ−⎛⎞=−⎜⎟⎝⎠(1)Where1{,,,}S Wαβγ= is the unknown kernel parametersset.1[,..,]TNX x x=denotes a N q×matrix, ,1,...,ix i N= ispoints in latent space, with q dimension, usually q D.Everyix represents one of poseiy in low dimensionalspace.1(,...,)DW diag w w≡is the diagonal matrix containing scaling factor for each data dimension andK is the kernel matrix. The elements of kernel matrix aregiven by a kernel function, (,)ij i jK k x x=. Kernel matrixis the core of Gaussian process. The prior knowledge ismodeled in kernel matrix. The Radial Basis Function(RBF) kernel is usually used for nonlinear mapping.211,(,)exp(||||)2i j i j i jk x x x xγαδβ−=−−+ (2)Where,i jδ is the Kronecker delta function.Figure2. Visualization of latent space for walking cycle sequences. (a) shows 2D latent points modeled by SGPLVM. (b) shows 2D latent points modeled by back constrained SGPLVM. (c, d) show the 3D latent points modeled by back constrained SGPLVM form side and top views. (e, f) show 3D latent points modeled by SGPLVM form side and top views.The model simultaneously provides a smooth mapping from latent space to poses space and represents poses in latent space. 3.2 Back Constraints for Data RepresentationIn prior-based tracking, the prior knowledge should encode the relations between poses. We exploit the geometrical relation in poses data and represent the relation in latent space. In dimension reduction, it is common that preserving the geometrical relation between points in the original data. For instance, LLE (local linear embedding) in Roweis et al (2000), Isomap, and spectral embedding techniques in Belkin et al (2001) all have this characteristic. Lawrence (2006) proposes back constraints from high-dimensional space to low-dimensional space forkeeping geometrical relation. The basic idea of backconstraints is encoding relation in high-dimensional space into latent space. Back constrained function mapping :g y x →is introduced to adding geometrical relation into latent space.(;)nj j n x g y w = (3) Where n y is the pose data, nj x is the latent points.w is the function parameters. This function “passes” the relation between poses into latent space, for instance, time correlation or geometrical relation.For computation conveniently, we select the funct- ionn j g to be kernel based regression function.1(;)(,)Nj n jm n m m g y A a k y y ==∑ (4)Where 1,1{}jn j q n N A a ≤≤≤≤=is the back constraints matrix, and the kernel matrix is2(,)exp(||||)2n m n m k y y y y γ=−− (5)Where 2γis the inverse width parameter. We determine2γ empirically.The relation between back constraints matrix A and latent space X is determined by following equations.11[,...,][(,),..,(,)],1,...,n n nq Tn n N x x x k y y k y y A n N==⋅= (6)3.3 Learning the Model ParametersLearning back constrained GPLVM entails estimating the kernel parameters 1{,,}αβγand back constraintsmatrix {}ij A a =and scaled matrix {}ij W w =. Following Lawrence (2006), we maximize theposterior 1({},,,,{}|{})ij ij i p a w y αβγ, which corresponds to minimizing the following negative likelihood functionwith Scaled Conjugate Gradients. The initialized latent points can be obtained by probabilistic PCA. 221111ln ||||||ln222T k k k i N k ik k a D L K w Y K Y x w βγ−=+++∏∑∑ (7)The derivation of L with respect to parameters will bedescribed in appendix A.To speed up the training, the kernel matrix K is only learned on a subset of training data. This selected subset is called active set . The active set can be considered as sparse representation of training data. The process of selecting the active set is described in Lawrence (2003). The training algorithm will be described in appendix A as well. 3.4 Model Results Figures 2 shows results learned from motion capture data (from CMU MoCap database). The latent space is learned for walking cycle sequence. The human model wasparameterized with 10 selected joint angles with 459 training poses. In each case, before minimizing L , theglobal translation of human model is removed. Proba-bilistic PCA is used to obtain the initial latent points foriteration optimization. The hyperparameters are initially set to 1 with 32110γ−=×. The negative log posterior L was minimized using Scaled Conjugate Gradients.4 Monocular TrackingThe basic idea of tracking 3D people is that integrating the learned model into particle filtering framework. Let 1(,...,)t t Z z z ≡denotes the measurement when tracking. The complete body pose is controlled by a state vector (,)t t t s P x =. t P represents the global position of the body, and t x is the point in latent space. We manually initialize the 3D position t P and 3D pose 0y . Given 0y , the initial latent points 0x can be obtained by minimizing the following likelihood function in Grochow et al (2004) and Mackay (1998).222||(())||1(,)ln ()||||2()22i i i i i i i W y f x D L x y x x x σσ−=++ (8) Where 1()()T i i f x Y K x −= (9)21()(,)()()i i i i x k x x k x K k x σ−=− (10)K is the kernel matrix for active set, ()i k x is a vector inwhich the i-th entry contains (,)i k x x . During the optimization, 2||(())||W y f x −term tries to keep i y close tothe corresponding prediction ()f x , and 2ln ()x σterm tries to keep the i x value close to the training data. The21||||2x is a penalty term. During the tracking, the learned model provides prior of poses, the latent space is the state space instead of pose space. We model the state dynamics as first-order auto-regressive dynamics: 1,(0,)t t t t x Ax Cv v N +=+Σ∼ (11) We select the ,,A C Σempirically, A is 0.01, and C is 0.01, and Σis 0.01. The tracking algorithm can be described in particle filtering framework in Arulampalam (2000). The detail tracking algorithm will be described in figure 3.To compute the measurement for the current prediction and the input video frame, we use the same method proposed by Sigal (2004). First the silhouette of the current video frame is extracted through background subtraction, and the chamfer matching cost between the projected model and image silhouettes is considered to be proportional to the negative log-likelihood. 5 ExperimentsIn this section, we describe the experimental setting and the tracking results.5.1 Test Setting and Ground TruthThe proposed algorithm has been tested in tracking walking humans. The test set, calibration information and ground truth are obtained from Balan et al (2005). The testset has four i mage sequences, captured by four synchron-Figure3. The detail monocular tracking 3D people algorithmized cameras from different viewpoints. The image sequences show that a person walks cycle with variable speed. For tracking, the challenging of the test set is largemotion, motion blur, loose clothing and self-occlusions. The image sequences have been widely utilized as test set in Li et al (2006), Balan et al (2005) and Sigal et al (2004). We only use one of sequences for test, and calibrationinformation is utilized to project 3D model into imageplane. In the experiments, we maintain the 2 dimension latent pace and 200 particles for our algorithm. We use the same human model proposed by Sigal et al (2004), which consists of a group of cylinders. One training data form motion capture data has 1900 frames with 28 freedoms. The training data is used for learning prior by SGPLVMand back constrained SGPLVM, respectively. We compare our algorithm with annealed particle filtering in Deutscher et al (2003) because the annealed particle filtering has good performance in tracking 3D people. We set annealed particle filtering with 5 layers and 200 particles per layer as well as four synchronized views simultaneously for measurement.5.2 Tracking Results and DiscussionsWe compare above tracking algorithm on a 3.0 GHzmachine with 512 MB RAM. We develop the system on Matlab and do not optimize the code specially. SGPLVM and back constrained SGPLVM tracking algorithm take approximate 11 seconds per frame, because we sample thesame particle number. The annealed particle filtering takes approximate 40 seconds per frame. Figure 4 shows selected frames for comparing the trackingFigure.4. selected frames of tracking results with different prior. The 3D cylinder human model is projected into image plane. Upper row: tracking result with back constrained SGPLVM. Lower row: tracking result with SGPLVM.results. The upper row shows the tracking result with back constrained SGPLVM as prior, and the lower row depicts the result with SGPLVM as prior. At 102 frame in Figure.3, SGPLVM as prior underestimates left wrist position. At 104 frame in Figure.3, SGPLVM as prior underestimates the right wrist and right leg position. SGPLVM prior does not preserve the poses relation, or time correlation, and this default may cause above error in tracking. At 200 frame in Figure.3, SGPLVM as prior loses track of the left wrist. The broken trajectory in SGPLVM latent space may be its weakness. The large motion and continuous poses can not be represented by broken trajectory. In contrast, back constrained SGPLVM has continuous manifold in latent space which leads to temporal smoothness for tracking in latent space; hence, tracking with back constrained SGPLVM prior is more robust and reliable.A qualitative comparison will be carried out. Tracking with back constrained SGPLVM prior is compared with annealed particle filtering in Deutscher et al (2003). We compute the RMSE, as compute mean square root of sum of squared deviations from the ground truth for every frame. More specifically.RMSE=Where M is the number of freedoms, i j is the estimated i-th freedom,ij is the corresponding freedom in ground truth. Figure. 5 shows tracking results for two algorithms. As can be seen in the figure.4, tracking with back constrained SGPLVM can obtain the better results as annealed particle filtering, but, the annealed particle filtering uses 4 synchronized views for measurement. At some frames,our algorithm has better performance, such as, from 100Figure. 5 RMSE of the two algorithms.frame to 140 frame, from 180 frame to 200 frame. Elbows and curs has large motion, and self-occlusion and motion blur, therefore, we select right elbow joint and right knee joint to test accuracy of our algorithm. Figure.6 shows the accuracy of the representative joints. The maximum absolute deviation with ground truth is 0.38 degree in knee joint, 0.50 degree in elbow joint, respectively.Figure. 6 Comparing the tracking results with ground truth. Left: comparing right elbow joint; Right: comparing rightknee joint.Figure.6 some tracking results with back constrained SGPLVM prior.Visit: /en/project/mrhomepage/En_demo.htm#humanpose for video.Figure.6 illustrates some tracking result with back constrained SGPLVM prior. The tracking algorithm canreliably track more than 200 frames. In some frame, as thethird frame form left to right in lower row in Figure6, theprojection of 3D cylinder human model can not match thepeople perfectly. Global translation error may cause it.More different speed of training date may mitigate thissituation.6 ConclusionsWe have presented the back constrained SGPLVM to learn prior for tracking 3D people and have shown that the prior can be used effectively for monocular 3D tracking. The experiments demonstrate that smooth and continuous trajectory in latent space is suitable for tracking 3D people. It is shown that our algorithm is capable to handling self-occlusion, motion-blur. Further work will focus on dynamic updating the learned model and easier initialization for tracking.AcknowledgementThis work was supported by the NEC Research China and“Science 100 plan” of Chinese Academy of Sciences, as well as supported by Beijing Natural Science Fundation: 4063041. We thank A.Balan, L.Sigal and M.Black forproviding the motion capture data set and code forperformance comparison and thank N. Lawrence for providing the prototype code of GPLVM. Appendix The gradients of L can be computed with the help of thefollowing derivatives, along with the chain rule:111T T LK WYY W K dK K−−−∂=−∂ ()(,)i j i j iKx x k x x x γ∂=−−∂ 1,,[(,),...,(,)]Ti i i N i j i jx A k y y k y y a a ∂∂=⋅∂∂ 2exp(||||)2i j K x x γα∂=−−∂ ,i j Kδβ∂=∂211||||(,)2i j i j K x x k x x γ∂=−−∂ 22222()(())/()()||(())||[]/(2())()T TL f x W W y f x x x xx W y f x D x x xx σσσσ∂∂=−−+∂∂∂−−+∂ 1()()T f x k x Y K x x−∂∂=∂∂ Algorithm 2 summarizes the training steps as following:Algorithm An algorithm for training back constrained SGPLVMInitialise: latent points X through probabilistic PCA.A size for active set, d . A number of iterations, T For T iterations doSelect a new active set.Optimise (7) with respect to the parameters of K using scaled conjugate gradients. Select a new active set.For each point not in active set , j .doOptimise (8) with respect to ,i j a using scaledconjugated gradients.End EndReferencesGavrila, D.M. (1999): The visual analysis of human movement: a survey. Computer Vision and Image Understanding 73(1):2066-2073Sminchisescu, C. and Triggs, B (2001): Covariance scaled sampling for monocular 3D body tracking. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition . 447-454Deutscher, J. Blake, A. and Reid, I (2000): Articulated body motion capture by annealed particle filtering. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition . 126-133Cham, T. J and Rehg. J.M (1999): A multiple hypothesis approach to figure tracking. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition. 239-245 Sidenbladh, H, Black, M. J. and Fleet, D.J (2000): Stochastic tracking of 3D human figures using 2D image motion. In: Proc. European Conf. on Computer Vision. 702-718Tipping, M. E. and Bishop, C. M (1999): Probabilistic principal componet analysis. Journal of the Royal Statistical society : 611-622Lawrence, N.D, Seeger, N. and Herbrich, R. (2003): Fast sparse Gaussian process methods: The informativevector machine. In NIPSGrochow, K. Martion, S. L. Hertzmannn, A. and Propovic, Z. (2004): Style-based inverse kinematics. In: ACM Computer Graphics(SIGGRAPH): 522-531 Balan, A. Sigal, L. and Black, M.(2005): A Quantitative evaluation of video-based 3D person tracking. IEEE workshop on VS_PETS , 349-356Urtasun, R., Fleet, D.J., Hertzmann and A. Fua, P. (2005): Priors for people tracking from small tracking sets. In: Proc. IEEE International Conf. on Computer Vision . 403-410Li, R. Yang, M. H. Sclaroff, S. and Tian, T.P (2006) Monocular tracking of 3D human motion with coordinated mixture of factor analyzers, In: Proc. European Conf. on Computer VisionLawrrence, N.(2004): Gaussian process latent variable models for visualization of high dimensional data. In NIPSMackay D (1998): Introduction to Gaussian processes. In C. Bishop, editor, Neural Networks and Machine Learning NATO ASI Series, 133-166Lawrence, N. Candela. J.Q (2006): Local distance preservation in the GP-LVM through back constraints. In: International Conf. on Machine LearningArulampalam, S., Maskell, S., Gordon, N., Clapp, T (2000): A tutorial on particle filters for on-line non-linear/non-Gaussian Bayesian tracking.Sigal, L., Bhatia, S., Roth, S, Black, M. J, Isard,M (2004): Tracking loose-limbed people. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition : 421-428 Tian, T, Li. R., and Sclaroff, S. Articulated pose estimation in a learned smooth space of feasible solutions. CVPR Learning workshop .:2005Roweis, S. and Saul, L.(2000): Nonlinear dimensionality reduction by locally linear embedding. Science ,290:2323-2326Belkin, M. and Niyogi, P.(2001) Laplacian eigenmaps and spectral techniques for embedding and clustering. NIPS 585-591Bregler, C and Malik, J (1998): Tracking People with twists and exponential maps. CVPR, 8-15Demirdjian, Q. and Faugeras, O (2001): 3D Articulated models and multi-view tracking with physical forses. CVIU , (81),328-357Grauman, K. Shaknarovich, G. and Darrell, T (2003): Inferring 3D structure with a statistical image-based shape model. ICCV ,641-648Rosales, R. Siddiqui, M. Alon, J and Sclaroff, S (2001) Estimating 3D body pose using uncalibrated cameras. CVPR , 821-827Agarwal, A. and Triggs, B (2004): 3D human pose from silhouettes by relevance vector regression. CVPR , 882-888。