当前位置:文档之家› 手势识别OpenCV

手势识别OpenCV

手势识别OpenCV
手势识别OpenCV

The chains model for detecting parts by their context
{leonid.karlinsky,michael.dinerstein,danny.harari,shimon.ullman}@weizmann.ac.il
Leonid Karlinsky
Weizmann Institute of Science, Rehovot 76100, Israel
(a) (b)
Michael Dinerstein
Daniel Harari
Shimon Ullman
Detecting an object part relies on two sources of information - the appearance of the part itself, and the context supplied by surrounding parts. In this paper we consider problems in which a target part cannot be recognized reliably using its own appearance, such as detecting lowresolution hands, and must be recognized using the context of surrounding parts. We develop the ‘chains model’ which can locate parts of interest in a robust and precise manner, even when the surrounding context is highly variable and deformable. In the proposed model, the relation between context features and the target part is modeled in a non-parametric manner using an ensemble of feature chains leading from parts in the context to the detection target. The method uses the configuration of the features in the image directly rather than through fitting an articulated 3-D model of the object. In addition, the chains are composable, meaning that new chains observed in the test image can be composed of sub-chains seen during training. Consequently, the model is capable of handling object poses which are infrequent, even non-existent, during training. We test the approach in different settings, including object parts detection, as well as complete object detection. The results show the advantages of the chains model for detecting and localizing parts of complex deformable objects.
Abstract
(b)
(a)
Figure 1. Overview of the proposed approach. (a) The goal is to detect object parts (e.g., hand, hoof) of deformable objects (e.g., person, horse) in still images, when the parts of interest are not recognizable on their own; (b) During testing, the inference graph of the chains model is constructed for each test image and used to compute the chains model posterior for the detected part. The edges are color-coded according to their weights (red=high). (c) Star marks the global maximum of the posterior.
(c)
(c)
Two main sources of information can be used for detecting a part of interest of an object: the appearance of the part itself, and the context supplied by surrounding parts. In the current paper we focus on detecting the location of a part of a deformable object using the context supplied by other parts of the object (denoted ‘internal context’ below). We also show that the same approach can be used to detect complete objects using the internal context of their parts. The detection of object parts and objects are interrelated tasks, since object parts can be used for detecting the objects, and the internal context of the object can be used to detect parts. Below, we briefly review existing recognition systems from 1
1. Introduction
the point of view of how they specify the relative locations of object parts, and their ability to detect a part of interest specified to the system, using the surrounding context. Existing approaches can be divided into several groups depending on the way they model geometric relationships between parts. The simplest methods do not explicitly model feature geometry; for example, the so called bagof-features approaches, such as [25, 3]. Another popular approach models feature geometry by a star model that allows each object part to have a region of tolerance relative to some object reference point (star center) [16, 8, 13]. The constellation model [9] allows more flexible joint Gaussian relationships between object parts. Finally, there are models that try to capture more complex feature geometry, such as feature hierarchy [6, 23], articulated spring models [7, 21, 10, 2, 14, 5] and level-set methods [22]. In terms of object and part localization, the bag-of-features methods provide only a rough estimate of object location as a cluster of relevant features. The star and the constellation models provide more specific object and part localization, but they are particularly suitable for semi-rigid objects and they do not focus on the detection of specific parts of interest. Feature hierarchies provide more flexible geometry, but are much more complex to learn, and in current approaches

[6, 23] are still restricted to semi-rigid objects. Finally, articulated models and level-set methods assume a known apriori model of object deformation (with the exception of [21]). Such models may encounter problems when the deformations are complex and only partially covered by the training examples. Moreover, most existing approaches (except [3]) train a relatively small feature codebook (e.g., obtained through quantization of randomly sampled patch descriptors), and all geometric approaches train a fixed model for the spatial arrangement of these features. One problem with using a feature codebook and a fixed spatial model is that they are trained to fit well the majority of the training examples, while infrequent object poses in the training set are usually underrepresented. This is reflected both in the lack of features for these poses, as well as in the spatial model that does not fit well the infrequent poses. This problem becomes particularly significant when the task is the detection of parts of a highly deformable object, such as hand detection, since many of the possible object poses are infrequent in the training set. The goal of the proposed approach is to obtain precise localization of parts of highly deformable objects. An example goal would be detecting the hand location in images of people captured from a distance or in a pose that does not allow detecting the hand reliably based on its own appearance. The approach is different from past approaches in several respects. First, the model focuses on detecting parts of interest, whereas many recognition models focus on the detection of the object itself, and ambiguous parts may be missed during recognition. Second, the way it captures geometric relations between features is neither through a semirigid model, such as a star or a constellation, nor through an articulated model fitted to the majority of the training set. Instead, we propose a new non-parametric probabilistic model (the chains model) and its associated inference algorithm, which can efficiently handle complex non-rigid feature configurations. This is obtained by using multiple feature chains leading from potential source features to the target part. Third, instead of using a relatively small feature codebook, all of the relevant features extracted from the training images are retained and efficiently used during the testing. Consequently, the model is able to represent information from all training images, including images containing rare poses. Fourth, all relative pair-wise spatial configurations of the relevant features extracted from the training images are retained and used to build the geometric model for the features on the test image. As a result, even rare poses are not under-represented, both in terms of the features as well as in terms of their geometric model. An overview of the approach is given in figure 1. The remainder of the paper is organized as follows. Section 2 describes the proposed approach and its implementation details. Section 3 describes the experimental valida-
tion. Summary and discussion are provided in section 4. This section describes the proposed ‘chains model’, the associated feature selection approach and the implementation details of the method. For clarity, the hand detection task will be used for illustrating the proposed method. In addition, we describe the use of the method for full object detection in section 2.3. Intuitively, the chains model describes an assembly of feature chains of arbitrary length that start at some known reference point on the object, such as an automatically detected face, and terminate at the detection target, such as the hand. The chains model is a generative model for the set of features extracted from the test image. It generates some of the features by creating a feature chain between the source (face) and the target (hand) parts. The rest of the features are generated independently out of the ’world’ distribution, i.e. marginal distribution over all images. During inference, we marginalize over all possible feature chains between the face and the hand. This is shown to be equivalent to summing over simple paths in a graph connecting the features extracted from the test image, with edge weights that are related to the so-called suspicious coincidence between the neighboring features. We show how we can approximate the solution of the inference problem by a simple computation using the adjacency matrix of the constructed graph. In the algorithm, the method for estimating different empirical probabilities of the features is similar to the one used by [3], and is based on Kernel Density Estimation (KDE) over neighbors computed using the Approximate Nearest Neighbor (ANN) technique. The details of this computation are described in section 2.5. For each image In (training or test), the face location is detected using an off-the-shelf method (we used [13]). Def note the location of the detected face by f = xf , yn n n and its horizontal size of by sn . All other distances, offsets and patch sizes are computed relative to sn , eliminating the dependence on the scale of the person. We extract a dense set of features centered on all edge points of In (detected by [18]) sampled with a spacing of min(0.2 sn , 4) pixels. Denote the set of extracted features by Fn = j j j fn = SIF Tn , Xn |j = 1, 2, . . . , kn , where kn is the j total number of features extracted from image In , SIF Tn is the SIFT descriptor (128-D row vector) [17] of feature j from image In , computed for sn × sn sized patch cenj tered at location Xn (2-D row vector). For training images only, we also denote the marked hand location in image In h by h = xh , yn . For training and quantitative evaluation, n n ground truth hand locations were semi-automatically iden-
2. Method
2.1. Feature extraction

Lh
(a)
F T ( 3) F
T ( 2)
(b)
Lf
Fk Fj
f N
As we found in our experiments, this feature selection criterion favors features whose presence correlates with the target part location (e.g., many arm features were automatically selected for hand detection). The probabilities used to compute Λj are estimated as described in section 2.5. n Model definition. The chains model is a generative model for the features extracted from the test image IN . The observed variables of the model are the extracted features j j j and the detected FN = F j = fN = SIF TN , XN
Fm
F T (1)
(c)
Lh
2.3. The chains model
M,T
Fl
Aj
Fj
Figure 2. (a) Chains model applied for part detection. The unobserved variable Lh is the location of the target part (e.g. hand). Lf = f is the observed location of the reference part (e.g. face) N in image IN . The edges symbolize the chains ‘feature graph’ j constructed over the set of observed features F j = fN . Unobserved variables T and M (affecting all the nodes) select a simple path T (red) of length M on the graph. Features not on this path are generated from their respective ‘world’ distributions. During inference we marginalize over T and M , summing over paths on the graph going from the reference part to each candidate location of the target part. (b) Chains model applied for full object detection as an extended star model. (c) Star model. Unobserved binary variables Aj (one per feature), control whether the feature is generated with respect to the star center or from its ‘world’ distribution.
Chains model
Star model
j=1,…,kn
f nj
face (source part) location Lf = f . The upper index N j of the features is according to some arbitrary order of the feature extractor (the order is immaterial for the subsequent computation). The unobserved variables are the location of the hand (target part) denoted by Lh , an arbitrary length M and an ordered list of feature indices denoted by T = T (1) , . . . , T (M ) , where 1 ≤ T (i) ≤ kN . kN is the number of features extracted from image IN . The list T defines a simple chain of features (i.e., without repetitions). The joint distribution of all these variables is taken to be: PCH M, T, Lh , P Lh fN P fN
T (1) T (M ) f N, j fN
= P (M, T )
T (m+1)
M 1 m=1
P fN
fN
T (m)
tified in some of the hand detection experiments. In these experiments a colored glove was used for fast and simple hand tracking, using a color blob tracker with manual initialization. Although color was used to obtain the ground truth, both training and testing of the chains model were performed on grayscale images in all of the experiments. In addition, experiments on external datasets (of [5] and [10]) verified the proposed approach on people without gloves.
f N
P
f N
j PW fN j ∈T /
(1)
The training stage proceeds as follows. The algorithm receives a set of training frames in which a person performs various hand movements. At each frame, both the hand and the face locations are marked by a single point each, as described in Section 2.1. No motion information is used. The feature selection procedure then retains 100 features with highest likelihood ratio from each training imj age. The ratio Λj for feature fn in image In is computed as: n j P (Lh = h ,F j =fn ,Lf = f ) Λj = P Lh = n ,F j =f j ,Lf = n where the random variables f n h ( n n) n h f L and L designate the hand and face locations respectively, and F j is a random variable for the j-th observed feature. Throughout the paper we will use a shorthand form of variable assignments, e.g., P f instead of P Lf = f . n n
2.2. Feature selection for target (hand) detection
j PW fN is the ‘world’ probability for generating the fea-
In this expression, along the chain we use the conditional probability of each feature given its chain predecessor, and for features outside the chain we take the independent product of their ‘world probabilities’. The distribution P (M, T ) is taken to be uniform over simple chains of length up to M = 4 and such that locations of neighboring chain features are neither too close nor too far apart (i.e for any T (m+1) T (m) 1 ≤ m < M : 1 s N ≤ XN XN ≤ 3 sN ). 2 2
j j ture fN , i.e., the probability to observe fN either on the obj ject or in the background. PW fN is used to generate all T (1)
f of the features that lie outside the chain T . P fN N is the conditional probability for transition from the face to T (M ) the first feature on the chain, and P Lh fN is the probability of transition from the last feature on the chain to the hand. All of these probabilities will be explicitly defined in section 2.5. We next divide the joint PCH by the j term f j ∈FN PW fN , which is fixed for a given image N IN , thus rewriting the joint in an equivalent form as:

PCH M, T, L , . . . ∝ P (M, T )
h T (M ) fN M 1 m=1
h
P fN
T (1)
,
f N
PW f N , fN
T (1)
(2)
P L
P fN PW f N
T (m+1)
T (m) T (m)
T (m+1)
. The chains model is schematiity: P cally illustrated in figure 2a. Inference. The goal of the inference is to compute the posterior distribution over the hand locations:
T (m+1) T (m) fN , fN
Here, P fN
T (m)
is the marginal of the pairwise probabil-
P fN
PCH Lh |
f N,
j fN

PCH M, T, Lh ,
M,T
f N,
j fN
To compute this posterior, we define a directed weighted j ‘feature graph’ GN . The nodes of GN are fN , and edges
j k X N XN
j k connect all pairs of features fN and fN for which 1 sN ≤ 2 j k from node fN to node fN is wjk =

3 2 sN .
The weight on the directed edge
j k P (fN ,fN ) j k PW (fN )P (fN )
similar to ‘suspicious coincidence’). The adjacency matrix AN of GN is a matrix in which entry jk is equal to wjk if j k the directed edge from fN to fN exists in GN and is zero otherwise. Using this notation, the marginal over chains of length smaller or equal to M can be computed as: PCH Lh
f N, j fN
(a ratio

PCH M, T, Lh , . . . =
M,T M 1 m=0 m
D C S ≈ D C S, where C =
h 1 fN h k fNN
(AN ) ;
where rectangular brackets denote row vectors. Here S is the source vector containing edge weights between the source part and its neighbors; C is the chains matrix, where entry jk is the sum of products of weights on all simple paths going from node k to node j in the graph GN ; and D is the target vector containing the edge weights between the target part and its neighbors. The matrix C is used as an easily computable approximation of C , obtained by summing over all paths of length ≤ M , and not only over simple paths. The reason to use an approximation is that computing k best simple paths is costly [12], and in our experiments we have found the simple approximation to be sufficient. Entry t jk of the adjacency matrix (AN ) is the sum of the weights
; , ,P L D= P L T k 1 f P fNN , f P fN , l N N , , S= 1 k PW (fN ) PW fNN
of all directed paths of length t going from node k to node j, where the weight of a path is a product of weights of edges on it. Consequently, finding Lh that maximizes eq. 3 is equivalent to max-marginal inference over the posterior of the chains model. Thus the inference in the model is simple and straightforward, involving only a few matrix multiplications. Interestingly, it is also possible to compute in a similar manner the Maximum A-Posteriori (MAP) inference for the chains model, using the observation that √ r ar + br → max (a, b). Raising each entry of D, AN r→∞ and S to a sufficiently large power r, obtaining the respecM 1 m tive √ r , Ar , Sr , and Cr = m=0 (Ar ) , and maximizD N N ing r Dr Cr Sr becomes equivalent to a MAP inference for the chains model. We have experimentaly observed that using a combination of MAP and max-marginal √ inference attains the best performance. We maximize D r Cr Sr , where the r-th root here is taken entry-wise. This is equivalent to maximizing a sum over all of the features in terms of their votes for possible hand locations, where each feature vote is determined by the single maximal chain reaching it from the source part. We use this method with r = 10 in all our experiments. Finally, it is also possible to use the same set of selected features for detecting multiple object parts by replacing the vector D with a matrix in which each row corresponds to a part being detected. Star as a special case. Noteworthy, setting C = I in eq. 3 reduces the chains model to a variant of the popular star model illustrated in figure 2c. This variant allows each feature to independently choose to be generated either by the object or the world distributions. The proof of this reduction and the detailed description of the star model variant are beyond the scope of the current paper and are provided in supplementary material. In section 3 we quantitatively show the advantage of the chains model over the star model for highly deformable part detection. The chains model can also be applied with some modifications to the task of full object detection. Similar to part detection, the model can use the test image features to point at the object location, either directly, or via intermediate feature chains (see figure 2b). In this section we describe how to apply the chains model to object detection. In the final discussion we consider the integration of objects and object parts detection within the chains model. For detection at the object level, we eliminate the detection of the chain’s source part, thereby allowing an arbitrary source location. As we no longer have a reference scale (obtained from the source size), the features’ SIFT descriptors are computed for patches of a fixed predefined size (20 × 20 pixels). Instead, multi-scale support is achieved by testing each image at multiple scales. The training stage runs on a set of positive images (with objects roughly aligned in lo-
(3)
2.4. Full object detection

Probability (KDE(V, S))
j PW fN
Query vector (V )
j SIF TN j SIF TN j SIF TN j j SIF TN , αN XN k SIF Tti
Queried set (S)
k SIF Tti |1 ≤ i ≤ R, all k 1,2,9 1,2,9 1,3,9 1,4,5,9
P P P
j fN j O, fn f j N , fN
f N
4
k SIF Tti |1 ≤ i ≤ R, all k , Oti = O m l SIF Tti , SIF Tti , αti selected m and l, 1 sti ≤ 2 k SIF Tti , αti k k SIF Tti , αti Xti f ti m X ti l Xt i
|1 ≤ i ≤ R, selected k
| 1 ≤ i ≤ R, selected k |1 ≤ i ≤ R,
3 s 2 ti 1,4,9
j k P fN , f N
j j k k SIF TN , SIF TN , αN XN XN j SIF TN , αN
4
P
Lh
j , fN h, f j , f n n n h, fj, f n n n
Lh
j XN f n 4
4
h ti
k Xt i f ti
m l Xti X t i ≤
|1 ≤ i ≤ R, selected k | 1 ≤ i ≤ R, all k
1,4,6,7,9
(1) P
1
/ (2) P Lh = P Lh = /
2 3 4 5 6 7 8 9
j j k k k k A selected feature fti is a valid neighbour of a query feature fN iff fN is within ≤ θti from fti , where the threshold θti is such that
‘all k’ - all of the features extracted from the training images; ‘selected k’ - only the selected features; ‘selected m and l’ - pairs of selected features. For both PW fN
h , fk , f ti ti ti j k computed using only neighbors that passed θti j
j j SIF Tn , αn Xn
k k SIF Tti , αti Xti
1,4,8,9
O is binary and Oti = true means that Iti is a positive training image. To limit the set of features that can vote for the target part, we set P
j
and P fN , the location component of the feature, i.e. XN , is assumed to be distributed uniformly.
j j = 0 when αN XN
j = 0. Features fN without any valid neighbors are not considered.
α = 1/s is a scale factor, where s is the horizontal size of the face in image I . () stands for n, N, ti To limit the set of features reachable directly from the source part, we set P Lh
f j N , fN
For speed, we query using only SIF TN and add the offset component directly to the KDE. Define {It1 , . . . , ItR } denotes the set of the training images. HandDistki t = αti
h ti k Xti αn Lh Xn j
j fN
= 0 when αN
f N
> 4.
Lh
. Restrict to neighbors with
HandDistki ≤ 0.75, for (1) t HandDistki > 0.75, for (2) t
j XN
> 1.5.
cation and scale), and negative background images. In the feature selection stage, features are selected by their logP (O=true,f j ) likelihood ratio defined as: Λj = log P O=f alse,fnj where n ( n) O is a binary variable designating the presence of the object in an image. We retain up to 100 features with the highest positive log-likelihood ratio from each positive training image. By allowing chains to originate at any node (not just k k in f ), i.e. substituting P fn n , f with P fn n in eq. n n 3, detection of the full object and its parts can be obtained without requiring a pre-detected reference part.
Table 1. Computations of all the required model probabilities using ANN queries. A query for a vector V in a set S returns a set of k neighbors, from which the estimated probability is computed by Gaussian Kernel Density Estimation -KD E(V, S).
We next show how to efficiently compute, from the training data, all of the probabilities used in the model and during the feature selection. These probabilities are computed using Kernel Density Estimation (KDE). Given a set of samples from a distribution of interest {Y1 , . . . , YR }, a Gaussian KDE estimate P (Y ) for the probability of a new sample Y can be approximated as: P (Y ) ∝ ≈ 1 exp Y Yr R r=1 1 R
Yr ∈N N R 2
2.5. Probability estimates by ANN-based KDE
large, brute-force search for the N N set becomes infeasible. Therefore, we use Approximate Nearest Neighbor (ANN) search (in the implementation of [19]) to compute the KDE. For a given training image In or test image IN , table 1 describes the computation of all the required model probabilities. In each case, the computation is based on an ANN query. The query returns a set of K nearest neighbors, and the Gaussian KDE is applied to this set to get the estimated probability. The accuracy of all of the computations depends on the size of the NN set. In our experiments we found that it is sufficient to use K = 25 (which was used throughout). We used σ =
0.5 , 0.35 ,
j k f or P fn , fn otherwise
k j ferent value for P fn , fn is due to the higher dimensionality of the associated ANN query, which leads to accuracy loss of the ANN implementation of [19].
. Dif-
2σ 2
2
exp Y Yr
2σ 2
(4)
where N N is the set of nearest neighbors of Y within the given set of samples. When the number of samples R is
For the task of hand detection, the chains model was compared with two types of models: a star model (described in the supplementary) and state of the art methods specialized for human pose estimation [10, 2, 14, 5]. The star model we used is a high performance method, which gives state of the art results on several standard classification datasets. The comparisons were performed on several datasets including three collected by us, two ‘sign language’ datasets collected by [5], and the ‘Buffy’ dataset collected
3. Results

self trained (ST)
Scenario + Dataset
1 max
self trained (Horse)
person crossgeneralization (PCG)
Table 2. Comparison of hand detection results of the chains model and the star model applied on ST, PCG and Horse datasets in various test scenarios. The columns titled ‘1 max’, ‘2 max’, etc., measure the detection performance within the top k local maxima after non-maximal suppression with the radius of 0.25 face width. In all of the scenarios, the standard deviation of the chains performance is about 4%. In the full generalization scenario the performance of [2] is 38.6%. In the horse hoof detection experiment, the reason for the 63% in the ‘1 max’ vs. 90.3% in ‘2 max’, is that the system had a hard time distinguishing between the front-left and right hoofs. This distinction was found confusing to humans as well.
full generalization (ST)
86.9
63.1 73.2
87
93.9 93.5
2 max
Chains
90.3 86.6
95.2 95.4 90
3 max
93.4
95.9 96.1
4 max
94.6
46.9 57.9
1 max
2 max
Star
50.2
92.2
37.6
74.1 52
61.6
65
75.7 81.9 66.9 68
3 max
81.2 85.5 74.7 74
4 max
5 signers Buffy
BBC news
Setting
1 max 84.9 47 96.8
2 max 92.8 53.9 99
Chains
3 max 99.5 95.4 56.7
4 max 99.6 96.7 59.3
1 max 95.6 -
[5]
86.371 78.951 39.21
1 max
[14]
1 max -
[10]
1 max -
[2]
561,2
46.1
Table 3. Comparison of hand detection results of the chains model and past methods applied on sign language and Buffy datasets. (1) [10, 14] reported average of detections of all 6 upper body parts. We believe this can be considered an upper bound on the hand detection percentage since the method by Andriluka et al. [2] that achieves 46.1% correct lower arm detection obtained 73.5% average detection for all the 6 body parts. (2) Result of [10] was obtained on 88% of frames in the dataset using foreground segmentation, 41% - without using foreground segmentation.
by [10]. Details of the datasets are described in section 3.1. All the compared approaches, except [5], were applied to each frame independently, without using any tracking or motion information. Our method was applied to grayscale versions of the images, while most of the compared approaches [10, 14, 5] use color information. In all of the experiments, the chains model detection of the hand was considered correct if it was within one hand width (1/2 face width) from the ground truth location. On our datasets the hand detection accuracy was evaluated for the right hand only. On all of the external datasets (not created by us), hand detection accuracy was evaluated for both right and left hands. For the external datasets, the compared methods’ results were taken out of the original publications. We also compared with [2] on our own dataset using the original code kindly provided by the authors. The hand detection experiments are described in section 3.1 and the results are summarized in tables 2 and 3. Evaluation of the full object detection was conducted on two datasets: UIUC cars [1] and Weizmann Horses [4]. We compare our performance to the star model (supplementary) and to state of the art methods [20, 16, 15, 11, 24] on these datasets. The experiments are described in section 3.2 and the results are summarized in table 4. Qualitative results for both hand and object detection are shown in figure 3 and in the supplementary material.
Our datasets. For our experiments, we collected three datasets denoted by ‘ST’, ‘PCG’, and ‘Horse’. The ST dataset consists of four movies (about 3000 frames each) of four different people performing various hand movements. The PCG dataset has 12 movies of 12 different people (about 600 frames each). The Horse dataset consists of a single movie of a running horse (1331 frames). For the
3.1. Hand Detection
Horse dataset the ground truth was obtained by manually marking the front left hoof and the head locations. Experiment on the Horse dataset demonstrates that, unlike methods tailored to the human body, exactly the same chains model algorithm (using the same parameters) can be used to detect deformable parts of both humans and animals. The experiments on our datasets were divided into three different test scenarios. The ST and the Horse datasets were used for testing in the ‘self-trained’ scenario. The first 10001500 frames of each movie sequence in the dataset were used for training and the rest were used for testing. The PCG dataset was used to test cross-generalization between different people in different clothing. The experiment was performed in a leave-one-out manner: each of the twelve movie sequences present in the dataset was used for testing while the remaining sequences were used for training. Finally, the ‘full-generalization’ scenario included training on all frames of the PCG dataset, and testing on the ST dataset training frames (having the widest variety of poses). This scenario tested both the generalization to new people and clothes, as well as to new unseen background. Two of the four people in ST dataset were present (in different clothing) in the PCG dataset and were excluded from the PCG training in their respective tests. The method of [2] attained 38.5% hand detection rate on the ST dataset. Its performance is to be compared with the results of the ‘full-generalization’ scenario of our method since it was not trained on any of our training data. Although [2] was not given the detected face locations, it detected the face correctly in all but a very few test frames. Sign language datasets. These two datasets include a ‘BBC news’ dataset containing a full BBC news sequence with a single signer, and a ‘5-signers’ dataset - a collection of frames from 5 news sequences, with different signers. Both datasets were collected by [5]. On the BBC news dataset the experiment was performed in a ‘self-trained’ fashion. We have manually marked the ground truth hand locations for the first 3115 frames. The first 600 frames

(a) (b)
Method Result
Method Result
99.9% 95.7% [24]
[20]
97.5% 93.9% [11]
[16]
[11, 15] 98.5%
Chains model (UIUC / Caltech-101 trained) 99.5% / 93.4% 87.8 Star model 3%
Star model (UIUC / Caltech-101 trained) 98% / 89.2%
Chains model 97.5 1%
were used for training and the remaining 2515 frames were used for testing. This is compatible with the experiment of [5], who also trained and tested on the same sequence. Notably, [5] used color and tracking cues, and quantitatively tested only on 296 frames. On the 5-signers dataset, the experiment was in the ‘full-generalization’ mode. All 195 frames of the 5-signers dataset were used for testing the model trained on the first 600 frames of the BBC news dataset. We consider our setting to be more challenging than the one used in [14], who trained and tested on the same signers in their 5-signers dataset experiment. Buffy dataset. The ‘Buffy’ dataset was collected by [10]. It contains frames selected from 4 episodes of the popular series ‘Buffy the Vampire Slayer’. The methods we compared with used frames from the first 3 episodes for testing (308 images) and the 4th episode is reserved for training. In our experiment on this dataset, we have trained the chains model on our PCG dataset plus 5 additional short movie sequences of standing people we shot in our lab. The reason for using additional movies is that the PCG dataset contained only sitting people, while ‘Buffy’ dataset consists mostly of standing people. Ground truth face locations were used. We do not consider this to be a significant advantage since the top competing method [2] achieved 96% correct head detection and thereby mostly had the correct head locations as well. Since most of the Buffy frames are very dark and of low contrast, histogram equalization was applied to all of the frames. Car Detection. The UIUC car dataset contains side-view images with one or more cars. Using 550 positive and 500 negative training images and 200 cars in 170 test images of the UIUC single scale dataset, the chains model achieved the recall of 99.5% at precision-recall Equal Error Rate (EER). In addition, to test cross-dataset generalization, we replaced the original UIUC training set with Caltech-101 cars, still achieving a competitive recall of 93.4% at EER. Horse Detection. The Weizmann Horses dataset consists of 323 side-on horse images in natural environments. The horses vary in breeds, textures, and articulations. The chains model achieved 97.5 ± 1% recall at EER computed by a 3-fold cross-validation (with 500 negative images).
Table 4. Comparison between different methods by precisionrecall at EER: (a) UIUC car testset; (b) Weizmann Horses dataset, the larger advantage of the chains may be related to fact that horses are somewhat more flexible than rigid 3-D objects, such as cars.
3.2. Full object detection
Figure 3. Qualitative examples, see supplementary for more examples and movies. Demonstrates hand detection for people in various environments, poses and clothing. Object detection is illustrated by horse detection, chains posterior is overlayed on the image (red = high posterior), star marks the global maximum.
The chains model described in this paper locates parts of interest in a robust and precise manner, even when the surrounding context is highly variable and deformable. Three properties of the chains model make it particularly useful for this task. First, evidence is combined in a flexible and comprehensive manner: for a given part, information from all other parts which can supply evidence regarding the part’s location, either directly or indirectly via other parts,
4. Discussion

is used. Second, the computation efficiently combines evidence in parallel from a large number of chains and subchains. Finally, it is a non-parametric model, in which the required probabilities are derived for a new image using all the information stored during training. Moreover, the probabilities are estimated using nearest neighbors, which can handle effectively feature configurations that are relatively infrequent in the training data. Another useful property of the model is that it uses image features directly, without relying on domain-specific parametric models, such as models that use articulated skeletons of humans and animals. Recognition at the full object level is naturally included in the scheme, since chains in the model lead to a target, which can be either a part or the full object. At the object level, the chains model can be viewed as an extended star model, and consequently the performance in object detection is competitive with similar top-performing schemes in the domains tested. The chains model can then use partial detection obtained during the first recognition phase as sources for the recognition of additional, difficult to detect, parts, embedded in deformable context. The detection of ambiguous parts by the chains model can also be naturally combined with other recognition schemes: partial results obtained by other recognition schemes can be augmented by serving as sources for disambiguating chains. The training of the chains model for the part detection task requires training images with labeled reference and target parts. In practice, these can often be obtained automatically; for example, for hand detection, we used a face detector and a colored glove tracking. More generally, the model can be trained to generalize from cases in which the target part is recognizable on its own, to cases in which it is not. Regarding future directions, the chains model may be extended to provide a useful component for scene recognition at multiple levels: chains can connect parts and objects, as well as related objects in a scene, providing a vehicle for the use of context in the recognition of scenes, objects and parts. Finally, it would be interesting to extend the chains model to deal with video input by developing spatiotemporal chains, traversing features within a frame as well as across nearby frames in a video sequence.
References
[1] S. Agarwal, A. Awan, and D. Roth. Learning to detect objects in images via a sparse, part-based representation. PAMI, 2004. [2] M. Andriluka, S. Roth, and B. Schiele. Pictorial structures revisited: People detection and articulated pose estimation. CVPR, 2009. [3] O. Boiman, E. Shechtman, and M. Irani. In defense of nearest-neighbor based image classification. CVPR, 2008. [4] E. Borenstein and S. Ullman. Class-specific, top-down segmentation. ECCV, 2002.
[5] P. Buehler, M. Everingham, D. Huttenlocher, and A. Zisserman. Long term arm and hand tracking for continuous sign language tv broadcasts. BMVC, 2008. [6] B. Epshtein and S. Ullman. Semantic hierarchies for recognizing objects and parts. CVPR, 2007. [7] P. Felzenszwalb and D. Huttenlocher. Efficient matching of pictorial structures. CVPR, 2000. [8] P. Felzenszwalb, D. McAllester, and D. Ramanan. A discriminatively trained, multiscale, deformable part model. CVPR, 2008. [9] R. Fergus, P. Perona, and A. Zisserman. Object class recognition by unsupervised scale-invariant learning. CVPR, 2003. [10] V. Ferrari, M. Marin, and A. Zisserman. Progressive search space reduction for human pose estimation. CVPR, 2008. [11] J. Gall and V. Lempitsky. Class-specific hough forests for object detection. CVPR, 2009. [12] J. Hershberger, M. Maxely, and S. Suri. Finding the k shortest simple paths: A new algorithm and its implementation. ACM Trans. Algorithms, 2007. [13] L. Karlinsky, M. Dinerstein, D. Levi, and S. Ullman. Unsupervised classification and part localization by consistency amplification. ECCV, 2008. [14] M. Kumar, A. Zisserman, and P. Torr. Efficient discriminative learning of parts-based models. ICCV, 2009. [15] C. Lampert, M. Blaschko, and T. Hofmann. Beyond sliding windows: Object localization by efficient subwindow search. CVPR, 2008. [16] B. Leibe, A. Leonardis, and B. Schiele. Robust object detection with interleaved categorization and segmentation. IJCV, 2008. [17] D. Lowe. Distinctive image features from scale-invariant keypoints. IJCV, 2004. [18] D. R. Martin, C. C. Fowlkes, and J. Malik. Learning to detect natural image boundaries using local brightness, color, and texture cues. PAMI, 2004. [19] D. Mount and S. Arya. Ann: A library for approximate nearest neighbor searching. CGC 2nd Annual Workshop on Comp. Geometry, 1997. [20] J. Mutch and D. Lowe. Multiclass object recognition with sparse, localized features. CVPR, 2006. [21] D. Ramanan, D. A. Forsyth, and K. Barnard. Building models of animals from video. PAMI, 2006. [22] T. Schoenemann and D. Cremers. Matching non-rigidly deformable shapes across images: A globally optimal solution. CVPR, 2008. [23] T. Serre, L. Wolf, and T. Poggio. Object recognition with features inspired by visual cortex. CVPR, 2005. [24] J. Shotton, A. Blake, and R. Cipolla. Multiscale categorical object recognition using contour fragments. PAMI, 2008. [25] J. Sivic, B. C. Russell, A. A. Efros, A. Zisserman, and W. T. Freeman. Discovering objects and their localization in images. ICCV, 2005.

基于OpenCV识别库的面部图像识别系统的设计

基于OpenCV识别库的面部图像识别系统的设计 本系统采用J2EE技术并以OpenCV开源计算机视觉库技术为基础,实现一套具有身份验证功能的面部图像识别信息管理系统。系统使用MySQL数据库提供数据支撑,依托于J2EE的稳定性和Java平台的可移植性使得本系统可以在各个操作系统平台中运行,同时提供在互联网中使用面部识别技术的一套较为完备的解决方案。 标签:OpenCV;人脸识别;生物学特征 引言 随着信息技术的飞速发展以及互联网的深入普及,越来越多的行业和领域使用信息技术产品以提高工作效率和管理水平。但是由于人们隐私信息的保护意识薄弱,出现了许多信息安全的问题。在人们对于信息安全越来越重视的情况下,许多技术被应用到信息安全领域中来。较为先进的技术有虹膜识别技术、遗传基因识别技术以及指纹识别技术等。而论文采用的是当前热点的面部图像识别技术。 1 系统实现算法及功能分析 1.1 面部图像的生物学特征模型的建立 本系统是利用面部图形的生物学特征来识别不同的人。由于每个人的面部图像都有各自的特征但又具有一定的通性,需要应用生物学中相关知识加以解决。可以利用已有的生物学测量手段以及现有的算法构建人的面部图像生物学特征模型(简称:面部模型),并应用于系统中,面部模型的建立为面部图像识别的功能提供实现依据。 1.2 知识特征库及面部识别引擎的建立 在前述面部模型建立完成后,需要建立相应的知识库以及面部识别引擎方可进行身份的识别。可经过大量数据的采集和分析后建立知识库,并根据知识库的特点建立相应的识别引擎。此识别引擎对外开放,在本系统中提供其它外来程序的调用接口,其它系统能够通过本接口实现识别引擎的调用实现对于面部图形的识别,从而达到识别引擎的可复用性。在技术条件允许的情况下,提供知识库的智能训练以及半自动构建支持。 1.3 面部图像的采集与预处理 本系统中采用了预留API接口,利用USB图形捕获设备采集数据图像。经过USB设备的捕获,使用JMF(Java Media Framework)来处理已捕获的图像数据,对捕获的图像进行面部图行检测和实时定位跟踪。

基于OpenCv的图像识别

基于2DPCA的人脸识别算法研究 摘要 人脸识别技术是对图像和视频中的人脸进行检测和定位的一门模式识别技术,包含位置、大小、个数和形态等人脸图像的所有信息。由于近年来计算机技术的飞速发展,为人脸识别技术的广泛应用提供了可能,所以图像处理技术被广泛应用了各种领域。该技术具有广阔的前景,如今已有大量的研究人员专注于人脸识别技术的开发。本文的主要工作内容如下: 1)介绍了人脸识别技术的基础知识,包括该技术的应用、背景、研究方向以及 目前研究该技术的困难,并对人脸识别系统的运行过程以及运行平台作了简单的介绍。 2)预处理工作是在原始0RL人脸库上进行的。在图像的预处理阶段,经过了图 象的颜色处理,图像的几何归一化,图像的均衡化和图象的灰度归一化四个过程。所有人脸图像通过上述处理后,就可以在一定程度上减小光照、背景等一些外在因素的不利影响。 3)介绍了目前主流的一些人脸检测算法,本文采用并详细叙述了Adaboost人脸 检测算法。Adaboost算法首先需要创建人脸图像的训练样本,再通过对样本的训练,得到的级联分类器就可以对人脸进行检测。 4)本文介绍了基于PCA算法的人脸特征点提取,并在PCA算法的基础上应用了 改进型的2DPCA算法,对两者的性能进行了对比,得出后者的准确度和实时性均大于前者,最后将Adaboost人脸检测算法和2DPCA算法结合,不仅能大幅度降低识别时间,而且还相互补充,有效的提高了识别率。 关键词:人脸识别 2DPCA 特征提取人脸检测

2DPCA Face Recognition Algorithm Based on The Research Abstract:Face recognition is a technology to detect and locate human face in an image or video streams,Including location, size, shape, number and other information of human face in an image or video streams.Due to the rapid development of computer operation speed makes the image processing technology has been widely applied in many fields in recent years. This paper's work has the following several aspects: 1)Explained the background, research scope and method of face recognition,and introduced the theoretical method of face recognition field in general. 2)The pretreatments work is based on the original ORL face database. In the image preprocessing stage, there are the color of the image processing, image geometric normalization, image equalization and image gray scale normalization four parts. After united processing, the face image is standard, which can eliminate the adverse effects of some external factors. 3)All kinds of face detection algorithm is introduced, and detailed describing the Adaboost algorithm for face detection. Through the Adaboost algorithm to create a training sample,then Training the samples of face image,and obtaining the cascade classifier to detect human face. 4)This paper introduces the facial feature points extraction based on PCA ,and 2DPCA is used on the basis of the PCA as a improved algorithm.Performance is compared between the two, it is concluds that the real time and accuracy of the latter is greater than the former.Finally the Adaboost face detection algorithm and 2DPCA are combined, which not only can greatly reduce the recognition time, but also complement each other, effectively improve the recognition rate. Key words:Face recognition 2DPCA Feature extraction Face detection

基于opencv对图像的预处理

基于opencv 对图像的预处理 1.问题描述 本次设计是基于opencv 结合c++语言实现的对图像的预处理,opencv 是用于开发实时的图像处理、计算机视觉及模式识别程序;其中图像的预处理也就是利用opencv 对图像进行简单的编辑操作;例如对图像的对比度、亮度、饱和度进行调节,同时还可以对图像进行缩放和旋转,这些都是图像预处理简单的处理方法;首先通过opencv 加载一幅原型图像,显示出来;设置五个滑动控制按钮,当拖动按钮时,对比度、亮度、饱和度的大小也会随之改变,也可以通过同样的方式调节缩放的比例和旋转的角度,来控制图像,对图像进行处理,显示出符合调节要求的图像,进行对比观察他们的之间的变化。 2.模块划分 此次设计的模块分为五个模块,滑动控制模块、对比度和亮度调节模块、饱和度调节模块、缩放调节模块、旋转调节模块,他们之间的关系如下所示: 图一、各个模块关系图 调用 调用 调用 调用 滑动控制模块 对比度和亮度调节模块 饱和度调节模块 缩放调节模块 旋转调节模块

滑动控制模块处于主函数之中,是整个设计的核心部分,通过createTrackbar创建五个滑动控制按钮并且调用每个模块实现对图像相应的调节。 3.算法设计 (1)滑动控制: 滑动控制是整个设计的核心部分,通过创建滑动控制按钮调节大小来改变相应的数据,进行调用函数实现对图像的编辑,滑动控制是利用createTrackbar(),函数中包括了滑动控制的名称,滑动控制显示在什么窗口上,滑动变量的地址和它调节的最大围,以及每个控制按钮应该调用什么函数实现什么功能; (2)对比度和亮度的调节: 对比度和亮度的调节的原理是依照线性理论,它的公式如下所示:g(x)=a* f(x) +b,其中f(x)表示源图像的像素,g(x)表示输出图像的像素,参数a(需要满足a>0)被称为增益(gain),常常被用来控制图像的对比度,参数b通常被称为偏置(bias),常常被用来控制图像的亮度; (3)饱和度的调节: 饱和度调节利用cvCvtColor( src_image, dst_image, CV_BGR2HSV )将RGB 颜色空间转换为HSV颜色空间,其中“H=Hue”表示色调,“S=Saturation”表示饱和度,“V=Value ”表示纯度;所以饱和度的调节只需要调节S的大小,H 和V的值不需要做任何的改变; (4)旋转的调节: 旋转是以某参考点为圆心,将图像的个点(x,y)围绕圆心转动一个逆时针角度θ,变为新的坐标(x1,y1),x1=rcos(α+θ),y1=rsin(α+θ),其中r是图像的极径,α是图像与水平的坐标的角度的大小; (5)缩放的调节: 首先得到源图像的宽度x和高度y,变换后新的图像的宽度和高度分别为x1和y1,x1=x*f,y1=y*f,其中f是缩放因子; 4.函数功能描述 (1)主函数main()用来设置滑动控制按钮,当鼠标拖动按钮可以得到相应的数据大小,实现手动控制的功能,当鼠标拖动对比度和亮度调节是,主函数调用

最新基于OpenCV与深度学习框架的物体图像识别

基于OpenCV与深度学习框架Caffe的物体图像识别 摘要:本文主要介绍深度神经网络中的卷积神经的相关理论与技术。研究采用OpenCV深度学习模块DNN与深度学习框架Caffe进行物体识别。采用OpenCV 中的DNN模块加载深度学习框架Caffe模型文件,对物体图像进行识别。实验结果表明,卷积神经网络在物体的识别方面具有较高的准确率。 一.概述 1.1 OpenCV简介 OpenCV于1999年由Intel建立,如今由Willow Garage提供支持。OpenCV 是一个基于BSD许可(开源)发行的跨平台计算机视觉库,可以运行在Linux、Windows和Mac OS操作系统上。它轻量级而且高效——由一系列C 函数和少量C++ 类构成,同时提供了Python、Ruby、MATLAB等语言的接口,实现了图像处理和计算机视觉方面的很多通用算法。其最新版本是3.2,于2016年12月23日发布。OpenCV致力于真实世界的实时应用,通过优化的C代码的编写对其执行速度带来了可观的提升,并且可以通过购买Intel的IPP高性能多媒体函数库(Integrated Performance Primitives)得到更快的处理速度。在其最新版3.2版本中,已经添加了深度神经网络模块,并支持深度学习框架Caffe模型(Caffe framework models)。 1.2 深度学习框架Caffe简介 Caffe(Convolutional Architecture for Fast Feature Embedding)是一个清晰而高效的深度学习框架,其作者是博士毕业于UC Berkeley的贾扬清,曾在Google 工作,现任Facebook研究科学家。Caffe是纯粹的C++/CUDA架构,支持命令行、Python和MATLAB接口;可以在CPU和GPU直接无缝切换。Caffe的优势

毕业设计:基于OpenCV的人脸识别算法(终稿)-精品

安徽工业大工商学院 毕业学士论文 基于OpenCV的人脸识别算法 姓名:陈滔 申请学位级别:学士专业:测控技术与仪器 指导教师:方挺

摘要 人脸在社会交往中扮演着十分重要的角色,是人类在确定一个人身份时所采用的最普通的生物特征,研究人脸跟踪识别及其相关技术具有十分重要的理论价值和应用价值。彩色图像序列的人脸检测、跟踪与识别技术是随着计算机技术的高速发展和视频监控等应用的需要在近几年才逐渐成为一个研究热点。本文着重构建一套人脸跟踪识别系统,致力于精确实时地对彩色视频中的人脸图像检测跟踪,并可以将跟踪到的人脸图片传输到识别端进行身份识别。系统分为客户端和服务器两部分。针对传统Camshifl跟踪算法进行形态学处理、分配多个跟踪器等改进后的算法应用于客户端进行多人脸的跟踪。服务器端首先将人脸图像按其主要特征进行分块,再对分块图执行Eigenface算法实现人脸身份的识别。这套系统完成了对多人脸的跟踪效果,可广泛的应用于各种安防系统之中如:ATM机监控系统,门禁系统等。

Abstract Human face is 0111"primary focus of attention in social intercourse playingamajor rolei conveying dentity and emotion.Researchonthe face tracking,recognition technology has great theoreticaland practical value.This paper focusesOilbuildingasetofhumanface recognition and trackingsystem tocommitted toaccurate and real-timecolorvideoimages,andcalltransmit the tracked human face image to the recognition part to identify the person’S status.Thesystem is divided into client and server parts.Thetracking algorithm whichcarrieson morphology processing after traditional track algorithm Camshifl and assignments severaltrackingdevices is applied to the client for duplex facetracking.Theserver—side first divides the person face image into blocksaccording to its chief feature,then the blocksuses the Eigenfacealgorithm separately to realize the person’S status recognition.The system implementation for multiple face trackingcallbe widelyused among the various security systems,suchas:ATM machine monitoring system,accesscontrol system.Keywords:Face DetectionFace TrackingFace Recognition Eigenface Camshift

基于OpenCV的图像处理

科技信息 教学中不断探索的课题。 参考文献 [1]陈祝军.分析化学实验专业化初探[J ].安徽职业技术学院学报,2004,3(1):39-40. [2]李祥,黄宁选.无机与分析化学实验课改革探讨[J ].大学化学,2003,18(7):16-18. [3]黄斌.对工学结合人才培养模式若干问题的探究[J ].教育探索, 2008,(3):79-80. [4]沈萍.高职有机化学实验课程改革初探[J ].职业教育研究,2005,(10):143. [5]沈萍.高职有机化学实验课程改革初探[J ].职业教育研究,2005,(10):143. [6]李克安,赵风林.分析化学教学的探讨与实践[J ].大学化学,2003,28(1):26-28. (上接第218页) OpenCV 的全称是Open Source Computer Vision Library ,是一个跨平 台的计算机视觉库。OpenCV 是由英特尔公司发起并参与开发, 以BSD 许可证授权发行,可以在商业和研究领域中免费使用,并且源代码是公开的。OpenCV 由一系列C 函数和少量C++类构成,实现了图像处理和 计算机视觉方面的很多通用算法,可用于开发实时的图像处理、 计算机视觉以及模式识别程序,极大方便了图像和视频处理研究者进行二次开发。 1、OpenCV的历史与特点 OpenCV 主要创作人员是Intel 的Performance Library Team 和Intel 俄国的一些专家,OpenCV 的第一个测试版本于2000年的IEEE 机器视 觉和模式识别会议上公布, 2001年到2005年间又发布了5个测试版本。2006年,OpenCV1.0版本正式发布。2009年10月,OpenCV2.0版本发布,功能大幅度增强。 OpenCV 在计算机视觉领域得到广泛应用,与其突出的优点是密不可分的: ●跨平台,可移植性好,无论Windows 、Linux 还是M ac OS 都可以运行; ●支持大多数C /C++编译器,如:VC6.0、https://www.doczj.com/doc/f93328990.html,2008、https://www.doczj.com/doc/f93328990.html,2005及C++Builder 等,可以轻易在不同平台之间进行移植; ●对个人及商业开发免费,源代码公开;●提供方便灵活的用户接口,采用C 、C++编写,包括300多个C /C++函数,代码效率高;支持中、高层API ;可以使用外部库,也可以独立使用; ●具备强大的图像和矩阵运算能力,减少开发者的工作量,有效提高开发效率和程序运行的可靠性; ●针对Intel 的处理器进行了优化。2、OpenCV的结构和应用 OpenCV 的程序构成了一个完整的系统,并有着自己独特的数据结构。目前OpenCV 主要包含如下几个子库: ●CxCore :该库提供了所有OpenCV 运行时的一些最基本的数据结构,包括矩阵,数组的基本运算,包括出错处理的一些基本函数; ●Cv :图像处理和计算机视觉功能(图像处理、结构分析、运动分析、 物体跟踪、模式识别、摄像机定标);●机器学习:用于分类、回归和数据聚类的类和函数;●CvAux :包括一些三维跟踪、PCA 、HMM 等函数;●HighGUI :用于用户交互(GUI 、图像视频的输入输出)。OpenCV 主要面向图像和视频处理,其主要应用包括:二维和三维 特征提取工具包(2D and 3D feature toolkits)、自运动估计(Egomotion esti-mation )、人脸识别(Face Recognition)、手势识别(Gesture Recognition)、人机 交互(Human-Computer Interface)、 移动机器人(M obile robotics)、运动理解(Motion Understanding)、目标识别(Object Identification)、分割与识别(Seg mentation and Recognition)、 立体视觉(Stereopsis Stereo vision:depth percep-tion from 2cameras)、 运动估计结构(Structure from motion)、运动跟踪(Mo-tion Tracking)等。 另外,OpenCV 包含一些机器学习库,主要有Boosting 、决策树(De-cision Trees)、期望最大化算法(Expectation M aximization)、k-最近邻算法(k-nearest neighbor algorithm)、朴素贝叶斯分类器(Naive Bayes classifier)、 人工神经网络(Artificial neural networks)、随机森林(Random forest)、支持向量机(Support Vector Machine)。 3、利用OpenCV进行人脸检测下面就以人脸检测为例,介绍OpenCV 的应用。人脸检测属于人脸识别(Face Recognition)第一步,人脸大体可以分为以下几种方法:基于神经网络的检测,基于特征的检测,基于颜色的检测等。基于神经网络的检测,需要大量的真假人脸图像来训练神经网络;基于特征的方法是利 用人脸的一些特征,比如眼睛、 鼻子等来检测;基于颜色的方法是通过人脸的颜色如黄色、棕色来检测。 OpenCV 中已经提供了训练好的Haar 级联分类器,使得人脸检测可以很方便的实现。OpenCV 中的Haar 特征的级联表中包含的是boost 分类器。所谓Harr 级联分类器是指由若干个简单boost 分类器级联成的一个大的分类器,被检测的目标依次通过每一个分类器,可以通过所有分类器的即可判定为人脸区域。OpenCV 中的人脸检测主要分为四步,其具体过程如下所示: 利用Harr 级联分类器进行人脸检测的实验图片如下,结果证明利用OpenCV 的Harr 分类器实现人脸检测,速度快、效率高,虽然检测率 不能达到100%, 存在误判现象,但是效果已经很不错了。4、总结 除了OpenCV ,目前比较有代表性的计算机视觉库/软件还有Intel 公司的IPP ,微软公司的visDSK ,MathWorks 公司的Matlab 等。IPP 是Intel 的图像、信号处理的集成开发库,收费并且源代码不公开。Microsoft 公司的visDSK 图象处理库,免费,开源,与OpenCV 功能相似。Matlab 功 能丰富,算法实现方便简单,适合研究、 仿真和演示,但是在开发产品方面效率低,而且软件费用高。 相比之下,跨平台的OpenCV 图象处理库,免费,源代码公开,功能强大,有着突出的优点,使用人数众多,资源丰富,其必将会成为图像视频处理领域的强有力的工具。 参考文献[1]李振伟等.基于OpenCV 的运动目标跟踪及其实现[J ].科学计 算及信息处理, 2008年20期[2]陈健等.一种基于Haar 小波变换的彩色图像人脸检测方法[J ].微计算机信息,2005.1 基于OpenCV 的图像处理 南京信息职业技术学院 阴法明 [摘要]OpenCV 是Intel 公司推出的开源、免费的计算机视觉程序库,利用OpenCV 可以很方便地实现图像和视频处理。本文介绍 了OpenCV 的历史、 特点以及结构和应用,并以人脸检测为例介绍了OpenCV 在图像处理中的应用。[关键词]OpenCV 人脸检测计算机与网络 220——

基于OpenCV的图像轮廓提取方法的实现

基于OpenCV的图像轮廓提取的实现 【摘要】OpenCV是近几年来推出的一个开源的、免费的计算机视觉库.OpenCV的目标是构建一个简单易用的计算机视觉框架,利用它所包含的函数帮助开发人员方便快捷地实现图像处理和视频处理。而图像的轮廓是图像的一种基本特征,携带着一幅图像大部分的信息,经常被应用到较高层次的图像应用中。它在图像分离,图像识别和图像压缩等领域有很广泛的应用,也是图像处理的基础。本文首先阐述了OpenCV的特点和结构,然后采用一系列的可行性算法来获取图像特征参数并通过各种算子(Sobel算子,Laplace算子,Canny算子)对图像的灰度进行分析,调节,用实现对图像的边缘检测和轮廓提取。 【关键词】OpenCV 图像轮廓提取

The realization of the image contour extraction based on OpenCV 【Abstract】OpenCV is launched an open source in recent years.Free computer vision library. OpenCV's goal is to build a simple and easy to use computer vision framework, function to help developers use it contains quick and easy to realize image processing and video processing.And the outline of the image is a basic feature of image, carrying an image, most of the information is often applied to the higher level of the image application.It in image separation, image recognition and image compression, and other fields have a wide range of applications, is also the basis of image processing.At first, this paper expounds the characteristics and structure of OpenCV, then a series of the feasibility of the algorithm is used to obtain image feature parameters and through a variety of operator (Sobel operator, Laplace operator, Canny operator) analysis of image gray level, adjustment, use of image edge detection and contour extraction. 【Key words】OpenCV Image Contour extraction

相关主题
文本预览
相关文档 最新文档