当前位置：文档之家› Eyes Segmentation Applied to Gaze Direction and Vigilance Estimation

Eyes Segmentation Applied to Gaze Direction and Vigilance Estimation

Eyes Segmentation Applied to Gaze Direction and

Vigilance Estimation

Zakia Hammal,Corentin Massot,Guillermo Bedoya,Alice Caplier

Laboratoy of Images and Signals

Grenoble,38031Cedex,France

zakia hammal@yahoo.fr,https://www.doczj.com/doc/059064867.html,@lis.inpg.fr Abstract.An ef?cient algorithm to iris segmentation and its application to auto-

matic and non-intrusive gaze tracking and vigilance estimation is presented and

discussed.A luminance gradient technique is used to?t the irises from face im-

ages.A robust preprocessing which mimics the human retina is used in such a

way that a robust system to luminance variations is obtained and contrast en-

hancement is achieved.The validation of the proposed algorithm is experimen-

tally demonstrated by using three well-known test databases:the FERET database,

the Yale database and the Cohn-Kanade database.Experimental results con?rm

the effectiveness and the robustness of the proposed approach to be applied suc-

cessfully in gaze direction and vigilance estimation.

1Introduction

During the past two decades a considerable scienti?c effort has been devoted to un-derstand the human vision.Since early works on the visual process study([1]),many promising applications have considered the eyes movements as well as the vigilance characteristics as behavioral information(see[2],[3],[4],[5]),in order to develop so-phisticated human-machine interfaces.In recent years,the evolution of the user inter-faces for computer systems have been producing a signi?cant impact since they are ori-ented to the development of intelligent multi-modal interfaces.The key idea for those systems is to make the communication/interaction with machines more intuitive.

We can de?ne the human-machine interaction improvement from two perspectives: at the communication level with the analysis of gaze direction,and at the interpretation level of the user state with the analysis of vigilance.In this paper,an ef?cient algorithm is proposed to achieve the automatic irises boundaries segmentation which is used for gaze direction and vigilance estimation.In order to evaluate the performances of the proposed system,the gaze direction estimator has been compared to other existing sys-tems([13]).

In spite of the fact that iris boundaries automatic segmentation is a well-known problem,we propose a new and ef?cient solution as an alternative to the usual meth-ods.Provided that many approaches fail when luminance conditions are variable,a robust preprocessing?lter which mimic the human retina is used,in such a way that a system tolerant to variable illumination conditions is obtained.A luminance gradient-based technique is used for iris segmentation,obtaining a robust?tting of the irises from face images.Additionally,the developed system requires only one single digital

camera,contrary to most existing systems,which have to use more sophisticated and expensive equipments,such as infrared cameras,multiple digital cameras or optical zooming ([3],[4],[5]).Finally,our approach does not need any manual initialization nor any learning step,so that a fully automatic system is presented.The developed eyes segmentation algorithm has been intensively tested using three different test databases:the FERET database [9](static images of faces),the YALE database [10](static images of expressive faces)and the Cohn-Kanade database [11](facial expressions sequences).The paper is organized as follows:Section 2introduces the robust ?lter preprocess-ing and the proposed segmentation approach.Some results are described and discussed.In Section 3,the use of the developed algorithm to gaze direction and vigilance estima-tion is presented.The interest of our approach to be used in real applications is also demonstrated.

2Iris Segmentation

Many real-world applications require accurate and real time iris segmentation,and a lot of scienti?c effort has been dedicated to this ?eld.In this work we are interested in iris segmentation in a frame acquired with a single digital camera.We start with the localization of the face in the ?rst frame of a given sequence.Face extraction is under the scope of this paper and we use the MPISearch algorithm [6]based on the work of Viola and Jones [7].This algorithm extracts a square bounding box around the face (Figure 2left).Then it is automatically tracked by block matching for the rest of the sequence.

2.1Retinal Pre?ltering

The video sequences are acquired in non-constraint illumination conditions.In order to avoid problems with luminance variations,a pre-?ltering stage is applied to each frame using a model of the retinal processing [8].The variations of illumination are smoothed by a ?lter that uses a non-linear adaptation stage and a multi-stage combination of low and high spatial frequencies.This yields to an enhancement of the contour and at the same time to a local correction of luminance variations (Figure

1).

Contrast compression

Fig.1.Retinal preprocessing.A luminance compression is ?rst applied on the frame.Then a Gaussian ?lter (size =15pixels,σ=2pixels)realizes a local averaging which is combined and compressed leading to a contrast enhancement.A ?nal combination gives the pre-?ltered frame.

a)b)c)

Fig.2.a)image with a dissymetrical illumination(the lite comes from the right of the face)and example of face extraction;b)image?ltered by the retinal?lter;c)evolution of the NFLG during the scanning of the search area of the iris corresponding to the upper left square of the rectangle surrounding the face.

2.2Iris Segmentation Algorithm

Iris contour is the frontier between the dark area of iris and the eye white.This contour is supposed to be a circle made of points of maximum of luminance gradient.Since the eyes could be slightly closed,the upper part of the iris could be occluded.Then,for each iris,we look for the lower part of the iris circle.Each iris semi-circle maximizes the normalized?ow of luminance gradient(NF LG t)de?ned at time t as:

NF LG t=

length(C sc)

p∈C sc

?→

?I t(p).?→n(p)(1)

where I t(p)is the luminance at point p and at time t,n(p)is the normal to the boundary at point p and C sc is the boundary of the lower semi-circle.The NF LG t is normalized by the length of C sc.

In order to select the semi-circle which maximizes the NF LG t,several candidates scanning the search area of each iris are tested.The size of the iris is taken proportion-nal to the head size given by the face detection algorithm([6]).The search area for each iris is limited to the upper right or upper left part of the face bounding box(Figure2 right).This division of the face has been determined after a study of400images of face (ORL database[12]).Even in the case of a rotated face,if the irises are both visible,the validity of the proportions is still true.

Figure3shows different iris segmentation results by using different test databases: the?rst row shows the robustness of our algorithm to different facial expressions,while the second row shows respectively the detection in case of spectacles,of bad illumina-tion conditions,inclination of the face and vertical rotation.

3Automatic System for Gaze Direction and Vigilance Estimation 3.1Gaze direction estimation

We present an automatic approach to estimate the gaze direction estimation of the user in front of a computer screen.The proposed method shows an accurate detection of the position of the iris in the face.It emerges as one promising alternative to the existing

Fig.3.Results of segmentation of the iris on the Yale database,the Cohn-Kanade database and from sequences acquired at our laboratory.

systems([13]),which usually require a device and/or impose acquisition and calibration constraints that are awkward to use.Some of the additional advantages of our method are that it uses a commercially available video acquisition system made of a single camera(e.g.,a webcam)placed above or below the screen(Figure4).Our system makes the assumption that the head is kept?xed,but this assumption can be removed using a head pose tracking system.

Geometrical Model and Approximation of the Projection Function We have to de?ne the projection function,which establishes the relationship between the position of the iris center in an image and its projection on the screen.

Fig.4.Left:overview of the system and geometrical model;middle:calibration con?guration; right:scale con?guration.

We de?ne the following geometrical model(Figure4left).O is the center of the screen;(A,B)represents the height of the screen(a similar model can be done for the width);C is the center of the iris;H is the orthogonal projection of C and is the center of the reference screen plane;αis the angle between CH and CO and represents the position of the user relatively to the screen;x is the coordinate on AB of the point?xed by the user;the angles between CO and Cx is notedθx,between CO and CB is noted θ1and between CO and CA is notedθ2.In order to?nd the analytical formulation of

the projection function,we express the anglesαandθx obtaining the following set of equations:

AO=OB=AH+HO=1(convention)

cos(α)=CH/CO

cos(α+θx)=CX/CH(2)

1=CO2+CA2?2cos(θ2)CO?CA

1=CO2+CB2?2cos(θ1)CO?CB

4=CA2+CB2?2cos(θ1+θ2)CA?CB

The resolution of the set of equations(2)leads to:

HX=CH?tan(α+θx)(3) Equation3shows that the estimation of gaze direction depends on the distance to the screen CH and on the angleα.Figure5left presents the variation of the distance Hx with a?xedαand different values of CH.The analysis of these curves points out the fact that for CH>=CH min the projection function tends to be linear.Fig-ure5right presents the comparison between the projection function and its equivalent linear approximation.We estimate that taking CH min=3.OA,i.e.,≈30cm and α<=αmax=10o,the projection function is linear.These values correspond to usual work conditions justifying the use of a linear approximation for the projection function (Figure5right).

Fig.5.Evolution of the projection function in relation to CH(left)and linear approximation (right).

Implementation.The screen is divided into four parts around the screen center O(Fig-ure4middle).According to the position C of the user in front of the screen(CO is not necessary perpendicular to the screen,Figure4right),a better precision can be ob-tained,if we ensure differences on the scales of each part at each coordinate.A speci?c scale is thus associated to each part.The step scale corresponds to the distance covered on the screen for one unit of movement of the iris(e.g.1unit=1pixel).For example, scale1of Figure4right has to be greater than scale2.It is because a displacement of one pixel of the iris center in the image plane will be greater in the top right part of the screen than in the top left part.

Gaze Direction Estimation.We calculate the gaze direction using the projection of the iris centre position on the screen.The system just takes into account the spatial displacement of the iris in the face and not any velocity information.The process is divided into two steps:the calibration which automatically initializes the system and the detection itself.

a)Calibration.The calibration consists in automatically de?ning the x and y scales

for each part of the screen.For this,?ve points are necessary:the center and the four corners of the screen(Figure4middle).In the calibration stage,the user has to be in front of the screen at a normal distance(50cm to80cm),while a camera is lo-cated above or under the screen.When the calibration procedure begins,?ve points appear dynamically on the screen one by one beginning by the center and?nishing at each corner.The user has only to follow them with his eyes.The major dif?culty is that human cannot precisely?x a given position without any dispersion.In order to overcome this problem,each point of calibration appear on the screen during1 second.The histogram of the positions is analyzed and four extreme positions(two according to the x coordinate and two,to the y coordinate)that have been?xed the most often are used for the calibration.

b)Detection of Gaze Direction.For each image,the right and the left iris center

positions are extracted.Each center is related to one point of the screen by the pro-jection process.To eliminate transitory positions(occurring during the movement of the eye towards its new position),we introduce the notion of?xation which is

a region?xed during several frames(6frames at25frames/second).At the end of

the sequence we obtain what we call a”?xation map”,which represents the whole set of?xations.The?xations can overlap,so we associate at each position a value corresponding to the number of times that this position has been?xed.

The gaze position of the user on the screen corresponds to the barycenter of the set of points that belong to the same?xation region obtained in the?xation map.In addition to the spatial position,the system automatically computes the chronological order of the?xations.The?nal result consists in recovering the whole set of barycenters of the regions of?xation and the order in which they appeared during the sequence(number associated)(Figure6right,Figure7).

Precision Measures to Determine the System Performance.In order to evaluate the precision of our gaze direction estimation,an experimental setup,consisting in a grid of18black points plotted on the screen(Figure6left)is used to estimate the user gaze position during the?xation of these points.The camera is placed under the screen (1024x768),considering that the usual work conditions in front of a computer screen, vary in a range between50cm to80cm.The subject was asked to sequencially?x the black points and this experiment is carried out10times.We obtain a mean precision of 0.8o which means that the?xations are accurately detected.Figure6left presents the result of the estimation of the user gaze direction on the grid de?ned before.The white circles represent the estimated user mean?xation position for each?xed black point. These results are very satisfactory in the context of human-computer interaction.

One of the potential application is the study of the strategy of a document exploration, like in?gure6right which presents the analysis of a geographical map.The system

is able to detect the user attention during the exploration.Each point represents the position of the gaze position on the map and its temporal apparition.

010020030040050060070080090010000100200300400500600700

800

123456781234

678

100200300400500600700800900

500600

70080090010001100Fig.6.Left :grid made of black points corresponding to the different positions used to estimate the precision (the size is 1024x768pixels and corresponds to the whole screen);white circles represent the results of the user gaze detection on the grid;right :analysis of the exploration strategy of a geographical map;the points are situated on the really observed countries with their chronological order.

Comparison with a Commercial System.We present two kinds of comparison re-sults:a ?xation map and a trajectory map.In the ?xation map,the user has to ?x each icon presented in the image in a free order.On Figure 7left,each point represents one ?xation (barycenter of a region of ?xation).The associated number indicates the order in which the different icons have been looked at by the user.The video sequences are acquired by a camera with a frame rate of 25frames per second.They are acquired in usual work conditions in front of a computer screen with the head kept ?xed.

In order to evaluate the performances of our detection system,we have made the same experiments as the infra-red detector Eye-Link system (a commercial eye tracker [13])(Figure 7right).The comparison of the detection obtained by both systems points out a similar quality of our results.

In Figure 8we aim at rebuilding the ocular trajectory of a user.He has to follow the edges of a drawn house.Our results (Figure 8left)have been compared with those obtained with the Eye-Link (Figure 8right).Our trajectory is composed of the whole set of points corresponding to the gaze locations,contrary to the Eye-Link one which only connects some ?xations points.This explains that the Eye-Link trajectory is more rectilinear.

Fig.7.Left:?xation map with our system;right:?xation map with Eye-Link.

Fig.8.Left:trajectory map with our system;right trajectory map with Eye-Link.

3.2Vigilance estimation

To analyze the vigilance,different sequences have been acquired in our laboratory.The video sequences are acquired by a standard camera at25frames per seconds and dur-ing10minutes.The subjects were asked to simulate three states:normal blinking,fast blinking and drowsiness respectively.

Blink detection.A blink corresponds to the transition of the state of each eye from open eye to closed eye.The open or closed state of each eye is related to the presence or the absence of an iris.The automatic detection of the eye state is based on the analysis of the NF LG t and the study of the normalized quantity of luminance(NQL t)which

is de?ned by the relation:

NQL t=

p∈S sc

I t(p)/nbr

sup p∈S

I t(p)

(4)

where I t(p)is the luminance at pixel p at time t,S sc is the surface of the candidate semi-circle and nbr the number of its points.Let NF LG m and NQL m be the mean values of the NF LG t and the NQL t for the open eyes.These values are computed on a temporal window with a width?t and situated between t??t and t.?t corresponds to the time needed for one blink at normal blinking frequency.The evaluation of NF LG m and NQL m at different time t allows the system to re-adapt itself to varying conditions of acquisition(like change in conditions of illumination).At time t,the maximum of NF LG m and the minimum of NQL m overall the already estimated values are com-puted.Indeed NF LG t depends on the degree of opening of the eye(Figure9right) and taking the maximum value ensures to consider the NF LG t of the highest opening of the eye.On the contrary,as the iris is a dark area NQL t decreases with the opening of the eye(Figure9left)and taking the minimum value ensures to consider the NQL t of the highest opening of the eye.At frame t,eyes are detected as open if the following relations are satis?ed:

(NF LG t≥max(NF LG m)?c NF LG(5) and(NQL t≤min(NQL m)?c NQL))

When the eyes start closing,the iris semi-circle is less and less visible so that NF LG t is decreasing.Once the eyes are closed,the value of NF LG t is inferior to the maximum of NF LG m multiplied by a coef?cient c NF LG(?rst condition for closed

eyes).Sometimes the value NF LG t along the selected semi-circle when the eyes are closed does not check the?rst condition because of the lashes:the semi-circle coincides with the lashes which are made of points of maximum gradient of luminance(frontier between the skin and the lashes).As a result,the NF LG t along the selected semi-circle is superior or equal to the de?ned threshold.For this reason,we add the normalized mean value of luminance of the surface of the semi-circle.If the eyes are closed,NQL t is higher or equal to the minimum of NQL m multiplied by a coef?cient c NQL(second condition for closed eyes)because the surface of the selected semi-circle corresponds to a clear area(area of eyelids)in the case of closed eye,instead of a dark area(area of iris)in the case of open eye.The coef?cients c NF LG and c NQL are taken,so that an

eye is considered open if more than1

3of the semi circle is visible(Figure10?rst row

second column)and is considered closed otherwise.These coef?cients are computed as the mean on ten subjects of the ratio between their maximum and minimum value of NF LG t and of NQL t.

Figure9shows the temporal evolution of NF LG t and NQL t and the results after thresholding.On these curves,there are two blinkings:the?rst one is very quick and the second one occurs during several frames(it might correspond to a short sleeping).

Fig.9.Top left:temporal evolution of NQL t and threshold(dashed line);top right,temporal evolution of NF LG t and threshold(dashed line);bottom left,eye state after NQL t threshold-ing;bottom right,eye state after NF LG t thresholding(0stands for closed eye and1stands for open eye).

Estimation of the Vigilance.We estimate the vigilance or interest level of a user by the evaluation of the frequency of blinking.In case of”normal”vigilance level,the blinking frequency evaluated on our sequences is in average18blinks per minute.These values are coherent with the values found in the medical literature(12to20per minute).To estimate the level of vigilance,the blink frequency is computed at each time t inside a temporal window?t analyzing the last6seconds(period of one blink at a normal blinking frequency).The detection of one or two blinks corresponds to a normal blink-ing frequency.If the frequency is higher,the ratio between the duration of the open eye states and the closed eye states indicates whether it is a case of fast blinking(eyes are found open two times longer)or a case of drowsiness(eyes are found closed two times longer).If the frequency is lower,then the same ratio indicates wether it is a case of normal blinking(the eyes are kept open during all the temporal window)or a case of drowsiness(the eyes are kept closed).

Figure10shows four frames taken from one of the sequences described below and examples of detection of the eyes state and of the vigilance level.

Fig.10.Estimation of the vigilance.For each?gure:left:current image;middle,top:past evo-lution of the eyes states,bottom:past evolution of the blinking frequency;right,top:current detected eye state,bottom:current vigilance level.

4Conclusion

In this paper,we have presented an ef?cient algorithm for iris segmentation.The results are accurate and robust to luminance variations.We have presented two associated ap-plications in human-machine interaction domain:an accurate and low constrained gaze direction estimation system working with a single commercially available video acqui-sition system.Secondly,we presented an analysis of the frequency of the blink,which can be used in the detection of the vigilance level of the user.Preliminary results and comparative experiments with existing systems show the interest and the robustness of the proposed approach.

References

1.Buswell,G.T.:How people look at pictures.University of Chicago Press(1920)

2.Zhu,Z.,Qiang,Q.:Robust Real-Time Eye Detection and Tracking and Variable Lighting Con-

ditions and Various Face https://www.doczj.com/doc/059064867.html,puter Vision and Image Understanding.98(2005) 124–154

3.Hansen,D.W.,Pece,A.E.C.:Eye Tracking in the https://www.doczj.com/doc/059064867.html,puter Vision and Image Under-

standing.98(2005)236–274

4.Noureddin,B.,Lawrence,P.D.,Man,C.F.:A Non-Contact for Tracking gaze in a Human-

Computer https://www.doczj.com/doc/059064867.html,puter Vision and Image Understanding.98(2005)52–82

5.Wang,J.,Sung,E.,Venkateswarlu,R.:Estimating the eye gaze from one https://www.doczj.com/doc/059064867.html,puter Vision

and Image Understanding.98(2005)83–103

6.MPIsearch algorithm for face detection https://www.doczj.com/doc/059064867.html,/grants/project1/free-

software/MPTWebSite/introductionframe.html

7.Viola,P.,Jones,M.:Robust real time face detection.International Journal of Computer Vision

57(2)(2004)137–154

8.Beaudot,W.:The neural information in the vertebra retina:a melting pot of ideas for arti?cial

vision.PhD thesis,tirf laboratory,Grenoble,France.(1994)

9.https://www.doczj.com/doc/059064867.html,/iad/humanid/feret/feret master.html

10.https://www.doczj.com/doc/059064867.html,/projects/yalefaces/yalefaces.html

11.https://www.doczj.com/doc/059064867.html,/idb/html/face/facial expression/

12.https://www.doczj.com/doc/059064867.html,:pub/data/att faces.zip

https://www.doczj.com/doc/059064867.html,mercial website:https://www.doczj.com/doc/059064867.html,/