Human Detection and Gesture Recognition Based on Ambient Intelligence
- 格式:pdf
- 大小:251.87 KB
- 文档页数:14
《基于深度学习的静态手势识别算法研究》篇一一、引言随着人工智能技术的不断发展,手势识别技术作为人机交互的重要手段之一,在多个领域都得到了广泛的应用。
而静态手势识别作为手势识别的重要分支,对于智能交互体验的提升有着不可忽视的作用。
本文旨在探讨基于深度学习的静态手势识别算法研究,以提高识别精度和实时性。
二、背景及意义静态手势识别是指对图像中的人体手势进行识别和分类的过程。
传统的静态手势识别方法主要依赖于人工设计的特征提取和分类器,然而这种方法在面对复杂的手势和多样的背景时,识别效果往往不尽如人意。
近年来,随着深度学习技术的不断发展,基于深度学习的静态手势识别算法逐渐成为研究热点。
这种算法可以自动学习图像中的特征,有效提高识别的准确性和鲁棒性,对于人机交互、虚拟现实、智能机器人等领域都具有重要的应用价值。
三、相关技术及理论3.1 深度学习概述深度学习是一种基于神经网络的机器学习方法,通过构建多层神经网络来模拟人脑的神经网络结构,实现对复杂数据的分析和处理。
在静态手势识别中,深度学习可以自动学习图像中的特征,有效提高识别的准确性和鲁棒性。
3.2 卷积神经网络(CNN)卷积神经网络是深度学习中常用的网络结构之一,其通过卷积操作提取图像中的特征,对于图像处理任务具有很好的效果。
在静态手势识别中,卷积神经网络可以有效地提取出手势图像中的特征,为后续的分类提供支持。
四、算法研究及实现4.1 数据集准备本文采用公开的手势数据集进行实验,包括多个不同背景、不同光照条件下的手势图像。
为了增加模型的泛化能力,我们还对手势图像进行了数据增强处理。
4.2 模型构建本文采用卷积神经网络构建静态手势识别模型。
模型包括卷积层、池化层和全连接层等部分,通过卷积操作提取图像中的特征,然后通过全连接层进行分类。
为了提高模型的性能,我们还采用了dropout、批归一化等技巧来防止过拟合和提高模型的泛化能力。
4.3 训练与优化在模型训练过程中,我们采用了交叉熵损失函数和Adam优化器来进行模型的优化。
手势解析及在人机交互中的应用随着计算机技术的不断发展,计算机键盘输入已经不再能够满足用户的需求。
近几年,越来越多的计算机设备和应用程序都开始支持手势解析。
这些设备和应用可以识别以及控制人的手势,用手势来操作计算机。
手势解析( gesture recognition)是一种技术,通过捕捉人体动作,识别和解释人体动作来实现人机交互。
它是将计算机技术与生物学科结合在一起,用技术来实现自然和舒适的操作系统。
它是一种使用特定系统来识别手势动作人体行为的技术。
手势解析所应用的设备分为接受设备和检测设备。
接受设备是处理手势识别数据的设备,它包括多媒体计算机,摄像机,传感器等;检测设备是用来捕捉人体动作并传送给接收设备的设备,它包括手势传感器,摄像头等。
手势解析可以用来实现人机交互,针对不同场景,可以使用不同的手势解析系统和技术。
举例来说,可以使用红外信号技术来构建手势传感器系统,它以接近通用的方式传感人体和机器的行为,利用计算机的核心技术,结合统计学方法进行多目标机器运动识别,能够实现多种形式的手势解析。
在安全领域,手势解析也被广泛应用,如智能家居、智能安防。
比如,智能家居系统可以通过手势来实现家庭安防监控,通过摄像头和检测传感器来识别不同成员的不同手势,实现人机交互,使用户更加安全可靠。
此外,手势解析还可以用于识别对象,用于智能家居中的定位导航,或者智能安防领域的监控系统,可以用它来进行人脸识别,识别物体等等。
从上面可以看出,手势解析在人机交互中能够发挥重要作用,当然,其前景也是很广阔的。
未来,手势解析技术将被用在更广泛的领域中,更多的应用程序将支持手势解析技术。
而随着硬件技术的发展,智能家居应用也会受益,将会使用户体验得到极大改善。
计算机视觉中的人体姿态估计与行为识别人体姿态估计与行为识别在计算机视觉领域中是一个重要的研究方向,它可以用于识别人体的动作与姿态,实现人机交互、安全监控、医疗诊断等多个领域的应用。
在本文中,我们将对人体姿态估计与行为识别的相关概念、技术方法以及应用进行全面的介绍,并探讨其未来发展方向。
一、人体姿态估计与行为识别的概念人体姿态估计与行为识别是指通过计算机视觉技术对人体的姿态与动作进行识别与分析的过程。
人体姿态估计主要关注于对人体关节位置的估计;而行为识别则是通过对人体动作的分析来识别其所处的行为状态。
这两个领域在近年来得到了广泛的关注,其应用涵盖了人机交互、虚拟现实、安全监控等多个领域。
人体姿态估计与行为识别的研究内容主要包括:关节位置的检测与跟踪、人体动作的识别与分析、行为状态的表示与理解等。
这些内容的研究对于实现计算机对人体动作的理解与推断具有重要的意义,可以为人机交互、智能监控、医疗诊断等领域的应用提供技术支持。
二、人体姿态估计与行为识别的技术方法1.关节位置的检测与跟踪关节位置的检测与跟踪是人体姿态估计与行为识别的基础技术之一。
在这一方面的研究中,通常会使用深度学习、机器学习等技术来进行人体姿态的检测与跟踪。
其中,深度学习可以通过对大量的人体数据进行训练,得到了较好的检测效果。
此外,在关节位置检测方面,还可以运用卷积神经网络(CNN)等深度神经网络技术来实现。
2.人体动作的识别与分析人体动作的识别与分析是人体姿态估计与行为识别的核心技术之一。
这一方面的研究中,通常会使用时空特征描述、行为模型建立等方法来实现对人体动作的识别与分析。
此外,还可以运用循环神经网络(RNN)等技术来实现对时间序列数据的建模与分析。
3.行为状态的表示与理解行为状态的表示与理解是人体姿态估计与行为识别的重要技术之一。
在这一方面的研究中,通常会使用特征提取、行为模式识别等方法来实现对行为状态的表示与理解。
此外,还可以运用深度学习、强化学习等技术来实现对行为状态的模型建立与推断。
基于深度学习的手势识别和人机交互研究手势识别和人机交互是当今信息技术领域的热门研究方向。
它们在实现自然、直观和高效的用户界面方面具有巨大的潜力。
随着深度学习技术的快速发展,基于深度学习的手势识别和人机交互成为了研究的重点。
本文将介绍基于深度学习的手势识别和人机交互的研究现状、关键技术以及应用前景。
手势识别是指通过图像或传感器获取人类手势信息,并将其转化为计算机能够理解的形式。
传统的手势识别方法主要基于计算机视觉技术,如形状、颜色和运动特征等。
然而,这些方法在复杂的环境下容易受到光照、遮挡和背景干扰等因素的影响,导致识别精度不高。
与传统方法相比,基于深度学习的手势识别更具优势。
深度学习算法可以自动提取和学习手势的特征表示,从而减少对手工特征的依赖,提高识别性能。
目前,基于深度学习的手势识别已经在各种应用场景中取得了重要进展,如智能家居控制、虚拟现实、手势交互游戏等。
在基于深度学习的手势识别中,关键技术包括深度神经网络和卷积神经网络。
深度神经网络通过多层次的非线性变换,可以从原始输入数据中学习到高层次的抽象特征表示。
卷积神经网络则是一种特殊的深度神经网络结构,主要用于处理图像和时序数据。
它通过局部感受野、权值共享和池化等操作,有效地减少网络参数,并实现对空间或时间关系的建模。
基于深度学习的手势识别通常采用卷积神经网络结合其他模块进行端到端的训练和优化,以达到更好的识别效果。
除了手势识别,基于深度学习的人机交互也是当前的研究热点。
传统的人机交互方式主要依赖于键盘、鼠标和触摸屏等输入设备,但这些方式存在着一定的局限性。
通过手势识别和深度学习,可以使人与计算机之间的交互更为直接和自然。
例如,用户可以通过手势控制电视机的开关或音量,通过手势在虚拟场景中绘画或旋转物体,实现更加身临其境的体验。
基于深度学习的人机交互主要涉及到手势识别、动作预测和姿态估计等问题,需要对多模态传感器数据进行融合和处理。
基于深度学习的手势识别和人机交互在多个领域有着广泛的应用前景。
基于深度学习的手势追踪与识别技术研究摘要:在人机交互领域,手势追踪与识别技术一直是热门研究方向之一。
随着深度学习技术的快速发展,基于深度学习的手势追踪与识别技术取得了显著的突破。
本文对基于深度学习的手势追踪与识别技术进行了综述,并对其在实际应用中的各种挑战和解决方案进行了探讨。
1. 引言手势追踪与识别技术是将人类的手势行为转化为对应的控制信号,实现人机交互的一种重要技术。
在虚拟现实、智能家居、游戏娱乐等领域,手势识别技术得到广泛应用。
然而,传统的手势追踪与识别方法往往存在诸多限制,难以满足复杂实际场景的需求。
而基于深度学习的手势追踪与识别技术,则具有更强的鲁棒性和适应性。
2. 手势追踪技术研究2.1 传统手势追踪方法传统手势追踪方法主要基于对手部形态特征的提取和跟踪,如使用颜色、形状、纹理等特征进行检测和定位。
然而,这些传统方法对光照变化、遮挡等因素较为敏感,难以应对实际场景的变化。
2.2 基于深度学习的手势追踪技术基于深度学习的手势追踪技术通过利用深度学习神经网络对输入的图像进行端到端的学习和推理,实现自动化的手势追踪。
其中,卷积神经网络(CNN)在手势追踪中得到广泛应用。
通过训练大规模数据集,CNN可以自动学习手势的特征表示,并能够应对复杂场景中的变化,具有较强的鲁棒性。
3. 手势识别技术研究3.1 传统手势识别方法传统手势识别方法主要依靠手工设计的特征提取和分类器进行识别。
例如,使用人工构造的特征描述手势的形态和运动信息,并通过分类器进行识别和分类。
然而,这些方法往往对于复杂多变的手势难以进行准确分类,且需要大量的人工设计。
3.2 基于深度学习的手势识别技术基于深度学习的手势识别技术可以通过端到端的训练实现自动学习和分类。
深度学习模型,如循环神经网络(RNN)、卷积神经网络(CNN)等,在手势识别中取得了显著成果。
通过训练大规模数据集,深度学习模型可以自动提取和学习手势的高级特征表示,并能够识别复杂的手势动作。
使用计算机视觉技术进行人体姿态估计和动作识别的方法人体姿态估计和动作识别是计算机视觉领域中的重要研究方向之一。
它可以通过计算机算法和技术实现对人体的姿态和动作进行自动分析和识别,具有广泛的应用价值,例如动作捕捉、人机交互、运动分析等。
本文将介绍使用计算机视觉技术进行人体姿态估计和动作识别的方法。
人体姿态估计是指从图像或视频中获取人体的关节位置及其相互关系的过程。
它是人体姿态分析的基础,可以用于识别人体的运动状态、行为和意图等。
人体姿态估计的方法可以分为两类:基于2D图像和基于3D图像。
基于2D图像的方法主要依赖单目视觉信息,通过图像中的人体轮廓、边缘和纹理等特征进行姿态估计。
而基于3D图像的方法则需要使用多个摄像机或深度传感器等设备获取3D人体信息。
这些方法可以利用点云数据或体素重建等方法获得人体的3D姿态信息。
常见的基于2D图像的人体姿态估计方法包括基于图像特征的方法和基于深度学习的方法。
基于图像特征的方法通过提取图像中的手工设计特征,例如边缘检测、角点检测和投影特征等,然后通过分类器或回归器进行姿态估计。
这些方法需要根据不同的应用场景选择合适的特征和模型,对于复杂场景的鲁棒性有一定的局限性。
而基于深度学习的方法则通过深度卷积神经网络(CNN)对图像进行特征提取和姿态估计。
这些方法通常利用大规模标注的数据集进行训练,可以获得更好的准确性和泛化能力。
动作识别是指从图像序列中分析和理解人的运动行为的过程。
它可以识别人体的动作类别、持续时间、时间顺序和目标对象等信息。
动作识别的方法可以分为两类:基于2D图像的方法和基于3D图像的方法。
基于2D图像的方法主要依赖时间序列中的图像特征,例如光流法和多尺度分析等。
这些方法可以通过将时间序列中的图像特征输入到分类器或递归神经网络等模型中进行动作识别。
而基于3D图像的方法则需要利用3D人体姿态信息进行动作识别,例如利用形状描述符和聚类算法进行运动建模和分类。
近年来,随着深度学习技术的快速发展,基于深度学习的人体姿态估计和动作识别方法取得了显著的进展。
计算机视觉中的人体姿势估计与动作识别技术人体姿势估计与动作识别技术是计算机视觉领域的研究热点之一,在人机交互、智能监控、虚拟现实等领域有广泛的应用。
本文将从技术原理、应用场景和发展趋势三个方面介绍人体姿势估计与动作识别技术的相关内容。
一、技术原理人体姿势估计与动作识别技术主要基于计算机视觉和机器学习的方法。
其主要流程包括人体姿势的检测和关键点的定位、姿势估计和动作识别。
首先,在人体姿势的检测和关键点定位阶段,采用深度学习的方法,如基于卷积神经网络(CNN)的方法,通过对训练集进行学习,实现对人体图像中的关键点进行检测和定位。
这些关键点包括头部、手、脚等身体部位的位置。
通过这一步骤,可以得到人体的姿势信息。
其次,在姿势估计阶段,利用关键点的位置信息,结合空间关系和先验知识,确定人体的姿势。
传统的方法通常使用模型匹配或规则基因的方法,如支持向量机(SVM)和隐马尔可夫模型(HMM)。
而当前的先进方法则采用了深度学习的技术,如基于循环神经网络(RNN)的方法,通过学习动作序列的上下文信息,实现对姿势的精准估计。
最后,在动作识别阶段,利用估计得到的姿势信息,根据不同的任务需求,设计相应的动作分类器进行动作识别。
这些分类器可以是传统的机器学习分类器,如支持向量机和随机森林,也可以是深度学习方法,如基于卷积神经网络(CNN)的方法。
通过对大量训练数据进行学习,使计算机能够对动作进行准确分类和识别。
二、应用场景人体姿势估计与动作识别技术在许多领域都有广泛的应用。
以下是一些典型的应用场景:1.人机交互:在虚拟现实、增强现实和游戏等领域,通过人体姿势估计与动作识别技术,可以实现用户与计算机之间的自然交互。
用户可以通过身体动作来控制虚拟角色的行为,从而增强用户体验和沉浸感。
2.智能监控:在视频监控领域,人体姿势估计与动作识别技术可以帮助实时识别和跟踪人体动作,实现安全监控和异常检测。
例如,可以通过检测异常动作来及时报警和采取应急措施,提高监控系统的效果和准确性。
本科生毕业设计(论文)题目:人体行为检测与识别姓名:学号:系别:专业:年级:指导教师:2015 年 4 月20日独创性声明本毕业设计(论文)是我个人在导师指导下完成的。
文中引用他人研究成果的部分已在标注中说明;其他同志对本设计(论文)的启发和贡献均已在谢辞中体现;其它内容及成果为本人独立完成。
特此声明。
论文作者签名:日期:关于论文使用授权的说明本人完全了解华侨大学厦门工学院有关保留、使用学位论文的规定,即:学院有权保留送交论文的印刷本、复印件和电子版本,允许论文被查阅和借阅;学院可以公布论文的全部或部分内容,可以采用影印、缩印、数字化或其他复制手段保存论文。
保密的论文在解密后应遵守此规定。
论文作者签名:指导教师签名:日期:人体行为检测与识别摘要人体行为检测与识别是当前研究的重点,具有很高的研究价值和广阔的应用前景。
主要应用在型人机交互、运动分析、智能监控和虚拟现实也称灵境技术(VR)领域,对于研究人体检测和识别有着重要的意义。
因为人的运动的复杂性和对外部环境的多变性,使得人们行为识别和检测具有一些挑战。
对人类行为和检测的研究目前处于初级阶段,有待进一步研究和开发。
本文基于matlab人体行为识别和检测的研究,本文主要研究的是从图像中判断出目标处于何种行为,例如常见的走、坐、躺、跑、跳等行为的识别。
从现有的很多主流研究成果来看,最常见的行为识别系统结构包括几个模块:目标检测、目标跟踪、行为识别。
本文首先对图像进行判断是否有人体目标,识别出人体后对图像进行灰度化,在对灰度图像用背景差法与背景图像比对,最后,比对提取出的人体来判断人体处于何种行为。
关键词:matlab,肤色识别,行为检测Human behavior detection and recognitionAbstractMatlab human behavior recognition and detection of computer vision, intelligent video surveillance, human motion analysis, the nature of the interaction, virtual application prospect and reality of the economic value of the field, so a lot of research a hot topic these problem areas. Because of the complexity and diversity of human movement of the external environment, so that human behavior recognition has some challenges. The study of human behavior in its infancy now, pending further study and discussion.Recognition matlab studied human behavior recognition and detection based on paper studies is judged from the image in which the target behavior, such as a common walk, sit, lie down, running, jumping and other acts. From the many existing mainstream research point of view, the most common gesture recognition system architecture consists of several modules: target detection, target tracking, behavior recognition. Firstly, the images to determine whether there are human targets identified after the body of the gray-scale image, the gray-scale images using background subtraction and background image comparison, and finally, more than the extracted human body is what determines kind gesture.Keywords: matlab, color identification, behavior detection目录第1章绪论 (7)1.1 研究背景 (7)1.2 研究意义 (9)1.3 研究内容 (9)1.4 论文组织 (10)第2章基于人脸检测的人体识别 (10)2.1人脸特征 (10)2.2 基于肤色的人脸检测 (11)第3章行为识别 (13)3.1 灰度化 (14)3.2背景差分法算法 (15)3.3背景差阈值分割法 (17)3.4通过长宽判断人体行为 (17)3.4小结 (18)结论 (19)参考文献 (19)谢辞 (22)附录二文献翻译 (29)第1章绪论1.1 研究背景随着社会的发展,人民生活的提高,人们越来越关注安全问题,对视频监控系统的需求也爆发式扩张,如停车场,超市,银行,工厂,矿山等安全有监控设备,但监控系统不会主动实时监控。
《基于深度学习的静态手势识别算法研究》篇一一、引言随着人工智能技术的不断发展,深度学习在计算机视觉领域的应用越来越广泛。
其中,静态手势识别作为人机交互的重要手段之一,在许多领域都有广泛的应用前景。
本文旨在研究基于深度学习的静态手势识别算法,提高手势识别的准确性和鲁棒性,为相关应用提供更好的技术支持。
二、相关工作静态手势识别是计算机视觉领域的一个重要研究方向,其核心在于对手势图像的准确识别和分类。
传统的手势识别方法主要依赖于手工设计的特征提取方法,如HOG、SIFT等。
然而,这些方法对于复杂的手势和多样的光照条件下的手势识别效果并不理想。
近年来,深度学习在静态手势识别方面取得了显著的成果,尤其是卷积神经网络(CNN)在图像识别领域的广泛应用。
本文将重点研究基于深度学习的静态手势识别算法。
三、算法原理本文提出的基于深度学习的静态手势识别算法主要包括数据预处理、模型训练和分类识别三个部分。
1. 数据预处理:首先对原始手势图像进行预处理,包括灰度化、归一化等操作,以便于后续的模型训练和分类识别。
2. 模型训练:采用卷积神经网络(CNN)进行模型训练。
在训练过程中,通过反向传播算法不断调整网络参数,使模型能够更好地对手势图像进行分类。
3. 分类识别:将预处理后的手势图像输入到训练好的模型中,通过前向传播得到分类结果。
四、实验与分析为了验证本文提出的算法的有效性,我们进行了大量的实验。
实验数据集包括多个公共手势数据集和自采的手势数据集,涵盖了多种手势和不同的光照条件。
实验结果表明,本文提出的算法在静态手势识别方面具有较高的准确性和鲁棒性。
具体而言,我们采用了多个不同的CNN模型进行实验,包括LeNet、VGG等。
通过对比实验结果,我们发现VGG模型在手势识别方面具有较好的性能。
此外,我们还对模型的参数进行了优化,包括学习率、批大小等,以进一步提高模型的性能。
在实验过程中,我们还对不同光照条件下的手势图像进行了处理和分析,以验证模型的鲁棒性。
《基于深度机器学习的体态与手势感知计算关键技术研究》篇一一、引言在信息技术高速发展的今天,基于深度机器学习的体态与手势感知计算技术在各个领域均发挥着日益重要的作用。
随着深度学习算法的不断成熟与计算机性能的持续提升,通过识别与分析个体的体态和手势,我们可以实现人机交互的智能化、自然化与高效化。
本文旨在探讨基于深度机器学习的体态与手势感知计算关键技术的研究现状、挑战及未来发展趋势。
二、体态与手势感知计算技术概述体态与手势感知计算技术是指通过传感器、摄像头等设备捕捉并分析人体的姿态和手势信息,从而实现对个体行为的感知与理解。
该技术主要涉及图像处理、计算机视觉、深度学习等多个领域,是人工智能与机器人技术的重要分支。
通过该技术,我们可以实现对人类动作的精准识别,进而为各种应用提供有力支持。
三、深度机器学习在体态与手势感知计算中的应用深度机器学习在体态与手势感知计算中发挥着重要作用。
首先,通过深度学习算法,我们可以训练出能够准确识别不同体态和手势的模型。
这些模型可以从大量的训练数据中学习并提取出有效的特征,从而实现对手势和体态的精准识别。
其次,利用深度学习算法,我们可以实现对手势和体态的实时分析,为实时交互提供有力支持。
此外,通过不断优化深度学习算法,我们可以进一步提高体态与手势感知计算的准确性和效率。
四、关键技术研究1. 数据采集与预处理:高质量的数据是进行体态与手势感知计算的关键。
我们需要设计合理的传感器布局和图像采集方案,以获取准确、全面的数据。
同时,为了减少数据噪声和提高数据处理效率,我们需要对数据进行预处理,如去噪、归一化等。
2. 深度学习模型设计:针对体态与手势感知计算任务,我们需要设计合适的深度学习模型。
这些模型应具备较强的特征提取能力和泛化能力,以便从大量的数据中学习并提取出有效的特征。
同时,为了降低模型的复杂度并提高计算效率,我们需要对模型进行优化设计。
3. 算法优化与性能提升:为了提高体态与手势感知计算的准确性和效率,我们需要对算法进行优化。
13Human Detection and Gesture RecognitionBased on Ambient IntelligenceNaoyuki KubotaTokyo Metropolitan UniversityJapan1. IntroductionRecently, various types of human-friendly robots such as pet robots, amusement robots, and partner robots, have been developed for the next generation society. The human-friendly robots should perform human recognition, voice recognition, and gesture recognition in order to realize natural communication with a human. F urthermore, the robots should coexist in the human environments based on learning and adaptation. However, it is very difficult for the robot to successfully realize these capabilities and functions under real world conditions. Two different approaches have been discussed to improve these capabilities and functions of the robots. One approach is to use conventional intelligent technologies based on various sensors equipped on a robot. As a result, the size of a robot becomes large. The other approach is to use ambient intelligence technologies of environmental systems based on the structured information available to a robot. The robot directly receives the environmental information through a local area network without measurement by the robot itself. In the following, we explain the current status of researches on sensor networks and interactive behavior acquisition from the viewpoint of ambient intelligence.1.1 Ambient IntelligenceFor the development of sensor network and ubiquitous computing, we should discuss the intelligence technologies in the whole system of robots and environmental systems. Here intelligence technologies related with measurement, transmission, modeling, and control of environmental information is called ambient intelligence. The concept of ambient intelligence was discussed by Hagras et.al. (Doctor et al., 2005). Their main aim is to improve the qualities of life based on computational artifacts, but we focus on the technologies for the co-existence of humans and robots in the same space. From the sensing point of view, a robot is considered as a movable sensing device, and an environmental system is considered as a fixed sensing device. If the environmental information is available from the environmental system, the flexible and dynamic perception can be realized by integrating environmental information.The research on wireless sensor networks combines three components of sensing, processing, and communicating into a single tiny device (Khemapech et al., 2005). The main roles of sensor networks are (1) environmental data gathering, (2) security monitoring, (3) Source:Face Recognition,Book edited by:Kresimir Delac and Mislav Grgic,ISBN 978-3-902613-03-5,pp.558,I-Tech,Vienna,Austria,June 2007O p e n A c c e s s D a t a b a s e w w w .i -t e c h o n l i n e .c o m262Face Recognition and object tracking. In the environmental data gathering, the data measured at each node are periodically transmitted to a database server. While the synchronization of the measurement is very important to improve the accuracy of data in the environmental data gathering, an immediate and reliable emergency alert system is very important in the security monitoring. Furthermore, a security monitoring system does not need to transmit data to an emergency alert system, but the information on features or situations should be transmitted as fast as possible. Therefore, the basic network architecture is different between data gathering and security monitoring. On the other hand, the object tracking is performed through a region monitored by a sensor network. Basically, objects can be tracked by tagging them with a small sensor node. Radio frequency identification (RFID) tags are often used for the tracking system owing to low cost and small size.Sensor networks and ubiquitous computing have been incorporated into robotics. These researches are called network robotics and ubiquitous robotics, respectively (Kim et al., 2004). The ubiquitous computing integrates computation into the environment (Satoh, 2006). The ubiquitous computing is conceptually different from sensor networks, but both aim at the same research direction. If the robot can receive the environmental data through the network without the measurement by sensors, the size of the robot can be easily reduced and the received environmental data are more precise because the sensors equipped in the environment is designed suitable to the environmental conditions. On the other hand, network robots are divided into three types; visible robots, unconscious robots, and virtual robots (Kemmotsu, 2005). The role of visible robots is to act on users with their physical body. The role of unconscious robots is mainly to gather environmental data, and this kind of unconscious robot is invisible to users. A virtual robot indicates a software or agent in a cyber world. A visible robot can easily perceive objects by receiving object information from RF ID tags, and this technology has been applied for the robot navigation and the localization of the self-position (Kulyukin et al., 2004). Hagras et al. developed iDorm as a multi-function space (Doctor et al., 2005). F urthermore, Hashimoto et al. proposed Intelligent Space (iSpace) in order to achieve human-centered services, and developed distributed intelligent network devices composed of color CCD camera including processing and networking units (Morioka & Hashimoto, 2004). A robot can be used not only as a human-friendly life-support system (Mori et al., 2005), but also as an interface connecting the physical world with the cyber world.1.2 Interactive Behavioral AcquisitionIn general, the behavioral acquisition used in robotics can be classified into supervised learning and self-learning (F igure1). The self-learning is defined as unsupervised learning performed by trial-and-error without exact target teaching signals for motion reproduction. Supervised learning in behavior acquisition is divided into social learning and error-based learning. F or example, least mean square algorithms are applied for behavioral learning when exact target teaching signals are given to a robot. On the other hand, the observed data, instead of exact target teaching signals, are used in social leaning. The social learning is performed between two or more agents. Basically, social learning is divided into imitative learning, instructive learning, and collaborative learning (Morikawa et al., 2001).Imitation (Billard, 2002) is a powerful tool for gestural interaction among children and for teaching how to behave to children by parents. Furthermore, the imitation is often used for communication among children, and the gestures are useful to understand the intentionsHuman Detection and Gesture Recognition Based on Ambient Intelligence 263and emotional expressions. Basically, imitation is defined as the ability to recognize and reproduce other's actions. The concept of imitative learning has been applied to robotics. In the traditional researches of learning by observation, motion trajectories of a human arm assembling or handling objects are measured, and the obtained data are analyzed and transformed for the motion control of a robotic manipulator. F urthermore, various neural networks have been applied to imitative learning for robots. The discovery of mirror neurons is especially important (Rizzolatti et al., 1996). Each mirror neuron activates not only by performing a task, but also by observing somebody performing the same task. Rao and Meltzoff classified imitative abilities into four stage progression: (i) body babbling, (ii) imitation of body movements, (iii) imitation of actions on objects, and (iv) imitation based on inferring intentions of others (Rao & Meltzoff, 2003). If a robot can perform all stages of imitation, the robot might develop in the same way as humans. While the imitative learning is basically unidirectional from a demonstrator to a learner, the instructive learning is bidirectional between an instructor and a learner. An instructor assesses the learning state of the learner, and then shows additional and suitable demonstrations to the learner. Collaborative learning is slightly different from the imitative learning and instructive learning, because neither exact teaching data nor target demonstration is given to agents beforehand in the collaborative learning. The solution is found or searched through interaction among multiple agents. Therefore, the collaborative learning may be classified asthe category of self-learning.Instructive Error-based UnsupervisedBidirectional clustering ...Figure 1. Learning methods in robotics1.3 Human Detection and Gesture RecognitionHuman interaction based on gestures is very important to realize the natural communication. The meaning of gesture can be understood through the actual interaction and imitation. Therefore, we focus on the human detection and gesture recognition for imitative learning of human-friendly network robots. Basically, imitative learning is composed of model observation and model reproduction. F urthermore, model learning is required to memorize and generalize motion patterns as gestures. In addition, the model clustering is required to distinguish a specific gesture from others, and model selection as a result of the human interaction is also performed. In this way, the imitative learning264Face Recognition requires various learning capabilities of model observation, model clustering, model selection, model reproduction, and model learning simultaneously. F irst of all, the robot detects a human based on image processing with a steady-state genetic algorithm (SSGA) (Syswerda, 1991). Next, a series of the movements of the human hand by image processing as model observation, or the hand motion pattern, is extracted by a spiking neural network (Gerstner, 1999). Furthermore, SSGA is used for generating a trajectory similar to the human hand motion pattern as model reproduction (Kubota, Nojima et al., 2006). In the following, we explain the methods for the human detection and gesture recognition based on ambient intelligence.2. Partner Robots and Environmental SystemWe developed two different types of partner robots; a human-like robot called Hubot (Kubota, Nojima et al., 2006) and a mobile PC called MOBiMac (Kubota, Tomioka et al., 2006) in order to realize the social communication with humans. Hubot is composed of a mobile base, a body, two arms with grippers, and a head with pan-tilt structure. The robot has various sensors such as a color CCD camera, two infrared line sensors, a microphone, ultrasonic sensors, and touch sensors (F igure 2(a)). The color CCD camera can capture an image with the range of -30˚ and 30˚ in front of the robot. Two CPUs are used for sensing, motion control, and wireless network communication. The robot can take various behaviors like a human. MOBiMac is also composed of two CPUs used for PC and robotic behaviors (Figure 2(b)). The robot has two servo motors, four ultrasonic sensors, four light sensors, a microphone, and a CCD camera. The basic behaviors of these robots are visual tracking, map building, imitative learning (Kubota, 2005), human classification, gesture recognition, and voice recognition. These robots are networked, and share environmental data among each other. Furthermore, the environmental system based on a sensor network provides a robot with its environmental data measured by the equipped sensors.Human detection is one of the most important functions in the ambient intelligence space. The visual angle of the robot is very limited, while the environmental system is designed to observe the wide range of the environment. Therefore, human detection can be easily performed by the monitoring of the environmental system or by the cooperative search of several robots based on the ambient intelligence (F igure 3). In the following sections, weexplain how to detect a human based on image processing.(a) Hubot (b) MOBiMacFigure 2. Hardware architecture of partner robotsHuman Detection and Gesture Recognition Based on Ambient Intelligence 265Figure 3. The sensor network among robots and their environmental system3. Human Detection and Gesture Recognition Based on Ambient Intelligence 3.1 Human DetectionHuman detection is performed by both robots and envrionmental sytem. Pattern matching has been performed by various methods such as template matching, cellular neural network, neocognitron, and dynamic programming (DP) matching (Fukushima, 2003; Mori et al., 2006). In general, pattern matching is composed of two steps of target detection and target recognition. The aim of target detection is to extract a target from an image, and the aim of the target recognition is to identify the target from classification candidates. Since the image processing takes much time and computational cost, a full size of image processing to every image is not practical. Therefore, we use a reduced size of image to detect a moving object for fast human detection. First, the robot calculates the center of gravity (COG) of the pixels different from the previous image as the differential extraction. The size of image used in the differential extraction is updated according to the previous human detection result; the maximum and minimum of the image sizes are 640×480 and 80×60, respectively. The differential extraction calculates the difference of the number of pixels between the previous and current images. If the robot does not move, the COG of the difference represents the location of the moving object. Therefore, the main search area for fast human detection can be formed according to the COG for fast human detection.We use a steady-state genetic algorithm (SSGA) for human detection and object detection as one of search methods, because SSGA can easily obtain feasible solutions through environmental changes at low computational costs. SSGA simulates a continuous model of the generation, which eliminates and generates a few individuals in a generation (iteration) (Syswerda, 1999). Here SSGA for human detection is called SSGA-H, while SSGA for object detection used as human hand detection is called SSGA-O.Face Recognition 266Human skin and hair colors are extracted by SSGA-H based on template matching. Figure 4 shows a candidate solution of a template used for detecting a target. A template is composed of numerical parameters of g i,1H ,g i,2H ,g i,3H , and g i,4H . The number of individuals is G . One iteration is composed of selection, crossover, and mutation. The worst candidate solution is eliminated ("Delete least fitness" selection), and is replaced by the candidate solution generated by the crossover and the mutation. We use elitist crossover and adaptive mutation. The elitist crossover randomly selects one individual and generates an individual by combining genetic information from the randomly selected individual and the best individual. Next, the following adaptive mutation is performed to the generated individual,g i ,j H←g i ,j H +αj H ⋅f max H −f i H f max H −f min H +βj H §©¨·¹¸⋅N 0,1()(1) where f i H is the fitness value of the i th individual, f max H and f min H are the maximum and minimum of fitness values in the population; N (0,1) indicates a normal random variable with a mean of zero and a variance of one; ǂ j H and ǃ j H are the coefficient (0< ǂ j H <1.0) and offset (ǃ j H >0), respectively. In the adaptive mutation, the variance of the normal random number is relatively changed according to the fitness values of the population. Fitness value is calculated by the following equation,f i H =C Skin H +C Hair H +η1H ⋅C Skin H ⋅C Hair H −η2H ⋅C OtherH (2) where C Skin H ,C Hair H and C Other H indicate the numbers of pixels of the colors corresponding to human skin, human hair, and other colors, respectively; Lj 1H and Lj 2H are the coefficients (Lj1H ,Lj 2H >0). Therefore, this problem results in the maximization problem. The iteration of SSGA-H is repeated until the termination condition is satisfied.Figure 4. A template used for human detection in SSGA-H3.2 Human Hand DetectionWe proposed a method for human hand detection based on the finger color and edges (Kubota & Abe, 2006), but we assume the human uses objects such as balls and blocks for performing a gesture interacting with a robot. Because the main focus is the gesture recognition based on human hand motion and the exact human hand detection is out of scope in this chapter. Therefore, we focus on color-based object detection with SSGA-O based on template matching. The shape of a candidate template is generated by the SSGA-O. We assume the human uses objects such as balls and blocks for performing a gesture interacting with a robot. F igure 5 shows a candidate template used for detecting a target where the j th point g i,j O of the i th template is represented by (g i,1O +g i,j O cos(g i,j+1O ),Human Detection and Gesture Recognition Based on Ambient Intelligence 267 g i,2O+g i,j O sin(g i,j+1O)),i=1, 2, ... , n,j=3, ... , 2 m+2;O i (=(g i,1O,g i,2O)) is the center of a candidate template on the image; n and m are the number of candidate templates and the searching points used in a template, respectively. Therefore, a candidate template is composed of numerical parameters of (g i,1O,g i,2O,... , g i,2m+2O). We used an octagonal template (m=8).The fitness value of the i th candidate template is calculated as follows.f i O=CT arg etO−ηO⋅COtherO (3)whereLjO is the coefficient for penalty (LjO>0); C Target O and C Other O indicate the numbers of pixels of a target and other colors included in the template, respectively. The target color is selected according to the pixel color occupied mostly in the template candidate. Therefore, this problem also results in the maximization problem. The robot extracts human hand motion from the series of images by using SSGA-O where the maximal number of images is T G. The sequence of the hand positions is represented by G(t)=(G x(t),G y(t)) where t=1, 2, ... , T G.Figure 5. A template used for object detection in SSGA-O3.3 Human Hand Motion ExtractionVarious types of artificial neural networks have been proposed to realize clustering, classification, nonlinear mapping, and control (Jang et al., 1997; Kuniyoshi & Shimozaki, 2003; Rumelhart et al., 1986). Basically, artificial neural networks are classified into pulse-coded neural networks and rate-coded neural networks from the viewpoint of abstraction level (Gerstner, 1999). A pulse-coded neural network approximates the dynamics with the ignition phenomenon of a neuron, and the propagation mechanism of the pulse between neurons. Hodgkin-Huxley model is one of the classic neuronal spiking models with four differential equations. An integrate-and-fire model with a first-order linear differential equation is known as a neuron model of a higher abstraction level. A spike response model is slightly more general than the integrate-and-fire model, because the spike response model can choose kernels arbitrarily. On the other hand, rate-coded neural networks neglect the pulse structure, and therefore are considered as neuronal models of the higher level of abstraction. McCulloch-Pitts and Perceptron are well known as famous rate coding models (Anderson & Rosenfeld, 1988). One important feature of pulse-coded neural networks is the capability of temporal coding. In fact, various types of spiking neural networks (SNNs) have been applied for memorizing spatial and temporal contexts. Therefore, we apply a SNN for memorizing several motion patterns of a human hand, because the human hand motion has specific dynamics.We use a simple spike response model to reduce the computational cost. F irst of all, the internal state h i(t) is calculated as follows;Face Recognition268h i (t )=tanh h i syn (t )+h i ext (t )+h i ref (t )() (4)Here hyperbolic tangent is used to avoid the bursting of neuronal fires, h i ext (t ) is the input to the i th neuron from the external environment, and h i syn (t ) including the output pulses from other neurons is calculated by,h i syn (t )=γsyn ⋅h i (t −1)+w j ,i ⋅h jEPSP (t )j =1,j ≠i N ¦ (5) Furthermore, h i ref (t ) indicates the refractoriness factor of the neuron; w j ,i is a weight coefficient from the j th to i th neuron; h j EPSP (t ) is the excitatory postsynaptic potential (EPSP) of the j th neuron at the discrete time t ;N is the number of neurons; DŽsyn is a temporal discount rate. The presynaptic spike output is transmitted to the connected neuron according to EPSP. The EPSP is calculated as follows;h i EPSP (t )=κn p i (t −n )n =0T¦ (6) where Nj is the discount rate (0<k <1.0); p i (t ) is the output of the i th neuron at the discrete time t ;T is the time sequence to be considered. If the neuron is fired, R is subtracted from the refractoriness value in the following,h iref (t )=γref ⋅h i ref (t −1)−R if p i (t −1)=1γref ⋅h i ref (t −1)otherwise ®°¯° (7) where DŽref is a discount rate. When the internal potential of the i th neuron is larger than the predefined threshold, a pulse is outputted as follows;p i (t )=1if h i ref (t )≥q i 0otherwise ®°¯° (8) where q i is a threshold for firing. Here spiking neurons are arranged on a planar grid (Figure6) and N =25. By using the value of a human hand position, the input to the i th neuron is calculated by the Gaussian membership function as follows;h i ext (t )=exp −c i −G (t )22σ2§©¨¨·¹¸¸ (9) where c i =(c x ,i ,c y ,i ) is the position of the i th spiking neuron on the image; is a standard deviation. The sequence of pulse outputs p i (t ) is obtained by using the human hand positions G (t ). The weight parameters are trained based on the temporal Hebbian learning rule as follows,w j ,i ←tanh γwgt ⋅w j ,i +ξwgt ⋅h j EPSP (t −1)⋅h i EPSP (t )()(10)Human Detection and Gesture Recognition Based on Ambient Intelligence 269 whereDŽwht is a discount rate and LJwgt is a learning rate. Because the adjacent neurons along the trajectory of the human hand position are easily fired as a result of the temporal Hebbian learning, the SNN can memorize the temporally firing patterns of various gestures.Figure 6. Spiking neurons for gesture recognition3.4 Gesture Recognition and LearningThis subsection explains a method for clustering human hand motions. Cluster analysis is used for grouping or segmenting observations into subsets or clusters based on similarity. Self-organizing map (SOM), K-means algorithm, and Gaussian mixture model are often applied as clustering algorithms (Hastie et al., 2001; Kohonen, 2001). SOM can be used as incremental learning, while K-means algorithm and Gaussian mixture model use all observed data in the learning phase (batch learning). In this paper, we apply SOM for clustering spatio-temporal patterns of pulse outputs from the SNN, because the robot observes a human hand motion at a time. Furthermore, the neighboring structure of units can be used in the further discussion for the similarity of clusters.SOM is often applied for extracting a relationship among observed data, since SOM can learn the hidden topological structure from the data. The inputs to SOM is given as the weighted sum of pulse outputs from neurons,v=v1,v2,...,vN() (11) where v i is the state of the i th neuron. In order to consider the temporal pattern, we useh i EPSP(t) as v i, although the EPSP is used when the presynaptic spike output is transmitted.When the i th reference vector of SOM is represented by r i, the Euclidian distance between aninput vector and the i th reference vector is defined asd i =v−ri(12)Where r i=(r1,i,r2,i, ... , r N,i) and the number of reference vectors (output units) is M. Next, the k th output unit minimizing the distance d i is selected byk=arg miniv−ri{} (13)Furthermore, the reference vector of the i th output unit is trained byr i ←ri+ξSOM⋅ζk,iSOM⋅v−ri() (14)whereLJ SO M is a learning rate (0<LJSOM <1.0); LJk,i SO M is a neighborhood function (0<LJk,i SO M <1.0). Accordingly, the selected output unit is the nearest pattern among the previously learned human hand motion patterns.270Face Recognition 4. ExperimentsThis section shows several experimental results of human detection and gesture recognition. We conducted several experiments of human detection by the environmental system and the robot (Kubota & Nishida, 2006; Sasaki & Kubota, 2006). F igures 7 and 8 show human detection results by SSGA-H from the ceiling view and from the robot view, respectively. The environmental system detects two people in the complicated background in Figure 7. In Figure 8 (a), first the robot used the high resolution of images to detect a walking human. Afterward, as the human gradually came toward the front of the robot, the robot used the lower resolution of images to reduce computational cost and detected the human (Figure 8 (b)).Figure 7. Human detection results from the ceiling camera of the environmental system by SSGA-HFigure 8. Human detection results from the robot by SSGA-HWe conducted several experiments of gesture recognition. The number of units used in SOM is 8. F igures 9 and 10 show examples of human hand motion and the learning of SNN, respectively. The human moves his hand from the upper left to the lower right through upper right on the human position. The EPSP based on spike outputs does not cover the human hand motion at the first trial (Figure 10 (a)), but after learning, the EPSP successfully covers the human hand motion based on the trained weight connections where the depth of color indicates the strength of weight connections between two neurons (F igure 10 (b)).Human Detection and Gesture Recognition Based on Ambient Intelligence 271 F igure 11 shows the learning results of various human hand motion patterns. The weight connections are trained according to the frequency of local movements between two neurons according to the human hand motion patterns. A different unit of SOM is selected when the different human hand motion is shown to the robot. The selected unit is depicted as dark boxes where each small box indicates the magnitude of the value in the reference vector. The 8th and 5th units are selected as gestures according to human hand motions, respectively. F urthermore, the comparison between SNN and RNN and the detailed analysis are discussed in (Kubota, Tomioka et al., 2006).Figure 9. A human hand motionOriginal Image WeightConnectionsSelected Unit ofSOMDetected Hand Position Human Color ExtractionEPSP(a) The initial state of EPSP at the first trial(b) The state of EPSP after learning Figure 10. Learning of SNN with the human hand motion of Figure 9272Face Recognition(a) A recognition result of human hand motion of the 8 figure; the 8th unit is selected.A recognition result of the human hand motion from the left to right; the 5th unit is selected. Figure 11. Gesture recognition by SNN and SOM after learning5. SummaryThis chapter introduced the methods for human detection and gesture recognition based on ambient intelligence. The experimental results show the effectiveness of the methods for human detection and gesture recognition, but we should use various types of sensory information in addition to visual images. We proposed the concept of evolutionary robot vision. The main sensory information is vision, but we can integrate and synthesize various types of sensory information for image processing. Possible sensory information to improve the performance is distance. We developed a 3D modeling method based on CCD cameras and a laser range finder with a small and compact pan-tilt mechanism. We intend to develop a gesture recognition method based on various types of sensory information.F urthermore, network robots based on ambient intelligece are an attractive approach to realize sophisticated services in the next generation society. One of attractive approaches is the SICE City project. The aim of SICE city project is to build up the methodology and concept in the city design to realize sophisticated services for residents based on measurement technology, network technology, control technology, and systems theory. We will discuss the applicability of human detection, human recognition, gesture recognition, and motion recognition based on ambient intelligence to human-friendly functions in the city design.。