Semantic Web
- 格式:pdf
- 大小:611.08 KB
- 文档页数:18
1.1什么是语义网微软公司董事长比尔·盖茨那幢坐落在西雅图,被喻为未来生活预言的科技住宅无疑是当今世界上最现代化的豪华住宅,堪称是智能建筑的经典之作。
豪宅内共铺设各种电缆52英里,房子内所有的电器设备相互连接成了一个智能网络。
主人在回家途中便可在车内利用计算机遥控家中的浴缸自动放水并调温,作好一切迎接准备。
房屋装有气象感知器,可以根据各项气象指标,控制室内的温度和通风情况。
走进大厅时,空调系统会将室温调整至你感觉最舒适的温度,音响系统也会针对你的喜好播放音乐,灯光系统自动调整照明颜色与强度,就连墙上的LCD显示屏,也会自动显示你喜爱的世界名画或播放你上次只看到一半的影片。
在住宅各处随意走动时,地板能在6英寸的范围内跟踪到人的足迹,在有人经过时自动打开照明,离去时自动关闭。
每个房间的温度、照明、音响等等都将随不同的设定自动调整。
就算是在水池中,也会从池底“冒”出如影随形的音乐。
尤其有意思的是,比尔·盖茨非常喜欢车道旁边一棵140岁的老枫树,所以就通过专门的监视系统对其进行24小时的全方位监控﹐一旦监视系统发现它有任何干燥的迹象,灌溉系统就会启动。
人们不禁要问,是什么尖端的科技系统使得盖茨先生的豪宅拥有如此高的智能化水平?要等到哪一个世纪才能让这样现代化的住宅走入寻常百姓家?你可能不曾想到,通过扩展今天的万维网(即WWW,是World Wide Web的简称)就完全可以使这一梦想变为现实。
这种扩展后的万维网称为语义网(Semantic Web)。
语义网的概念由万维网的发明者、现任万维网联盟(即W3C,是World Wide Web Consortium的简称)主任提姆·伯纳丝·李(Tim Berners-Lee)于1998年首次提出。
在他看来,“语义网(Semantic Web)并非是另外一个独立的Web,而是现在的Web的一个延伸。
在其中,所有的信息都具有定义完好的含义,更利于人与机器之间的合作。
语义网中的本体构建与推理研究随着互联网技术的不断发展,人们在网络上获取信息变得越来越容易,然而,这些信息往往是海量的、杂乱无章的,并不便于机器自动处理。
因此,我们需要一种能够理解信息含义的方式,来帮助我们更好地处理这些信息。
这就是语义网的基本思想。
语义网(Semantic Web)的核心是充分地使用信息的语义,通过构建本体(Ontology)、推理等手段来实现Web资源的高效利用和共享。
本体是语义网的基石本体是语义网中的核心概念。
顾名思义,本体就是用于描述实体及其关联关系的模型。
它是对某一领域中实体、概念、属性和关系等的描述,以及这些描述之间的约束、规则等。
本体的目的是消除不同人、不同组织、不同机器对同一概念的不同解释,为不同使用者提供一个一致的、标准的基础。
因此,本体的构建关系到语义网的推广和应用。
本体构建的方法本体构建的方法可以大致分为三大类:手工构建法、半自动化构建和自动化构建。
手工构建是最早出现的一种本体构建方式。
其优点在于可以高度抽象地描述概念,缺点在于速度慢、成本高。
半自动化构建则是在手工构建的基础上,在人工干预的情况下涉及到自动化工具,优点在于缩短了构建时间。
自动化构建是一种基于机器学习的方法,具有时间成本低、可扩展性好等优点。
本体推理的方法本体推理是指通过基于本体知识的逻辑推断,从本体中出发,再结合外部实例数据,推导出新的知识或结论,从而完善和扩展本体的过程。
本体推理的方法可以大致分为逻辑推理和规则推理。
逻辑推理是利用逻辑形式化地表示本体知识,然后进行逻辑推理的过程。
逻辑推理需要对本体进行形式化表示,从而使推理结果是形式化规则所允许的。
规则推理是指利用基于规则或规则表示的推理方法,利用规则的强特定性来完成推理任务。
本体构建和推理的应用完善的本体和推理技术可以帮助我们更好地利用和共享网络信息。
下面分别介绍几个应用。
1. 语义搜索语义搜索可以从网络数据中精确提取用户所需信息。
在语义搜索中,可以利用本体中的概念间关系,由搜索关键词推断出更适合用户需求的结果,从而不必对搜索结果进行手工筛选。
黄智生博士谈语义网与Web 3.0近两年来,“语义网(Semantic Web)”或“Web 3.0”越来越频繁地出现在IT报道中,这表明语义网技术经过近10年的研究与发展,已经走出实验室进入工程实践阶段。
PowerSet、Twine、SearchMonkey、Hakia等一批语义网产品的陆续推出,预示着语义网即将在现实世界中改变人们的生活与工作方式。
在Web 3.0时代即将揭开序幕之际,正确理解、掌握语义网的概念与技术,对IT人士与时俱进和增加优势是必不可少的。
为此,InfoQ中文站特地邀请到来自著名语义网研究机构荷兰阿姆斯特丹自由大学的黄智生博士,请他为我们谈一谈工业界人士感兴趣的语义网话题,包括什么是语义网、语义网与Web 3.0的关系以及语义网如何给商业公司带来效益等。
InfoQ中文站:您是语义网方面的权威专家,能否先请您为我们消除概念上的困惑。
现在有一个说法,即Web 3.0就是语义网。
但是除了W3C定义的语义网以外,关于Web 3.0还有许多种其他说法,您认为谁才真正代表了Web 3.0?为什么?黄智生博士(以下称黄博士):首先需要说明的是:我不认为自己是所谓的“权威”。
纵观万维网的发展,总是年轻人在创造历史,他们给人类社会带来了一次又一次的惊奇。
且不说万维网之父Tim Berners-Lee在1989年构想万维网的时候仅仅三十出头。
Web 1.0产生的雅虎和谷歌等国际大公司的创始人大多是年轻的博士生。
Web 2.0产生的Facebook等公司创始人的情况也大体如此。
Web 3.0的情况也可能如此。
我们甚至都不能完全指望通过现有的IT大公司的巨大投入来发展语义网。
这些大公司往往受着过去成功经验的束缚,而且新技术采用的是与以往完全不同的思路,从而会加深大公司对新技术的怀疑。
当然,这也为年轻人书写历史创造辉煌提供了发展空间。
由于Web 1.0和Web 2.0技术的成熟,Web 3.0的想法实际上表达了现在人们对下一代万维网技术的种种期待。
语义网概念及技术综述语义网(Semantic Web)是一种由 W3C(World Wide Web Consortium)推广的,基于 XML(eXtensible Markup Language)和 RDF(Resource Description Framework)等技术的网络,它旨在增强网络信息的语义表达和机器可读性,从而使得计算机能够更好地理解和处理网络信息。
一、语义网的概念语义网是一种以“数据”为中心的网络,它通过使用 XML、RDF 等技术,将网络信息以机器可读的方式进行组织和表达。
与传统的 Web 相比,语义网更加强调信息的结构和含义,而不是简单的文本表现形式。
因此,语义网被认为是 Web 的一个重要发展阶段,是实现智能 Web 的关键步骤。
二、语义网的技术1.XMLXML 是构建语义网的基础技术之一,它是一种用于描述数据的标记语言。
XML 可以用来表示数据结构,并且可以很好地与 HTML、HTTP 等现有网络技术集成。
通过 XML,我们可以将数据以机器可读的方式进行组织和表达,从而使得计算机可以更好地处理和理解数据。
2.RDFRDF 是另一种构建语义网的关键技术,它是一种用于描述资源及其关系的模型。
RDF 将每个资源视为一个三元组,包括主体、属性和值三个部分。
通过这种方式,我们可以将网络信息以一种通用的、机器可读的方式进行描述和组织,从而实现数据的共享和重用。
3.RDFSRDFS 是 RDF 的扩展,它增加了一些新的概念和规则,例如类、子类关系、属性限制等。
这些概念和规则可以帮助我们更好地描述和组织数据,并且可以用于构建更加复杂的语义网应用。
4.OWLOWL 是另一种基于 RDF 的语言,它提供了更加丰富的概念和规则,例如类、属性、关系等。
OWL 提供了三种不同的表达层次,分别是 OWL Lite、OWL DL 和OWL Full,以满足不同应用场景的需求。
OWL 可以用于描述更加复杂的概念和关系,并且可以用于构建更加高级的语义网应用。
swc分类SWC分类(Semantic Web Challenge Classification)是用于描述语义网挑战赛中的分类方法。
语义网挑战赛是一个旨在推动语义网技术发展的国际性比赛,每年举办一次,参赛者通过设计和实现语义网应用来展示他们的创新和技术实力。
SWC分类主要是为了方便对参赛作品进行评价和比较。
不同的参赛作品可能涉及不同的语义网技术和应用领域,因此需要将它们进行分类,以便更好地进行评估和评判。
SWC分类主要根据参赛作品的应用领域、数据来源、技术方法等因素进行分类。
下面将介绍几个常见的SWC分类。
1. Ontology-based Applications(基于本体的应用)这类参赛作品主要使用本体来描述领域知识,并利用本体进行数据的整合、推理和查询等操作。
常见的应用包括本体构建、本体匹配和本体推理等。
2. Linked Data Applications(链接数据应用)这类参赛作品主要利用链接数据的方式来整合和共享分布在不同数据源上的数据。
参赛者需要设计和实现链接数据的模型、查询和可视化等功能。
3. Semantic Web Services(语义网服务)这类参赛作品主要关注语义网服务的发现、组合和执行等问题。
参赛者需要设计和实现语义网服务的描述、匹配和调用等功能。
4. Semantic Web and Machine Learning(语义网与机器学习)这类参赛作品主要结合语义网和机器学习的技术,用于解决数据挖掘、信息检索和推荐等问题。
参赛者需要设计和实现语义网与机器学习的集成方法和算法。
5. Semantic Web and Natural Language Processing(语义网与自然语言处理)这类参赛作品主要关注语义网和自然语言处理的结合应用。
参赛者需要设计和实现基于语义网的自然语言理解、问答系统和文本分析等功能。
6. Semantic Web and Internet of Things(语义网与物联网)这类参赛作品主要关注语义网和物联网的融合应用。
语义网技术的发展与应用随着互联网的普及与数据的爆炸式增长,我们越来越需要一种更加高效、准确、智能的方式来处理和利用数据。
而语义网技术就是能够满足这种需求的一种新型数据处理技术。
本文将从语义网技术的定义、发展历程以及其应用前景三个方面来展开论述。
一、语义网技术的定义语义网技术,即语义网(Semantic Web),是一种基于网络的、带有语义的数据处理技术。
它能让机器理解文字和语言,并对其进行推理和应用,从而赋予数据更多的深层次的含义和价值。
语义网技术的核心是对于不同类型的信息进行统一整合、归纳和处理,以达到复杂、多样性数据间的自动化共享和交流。
二、语义网技术的发展历程语义网技术的历史可以追溯到英国人蒂姆·伯纳斯·李(Tim Berners-Lee)在1989年提出“万维网”(World Wide Web)的想法。
他最初创意是为了方便科学研究者之间的信息交流,而在此基础上,李提出了语义网的概念,即将现有的万维网变成一个更加智能化的平台,以减少数据匮乏、信息无效的情况。
20世纪90年代,随着万维网上的信息爆炸式增长,语义网技术逐渐得到了人们的重视。
在2001年,万维网联盟(W3C)发布了语义网指导方针,正式确立了语义网技术的标准化。
此后,每年W3C都会发布新的语义网推荐规范,不断完善和拓展语义网的功能和应用范畴。
三、语义网技术的应用前景语义网技术的应用前景非常广泛,可以用于企业管理、电子商务、智能家居、医疗健康、金融投资、灾害预警等多个领域。
以下是具体的几个应用实例:1.企业管理:语义网技术可以建立起一个完整、集成的企业数据体系,实现对企业内部数据和知识的有效管理与共享。
2.电子商务:语义网技术可以将产品和服务的信息进行语义化,方便消费者搜索和比较,提高电子商务的效率。
3.智能家居:语义网技术可以将家居设备和服务进行互联化,实现智能化的管理和控制,提升家庭生活质量和安全性。
4.医疗健康:语义网技术可以整合医学知识和患者数据,实现个性化的医疗服务和健康管理。
An Ontology-Based Multimedia Annotator for the Semantic Web of Language EngineeringArtem Chebotko, Wayne State University, USAYu Deng, Wayne State University, USAShiyong Lu, Wayne State University, USAFarshad Fotouhi, Wayne State University, USAAnthony Aristar, Wayne State University, USAABSTRACTThe development of the Semantic Web, the next-generation Web, greatly relies on the availability of ontologies and powerful annotation tools. However, there is a lack of ontology-based annotation tools for linguistic multimedia data. Existing tools either lack ontology support or provide limited support for multimedia. To fill the gap, we present an ontology-based linguistic multimedia annotation tool, OntoELAN, which features: (1) the support for OWL ontologies;(2) the management of language profiles, which allow the user to choose a subset of ontological terms for annotation; (3) the management of ontological tiers, which can be annotated with language profile terms and, therefore, corresponding ontological terms; and (4) storing OntoELAN annotation documents in XML format based on multimedia and domain ontologies. To our best knowledge, OntoELAN is the first audio/video annotation tool in the linguistic domain that provides support for ontology-based annotation. It is expected that the availability of such a tool will greatly facilitate the creation of linguistic multimedia repositories as islands of the Semantic Web of language engineering.Keywords:annotation; general multimedia ontology; GOLD; multimedia; ontology; OWL;Semantic WebINTRODUCTIONThe Semantic Web (Lu, Dong, & Fotouhi, 2002; Berners-Lee, Hendler, & Lassila, 2001) is the next-generation Web, in which information is structured with well-defined semantics, enabling better coopera-tion of machine and human effort. The Semantic Web is not a replacement, but an extension of the current Web, and its de-velopment greatly relies on the availability of ontologies and powerful annotation tools.Ontology development and annotation management are two challenges of the development of the Semantic Web, as we discussed in Chebotko, Lu, and Fotouhi(2004). In this article, although we use our developed General Multimedia Ontology as the framework and the GOLD ontology developed at the University of Arizona as an ontology example for ontology-based annotation of linguistic multimedia data, our focus will be on addressing the second chal-lenge—the development of an ontology-based multimedia annotator OntoELAN for the Semantic Web of language engineering.Recently, there is an increasing inter-est and effort for preserving and document-ing endangered languages (Lu et al., 2004; The National Science Foundation, 2004). Many languages are in serious danger of being lost, and if nothing is done to prevent it, half of the world’s approximately 6,500 languages will disappear in the next 100 years. The death of a language entails the loss of a community’s traditional culture, for the language is a unique vehicle for its traditions and culture.In the linguistic domain, many lan-guage data are collected as audio and video recordings, which impose a challenge to document indexing and retrieval. Annota-tion of multimedia data provides an oppor-tunity for making the semantics explicit and facilitates the searching of multimedia docu-ments. However, different annotators might use different vocabulary to annotate multi-media, which causes low recall and preci-sion in search and retrieval. In this article, we propose an ontology-based annotation approach, in which a linguistic ontology is used so that the terms and their relation-ships are formally defined. In this way, an-notators will use the same vocabulary to annotate multimedia, so that ontology-driven search engines will retrieve multimedia data with greater recall and precision. We be-lieve that even though in a particular do-main, it can be very difficult to enforce a uniform ontology that is agreed on by the whole community, ontology-driven annota-tion will benefit the community once ontol-ogy-aware federated retrieval systems are developed based on ontology techniques such as ontology mapping, alignment, and merging (Klein, 2001). In this article, we present an ontology-based linguistic multimedia annotation tool, OntoELAN—a successor of EUDICO Lin-guistic Annotator (ELAN) (Hellwig & Uytvanck, 2004), developed at the Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands, with the aim to provide a sound technological basis for the annotation and exploitation of multime-dia recordings. Although ELAN is designed specifically for linguistic domain (analysis of language, sign language, and gesture), it can be used for annotation, analysis, and documentation purposes in other multime-dia domains. We briefly describe the fea-tures of ELAN in the section, “An Over-view of OntoELAN,” and refer the reader to Hellwig and Uytvanck (2004) for de-tails. OntoELAN inherits all ELAN’s fea-tures and extends the tool with an ontol-ogy-based annotation approach. In particu-lar, our main contributions are:•OntoELAN can open and display ontolo-gies, specified in OWL Web Ontology Language (Bechhofer et al., 2004).•OntoELAN allows the creation of a lan-guage profile, which enables a user to choose a subset of terms from a linguis-tic ontology and conveniently rename them if needed.•OntoELAN allows the creation of onto-logical tiers, which can be annotated with profile terms and, therefore, their corresponding ontological terms.•OntoELAN saves annotations in XML (Bray, Paoli, Sperberg-McQueen, Maler, & Yergeau, 2004) format as class in-stances of the General Multimedia On-tology, which is designed based on the XML Schema (Fallside, 2001) for ELAN annotation files.•OntoELAN, while annotating ontologi-cal tiers, creates class instances of cor-responding ontologies linked to annota-tion tiers and relates them to instances of the General Multimedia Ontology classes.This paper extends the presentation of OntoELAN in Chebotko et al. (in press), with more details on ontological and archi-tectural aspects of OntoELAN and with a premier on OWL. Since OntoELAN is de-veloped to fulfill annotation requirements for the linguistic domain, it is natural that, in this article, we use linguistic annotation examples and link the General Ontology for Linguistic Description (GOLD) (Farrar & Langendoen, 2003) to an ontological tier. To our best knowledge, OntoELAN is the first audio/video annotation tool in the lin-guistic domain that provides support for ontology-based annotation. It is expected that the availability of such a tool will greatly facilitate the creation of linguistic multime-dia repositories as islands of the Semantic Web of language engineering. RELATED WORKIn the following, first we identify the requirements for linguistic multimedia an-notation, then we review existing annota-tion tools with respect to these require-ments. We conclude that these tools do not fully satisfy our requirements, and this mo-tivates our development of OntoELAN.Linguistic domain places some mini-mum requirements on multimedia annota-tion tools. While semantics-based contents such as speeches, gestures, signs, and scenes are important, color and shape are not of interest. To annotate semantics-based content, a tool should provide a time axis and the capability of its subdivision into time slots/segments, multiple tiers for different semantic content. Obviously, there should be some multimedia resource metadata such as title, authors, date, and time. Addi-tionally, a tool should provide ontology-based annotation features to enable the same an-notation vocabulary for a particular domain.As related work, we give a brief de-scription of the following tools: Protégé(Stanford University, 2004), IBM MPEG-7 Annotation Tool (International Business Machines Corporation, 2004), and ELAN (Hellwig & Uytvanck, 2004).Protégé is a popular ontology con-struction and annotation tool developed at Stanford University. Protégé supports the Web Ontology Language through the OWL plug-in, which allows a user to load OWL ontologies, annotate data, and save anno-tation markup. Unfortunately, Protégé pro-vides only simple multimedia support through the Media Slot Widget. The Media Slot Widget allows the inclusion and dis-play of video and audio files in Protégé, which may be enough for general descrip-tion of multimedia files like metadata en-tries, but not sufficient for annotation of a speech, where the multimedia time axis and its subdivision into segments are cru-cial.The IBM MPEG-7 Annotation Tool was developed by IBM to assist annotat-ing video sequences with MPEG-7 (Martínez, 2003) metadata based on the shots of the video. It does not support any ontology language and uses an editable lexi-con from which a user can choose key-words to annotate shots. A shot is defined as a time period in video in which the frames have similar scenes. Annotations are savedbased on MPEG-7 XML Schema (Martínez, 2003). Although the IBM MPEG-7 Annotation Tool was specially designed to annotate video, shot and lexi-con-based annotation does not provide enough flexibility for linguistic multimedia annotation. In particular, the shot approach is good for the annotation of content-based features like color and texture, but not for time alignment and time segmentation re-quired for semantics-based content anno-tation.ELAN (EUDICO Linguistic Annota-tor), developed at the Max Planck Institute for Psycholinguistics, Nijmegen, The Neth-erlands, is designed specifically for linguis-tic domain (analysis of language, sign lan-guage, and gesture) to provide a sound technological basis for the annotation and exploitation of multimedia recordings. ELAN provides many important features for linguistic data annotation such as time segmentation and multiple annotation lay-ers, but not the support of an ontology. Annotation files are saved in the XML for-mat based on ELAN XML Schema.As a summary, existing annotation tools such as Protégé and the IBM MPEG-7 Annotation Tool are not suitable for our purpose since they do not support many multimedia annotation operations such as multiple tiers, time transcription, and trans-lation of linguistic audio and video data. ELAN is the best candidate for becoming a widely accepted linguistic multimedia an-notator, and it is already used by linguists throughout the world. ELAN provides most of the required features for linguistic multi-media annotation, which motivates us to use it as the basis for the development of OntoELAN to add ontology-based annota-tion features such as the support of an on-tology and a language profile.AN OVERVIEW OF ONTOELANOntoELAN is an ontology-based lin-guistic multimedia annotator, developed on the top of ELAN annotator. It was partially sponsored and developed as a part of Elec-tronic Metastructure for Endangered Lan-guages Data (E-MELD) project. Currently, OntoELAN source code contains more than 60,000 lines of Java code and has sev-eral years of development history started by the Max Planck Institute for Psycholinguistics team and continued by the Wayne State University team. Both development teams will continue their col-laboration on ELAN and OntoELAN.OntoELAN has a long list of detailed descriptions of all its technical features, in-cluding the following features that are in-herited from ELAN:•display a speech and/or video signals, together with their annotations;•time linking of annotations to media streams;•linking of annotations to other annota-tions;•unlimited number of annotation tiers as defined by a user;•different character sets; and •basic search facilities.OntoELAN implements the following ad-ditional features:•loading of OWL ontologies;•language profile creation;•ontology-based annotation; and •storing annotations in the XML format based on the General Multimedia On-tology and domain ontologies.The main window of OntoELAN is shown in Figure 1. OntoELAN has the video viewer, the annotation density viewer, the waveform viewer, the grid viewer, the sub-title viewer, the text viewer, the timeline viewer, the interlinear viewer, and associ-ated with them controls and menus. All viewers are synchronized so that whenever a user accesses a point in time in one viewer, all the other viewers move to the corresponding point in time automatically. The video viewer displays video in “mpg”and “mov” formats, and can be resized or detached to play video in a separate win-dow. The annotation density viewer is use-ful for navigation through the media file and analysis of annotations concentration. The waveform viewer displays the waveform of the audio file in “wav” format; in case of video files, there should be an additional “wav” file present to display waveform. The grid viewer displays annotations and associated time segments for a selected annotation tier. The subtitle viewer displays annotations on selected annotation tiers at the current point in time. The text viewer displays annotations of a selected annota-tion tier as a continuous text. The timeline viewer and the interlinear viewer are in-terchangeable, and both display all tiers and all their annotations; only one viewer can be used at a time. In this article, we will mostly work with the timeline viewer (see Figure 1), which allows a user to perform of operations on tiers and annotations. Be-cause a significant part of the OntoELAN interface is inherited from ELAN, the reader can refer to Hellwig and Uytvanck (2004) for detailed description. OntoELAN uses and manages several data sources:•General Multimedia Ontology (OWL): ontological terms for multimedia anno-tations.Figure 1. A snapshot of the OntoELAN main window•Linguistic domain ontologies (OWL): ontological terms for linguistic annota-tions.•Language profiles (XML): a selected subset of domain ontology terms for lin-guistic annotations.•OntoELAN annotation documents (XML): storage for linguistic multime-dia annotations.A data flow diagram for OntoELAN is shown in Figure 2. We do not specify names of most data flows, as they are too general to give any additional information. Two data flows from a user are user-de-fined terms for language profiles and lin-guistic multimedia annotations.In the following sections, we will give more details on OntoELAN data sources and data flows. We focus more on the de-scription of features that make OntoELAN an ontology-based multimedia annotator, like OWL support, linguistic domain ontol-ogy and the General Multimedia Ontology, a language profile, ontological annotation tiers, and so forth.SUPPORT OF OWLOWL Web Ontology Language (Bechhofer et al., 2004) is recently recom-mended as the semantic markup language for publishing and sharing ontologies on the World Wide Web. It is developed as a revi-sion of DAML+OIL language and has more expressive power than XML, RDF, and RDF Schema (RDF-S). OWL provides constructs to define ontologies, classes, properties, individuals, data types, and their relationships. In the following, we present a brief overview of the major constructs and refer the reader to Bechhofer et al. (2004) for more details.ClassesA class defines a group of individuals that share some properties. A class is de-fined by owl:Class, and different classes can be related by rdfs:subClassOf into a class hierarchy. Other relationships be-tween classes can be specified by owl:equivalentClass, owl:disjointWith,Figure 2. OntoELAN data flow diagramand so forth. The extension of a class can be specified by owl:oneOf with a list of class members or by owl:intersectionOf, owl:unionOf and owl:complementOf with a list of other classes.PropertiesA property states relationships be-tween individuals or from individuals to data values. The former is called ObjectProperty and specified by owl:ObjectProperty. The latter is called DatatypeProperty and specified by owl:DatatypeProperty. Similarly to classes, different properties can be related by rdfs:subPropertyOf into a property hi-erarchy. The domain and range of a prop-erty are specified by rdfs:domain and rdfs:range, respectively. Two properties might be asserted to be equivalent by owl:equivalentProperty. In addition, dif-ferent characteristics of a property can be specified by owl:FunctionalProperty, o w l:I n v e r s e F u n c t i o n a l P ro p e r t y, owl:TransitiveProperty, and owl: SymmetricProperty.Property RestrictionsA property restriction is a special kind of a class description. It defines an anony-mous class, namely the set of individuals that satisfy the restriction. There are two kinds of property restrictions: value con-straints and cardinality constraints. Value constraints restrict the values that a prop-erty can take within a particular class, and they are specified by owl:allValuesFrom, owl:someValuesFrom, and owl:hasValue. Cardinality constraints restrict the number of values that a property can take within a particular class, and they are specified by owl:minCardinality, owl:maxCardinality, owl:cardinality, and so forth.OWL is subdivided into three species (in increasingly-expressive order): OWL Lite, OWL DL, and OWL Full. OWL Lite places some limitations on the usage of constructs and is primarily suitable for ex-pressing taxonomies. For example, owl:unionOf and owl:complementOf are not part of OWL Lite, and cardinality con-straints may only have a 0 or 1 value. OWL DL provides more expressivity and still guarantees computational completeness and decidability. In particular, OWL DL supports all OWL constructs, but places some restrictions (e.g., class cannot be treated as an individual). Finally, OWL Full gives maximum expressiveness, but not computational guarantee.OntoELAN uses the Jena 2 (Hewlett-Packard Labs, 2004) Java framework for writing Semantic Web applications to pro-vide OWL DL support. On the language profile creation stage, OntoELAN basically uses class hierarchy information based on rdfs:subClassOf construct. However, while annotating data with ontological terms (by means of a language profile), OntoELAN generates dynamic interface for creating instances, assigning property val-ues, and so forth.LINGUISTIC DOMAIN ONTOLOGYAs a linguistic domain ontology ex-ample, we use the General Ontology for Linguistic Description (GOLD) (Farrar & Langendoen, 2003). To make things clear from the beginning, OntoELAN does not have GOLD as a component; both are in-dependent. The user can load any other linguistic domain ontology, thereforeOntoELAN can be used as a multimedia annotator in other domains that require simi-lar features. Moreover, the user can load several different ontologies for distinct an-notation tiers to provide multi-ontological or even multi-domain annotation ap-proaches. For example, a gesture ontology can be used for linguistic multimedia anno-tation, as a speaker’s gestures help the audience understand the meaning of a speech better. Therefore, linguists can use GOLD in one tier and the gesture ontol-ogy in another tier to capture more se-mantics.The General Ontology for Linguistic Description is an ongoing research effort led by the University of Arizona to define linguistic domain-specific terms using OWL. GOLD is constantly under revision, and the ontology changes with introduction of new classes, properties, and relations; its structure also changes. Current infor-mation about GOLD is available at , and the ontology is also downloadable from / ~farrar/gold.owl. We briefly describe GOLD content in the next few paragraphs and refer the reader to Farrar and Langendoen (2003) and also to Farrar (2004) for more details.GOLD provides a semantic frame-work for the representation of linguistic knowledge and organizes knowledge into four major categories:•Expressions: Physically accessible as-pects of a language. Linguistic expres-sions include the actual printed words or sounds produced when someone speaks. For example, Orthographic Expression, Utterance, Signed Ex-pression, Word, WordPart, Prefix.•Grammar: The abstract properties and relations of a language. For example,Tense, Number, Agreement, PartOf Speech.•Data Structures: Constructs that are used by linguists to analyze language data. A linguistic data structure can be viewed as a structuring mechanism for linguistic data content. For example, a lexical entry is a data structure used to organize lexical content. Other examples are a phoneme table and a syntactic tree.•Metaconcepts: The most basic concepts of linguistic analysis. The example of a metaconcept is a language itself. Through the article we will use only simple GOLD concepts like Noun, Verb, Parti-ciple, Preverb. They are subclasses of PartOfSpeech, and their meaning is easy to understand without special training. Ad-ditionally, we will use the concepts Animate (living things, including humans, animals, spirits, trees, and most plants) and Inani-mate (non-living things, such as objects of manufacture and natural “non-living”things), which are two grammatical gen-ders or classes of nouns. GENERAL MULTIMEDIA ONTOLOGYAlthough OntoELAN is an ontology-based annotator, a user may not use onto-logical terms for annotation. In fact, for lin-guistic multimedia annotation there should usually be several annotation tiers whose annotation is not based on ontological terms. For example, a speech transcription and a speech translation into another language do not use an ontology. Consequently, OntoELAN needs to save not only instances of classes created for ontology-based an-notations, but also other text data created without ontologies. One solution is to use XML Schema definitions to save an anno-tation file in the XML format—this is what ELAN does. Being consistent in using an ontological approach and, therefore, build-ing the Semantic Web, we provide another solution—the multimedia ontology.We have developed the multimedia ontology that we called General Multime-dia Ontology and that serves as a semantic framework for multimedia annotation. In contrast to domain ontologies, the General Multimedia Ontology is a crucial compo-nent of the system. OntoELAN saves its annotations in the XML format as class in-stances of the General Multimedia Ontol-ogy and class instances of linguistic domain ontologies that are used in ontological tiers.The General Multimedia Ontology is expressed in Web Ontology Language and is designed based on ELAN XML Schema for annotation. The General Multimedia Ontology contains the following classes:•AnnotationDocument, which repre-sents the whole annotation document.•Tier, which represents a single annota-tion tier/layer. There are several types of tiers that a user can choose.•TimeSlot, which represents a concept of a time segment that may subdivide tiers.•Annotation, which can be either AlignableAnnotation or Referring Annotation.•AlignableAnnotation, which links di-rectly to a time slot.•ReferringAnnotation, which can ref-erence an existing Alignable Annota-tion.•AnnotationValue, which has two sub-classes StringAnnotation and Ontol-ogy Annotation that represent two dif-ferent ways of annotating.•MediaDescriptor, TimeUnit and others.Relationships among some important Gen-eral Multimedia Ontology classes are pre-sented in Figure 3. In general, AnnotationDocument may have zero or many Tiers, which, in turn, may have zero or many Annotations. Annotation can be either AlignableAnnotation or ReferringAnnotation, where Alignable Annotation can be divided by TimeSlots, and ReferringAnnotation can refer to another annotation. ReferringAnnotation may refer to AlignableAnnotation, as well as to ReferringAnnotation, but the root of the referenced annotations must be an AlignableAnnotation. Each Annotation has one AnnotationValue, which can be either a StringAnnotation or an OntologyAnnotation. StringAnnotation represents any string that a user can input as an annotation value, but values, repre-sented by OntologyAnnotation, come from a language profile and, consequently, from an ontology. Note that the General Multimedia Ontology allows Ontology Annotation to be used only with ReferringAnnotation. In other words, tiers with AlignableAnnotations do not support an ontology-based approach. This limita-tion is due to software development is-sues—OntoELAN does not support anno-tation with ontological terms in alignable tiers. We intentionally emphasize this con-straint in the ontology, although conceptu-ally it should not be the case. Among our contributions is the introduc-tion of the OWL class Ontology Annota-tion, which serves as an annotation unit for an ontology-based annotation. OntologyAnnotation has restrictions on the following properties:•hasOntAnnotationId: The ID of the annotation. The property cardinality equals one (owl:cardinality = 1).•hasUserDefinedTerm, which relates OntologyAnnotation to a term in a lan-guage profile (described in the next sec-tion). The property cardinality equals one (owl:cardinality = 1).•hasInstances, which relates Ontology Annotation to a term (represented as an instance) in an ontology used for an-notation. The property cardinality is greater than zero (owl:minCardinality = 1).•hasOntAnnotationDescription: De-scriptions/comments on the annotation.The property cardinality is not restricted. The General Multimedia Ontology is avail-able at /proj/ OntoELAN/multimedia.owl. We will add new concepts to the ontology in case if OntoELAN needs them for annotation. We have developed the General Multimedia Ontology especially for OntoELAN and have not included most concepts in multi-media domain. In particular, we did not in-clude multimedia concepts such as those related to shapes, colors, motions, audio spectrum, and so forth. Our small ontology focuses on high-level multimedia annota-tion features and can be used for similar annotation tasks.LANGUAGE PROFILEA language profile is a subset of on-tological terms, possibly renamed, that are used in the annotation of a particular multi-media resource. The idea of a language profile comes from the following practicalFigure 3. Relationships among some General Multimedia Ontology classes (UML class diagram)issues related to an ontology-based anno-tation.A domain ontology defines all terms related to a particular domain, and the num-ber of terms is usually considerably large. However, to annotate a concrete data re-source, an annotator usually does not need all terms from an ontology. Moreover, an experienced annotator can identify a sub-set of ontological terms that will be useful for a given resource. Speaking in terms of a linguistic domain, an annotator will only use a subset of GOLD to annotate a par-ticular language and may need a different subset for another language.Linguists have been annotating mul-timedia data for years without standard-ized terms from an ontology. They have their individual sets of terms that they are accustomed to using for annotation. It will be difficult to come to a consensus about class names in GOLD so that every lin-guist is satisfied with it. Additionally, lin-guists widely use abbreviations like “n” for “noun” which is concise and convenient. Finally, linguists whose native language is, for example, Ukrainian may prefer to use annotation terms in Ukrainian rather than in English.More formally, a language profile is defined as a quadruple: ontological terms; user-defined terms; a mapping between ontological terms and user-defined terms; and a reference to an ontology, which con-tains the structural information about terms (like subclass relationship). In summary, a language profile in OntoELAN provides convenience and flexibility for a user to:•select a subset of ontological terms use-ful for a particular resource annotation;•rename ontological terms, for example, use another language, give an abbrevia-tion or a synonym;•combine the meaning of two or many ontological terms in one user-defined term (e.g., ontological terms “Inanimate”and “Noun” may be conveniently re-named as “NI”).OntoELAN allows ontology-based annotation by means of a language profile.A user opens an ontology, creates a pro-file, and links it to an ontological tier. Anno-tation values for an ontological tier can only be selected from a language profile.A language profile in OntoELAN is repre-sented as a simple XML document (see Figure 4) with a specified schema, which basically maps ontological terms to user-defined terms, and has a link to the original ontology and some metadata. A user can easily create, open, edit, and save profiles with OntoELAN.Figure 4 presents an example lan-guage profile, created by the author Artem and linked to GOLD ontology at URI /~farrar/gold.owl. InFigure 4. An example of the language profile XML document。