当前位置:文档之家› Semantic Web

Semantic Web

Semantic Web
Semantic Web

An Ontology-Based Multimedia Annotator for the Semantic Web of Language Engineering

Artem Chebotko, Wayne State University, USA

Yu Deng, Wayne State University, USA

Shiyong Lu, Wayne State University, USA

Farshad Fotouhi, Wayne State University, USA

Anthony Aristar, Wayne State University, USA

ABSTRACT

The development of the Semantic Web, the next-generation Web, greatly relies on the availability of ontologies and powerful annotation tools. However, there is a lack of ontology-based annotation tools for linguistic multimedia data. Existing tools either lack ontology support or provide limited support for multimedia. To fill the gap, we present an ontology-based linguistic multimedia annotation tool, OntoELAN, which features: (1) the support for OWL ontologies;

(2) the management of language profiles, which allow the user to choose a subset of ontological terms for annotation; (3) the management of ontological tiers, which can be annotated with language profile terms and, therefore, corresponding ontological terms; and (4) storing OntoELAN annotation documents in XML format based on multimedia and domain ontologies. To our best knowledge, OntoELAN is the first audio/video annotation tool in the linguistic domain that provides support for ontology-based annotation. It is expected that the availability of such a tool will greatly facilitate the creation of linguistic multimedia repositories as islands of the Semantic Web of language engineering.

Keywords:annotation; general multimedia ontology; GOLD; multimedia; ontology; OWL;

Semantic Web

INTRODUCTION

The Semantic Web (Lu, Dong, & Fotouhi, 2002; Berners-Lee, Hendler, & Lassila, 2001) is the next-generation Web, in which information is structured with well-defined semantics, enabling better coopera-tion of machine and human effort. The Semantic Web is not a replacement, but an extension of the current Web, and its de-velopment greatly relies on the availability of ontologies and powerful annotation tools.

Ontology development and annotation management are two challenges of the development of the Semantic Web, as we discussed in Chebotko, Lu, and Fotouhi

(2004). In this article, although we use our developed General Multimedia Ontology as the framework and the GOLD ontology developed at the University of Arizona as an ontology example for ontology-based annotation of linguistic multimedia data, our focus will be on addressing the second chal-lenge—the development of an ontology-based multimedia annotator OntoELAN for the Semantic Web of language engineering.

Recently, there is an increasing inter-est and effort for preserving and document-ing endangered languages (Lu et al., 2004; The National Science Foundation, 2004). Many languages are in serious danger of being lost, and if nothing is done to prevent it, half of the world’s approximately 6,500 languages will disappear in the next 100 years. The death of a language entails the loss of a community’s traditional culture, for the language is a unique vehicle for its traditions and culture.

In the linguistic domain, many lan-guage data are collected as audio and video recordings, which impose a challenge to document indexing and retrieval. Annota-tion of multimedia data provides an oppor-tunity for making the semantics explicit and facilitates the searching of multimedia docu-ments. However, different annotators might use different vocabulary to annotate multi-media, which causes low recall and preci-sion in search and retrieval. In this article, we propose an ontology-based annotation approach, in which a linguistic ontology is used so that the terms and their relation-ships are formally defined. In this way, an-notators will use the same vocabulary to annotate multimedia, so that ontology-driven search engines will retrieve multimedia data with greater recall and precision. We be-lieve that even though in a particular do-main, it can be very difficult to enforce a uniform ontology that is agreed on by the whole community, ontology-driven annota-tion will benefit the community once ontol-ogy-aware federated retrieval systems are developed based on ontology techniques such as ontology mapping, alignment, and merging (Klein, 2001). In this article, we present an ontology-based linguistic multimedia annotation tool, OntoELAN—a successor of EUDICO Lin-guistic Annotator (ELAN) (Hellwig & Uytvanck, 2004), developed at the Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands, with the aim to provide a sound technological basis for the annotation and exploitation of multime-dia recordings. Although ELAN is designed specifically for linguistic domain (analysis of language, sign language, and gesture), it can be used for annotation, analysis, and documentation purposes in other multime-dia domains. We briefly describe the fea-tures of ELAN in the section, “An Over-view of OntoELAN,” and refer the reader to Hellwig and Uytvanck (2004) for de-tails. OntoELAN inherits all ELAN’s fea-tures and extends the tool with an ontol-ogy-based annotation approach. In particu-lar, our main contributions are:?OntoELAN can open and display ontolo-gies, specified in OWL Web Ontology Language (Bechhofer et al., 2004).?OntoELAN allows the creation of a lan-guage profile, which enables a user to choose a subset of terms from a linguis-tic ontology and conveniently rename them if needed.

?OntoELAN allows the creation of onto-logical tiers, which can be annotated with profile terms and, therefore, their corresponding ontological terms.?OntoELAN saves annotations in XML (Bray, Paoli, Sperberg-McQueen, Maler, & Yergeau, 2004) format as class in-

stances of the General Multimedia On-tology, which is designed based on the XML Schema (Fallside, 2001) for ELAN annotation files.

?OntoELAN, while annotating ontologi-cal tiers, creates class instances of cor-responding ontologies linked to annota-tion tiers and relates them to instances of the General Multimedia Ontology classes.

This paper extends the presentation of OntoELAN in Chebotko et al. (in press), with more details on ontological and archi-tectural aspects of OntoELAN and with a premier on OWL. Since OntoELAN is de-veloped to fulfill annotation requirements for the linguistic domain, it is natural that, in this article, we use linguistic annotation examples and link the General Ontology for Linguistic Description (GOLD) (Farrar & Langendoen, 2003) to an ontological tier. To our best knowledge, OntoELAN is the first audio/video annotation tool in the lin-guistic domain that provides support for ontology-based annotation. It is expected that the availability of such a tool will greatly facilitate the creation of linguistic multime-dia repositories as islands of the Semantic Web of language engineering. RELATED WORK

In the following, first we identify the requirements for linguistic multimedia an-notation, then we review existing annota-tion tools with respect to these require-ments. We conclude that these tools do not fully satisfy our requirements, and this mo-tivates our development of OntoELAN.

Linguistic domain places some mini-mum requirements on multimedia annota-tion tools. While semantics-based contents such as speeches, gestures, signs, and scenes are important, color and shape are not of interest. To annotate semantics-based content, a tool should provide a time axis and the capability of its subdivision into time slots/segments, multiple tiers for different semantic content. Obviously, there should be some multimedia resource metadata such as title, authors, date, and time. Addi-tionally, a tool should provide ontology-based annotation features to enable the same an-notation vocabulary for a particular domain.

As related work, we give a brief de-scription of the following tools: Protégé(Stanford University, 2004), IBM MPEG-7 Annotation Tool (International Business Machines Corporation, 2004), and ELAN (Hellwig & Uytvanck, 2004).

Protégé is a popular ontology con-struction and annotation tool developed at Stanford University. Protégé supports the Web Ontology Language through the OWL plug-in, which allows a user to load OWL ontologies, annotate data, and save anno-tation markup. Unfortunately, Protégé pro-vides only simple multimedia support through the Media Slot Widget. The Media Slot Widget allows the inclusion and dis-play of video and audio files in Protégé, which may be enough for general descrip-tion of multimedia files like metadata en-tries, but not sufficient for annotation of a speech, where the multimedia time axis and its subdivision into segments are cru-cial.

The IBM MPEG-7 Annotation Tool was developed by IBM to assist annotat-ing video sequences with MPEG-7 (Martínez, 2003) metadata based on the shots of the video. It does not support any ontology language and uses an editable lexi-con from which a user can choose key-words to annotate shots. A shot is defined as a time period in video in which the frames have similar scenes. Annotations are saved

based on MPEG-7 XML Schema (Martínez, 2003). Although the IBM MPEG-7 Annotation Tool was specially designed to annotate video, shot and lexi-con-based annotation does not provide enough flexibility for linguistic multimedia annotation. In particular, the shot approach is good for the annotation of content-based features like color and texture, but not for time alignment and time segmentation re-quired for semantics-based content anno-tation.

ELAN (EUDICO Linguistic Annota-tor), developed at the Max Planck Institute for Psycholinguistics, Nijmegen, The Neth-erlands, is designed specifically for linguis-tic domain (analysis of language, sign lan-guage, and gesture) to provide a sound technological basis for the annotation and exploitation of multimedia recordings. ELAN provides many important features for linguistic data annotation such as time segmentation and multiple annotation lay-ers, but not the support of an ontology. Annotation files are saved in the XML for-mat based on ELAN XML Schema.

As a summary, existing annotation tools such as Protégé and the IBM MPEG-7 Annotation Tool are not suitable for our purpose since they do not support many multimedia annotation operations such as multiple tiers, time transcription, and trans-lation of linguistic audio and video data. ELAN is the best candidate for becoming a widely accepted linguistic multimedia an-notator, and it is already used by linguists throughout the world. ELAN provides most of the required features for linguistic multi-media annotation, which motivates us to use it as the basis for the development of OntoELAN to add ontology-based annota-tion features such as the support of an on-tology and a language profile.AN OVERVIEW OF ONTOELAN

OntoELAN is an ontology-based lin-guistic multimedia annotator, developed on the top of ELAN annotator. It was partially sponsored and developed as a part of Elec-tronic Metastructure for Endangered Lan-guages Data (E-MELD) project. Currently, OntoELAN source code contains more than 60,000 lines of Java code and has sev-eral years of development history started by the Max Planck Institute for Psycholinguistics team and continued by the Wayne State University team. Both development teams will continue their col-laboration on ELAN and OntoELAN.

OntoELAN has a long list of detailed descriptions of all its technical features, in-cluding the following features that are in-herited from ELAN:

?display a speech and/or video signals, together with their annotations;?time linking of annotations to media streams;

?linking of annotations to other annota-tions;

?unlimited number of annotation tiers as defined by a user;

?different character sets; and ?basic search facilities.

OntoELAN implements the following ad-ditional features:

?loading of OWL ontologies;?language profile creation;?ontology-based annotation; and ?storing annotations in the XML format based on the General Multimedia On-tology and domain ontologies.

The main window of OntoELAN is shown in Figure 1. OntoELAN has the video viewer, the annotation density viewer, the waveform viewer, the grid viewer, the sub-title viewer, the text viewer, the timeline viewer, the interlinear viewer, and associ-ated with them controls and menus. All viewers are synchronized so that whenever a user accesses a point in time in one viewer, all the other viewers move to the corresponding point in time automatically. The video viewer displays video in “mpg”and “mov” formats, and can be resized or detached to play video in a separate win-dow. The annotation density viewer is use-ful for navigation through the media file and analysis of annotations concentration. The waveform viewer displays the waveform of the audio file in “wav” format; in case of video files, there should be an additional “wav” file present to display waveform. The grid viewer displays annotations and associated time segments for a selected annotation tier. The subtitle viewer displays annotations on selected annotation tiers at the current point in time. The text viewer displays annotations of a selected annota-tion tier as a continuous text. The timeline viewer and the interlinear viewer are in-terchangeable, and both display all tiers and all their annotations; only one viewer can be used at a time. In this article, we will mostly work with the timeline viewer (see Figure 1), which allows a user to perform of operations on tiers and annotations. Be-cause a significant part of the OntoELAN interface is inherited from ELAN, the reader can refer to Hellwig and Uytvanck (2004) for detailed description. OntoELAN uses and manages several data sources:

?General Multimedia Ontology (OWL): ontological terms for multimedia anno-tations.

Figure 1. A snapshot of the OntoELAN main window

?Linguistic domain ontologies (OWL): ontological terms for linguistic annota-tions.

?Language profiles (XML): a selected subset of domain ontology terms for lin-guistic annotations.

?OntoELAN annotation documents (XML): storage for linguistic multime-dia annotations.

A data flow diagram for OntoELAN is shown in Figure 2. We do not specify names of most data flows, as they are too general to give any additional information. Two data flows from a user are user-de-fined terms for language profiles and lin-guistic multimedia annotations.

In the following sections, we will give more details on OntoELAN data sources and data flows. We focus more on the de-scription of features that make OntoELAN an ontology-based multimedia annotator, like OWL support, linguistic domain ontol-ogy and the General Multimedia Ontology, a language profile, ontological annotation tiers, and so forth.SUPPORT OF OWL

OWL Web Ontology Language (Bechhofer et al., 2004) is recently recom-mended as the semantic markup language for publishing and sharing ontologies on the World Wide Web. It is developed as a revi-sion of DAML+OIL language and has more expressive power than XML, RDF, and RDF Schema (RDF-S). OWL provides constructs to define ontologies, classes, properties, individuals, data types, and their relationships. In the following, we present a brief overview of the major constructs and refer the reader to Bechhofer et al. (2004) for more details.

Classes

A class defines a group of individuals that share some properties. A class is de-fined by owl:Class, and different classes can be related by rdfs:subClassOf into a class hierarchy. Other relationships be-tween classes can be specified by owl:equivalentClass, owl:disjointWith,

Figure 2. OntoELAN data flow diagram

and so forth. The extension of a class can be specified by owl:oneOf with a list of class members or by owl:intersectionOf, owl:unionOf and owl:complementOf with a list of other classes.

Properties

A property states relationships be-tween individuals or from individuals to data values. The former is called ObjectProperty and specified by owl:ObjectProperty. The latter is called DatatypeProperty and specified by owl:DatatypeProperty. Similarly to classes, different properties can be related by rdfs:subPropertyOf into a property hi-erarchy. The domain and range of a prop-erty are specified by rdfs:domain and rdfs:range, respectively. Two properties might be asserted to be equivalent by owl:equivalentProperty. In addition, dif-ferent characteristics of a property can be specified by owl:FunctionalProperty, o w l:I n v e r s e F u n c t i o n a l P ro p e r t y, owl:TransitiveProperty, and owl: SymmetricProperty.

Property Restrictions

A property restriction is a special kind of a class description. It defines an anony-mous class, namely the set of individuals that satisfy the restriction. There are two kinds of property restrictions: value con-straints and cardinality constraints. Value constraints restrict the values that a prop-erty can take within a particular class, and they are specified by owl:allValuesFrom, owl:someValuesFrom, and owl:hasValue. Cardinality constraints restrict the number of values that a property can take within a particular class, and they are specified by owl:minCardinality, owl:maxCardinality, owl:cardinality, and so forth.

OWL is subdivided into three species (in increasingly-expressive order): OWL Lite, OWL DL, and OWL Full. OWL Lite places some limitations on the usage of constructs and is primarily suitable for ex-pressing taxonomies. For example, owl:unionOf and owl:complementOf are not part of OWL Lite, and cardinality con-straints may only have a 0 or 1 value. OWL DL provides more expressivity and still guarantees computational completeness and decidability. In particular, OWL DL supports all OWL constructs, but places some restrictions (e.g., class cannot be treated as an individual). Finally, OWL Full gives maximum expressiveness, but not computational guarantee.

OntoELAN uses the Jena 2 (Hewlett-Packard Labs, 2004) Java framework for writing Semantic Web applications to pro-vide OWL DL support. On the language profile creation stage, OntoELAN basically uses class hierarchy information based on rdfs:subClassOf construct. However, while annotating data with ontological terms (by means of a language profile), OntoELAN generates dynamic interface for creating instances, assigning property val-ues, and so forth.

LINGUISTIC DOMAIN ONTOLOGY

As a linguistic domain ontology ex-ample, we use the General Ontology for Linguistic Description (GOLD) (Farrar & Langendoen, 2003). To make things clear from the beginning, OntoELAN does not have GOLD as a component; both are in-dependent. The user can load any other linguistic domain ontology, therefore

OntoELAN can be used as a multimedia annotator in other domains that require simi-lar features. Moreover, the user can load several different ontologies for distinct an-notation tiers to provide multi-ontological or even multi-domain annotation ap-proaches. For example, a gesture ontology can be used for linguistic multimedia anno-tation, as a speaker’s gestures help the audience understand the meaning of a speech better. Therefore, linguists can use GOLD in one tier and the gesture ontol-ogy in another tier to capture more se-mantics.

The General Ontology for Linguistic Description is an ongoing research effort led by the University of Arizona to define linguistic domain-specific terms using OWL. GOLD is constantly under revision, and the ontology changes with introduction of new classes, properties, and relations; its structure also changes. Current infor-mation about GOLD is available at https://www.doczj.com/doc/4c7937622.html,, and the ontology is also downloadable from https://www.doczj.com/doc/4c7937622.html,/ ~farrar/gold.owl. We briefly describe GOLD content in the next few paragraphs and refer the reader to Farrar and Langendoen (2003) and also to Farrar (2004) for more details.

GOLD provides a semantic frame-work for the representation of linguistic knowledge and organizes knowledge into four major categories:?Expressions: Physically accessible as-pects of a language. Linguistic expres-sions include the actual printed words or sounds produced when someone speaks. For example, Orthographic Expression, Utterance, Signed Ex-pression, Word, WordPart, Prefix.?Grammar: The abstract properties and relations of a language. For example,

Tense, Number, Agreement, PartOf Speech.

?Data Structures: Constructs that are used by linguists to analyze language data. A linguistic data structure can be viewed as a structuring mechanism for linguistic data content. For example, a lexical entry is a data structure used to organize lexical content. Other examples are a phoneme table and a syntactic tree.?Metaconcepts: The most basic concepts of linguistic analysis. The example of a metaconcept is a language itself. Through the article we will use only simple GOLD concepts like Noun, Verb, Parti-ciple, Preverb. They are subclasses of PartOfSpeech, and their meaning is easy to understand without special training. Ad-ditionally, we will use the concepts Animate (living things, including humans, animals, spirits, trees, and most plants) and Inani-mate (non-living things, such as objects of manufacture and natural “non-living”things), which are two grammatical gen-ders or classes of nouns. GENERAL MULTIMEDIA ONTOLOGY

Although OntoELAN is an ontology-based annotator, a user may not use onto-logical terms for annotation. In fact, for lin-guistic multimedia annotation there should usually be several annotation tiers whose annotation is not based on ontological terms. For example, a speech transcription and a speech translation into another language do not use an ontology. Consequently, OntoELAN needs to save not only instances of classes created for ontology-based an-notations, but also other text data created without ontologies. One solution is to use XML Schema definitions to save an anno-

tation file in the XML format—this is what ELAN does. Being consistent in using an ontological approach and, therefore, build-ing the Semantic Web, we provide another solution—the multimedia ontology.

We have developed the multimedia ontology that we called General Multime-dia Ontology and that serves as a semantic framework for multimedia annotation. In contrast to domain ontologies, the General Multimedia Ontology is a crucial compo-nent of the system. OntoELAN saves its annotations in the XML format as class in-stances of the General Multimedia Ontol-ogy and class instances of linguistic domain ontologies that are used in ontological tiers.

The General Multimedia Ontology is expressed in Web Ontology Language and is designed based on ELAN XML Schema for annotation. The General Multimedia Ontology contains the following classes:?AnnotationDocument, which repre-sents the whole annotation document.?Tier, which represents a single annota-tion tier/layer. There are several types of tiers that a user can choose.?TimeSlot, which represents a concept of a time segment that may subdivide tiers.

?Annotation, which can be either AlignableAnnotation or Referring Annotation.?AlignableAnnotation, which links di-rectly to a time slot.?ReferringAnnotation, which can ref-erence an existing Alignable Annota-tion.

?AnnotationValue, which has two sub-classes StringAnnotation and Ontol-ogy Annotation that represent two dif-ferent ways of annotating.?MediaDescriptor, TimeUnit and others.Relationships among some important Gen-eral Multimedia Ontology classes are pre-sented in Figure 3. In general, AnnotationDocument may have zero or many Tiers, which, in turn, may have zero or many Annotations. Annotation can be either AlignableAnnotation or ReferringAnnotation, where Alignable Annotation can be divided by TimeSlots, and ReferringAnnotation can refer to another annotation. ReferringAnnotation may refer to AlignableAnnotation, as well as to ReferringAnnotation, but the root of the referenced annotations must be an AlignableAnnotation. Each Annotation has one AnnotationValue, which can be either a StringAnnotation or an OntologyAnnotation. StringAnnotation represents any string that a user can input as an annotation value, but values, repre-sented by OntologyAnnotation, come from a language profile and, consequently, from an ontology. Note that the General Multimedia Ontology allows Ontology Annotation to be used only with ReferringAnnotation. In other words, tiers with AlignableAnnotations do not support an ontology-based approach. This limita-tion is due to software development is-sues—OntoELAN does not support anno-tation with ontological terms in alignable tiers. We intentionally emphasize this con-straint in the ontology, although conceptu-ally it should not be the case. Among our contributions is the introduc-tion of the OWL class Ontology Annota-tion, which serves as an annotation unit for an ontology-based annotation. OntologyAnnotation has restrictions on the following properties:?hasOntAnnotationId: The ID of the annotation. The property cardinality equals one (owl:cardinality = 1).

?hasUserDefinedTerm, which relates OntologyAnnotation to a term in a lan-guage profile (described in the next sec-tion). The property cardinality equals one (owl:cardinality = 1).?hasInstances, which relates Ontology Annotation to a term (represented as an instance) in an ontology used for an-notation. The property cardinality is greater than zero (owl:minCardinality = 1).?hasOntAnnotationDescription: De-scriptions/comments on the annotation.

The property cardinality is not restricted. The General Multimedia Ontology is avail-able at https://www.doczj.com/doc/4c7937622.html,/proj/ OntoELAN/multimedia.owl. We will add new concepts to the ontology in case if OntoELAN needs them for annotation. We have developed the General Multimedia Ontology especially for OntoELAN and have not included most concepts in multi-media domain. In particular, we did not in-clude multimedia concepts such as those related to shapes, colors, motions, audio spectrum, and so forth. Our small ontology focuses on high-level multimedia annota-tion features and can be used for similar annotation tasks.

LANGUAGE PROFILE

A language profile is a subset of on-tological terms, possibly renamed, that are used in the annotation of a particular multi-media resource. The idea of a language profile comes from the following practical

Figure 3. Relationships among some General Multimedia Ontology classes (UML class diagram)

issues related to an ontology-based anno-tation.

A domain ontology defines all terms related to a particular domain, and the num-ber of terms is usually considerably large. However, to annotate a concrete data re-source, an annotator usually does not need all terms from an ontology. Moreover, an experienced annotator can identify a sub-set of ontological terms that will be useful for a given resource. Speaking in terms of a linguistic domain, an annotator will only use a subset of GOLD to annotate a par-ticular language and may need a different subset for another language.

Linguists have been annotating mul-timedia data for years without standard-ized terms from an ontology. They have their individual sets of terms that they are accustomed to using for annotation. It will be difficult to come to a consensus about class names in GOLD so that every lin-guist is satisfied with it. Additionally, lin-guists widely use abbreviations like “n” for “noun” which is concise and convenient. Finally, linguists whose native language is, for example, Ukrainian may prefer to use annotation terms in Ukrainian rather than in English.

More formally, a language profile is defined as a quadruple: ontological terms; user-defined terms; a mapping between ontological terms and user-defined terms; and a reference to an ontology, which con-tains the structural information about terms (like subclass relationship). In summary, a language profile in OntoELAN provides convenience and flexibility for a user to:?select a subset of ontological terms use-ful for a particular resource annotation;?rename ontological terms, for example, use another language, give an abbrevia-tion or a synonym;

?combine the meaning of two or many ontological terms in one user-defined term (e.g., ontological terms “Inanimate”

and “Noun” may be conveniently re-named as “NI”).

OntoELAN allows ontology-based annotation by means of a language profile.

A user opens an ontology, creates a pro-file, and links it to an ontological tier. Anno-tation values for an ontological tier can only be selected from a language profile.

A language profile in OntoELAN is repre-sented as a simple XML document (see Figure 4) with a specified schema, which basically maps ontological terms to user-defined terms, and has a link to the original ontology and some metadata. A user can easily create, open, edit, and save profiles with OntoELAN.

Figure 4 presents an example lan-guage profile, created by the author Artem and linked to GOLD ontology at URI https://www.doczj.com/doc/4c7937622.html,/~farrar/gold.owl. In

Figure 4. An example of the language profile XML document

this example, there is only one user-defined term “NI” that maps to ontological terms “Noun” and “Inanimate.” This is a one-to-many mapping, but a mapping can be many-to-many as well. For example, we can add another user-defined term “IN” that maps to the same ontological terms “Noun” and “Inanimate.” In general, a mapping can be one-to-one, one-to-many, many-to-one, or many-to-many.

ANNOTATION TIERS AND LINGUISTIC TYPES

OntoELAN allows a user to create an unlimited number of annotation tiers. Multiple-tier feature is a must for linguistic multimedia annotation. For example, while annotating an audio monolog, a linguist may choose separate tiers to write a monolog transcription, a translation, a part of speech annotation, a phonetic transcription, and so forth.

An annotation tier can be either alignable or referring. Alignable tiers are directly linked to the time axis of an audio/ video clip and can be divided into segments (time slots); referring tiers contain annota-tions that are linked to annotation on an-other tier, which is also called a parent tier and can be alignable or referring. Thus, tiers form a hierarchy, where its root must be an alignable tier. Following the previous example, the speech transcription could be an independent time-alignable tier that is divided into time slots of the speaker’s ut-terances. On the other hand, the transla-tion-referring tier could refer to the tran-scription tier, so that the translation tier in-herits its time alignment from the transcrip-tion tier.

After a tier hierarchy is established, changes in one tier may influence other tiers. Deletion of a parent tier is cascaded: all its child tiers are automatically deleted. Similarly, this is true about annotations on a tier: deletion of an annotation on a parent tier causes the deletion of all correspond-ing annotations on its child tiers. Alteration of the time slot on a parent tier influences all child tiers as well.

Each annotation tier has associated with it linguistic type. There are five pre-defined linguistic types in OntoELAN which put some constraints on tiers assigned to them. The first four of them are described in Hellwig and Uytvanck (2004), and we also give their definitions here:?None: The annotation on the tier is linked directly to the time axis. This is the only type that alignable tiers can have.?Time Subdivision: The annotation on the parent tier can be subdivided into smaller units, which, in turn, can be linked to time slots. They differ from annotations on alignable tiers in that they are assigned to a slot that is contained within the slot of their parent annotation.?Symbolic Subdivision: Similar to the previous type, but the smaller units can-not be linked to the time slots.?Symbolic Association: The annotation on the parent tier cannot be subdivided further, so there is a one-to-one corre-spondence between the parent annota-tion and its referring annotation.?Ontological Type: The annotation on such a tier is linked to a language pro-file. This is not an independent type, as it can be used only in combination with referring tier types such as Time Subdi-vision, Symbolic Subdivision, or Sym-bolic Association. To emphasize that a referring tier allows ontology-based an-notation, we call it an ontological tier.

Only ontological tiers allow annota-tion based on language profile terms; other types of tiers allow annotation with any string value.

LINGUISTIC MULTIMEDIA ANNOTATION WITH ONTOELAN

In this section, we describe an anno-tation process in OntoELAN using a lin-guistic multimedia resource annotation ex-ample. In general, an annotation process in OntoELAN consists of three major steps: (1) language profile creation, (2) creation of tiers, and (3) creation of annotations. The first step is unnecessary if ontological tiers will not be defined. The second step can be completed partially for non-ontological tiers before the creation of a language pro-file. It is also possible to have multiple pro-files for multiple ontological tiers, but there is always one-to-one correspondence be-tween a profile and an ontological tier.

As an example, we annotate the au-dio file, which contains a sentence in Potawatomi, one of the North American native languages.

We first load GOLD ontology and create the Potawatomi language profile. Figure 5 presents a snapshot of the profile creation window. The tabs “Index” and “Ontology Tree” on the left provide two views of an ontology: a list view, which dis-plays all the terms of an ontology alpha-betically as a list, and a hierarchical view, which displays all the terms of an ontology in a hierarchical fashion to illustrate par-ent-child relationships between terms. From any of these two views, a user can select required terms and add them to the “Onto-

Figure 5. A snapshot of creating a language profile

logical Terms” list, and rename ontological terms as shown in the “User-Defined Terms” list. In Figure 5, we selected the ontological terms “Inanimate” and “Noun”and combine them under one user-defined term “NI.”

After the language profile is ready, we define six tiers in the OntoELAN main window (see Figure 6):?Orthographic of type “None” (linked to the time axis)

?Translation of type “Symbolic Associa-tion” (referring to Orthographic)?Words of type “Symbolic Subdivision”

(referring to Orthographic)?Parse of type “Symbolic Subdivision”

(referring to Words)

?Gloss of type “Symbolic Association”

(referring to Parse)

?Ontology of type “Symbolic Associa-tion” and “Ontological Type” (referring to Gloss)

The created tier hierarchy is shown in Figure 7.

Finally, we specify annotation values on all six tiers (see Figure 6). We annotate the Orthographic tier first, because it is the root of the tier hierarchy, and its time alignment is inherited by other tiers. We do not divide the Orthographic tier into time slots, and its time axis contains the whole sentence in Potawatomi. The Translation tier inherits time alignment from its parent and cannot subdivide it any further (type “Symbolic Association”). The Words tier also inherits Orthographic time alignment, but in this case we subdivide it into seg-ments that correspond to words in the sen-tence. Similarly, we subdivide the Parse tier alignment inherited from Words. The Gloss tier inherits alignment from Parse, and the Ontology tier inherits alignment from Gloss; both Gloss and Ontology do

Figure 6. A snapshot of annotation tiers in the OntoELAN main window

Figure 7. A snapshot of the tier hierarchy

not allow further subdivision. Correct align-ment inheritance is important, because there is a semantic correspondence between seg-ments of different tiers. For example, if we look at a Potawatomi word “neko” in the Words tier, we can find its gloss “used to”in the Gloss tier and part of speech “PC”(maps to GOLD Participle concept) in the Ontology tier.

Except for the annotations on the Ontology tier, which is defined as an onto-logical tier, all the annotations are annotated by a string value. Unlike the text annota-tion, the user annotates the ontological tier by selecting a user-defined term from the profile. Once the term is selected, the next step is creating individuals of the corre-sponding ontological term(s). The user needs to do nothing if the ontological term is defined as an instance in the ontology, to input an instance name if the ontological term is defined as a class with no restric-tions, or to provide all information based on the definition of the ontological class, prop-erties, and so forth.

The annotation is saved in the XML format as instances of the General Multi-media Ontology and, in our case, GOLD. The example of the XML markup for the Ontology tier instance and referring an-notation instance with ID “a42” on that tier is shown in Figure 8. For the Ontology tier, several properties are defined such as ID, parent tier, profile, linguistic type, and so forth. For the referring annotation, OntoELAN has defined ID, reference to another annotation, and annotation value that includes an OntologyAnnotation class instance with ID, user-defined term “PV,”and reference to GOLD concept Preverb, which is defined as an instance. The markup in Figure 8 is based on the General Multimedia Ontology, except the reference to a GOLD instance mentioned above.

Figure 8. An example of the XML markup for the OntoELAN annotation

CONCLUSIONS AND FUTURE WORK

In this article, we address the chal-lenge of annotation management for the Semantic Web of language engineering. Our contribution is the development of OntoELAN, a linguistic multimedia anno-tation tool that features an ontology-based annotation approach. OntoELAN is the first attempt at annotating linguistic multimedia data with a linguistic ontology. Meanwhile, the ontological annotations share the data on the linguistic ontologies. Future work will improve the system and provide more chan-nels for sharing data on the Web, such as the multimedia descriptions, the language words, and so forth. Also, a future version will improve the current searching system, which supports text searching and retrieval in one annotation document, to search, re-trieve, and compare the linguistic multime-dia annotation data on the Web. Addition-ally, we plan to integrate a text document annotation into OntoELAN and include semi-automatic annotation support, similar to Shoebox (SIL International, 2000). ACKNOWLEDGMENTS

Developers of ELAN from Max Planck Institute for Psycholinguistics, Hennie Brugman, Alexander Klassmann, Han Sloetjes, Albert Russel, and Peter Wittenburg, provided us with ELAN’s source code and documentation. Also, we would like to thank Dr. Laura Buszard-Welcher and Andrea Berez from the E-MELD (Electronic Metastructure for En-dangered Languages Data) project for their constructive comments on OntoELAN.REFERENCES

Bechhofer, S., Harmelen, F., Hendler, J., Horrocks, I., McGuinness, D., Patel-Schneider, et al. (2004). OWL Web On-tology Language reference. W3C Rec-ommendation. Retrieved from https://www.doczj.com/doc/4c7937622.html,/TR/owl-ref/

Berners-Lee, T., Hendler, J., & Lassila, O.

(2001). The Semantic Web. Scientific American. Retrieved from https://www.doczj.com/doc/4c7937622.html,/article.cfm?article

I D=00048144-10D2-1C70-

84A9809EC588EF21

Bray, T., Paoli, J., Sperberg-McQueen, C., Maler, E., & Yergeau, F. (2004). Exten-sible Markup Language (XML) 1.0 (Third Edition). W3C Recommenda-tion. Retrieved from https://www.doczj.com/doc/4c7937622.html,/TR/ REC-xml/

Chebotko, A., Deng, Y., Lu, S., Fotouhi, F., Aristar, A., Brugman, H., et al. (in press).

OntoELAN: An ontology-based linguis-tic multimedia annotator. Proceedings of the IEEE 6th International Sympo-sium on Multimedia Software Engi-neering (IEEE-MSE 2004), Miami, FL, USA.

Chebotko, A., Lu, S., & Fotouhi, F. (2004, April). Challenges for information sys-tems towards the Semantic Web. AIS SIGSEMIS Semantic Web and Infor-mation Systems Newsletter, 1. Re-trieved from https://www.doczj.com/doc/4c7937622.html,/news-l e t t e r/n e w s l e t t e r/A p r i l2004/ FINAL_AIS_SIGSEMIS_Bulletin_ 1_1_04_1_.pdf

Fallside, D.C. (2001). XML Schema part 0: Primer. W3C Recommendation.

Retrieved from https://www.doczj.com/doc/4c7937622.html,/TR/ xmlschema-0/

Farrar, S. (2004). GOLD: A progress re-port. Retrieved from www.u.

https://www.doczj.com/doc/4c7937622.html,/~farrar/gold-status-report.pdf

Farrar, S., & Langendoen, D.T. (2003). A linguistic ontology for the Semantic Web.

GLOT International, 7(3), 97-100. Hellwig, B., & Uytvanck, D. (2004).

EUDICO Linguistic Annotator (ELAN) Version 2.0.2 manual [soft-ware manual]. Retrieved from w w w.m p i.n l/t o o l s/E L A N/E L A N _Manual-04-04-08.pdf

Hewlett-Packard Labs. (2004). Jena 2—

a Semantic We

b framework [computer

software]. Retrieved from www.hpl.

https://www.doczj.com/doc/4c7937622.html,/semWeb/jena2.htm International Business Machines Corpora-tion. (2004). IBM MPEG-7 Annotation Tool [computer software]. Retrieved from https://www.doczj.com/doc/4c7937622.html,/tech/ videoannex

Klein, M. (2001). Combining and relating ontologies: An analysis of problems and solutions. Proceedings of the IJCAI-2001 Workshop on Ontologies and Information Sharing, Seattle, WA, USA.

Lu, S., Dong, M., & Fotouhi, F. (2002). The Semantic Web: Opportunities and chal-lenges for next-generation Web appli-cations. International Journal of In-

formation Research, 7(4). Retrieved from https://www.doczj.com/doc/4c7937622.html,/ir/7-4/paper 134.html

Lu, S., Liu, D., Fotouhi, F., Dong, M., Reynolds, R., Aristar, A., et al. (2004).

Language engineering for the Semantic Web: A digital library for endangered languages. International Journal of Information Research, 9(3). Retrieved from https://www.doczj.com/doc/4c7937622.html,/ir/9-3/paper 176.html

Martínez, J.M. (2003). MPEG-7 overview (version 9). International Organisation for Standardisation.

Retrieved from https://www.doczj.com/doc/4c7937622.html,/ mpeg/standards/mpeg-7/mpeg-7.htm SIL International. (2000). The linguist’s Shoebox: Tutorial and user’s guide [software manual]. Retrieved from https://www.doczj.com/doc/4c7937622.html,/computing/shoebox/ ShTUG.pdf

Stanford University. (2004). The ProtégéProject [computer software]. Re-trieved from https://www.doczj.com/doc/4c7937622.html,. The National Science Foundation. (2004).

NSF 04-605. Documenting endan-gered languages (DEL). Retrieved from https://www.doczj.com/doc/4c7937622.html,/pubs/2004/ nsf04605/nsf04605.htm

Artem Chebotko is a PhD student in the Department of Computer Science at Wayne State University. His research interests include databases and the Semantic Web. He is a student member of the IEEE.

Yu Deng has graduated from Wayne State University with an MS in computer science (2004). Her research interests include databases and the Semantic Web.

Shiyong Lu received his PhD in computer science from the State University of New York at Stony Brook (2002). He is currently an assistant professor of the Department of Computer Science, Wayne State University (USA). His research interests include databases, the Semantic Web and bioinformatics. He has published more than 30 papers in top international conferences and journals in the above areas. He is a member of the IEEE.

Dr. Fotouhi received his PhD in computer science from Michigan State University in 1988. He joined the Faculty of Computer Science at Wayne State University in August 1988 where he is

currently professor and chair of the department. Dr. Fotouhi major area of research is databases, including object-relational databases, multimedia information systems, bioinformatics, medical image databases and web-enabled databases. He has published more than 80 papers in refereed journals and conference proceedings, served as a program committee member of various database related conferences.

Dr. Aristar received his PhD in linguistics from University of Texas in 1984. He was a researcher at Microelectronics & Computer Technology Corporation, 1984-1989; assistant professor, University of Australia, 1990-1991; assistant professor, Texas A&M University, 1991-1995; and associate professor, Texas A&M University, 1996-1998. He joined the Department of English at Wayne State University in 1998 where he is currently an associate professor. Dr. Aristar is the chairman of OLAC Working Group on Language Codes; moderator & founder of LINGUIST List; and organizer of several endangered languages related workshops.

最受欢迎的十大WEB应用安全评估系统教学教材

最受欢迎的十大WEB应用安全评估系统 在国内一些网站上经常看到文章说某某WEB应用安全评估工具排名,但是很可惜,绝大多数都是国外人搞的,界面是英文,操作也不方便,那游侠就在这里综合下,列举下国内WEB安全评估人员常用的一些工具。当然,毫无疑问的,几乎都是商业软件,并且为了描述更准确,游侠尽量摘取其官方网站的说明: 1.IBM Rational AppScan IBM公司推出的IBM Rational AppScan产品是业界领先的应用安全测试工具,曾以Watchfire AppScan 的名称享誉业界。Rational AppScan 可自动化Web 应用的安全漏洞评估工作,能扫描和检测所有常见的Web 应用安全漏洞,例如SQL 注入(SQL-injection)、跨站点脚本攻击(cross-site scripting)及缓冲溢出(buffer overflow)等方面安全漏洞的扫描。 游侠标注:AppScan不但可以对WEB进行安全评估,更重要的是导出的报表相当实用,也是国外产品中唯一可以导出中文报告的产品,并且可以生成各种法规遵从报告,如ISO 27001、OWASP 2007等。 2.HP WebInspect

目前,许多复杂的Web 应用程序全都基于新兴的Web 2.0 技术,HP WebInspect 可以对这些应用程序执行Web 应用程序安全测试和评估。HP WebInspect 可提供快速扫描功能、广泛的安全评估范围及准确的Web 应用程序安全扫描结果。 它可以识别很多传统扫描程序检测不到的安全漏洞。利用创新的评估技术,例如同步扫描和审核(simultaneous crawl and audit, SCA)及并发应用程序扫描,您可以快速而准确地自动执行Web 应用程序安全测试和Web 服务安全测试。 主要功能: ·利用创新的评估技术检查Web 服务及Web 应用程序的安全 ·自动执行Web 应用程序安全测试和评估 ·在整个生命周期中执行应用程序安全测试和协作 ·通过最先进的用户界面轻松运行交互式扫描 ·满足法律和规章符合性要求 ·利用高级工具(HP Security Toolkit)执行渗透测试 ·配置以支持任何Web 应用程序环境 游侠标注:毫无疑问的,WebInspect的扫描速度相当让人满意。 3.Acunetix Web Vulnerability Scanner

趋势分析之语义网

趋势分析之语义网 近几年来,语义网越来越频繁地出现在IT报道中,PowerSet、Twine、SearchMonkey、Hakia等一批语义网产品也陆续推出。早在2010年,Google就已经收购了语义网公司Metaweb。对于这次收购Google产品管理主管杰克·门泽尔(Jack Menzel)发文称,该公司可以处理许多搜索请求,但Metaweb的信息可以使其处理更多搜索请求,“通过推出搜索答案等功能,我们才刚刚开始将我们对互联网的理解用于改进搜索体验”,但对于部分搜索仍然无能为力,“例如,‘美国西海岸地区学费低于3万美元的大学’或‘年龄超过40岁且获得过至少一次奥斯卡奖的演员’,这些问题都很难回答。我们之所以收购Metaweb,是因为我们相信,整合Metaweb的技术将使我们能提供更好的答案”。这表明语义网技术经过近10年的研究与发展,已经走出实验室进入工程实践阶段。 语义网热度变化图 语义网(Semantic Web)是一种智能网络,它不但能够理解词语和概念,而且还能够理解它们之间的逻辑关系,可以使交流变得更有效率和价值。语义网实际上是对未来网络的一个设想,现在与Web 3.0这一概念结合在一起,作为3.0网络时代的特征之一。 语义网这一概念是由万维网联盟的蒂姆·伯纳斯-李(Tim Berners-Lee)在1998年提出的一个概念,实际上是基于很多现有技术的,也依赖于后来和text-and-markup与知识表现的综合。其渊源甚至可以追溯到20世纪60年代末期的Collins、Quillian、Loftus等人的研究,还有之后70年代初Simon、Schamk、Minsky等人陆续提出的一些理论上的成果。其中Simon在进行自然语言理解的应用研究时提出了语义网络(Semantic Network,不是现在的Semantic Web)的概念。 下面我们用Trend analysis分析语义网领域内的研究热点。(点击链接即可进入https://https://www.doczj.com/doc/4c7937622.html,/topic/trend?query=Semantic%20Web)

最新web系统与技术复习题教程文件

复习资料 选择题 HTTP哪个请求方式,请求参数会出现在网址列上? (A) GET (B)POST Web容器在收到浏览器请求时,会如何处理请求? (A)使用单一执行绪处理所有请求 (B)一个请求就建立一个执行绪来处理请求 (C)一个请求就建立一个行程来处理请求 (D)一个请求就执行一个容器来处理请求 Java EE中各技术标准最后将由什么文件明订规范? (A) JCP (B)JSR(C)JDK 在JSP中,要定义一个方法,需要用到以下()写法。 A. <%= %> B. <% %> C. <%! %> D. <%@ %> 在J2EE中,在web.xml中,有如下代码: 30

上述代码定义了默认的会话超时时长,时长为30()。 A. 毫秒 B. 秒 C. 分钟 D. 小时 JavaWeb 中()类的()方法用于创建对话。 A. HttpServletRequest、getSession B. HttpServletResponse、newSession C. HtttpSession、newInstance D. HttpSession、getSession 给定一个Servlet 的doGet方法中的代码片段,如下: request.setAttribute(“name”,”zhang”); response.sendRedirect(“http://localhost:8080/servlet/MyServlt”); 那么在Servlet 中可以使用()方法把属性name的值取出来。 A. String str=request.getAttribute(“name”); B. String str=(String)request.getAttribute(“name”); C. Object str=request.getAttribute(“name”); D. 无法取出来 下边哪个不是JSP的内置对象?()

语义网和语义网格中的本体研究综述

语义网和语义网格中的本体研究综述 余一娇1,2 (1 华中师范大学语言学系,武汉,430079) (2 华中科技大学计算机学院 武汉 430074) E-mail: yjyu@https://www.doczj.com/doc/4c7937622.html, 摘要:本体是语义网和语义网格研究中的一种重要方法。文中首先介绍本体的定义、本体的四元素表示法和六元组表示方法,以及本体的设计分析生命周期;然后回顾语义网研究中曾产生过巨大影响的七种本体语言。通过分析众多文献的观点,文中提出在将来我们应重点针对 DAML+OIL 和OWL两种本体语言进行深入研究。文中还列举出了本体在生物信息计算和网络管理领域应用的两个实例。最后根据语义网格和本体研究现状,提出了利用本体研究语义网格服务质量的基本思路和研究方法。 关键词:本体 本体语言 DAML+OIL OWL 语义网 语义网格 服务质量 1.前 言 Ontology在哲学领域常译为“存在论”,是指关于事物是否存在思考的学科。在计算机科学和人工智能领域则译为“本体”,其词义与哲学中的“存在论”大相径邻。1993年美国Stanford大学知识系统实验室的Gruber博士在文献[1]中定义:本体是用来帮助程序和人共享知识的概念的规范描述 (An ontology is the specification of conceptualizations, used to help programs and humans share knowledge.),后来该定义得到了进一步发展和完善[2]。文献[1]还指出:概念化是关于世界上的实体,如:事物、事物之间的关系和约束条件的知识表达。而规范一词是强调这种表达是用一种固定的形式来描述。从我们已经阅读的多篇相关文献来看,几乎所有论文都接受了上述关于本体的定义。 迅速增加的Web页面数量、丰富的页面内容和时新的消息,为知识工程领域的科学家实现面向终端用户的应用研究、开发带来了极好的机会。在Internet上实现基于语义的信息检索和情报收集,无疑是广大因特网用户的迫切需求。2001年5月,Web之父Tim Berners-Lee和合作者在《Scientific American》杂志上发表了“The Semantic Web”一文。文中正式提出了语义网的概念,鉴于Tim Berners-Lee在Web领域的巨大影响,该文后来一直被公认为是开辟语义网研究的源头文献。为了实现知识的共享和重用,语义网研究中引入本体技术是最近几年来的发展趋势,且正在被不断的实践。知识工程和人工智能学科针对本体技术进行研究已有多年历史,其中最有影响的科学研究组织是美国Stanford大学的知识系统实验室。该实验室的Gruber博士以及Deborah L. McGuiinness博士都对本体和语义网本体研究作出了巨大的贡献。 本文的结构安排如下:第二部分介绍本体的表示方法和本体开发的生命周期;第三部分介绍语义网研究中的本体语言发展过程以及多种本体语言之间的关系;第四部分介绍本体在语义网研究中的应用实例;第五部分讨论我们今后一年的研究思路和研究目标。 2. 本体的表示与本体开发 关于本体的定义如今在计算机科学领域已比较统一,但在具体的应用环境中如何规范化描述本体至今还缺乏统一的标准。目前有两种本体表示方法应用比较广泛,第一是传统的四元素表示方法、第二是较新的六元组表示法。前者源于Gruber博士的观点,后者则是2002年由新加坡南洋理工大学的Myo Myo Naing博士在一篇国际会议论文中提出。前者在世界范围内得到了比较高的认同,但

语义网技术

语义网技术是当前互联网技术研究的热点之一。目前大多数页面中的使用的文字信息不便于机器自动处理,只适合人们自己阅读理解,解决可自动处理的数据和信息方面发展较慢的问题,在网络上信息量剧增、人们迫切需要计算机分担知识整理这一压力的今天,成为信息检索的一个难题。本文首先建构了一种形式化的本体描述方法,并给出了标准化的定义,主要针对在本体层定义的基础上对逻辑层展开了基础研究,对于本体概念进行逻辑推理,通过本体中关系的属性,推理出隐含在本体概念间的关系。在本文的定义中本体包含五个基本的建模元语,概念,关系,函数,公理,实例,通过本体的五个建模元语构建本体,给出本体的形式化的规范定义,本体描述中的四种特殊关系有继承关系,部分关系,实例关系和属性关系,关系的各种属性是进行本体推理的逻辑依据,有传递性属性,关系继承性,反向关系继承性,逆属性,对称性属性,反身性属性,等价性属性等等,依据这些属性的逻辑性,可以推理出所要的查找。本文利用属性的逻辑推理机制采用树搜索的查找检索方式查找出隐含在概念之间的逻辑关系是本文所要进行的主要工作,这样可以判断出概念之间是否存在一些给定判断的关系,或者一个概念和什么概念存在给定的关系,再或者两个概念间都存在什么关系等等都是我们用推理检索所要实现的判断。摘要语义网技术是当前互联网技术研究的热点之一。目前大多数页面中所使用的文字信息不便于机器自动处理,只适合人们自己阅读理解,解决可自动处理的数据和信息方面发展较慢的问题,在网络上信息量剧增、人们迫切需要计算机分担知识整理这一压力的今

天,成为信息检索的一个难题,本文中对本体层概念的推理就是为了探索计算机理解语义所做的一个尝试。语义网的体系结构向我们说明了语义网中各个层次的功能和特征,语义网的研究是阶段性的,首先解决syntax(语法)层面的问题,也就是xml,然后是解决(数据层)基本资源描述问题,也就是rdf,然后是(本体层)对资源间关系的形式化描述,就是owl,damloil,这三步已经基本告罄,当然,基于rdf 或者owl的数据挖掘和ontology管理(如合并,映射,进化)按TIMBERNERS-LEE的构想,这个工作大概到2008左右可以完成,在商业上,很快就会在知识管理,数据挖掘,数据集成方面出现一些企业。目前亟待发展的是LogicLayer(逻辑层),这方面在国内外的期刊著作中还少有提到,接下来的工作就应该是对于owlbased的数据进行推理和查询了,当前的推理方法主要是针对本体而言的,而本体的概念是在某个特定领域范围内的,而且在知识库中推理和查询是紧密的结合在一起的,相辅相成的,查询的同时必然存在着推理,而这里的推理就必须要建立在一定的逻辑模型的基础上,所以推理的方法就是基于逻辑模型的逻辑推理,可采用逻辑推理的方法。本体中推理的重点在于推理结论的正确性、完备性,若是不能保证推理的正确性,则语义网的引入就不但没有给网络资源的查询带来便利,反而阻碍了网络的发展,而且还要保证推理的完备,不遗漏应有的推理结果。本体推理的难点在于推理的高效性、资源利用率,若推理虽能达到正确性,完备性的目的而浪费了大量的时间和资源,则语义网也不能达到预期的效果,所以推理方法的使用及其效果是语义网成功的关

Web应用系统架构演进过程

1 系统架构演化历程 -- 初始阶段架构 初始阶段的小型系统,其特征表现为应用程序、数据库、文件等所有的资源都部署在一台服务器上,我们通俗称为LAMP架构。 特征: 应用程序、数据库、文件等所有的资源都在一台服务器上。 描述: 通常服务器操作系统使用Linux,应用程序使用PHP开发,然后部署在Apache上,数据库使用MySQL,汇集各种免费开源软件以及一台廉价服务器就可以开始系统的发展之路了。 2 系统架构演化历程 -- 应用服务和数据服务分离

好景不长,发现随着系统访问量的增加,Web应用服务器器的压力在高峰期会上升到比较高,这个时候开始考虑增加一台Web应用服务器提供系统的访问效率。 特征: 应用程序、数据库、文件分别部署在独立的资源上。 描述: 数据量增加,单台服务器性能及存储空间不足,需要将应用和数据分离,并发处理能力和数据存储空间得到了很大改善。 3 系统架构演化历程 -- 使用缓存改善性能标题

特征: 数据库中访问较集中的一小部分数据存储在缓存服务器中,减少数据库的访问次数,降低数据库的访问压力。 描述: 1. 系统访问特点遵循二八定律,即80%的业务访问集中在20%的数据上; 2. 缓存分为本地缓存和远程分布式缓存,本地缓存访问速度更快但缓存数据量有限,同时存在与应用程序争用内存的情况。 4 系统架构演化历程 -- 使用应用服务器集群

在对应用系统做完分库分表工作后,数据库上的压力已经降到比较低了,又开始过着每天看着访问量暴增的幸福生活了,突然有一天,发现系统的访问又开始有变慢的趋势了,这个时候首先查看数据库,压力一切正常,之后查看Web Server,发现Apache阻塞了很多的请求,而应用服务器对每个请求也是比较快的,发现问题是由于请求数太高导致需要排队等待,响应速度变慢。 特征: 多台服务器通过负载均衡同时向外部提供服务,解决单台服务器处理能力和存储空间上限的问题。 描述: 1. 使用集群是系统解决高并发、海量数据问题的常用手段。 2. 通过向集群中追加资源,提升系统的并发处理能力,使得服务器的负载压力不再成为整个系统的瓶颈。 5 系统架构演化历程 -- 数据库读写分离

语义网本体

Part2:创建本体 本次所创建的本体是一个植物(plant)本体,所用的工具是Protege4.3。首先根据植物的分类来建立本体的Schema层,按照不同的分类方式可以有不同的分类例如可以分为花(flower)、草(grass)和树(tree)三类。花又可以分为蔷薇科(Rosaceae )、十字花科(cruciferae)、百合科(liliaceae)。草又可以分为草坪草(turfgrass)、孔雀草(maidenhair)、千日草(One thousand days grass)。树又可以分为乔木(arbor)、灌木(shrub)。所建的Schema层如下图1所示。 图1 植物本体的Schema层构建图 2、添加属性,属性包括对象属性和数据属性。所添加的对象属性有:颜色、枯萎季节、茂盛季节开花时间、开花时长,其定义域均设置为Plant。添加的数据属性有:根茎的长度。具体的添加如下图2所示。 (1)对象属性添加图(2)数据属性添加图 图2 植物本体的属性构建图

3、添加相应的实例。为百合科添加实例:百合花(greenish lily flower )为乔木添加实例:雪松和杨树,为草坪草添加实例:马蹄金草(The horseshoe golden grass )具体的实例图如下图3所示。 图3 具体实例添加图 4、定义公理,例如可以对其定义灌木为丛生状态比较矮小。则需要添加对象属性丛生状态(Cluster_State)和子属性主要丛生状态(Main_Cluster_State),然后添加分类:Type,包括short and small和tall。对草坪草定义为:主要丛生状态是short and small。对乔木添加定义:主要丛生状态是tall。在Plant类下面添加叶子(leaf),然后添加对象属性is_part_of,给leaf定义为:叶子是树叶的一部分。对草坪草的具体的定义效果如下图4所示。 图4 草坪草定义效果图

《Web系统与技术》期末考试题A

西安财经学院试题(卷)纸命题教师刘通学期2012 —2013学年第1 学期使用班级计本10级考核方式大作业 课程名称Web系统与技术阅卷教师签名 题号一二三四五六七八 九 十总分得分 注意事项: 命题教师1.出题用五号字、宋体输入,打印用正规A4纸张。 2.装订线以外的各项均由命题教师填写,不得漏填。 考生1.装订线内的“班级”、“学号”、“姓名”、“时间”等栏由考生本人填写。 2.一律用黑色的签字笔答题,否则试卷无效。 动态网站设计(100分) 一.基本要求及总体效果(40分): 1.设计一个基于web的管理信息系统,网站内容自定,可以是企业人事管理系统、学生管理系统、课程管理系统、教务管理系统、图书管理系统、客户管理系统、超市商品管理系统、库存管理系统、汽车租赁系统、网上商店等等、也可以自拟题目,内容不限,但要求是基于web的信息管理系统,主题思想明确、结构清晰、形式新颖、内容充实、浏览方便、网页文字及相关链接无错误。(10分) 2.网页整体设计思路清晰,网页布局合理,风格明快。主题页和其它各子页之间协调,主题分明、重点突出。栏目及版面设计,层次结构及链接结构明确。内容布局合理,图画运用得当,效果生动。(20分) 3.网页上各主题和附加图片、背景的色彩选配方案要注意做到:色彩柔和、搭配美观,朴素大方,不应过分夸张,使视觉疲劳。(10分)。 二、具体功能模块内容要求:(60分) 1.用户登录模块 输入的用户名和密码都正确,才能登录,否则给出错误提示,重新登录。(5分) 2.用户注册模块。 输入的信息要有有效性验证,还可以根据实际情况设置所需注册信息内容,注册成功后可用该账号登录网站。(10分) 3.用户留言模块 来访用户能够在空间留言,管理员或其他登录用户可以回复留言,用户的留言能够在网站中显示出来。(10分) 4.导航清晰,网站内各页面可以方便地相互跳转。 5.其他具体内容自己根据实际情况设计。要求内容新颖、有创意,能够完整地实现系统的主要功能,系统运行正常。(5分) 提交要求: 1.每人独立一题,独立完成,不得盗用他人作品,设计雷同者成绩均按零分计。 2.请做完之后,用RAR或ZIP压缩格式,文件名采用如下格式:班级+姓名+学号。(计本1001班的01张三,则文件名为计本1001张三01)3.站点名称建议用英文或者数字,所有设计到的文件最好用英文或数字命名,把主页放在站点文件夹的根目录下,保存为index.htm或default.aspx 第一题 得 分 1

语义网主要应用技术与研究趋势_吴玥

2012年第2期 Computer CD Software and Applications 信息技术应用研究 — 41 — 语义网主要应用技术与研究趋势 吴 玥 (苏州大学计算机科学与技术学院,江苏苏州 215006) 摘 要:我国企业多数已经实现了网络办公自动化,为企业的经营管理创造了优越的环境。但随着销售业务的增长,企业经营管理的范围逐渐扩大,其内部网络面临的运营难题更加明显,网络知识管理是当前企业存在的最大困难。语义网络技术的运用方便了知识管理系统的构建与操控,促进了企业知识管理效率的提升。针对这一点,本文主要分析了语义网应用的相关技术,对未来研究趋势进行总结。 关键词:语义网;应用技术;知识管理;趋势 中图分类号:TP391.1 文献标识码:A 文章编号:1007-9599(2012)02-0041-02 The Main Application Technology and Research Trends of Semantic Web Wu Yue (School of Computer Science&Technology,Soochow University,Suzhou 215006,China) Abstract:Our country enterprise majority already realize the network office automation,enterprise management to create a favorable environment.But as the sales growth,gradually expanding the scope of business management of enterprise,its internal network operator facing the problem is more apparent,network knowledge management is the current enterprise is the most difficult.Semantic network technology is convenient to use the knowledge management system's construction and operation,promote the enterprise to improve the efficiency of knowledge management.In view of this,this article mainly analyzes the semantic web technologies,the future research trends are summarized. Keywords:Semantic network;Application technology;Knowledge management;Trend 语义网是对未来计算机网络的一种假设,通过相匹配的网络 语言对文件信息详细描述,最终判断不同文档之间的内在关系。 简言之,语义网就是能参照语义完成判断的网络。企业在经营管 理中引进语义网有助于数据信息的挖掘,对数据库潜在的信息资 源充分利用,以创造更大的经济收益。 一、传统互联网知识管理的不足 互联网用于企业经营管理初期,加快了国内行业经济的改革进 步,促进了企业自动化操控模式的升级。然而,当企业经营范围不 断扩大之后,企业面临的网络管理问题也更加显著。如:业务增多、产品增多、客户增多等, 企业网络每天需要处理的文件信息不计其 数,基于传统互联网的知识管理系统也会遇到多种问题。 (一)检索问题。互联网检索是十分重要的功能,如图一。用 户在互联网上检索某一项资源时,常用的方法是通过关键词搜寻, 未能考虑到语义对资源搜索的重要性。这种检索模式下则会遇到许 多难题,如:对同义词检索会出现多余的无关资源,尽管用户在互 联网上可以查找到许多与关键词相关的信息,但多数是无用的。 图一 互联网信息检索 (二)集成问题。信息集成是网络系统按照统一的标准、编码、程序等,对整个系统存储的资源集成处理,然后实现信息资源的共享。企业互联网信息集成依旧采用人工处理,这是由于网络的自动代理软件不能处理文本代表的常识知识,信息集成问题将制约着互联网功能的持续发挥。 (三)维护问题。对于企业知识管理系统而言,其采用的文档大部分是半结构化数据,这种数据的维护管理难度较大。现有的互联网在文档维护方面缺乏先进的软件工具,对于文档信息的处理也会遇到不少错误。知识管理中的数据库资源错误会给企业经营造成误导,且带来巨大的经济损失。 二、语义网应用的相关技术 互联网研发对语义网应用研究的最终目标是“开发各种各样计算机可理解和处理的表达语义信息的语言和技术,让语义网络的功能得到最大发挥” 。因此,结合语义网络的功能特点、结构形式、信息储存等情况,用户需掌握各种语义网应用技术。就目前而言,语义网主要的应用技术包括: (一)编码技术。编码是计算机网络运行的重要元素,通过编码之后才能让程序信号及时传递。语义网编码技术就是通过编码处理将知识内容表达出来,这一过程能够把不同的知识编码为某个数据结构,从而方便了用户对数据的检索。编码技术要用到各种知识表达方法,如:一阶谓词逻辑表示法、产生式表示法、框表示法、语义网络表示法等等。 (二)框架技术。框架技术本质上就是对语义网进行层次划分,将网络结构分层不同的层面。语义网框架技术应用要借助语义 Web 模型,经过长期研究,我们把语义网体系结构分为7个层面,如图二。每个层面在语义网运行时都可发挥对应的功能,促进了语义网程序操控的稳定进行。层面框架的分析,可以掌握语义网体系中各层的功能强弱。 图二 语义网的体系结构

太原理工大学WEB系统和技术试题(卷)2016年0425

6、JavaWeb 中()类的()方法用于创建对话。 A. HttpServletRequest、getSession B. HttpServletResponse、newSession C. HtttpSession、newInstance D. HttpSession、getSession 7、给定一个Servlet 的doGet方法中的代码片段,如下: request.setAttribute(“name”,”zhang”); response.sendRedirect(“http://localhost:8080/servlet/MyServlt”); 那么在Servlet 中可以使用()方法把属性name的值取出来。 A. String str=request.getAttribute(“name”); B. String str=(String)request.getAttribute(“name”); C. Object str=request.getAttribute(“name”); D. 无法取出来 8、下边哪个不是JSP的内置对象?() A. session B. request C. cookie D. out 9、关于get和post两种请求,下列说法正确的是?() A. Form表单默认请求是get请求。 B. get请求处理的数据量大小不受到限制。 C. post请求地址栏里是能看到数据的,所以传送用户信息尽量避免使用。 D. post请求可以由doGet方法处理。 10、下面哪一个是正确使用JavaBean的方式?() A. 第 2 页共15 页

黄智生博士谈语义网与Web 3

黄智生博士谈语义网与Web 3.0 作者徐涵发布于 2009年3月26日下午6时0分 社区 Architecture, SOA 主题 语义网 标签 Web 2.0, 采访, 元数据, 语义网 近两年来,“语义网(Semantic Web)”或“Web 3.0”越来越频繁地出现在IT 报道中,这表明语义网技术经过近10年的研究与发展,已经走出实验室进入工程实践阶段。PowerSet、Twine、 SearchMonkey、Hakia等一批语义网产品的陆续推出,预示着语义网即将在现实世界中改变人们的生活与工作方式。在Web 3.0时代即将揭开序幕之际,正确理解、掌握语义网的概念与技术,对IT人士与时俱进和增加优势是必不可少的。为此,InfoQ中文站特地邀请到来自著名语义网研究机构荷兰阿姆斯特丹自由大学的黄智生博士,请他为我们谈一谈工业界人士感兴趣的语义网话题,包括什么是语义网、语义网与Web 3.0的关系以及语义网如何给商业公司带来效益等。 InfoQ中文站:您是语义网方面的权威专家,能否先请您为我们消除概念上的困惑。现在有一个说法,即Web 3.0就是语义网。但是除了W3C定义的语义网以外,关于Web 3.0还有许多种其他说法,您认为谁才真正代表了Web 3.0?为什么? 黄智生博士(以下称黄博士):首先需要说明的是:我不认为自己是所谓的“权威”。纵观万维网的发展,总是年轻人在创造历史,他们给人类社会带来了一次又一次的惊奇。且不说万维网之父Tim Berners-Lee在1989年构想万维网的时候仅仅三十出头。Web 1.0产生的雅虎和谷歌等国际大公司的创始人大多是年轻的博士生。Web 2.0产生的Facebook等公司创始人的情况也大体如此。Web 3.0的情况也可能如此。我们甚至都不能完全指望通过现有的IT大公司的巨大投入来发展语义网。这些大公司往往受着过去成功经验的束缚,而且新技术采用的是与以往完全不同的思路,从而会加深大公司对新技术的怀疑。当然,这也为年轻人书写历史创造辉煌提供了发展空间。 由于Web 1.0和Web 2.0技术的成熟,Web 3.0的想法实际上表达了现在人们对下一代万维网技术的种种期待。从这个意义上讲,Web 3.0并不等同于语义网。网络上对Web 3.0众说纷纭,都有一定的道理。但我有一定的理由相信,语义网技术是Web 3.0的重要技术基础。我于2008年底在国内一些大学巡回讲学报告

语义搜索的分类

语义搜索的分类 一.按语义搜索引擎服务内容的分类 语义搜索引擎从人们头脑中的概念到在搜索领域占据一席之地经历不少坎坷。语义网出现后,语义搜索迎来了高速发展的机遇期。虽然语义搜索服务内容主要集中在传统搜索引擎不擅长的语义网搜索方面。不过语义搜索引擎也试图拓展服务范围,提供比传统搜索引擎更全面的服务。语义搜索引擎的服务内容主要包括以下几个方面:知识型搜索服务、生活型搜索服务、语义工具服务等。 (1)知识型搜索方面,主要针对语义网知识信息资源。其中包括: ①词典型搜索服务。一种形式是如同使用电子词典一样,通过关键词直接查询与关键词对应的概念。这些概念由语义搜索引擎索引的本体文件中提取。另一种形式则是对在线百科全书的搜索服务,如PowerSet,这一点与传统搜索引擎近似,但语义搜索引擎在信息的组织上远胜于传统搜索引擎。 ②语义网文档(SWD)的查询服务。用户可以通过语义搜索引擎查询所需的语义网文档和相关的语义网文档。Falcons 为统一资源标识符(URI)定义的语义网对象和内容提供基于关键词的检索方式。Swoogle 从互联网上抽取由RDF 格式编制的语义网文档(SWDs),并提供搜索语义网本体、语义网例证数据和语义网术语等服务。 ③领域知识查询。部分语义搜索引擎提供了针对某个或某几个专业门类的信息检索服务,用户可以选择自己所需相关信息。Cognition 以搜索法律、卫生和宗教领域为主。个别语义搜索引擎提供针对特定领域的多媒体语义搜索服务,如Falcon-S 对足球图片的搜索服务。不过多媒体语义搜索面临与传统多媒体搜索相似的困境,缺乏有效的语义标注。对多媒体信息的辨别和分类能力仍有待提高。 (2)生活型搜索方面,语义搜索引擎在传统搜索引擎力所不及的诸方面发展迅速。 ①社会网络搜索。部分语义搜索引擎提供社会网络搜索功能,这种功能可以实现通过姓名、著作、所在单位等信息中的一条或几条,查询与这些信息有关联的更多信息,如我国的ArnetMiner。 ②资讯搜索。目前语义化的网络搜索服务能够更有针对性,更准确地为用户提供新闻资讯。Koru就是这方面的代表。 (3)语义工具服务。 这是语义搜索引擎所属的研究机构的一个较为独特的方面,和传统搜索引擎提供的桌面搜索等工具不同,语义搜索引擎提供的语义工具一般不是对语义搜索功能的直接移植,而是对文档的相似性、标注等进行处理用的。这些工具可以为语义搜索引擎的索引对象进行前期数据加工,同时也供科研使用。 理论上讲语义搜索引擎能够提供包括普通网络文档检索在内的所有类型网络文档搜索服务,但是由于语义搜索引擎对网页的索引方式不同,微处理器需要比传统搜索更长的时间才能分析完一个页面,因此很多语义搜索网站只能扫描到外部网站的二级页面,这样将难以满足用户全网络搜索的需求。 二.按语义搜索引擎服务模式分类 语义搜索引擎高速发展的阶段正值传统搜索引擎发展的平台期,虽然语义搜索引擎暂时尚不具备传统搜索引擎的市场竞争力,但是它们却可以很容易地借鉴传统搜索引擎的成

语义检索

在数字图书馆中,信息检索存在明显不足。在文献的组织与描述上,简单将关键词作为描述文献的基本元素,文献之间没有关联,是相互独立的、无结构的集合。在检索操作上,通常是基于关键词的无结构查询,难以反映词语间各种语义联系, 查询能力有限,误检率和漏检率很高,检索结果的真实相关度较低;计算查询和文档之间的相似度的方法也有局限。在用户交互界面上,用户的检索意图难以被机器理解,采用自然语言输入的检索关键词与机器的交互存在障碍。现有数字图书馆信息资源检索存在资源表示语义贫乏和检索手段语义贫乏、查准率低下等问题,语义网技术的出现,为数字图书馆的发展注入了新的活力,为信息检索质量的提高带来了新的生机。运用语义网技术,使解决信息检索中现存的问题,完善信息检索流程成为了可能。3.1 数字图书馆信息检索模型目前数字图书馆的信息检索主要借助于目录、索引、关键词方法来实现, 或者要求了解检索对象数据结构等, 对用户提供的关键词的准确性要求较高,基于语法结构进行检索, 却不能处理复杂语义关系,常常检索出大量相关度很差的文献。 图3.1 数字图书馆信息检索模型用户通过检索界面,输入关键词,文本操作系统对用户的关键词进行简单的语法层次的处理整合,与数字图书馆资源进行匹配检索,最终将检索的结果,再通过用户界面返回给用户。而数字图书资源,专业数据库等都是数字图书馆信息检索的范畴,这些数字化的知识资源主要以数据库形态分布于全球互联网的数千个站点,这种以数据库形式存放的信息资源,通常是电子化了的一次文献,包括元数据、摘要或者是全文,也可以是全文链接的地址。 24 基于语义网的数字图书馆信息检索模型研究 3.2 基于语义网的数字图书馆信息检索模型的设计思想数字图书馆信息检索系统存在诸多问题。查询服务智能化水平低,无法对用户请求进行语义分析;信息资源的共享程度低,仅仅采用题名、文摘或全文中出现的关键词标识文献内容,难以揭示文献资料所反映的知识信息,易形成信息孤岛;对用户输入的关键词进行句法匹配,查准率不高;片面追求查全率,返回大量无关结果等。这些问题最终造成用户的真正检索意图难以实现。人们希望有突破性的信息检索技术出现,能够支持更为强大的信息检索功能,具备理解语义和自动扩展、联想的能力,并为用户提供个性化服务。在这样的需求下,本节深入探讨了现存问题的解决方法,结合语义网技术,提出了以下基于语义网的数字图书馆信息检索模型的设计思想。3.2.1 机器理解与人机交互人们通过信息的交流和沟通,表达一定的思想、意思和内容,因此,自然语言和表达的信息中蕴含着丰富的语义。尤其是自然语言中,一词多义、一义多词现象十分常见,在不同的语境中,同样的词汇还可以表达出不同的意义。在人与人的交流中,近义词、反义词、词语的词性、语法结构等帮助人们在特定的语言环境中理解语言表达的确切含义,而计算机要做到这点却有难度。随着网络的不断发展,网络信息充斥着人们的视野。如何在浩如烟海的信息资源中,以最短的时间查找出相关资源,成为人们所关注的问题之一。通常,检索系统总会返回相关度不高,甚至完全无关的信息,而有些相关的信息却往往被遗漏了。一方面,检索工具没能把已经存在的、对用户有价值的信息检索出来,另一方面,信息资源没有很好的被归纳,提炼成知识。利用语义网技术,将语义丰富的描述信息和资源关联起来,通过机器理解和人机交互,对信息资源进行深层次的分析和挖掘。从本质上讲,人机交互是认知的过程,主要通过系统建模、形式化语言描述等信息技术,最终实现和应用人机交互系统。3.2.2 语义知识与描述逻辑从语义学的角度讲,语义是语言形式表达的内容,是思维的体现者,是客观事物在人们头脑中的反映[72]。人们在进行信息交流和沟通时,通过词语、符号来表达思想。当人们看到

什么是Web应用程序

什么是Web应用程序? 如果我们要谈论Web应用程序以及如何开发它们,那么我们就需要知道什么是Web应用程序,以及是什么东西使得它们与我们创建的其他应用程序不同。让我们看看一些Web应用程序的定义,以及这些定义的共同点。下面是从互联网上得到的三个定义: 定义一:一个Web应用程序是作为单一实体管理的、逻辑上链接的Web页面的集合。换句话说,一个网站,可以有多个来自不同客户的Web应用。 定义二:一个Web应用程序,是使用Internet技术开发的,符合下面一项或者多项的应用程序:(1)使用数据库(如Oracle或者SQL Server); (2)使用一种应用程序开发工具开发(如Oracle Internet Developer Suite或者Microsoft Visual Studio); (3)需要持续地运行服务器过程(如新闻组和聊天室); (4)从数据输入屏幕或者Web表单储存输入数据。 定义三:在软件工程中,一个Web应用程序是一种经由Internet或Intranet、以Web方式访问的应用程序。它也是一个计算机软件应用程序,这个应用程序用基于浏览器的语言(如HTML、ASP、PHP、Perl、Python等等)编码,依赖于通用的Web浏览器来表现它的执行结果。 在我们看到这些定义时,有几点是比较突出的。首先,在Web应用程序中有某种形式的浏览器或者GUI。其次,所有定义中都隐含或者明确指出需要一台服务器。最后,Web应用程序不同于Internet 应用程序,Internet应用程序增加了额外的技术和能力。 Web应用程序首先是“应用程序”,和用标准的程序语言,如C、C++、C#等编写出来的程序没有什么本质上的不同。然而Web应用程序又有自己独特的地方,就是它是基于Web的,而不是采用传统方法运行的。换句话说,它是典型的浏览器/服务器架构的产物。 浏览器/服务器架构(Browser/Server,简称B/S)能够很好地应用在广域网上,成为越来越多的企业的选择。浏览器/服务器架构相对于其他几种应用程序体系结构,有如下3方面的优点:(1)这种架构采用Internet上标准的通信协议(通常是TCP/IP协议)作为客户机同服务器通信的协议。这样可以使位于Internet任意位置的人都能够正常访问服务器。对于服务器来说,通过相应的Web 服务和数据库服务可以对数据进行处理。对外采用标准的通信协议,以便共享数据。 (2)在服务器上对数据进行处理,并将处理的结果生成网页,以方便客户端直接下载。 (3)在客户机上对数据的处理被进一步简化,将浏览器作为客户端的应用程序,以实现对数据的显示。不再需要为客户端单独编写和安装其他类型的应用程序。这样,在客户端只需要安装一套内置浏览器的操作系统,如Window XP或Windows 2000或直接安装一套浏览器,就可以实现服务器上数据的访问。而浏览器是现在计算机的标准设备。 理解了什么是浏览器/服务器架构,就了解了什么是Web应用程序。常见的计数器、留言版、聊天室和论坛BBS等,都是Web应用程序,不过这些应用相对比较简单,而Web应用程序的真正核心主要是对数据库进行处理,管理信息系统(Management Information System,简称MIS)就是这种架构最典型的应用。MIS可以应用于局域网,也可以应用于广域网。目前基于Internet的MIS系统以其成本低廉、维护简便、覆盖范围广、功能易实现等诸多特性,得到越来越多的应用。 应用程序有两种模式C/S、B/S。C/S是客户端/服务器端程序,也就是说这类程序一般独立运行。而B/S就是浏览器端/服务器端应用程序,这类应用程序一般借助IE等浏览器来运行。WEB应用程序一般是B/S模式。 在本课程中,术语Web应用程序或者Webapp,是指那些用户界面驻留在Web浏览器中的任何应用程序。可以将其想像为一个连续统一体(如下图所示)。这个统一体的一端是呈现静态内容的Web应用程序。大多数Web网站都在此列(图中未画出)。而在另一端,则是行为类似常规桌面应用程序的Web应用程序。Struts就是用来构建位于这个统一体右半边的Web应用程序的框架。

语义网基础教程

第一章概述 1.1万维网现状 万维网改变了人类彼此交流的方式和商业创作的方式。发达社会正在向知识经济和知识社会转型,而万维网处于这场革命的核心位置。 这种发展使得人们对计算机的看法也发生了变化。起初,计算机仅仅用作数值计算,而现在则主要用于信息处理,典型的应用包括数据库,文档处理和游戏等等。眼下,人们对计算机关注的焦点正在经历新的转变,将其视作信息高速公路的入口。 绝大部分现有的网络内容适合于人工处理。即使是从数据库自动生成的网络内容,通常也会丢弃原有的结构信息。目前万维网的典型应用方式是,人们在网上查找和使用信息、搜索和联系其他人、浏览网上商店的目录并且填表格订购商品等等。 现有软件工具没有很好的支持这些应用。除了建立文件间联系的链接之处,最优价值和必不可少的工具是搜索引擎。 基础关键词的搜索引擎,比如Alta Vista、Yahoo,Google等,是使用现有万维网的主要工具。毫无疑问,加入没有这些搜索引擎,万维网不会取得现在这么大的成功。然而,搜索引擎的使用也存在一些严重过的问题: ●高匹配、低精度。即使搜到了主要相关页面,但它们与同时搜到的28758 个低相关或不相关页面混在一起,检索的效果就很差。太多和太少一样令人不满意。 ●低匹配或无匹配。有时用户得不到任何搜索结果,或者漏掉了一些重要的 相关页面。虽然对于现在的搜索引擎来说,这种情况发生的频率不高,但确实会出现。 ●检索结果对词汇高度敏感。使用最初填写的关键词往往不能得到想要的结 果,因为祥光的文档里使用了与检索关键词不一样的术语。这当然令人不满意,因为语义相似的查询理应返回相似的结果。 ●检索结果是单一的网页。如果所需要的信息分布在不同的文档中,则用户 必须给出多个查询来收集相关的页面,然后自己提取这些页面中的相关信息并组织成一个整体 有趣的是,尽管搜索引擎技术在发展,但主要的困难还是上述几条,技术的发展速度似乎落后于网上内容量的增长速度。 此外,即使搜索是成功的,用户仍必须自己浏览搜索到的文档,从中提取所需的信息,也就是说,对极其耗时的信息检索本身,搜索引擎并没有提供更多支持。因此,用信息检索来描述搜索引擎为用户提供的功能,是不确切的;用信息定位可能更加合适。另外,由于现有网络搜索的结果不易直接被其他软件进一步处理,因此搜索引擎的应用往往是孤立的。 目前,为网络用户提供更大支持的主要障碍在于,网上内容的含义不是机器可解读的。当然,有一些工具能够检索文档、把它们分割成更小的部分、检查拼写并统计词频等等。可是,一旦牵涉到解释句子含义和提取对用户有用的信息,现有的软件能力就有限了。举一个简单的例子。对现有技术而言,一下俩个句子的含义是难以区分的: 我是一个计算机科学的教授。 你不妨认为,我是一个计算机科学的教授。

相关主题
文本预览
相关文档 最新文档