当前位置:文档之家› constructing and mining a semantic-based academic social network

constructing and mining a semantic-based academic social network

constructing and mining a semantic-based academic social network
constructing and mining a semantic-based academic social network

Journal of Intelligent&Fuzzy Systems21(2010)197–207197 DOI:10.3233/IFS-2010-0451

IOS Press

Constructing and mining a semantic-based academic social network

Trong Hai Duong a,Ngoc Thanh Nguyen b,?and Geun Sik Jo a,?

a School of Computer and Information Engineering,Inha University,Korea

b Institute of Informatics and Engineering,Wroclaw University of Technology,Poland

Abstract.A number of studies have focused on how to construct an academic social network.The relational ties among researchers are either co-authorship or shared keywords that are captured from scienti?c journals.The problem with such a network is that researchers are limited within their professional social network.In this paper,we propose a novel method for building a social network explicitly based on researchers’knowledge interests.The researcher’s pro?le is automatically generated from metadata of scienti?c publications and homepage.By measuring the similarity between topics of interest,we are able to construct a researcher social network with relational linkages among authors that are produced by matching the their corresponding pro?les.A direct loop graph-based social network is proposed.The graph naturally represents such a social network.Interestingly,our results showed that a social network based on pro?le matching is more representative than network based on publication co-authorship or shared keywords.Researcher mining in the academic social network has been explored via two problems Researcher Ranking and Expert Finding.

Keywords:Ontology,researcher pro?le,ontology-based user pro?le,ontology integration,social network,social network visualization

1.Introduction

1.1.Related work

Social networks have recently attracted considerable interest.There are a number of systems for generating social network visualizations and performing statisti-cal analyses for the purpose of sociological research, including UCINet[30],JUNG[19],and GUESS[1]. With the intention of utilizing social networks for the semantic web,several studies have examined automatic extraction of social networks and visualization network applications such as TouchGraph’s Facebook Browser and Vizster.The TouchGraph’s Facebook Browser lets bloggers visualize their friends and photos.Vizster is ?Corresponding authors.Ngoc Nguyen Thanh and Geun Sik.E-mail:thanh@pwr.wrov.pl(N.T.Ngugen);gsjo@inha.ac.kr (G.S.Jo).built as a visualization system that end-bloggers of so-cial networking services can use to facilitate discovery and increase awareness of their online community[17]. Both of these tools are used to display the network be-tween a blogger and his friends in large communities such as Facebook and Friendster.Takama et al.[28]at-tempted to develop a visualization map that contributed to an interactive blog search.The visualization map was facilitated by linking the keywords of the blogs.Karger and Quan[20]meanwhile presented a visualization sys-tem that displays messages from multiple blogs together as a reply graph,a diagram describing relationships be-tween a message and all comments related to the mes-sage.Their result showed that bloggers were able to understand how the relevant issues are constructed and related.

In the academic area,several researchers have at-tempted to conduct a thorough investigate on of the issue of extraction and mining of an academic social

1064-1246/10/$27.50?2010–IOS Press and the authors.All rights reserved

198T.H.Duong et al./Constructing and mining a semantic-based academic social network

network.Harrison and Stephen[15]described the elec-tronic journal as the heart of an online scholarly com-munity where academic journals principally function as channels of communication for practicing scholars. Newman[23,24]constructed a scientist collaboration network based on co-authorship data from journal pa-pers.In this collaboration network,two scientists were considered to connect if they had authored a paper to-gether.The problem with such a network is that re-searchers are limited within their professional social network.

Expert?nding is one of the most important sub-jects for mining from social networks.The task of expert?nding entails identifying persons with rele-vant expertise or experience for a given topic.There are many methods that focus on expert?nding,and mainly fall into two categories:pro?le-based methods and document-based methods.In pro?le-based meth-ods[2,3,11,22],researchers?rst build a term-based ex-pertise pro?le(called document reorganization in[11]) for each candidate,and rank the candidate experts based on the relevance scores of their pro?les for a given topic by using traditional ad hoc retrieval models.In document-based methods[3,5,25],instead of creat-ing such term-based expertise pro?les,researchers use supporting documents as a bridge and rank the candi-dates based on the co-occurrences of topic and candi-date mentions in the supporting documents.However, the number of candidate experts is limited;for example, there are only1092candidate experts in TREC[2,5], but in a network everyone can be a candidate,and thus the number of candidate experts can be in the millions. In addition,on the Web,only unstructured data is avail-able,and candidate names and topics are presented as keywords in plain texts,whereas a social network con-tains not only personal local information but also com-plex relationships.

1.2.Our approach

From a list of researchers’publications,it is possible to get to know the knowledge interests of a researcher, his/her collaborators,and even the latest conferences attended by that researcher.Moreover,other personal information of a researcher such as basic information (e.g.photo,af?liation,and position),contact informa-tion(e.g.address,email,and telephone),and educa-tional history(e.g.graduated university and major) may be extracted from his/her personal sites.There-fore,for each researcher,we can create a pro?le based on an ontology by extracting the pro?le information from his/her homepage and journal papers.This in-cludes typical information in the researcher pro?le(see Fig.1).By measuring the similarity between the inter-esting topics,we are able to construct an academic so-cial network.Relational linkages among authors are not only co-authorship but also are produced by matching their corresponding pro?les together.In particular,an academic social network is generated by considering a target researcher who plays a central role.The tar-get’s co-authors are counted as the second level of the network.The complete network is decided by n levels where n>1.Relational linkages among the researchers are produced by matching their corresponding pro?le together.Moreover,the number of candidate experts that can be utilized will be in the millions,and we can ?nd experts for a given topic by taking into account the contributions from all the researchers in the academic network and the score of his/her research topics.

Our approach aims at conducting a thorough investi-gation of the issue of the construction and mining of an academic social network,providing the following main contributions:

?We propose a novel method for building an academic social network based explicitly on re-searchers’knowledge interests.In particular,it fo-cuses on how to automatically extract the pro?le of a researcher from metadata of his/her scienti?c publications and homepage,and how to identify researchers with interest knowledge that is similar to that of the target.The target pro?le is further im-proved by learning interest knowledge from sim-ilar pro?les.

?A direct loop graph-based social network is pro-posed.The graph naturally represents such a social network.It is useful to deduce indirect re-lationships among researchers.The network can be reduced by considering only a speci?c rela-tion or combining all relations.Interestingly,our results showed that a social network based on pro?le matching is more representative than the networks based on publication co-authorship or keywords.

?Researcher mining in the academic social net-work has been explored.We solved two main problems related to mining:Researcher Ranking and Expert Finding.In the researcher ranking,the researchers are ranked based on an importance weight.The importance measurement for each re-searcher takes into account the contributions from all the other researchers in the academic network

Fig.1.Research pro?le.

200T.H.Duong et al./Constructing and mining a semantic-based academic social network

?S c=(t1,t2,...,t n)is the collection of all key-objects(terms)in the document collection D c,??→S c=(w1,w2,...,w n)is a feature vector of the concept c,where w i is the weight distribution of term i in S c,0≤w i≤1.

2.3.Researcher pro?le generation

This section describes the process of building an on-tological user pro?le based on the techniques described in the previous section.Hierarchical user information in the initial user pro?le is treated as an ontology.More clearly,we learn and improve the users pro?le in the form of an ontology.

https://www.doczj.com/doc/0112301451.html,rmation collection

Homepage retrieval.Given a person’s name and possibly an email address,the system?nds the person’s homepage by the following steps:

?Feature Vector Construction:We collect100 homepages and extract keywords for each home-page manually.We then make a consensus among them to obtain the most representative keywords for a given homepage.The keywords are called components of the feature vector.The feature vec-tor can be used to distinguish between a homepage and other webpages.

?Query Generation:A set of mentions is generated from a person’s name.For example,for the name John Fitzgerald Kennedy,the corresponding set of mentions may contain Kennedy,J.F.Kennedy and John F.Kennedy.The?rst query uses the per-son’s name or each mention,and Google’s site:the operator with the most speci?c domain name ap-pears in their corresponding email address(Trong Hai Duong site:eslab.inha.ac.kr).If no result is returned,increasingly general queries are issued (Trong Hai Duong site:inha.ac.kr).If the?nal re-sult is empty,a simple query containing only the person’s name is used.A none-empty result is sent to Homepage Identi?cation.?Homepage Identi?cation:For each page in the re-sult,the system crawls its internal hyperlinks to collect the user’s entire web home directory.We match the feature vector with this result to choose the correct homepage.

Publication Retrieval.Many scholarly publications are currently available on the Internet and in digital https://www.doczj.com/doc/0112301451.html, is a sub search en-gine that has been proved to be the most effective in searching such publications.We use the Google search to crawl all user’s publications by using ba-sic keyword of the user’s name and propably email as follows:

?Feature Vector Construction:Similar to homepage retrieval,we construct a feature vector to represent

a publication site.By using the feature vector we

could distinguish between a publication site and other webpages.

?Query Generation:A set of mentions from a person’name is generated as presented in the homepage retrieval section.The query is the longest name of the person or a corresponding mention.

?Publication Collection:Each query is input to https://www.doczj.com/doc/0112301451.html, to crawl all user’s publi-cations.The result is?ltered by retaining the sites associated with a user’s email.The?nal result is passed to Publication Identi?cation.?Publication Identi?cation:For each page in the re-sult,we match the feature vector with this result;

then,depending on the matching cost,we can se-lect all correct publication sites.

https://www.doczj.com/doc/0112301451.html,rmation extraction

Personal Information Extraction.The method to extract personal information was referred to[21]as follows:

?Finding Relevant Web Pages:Given a researcher name and probably an email,Homepage and Pub-lication sites are identi?ed as presented in the above section.

?In preprocessing:(A)We separate the text into paragraphs(i.e.,sequences of tokens),(B)we de-termine tokens in the paragraphs,and(C)we as-sign possible tags to each token.

?In tagging:Given a sequence of units,we deter-mine the most likely corresponding sequence of tags by using a trained tagging model.In this pa-per,we make use of CRF as the tagging model.

It is a conditional probability distribution of a se-quence of labels given a sequence of observations, represented as P(Y/X),where X denotes the ab-servation sequence and Y the label sequence.All components Y i of Y are assumed to range over a ?nite label letter Y.The conditional probability is formulized as follows:

T.H.Duong et al./Constructing and mining a semantic-based academic social network201

p(x|y)=

1

Z(x)

exp

e∈E,j

λj t j(e,y|e,x)

+

v∈V,k

μk s k(v,y|v,x)

(3)

where x is a data sequence,y is a label sequence,

and y|e and y|v are the set of components of y asso-

ciated with edge e and vertex v in the linear chain

respectively;t j and s k are feature functions;the

parametersλj andμk are coef?cients correspond-

ing to the feature functions t j and s k respectively,

and are estimated from the training data;Z(x)is

the normalization factor.The model is used to?nd

the sequence of tags Y?with the highest likelihood

Y?=max y P(Y|X),using the Viterbi algorithm.?In training:The CRF model is built with labeled data and by means of an iterative algorithm based

on Maximum Likelihood Estimation.

Paper Mention Information Extraction.We man-ually describe structure paper resource portals such as ACM portal,IEEE Eplore,CiteSeer,etc,to obtain exactly important mentions of paper information such as the authors’name,af?liation and email,paper’s title,abstraction,and content,....For each publication site,we identify the corresponding paper portal.Then, depending on the structure paper portal,we can easily extract useful information from the publication site.If there is no paper portal indexed,we apply a method that is similar to Personal Information Extraction.

2.4.Interest knowledge building

2.4.1.Document presentation

The t f–idf weight(term frequency-inverse docu-ment frequency)is a weight often used in information retrieval and text mining.This weight is a statistical measure used to evaluate how important a word is to a document in a collection or corpus.Here,we use a tradi-tional vector space model(tf–id f)to de?ne the feature of the documents.

De?nition2.3(Feature Vector of Document).Let T d=(t1,t2,...,t n)be the collection of all of key-words(or terms)of the document d.Term fre-quency t f(d,t)is de?ned as the number of occur-rences of term t in document d.A set of term fre-quency pairs,P d={(t,f)|t∈T d,f>threshold},is called the pattern of a document.Given a pattern P d={(t1,f1),(t2,f2),...,(t m,f m)},let d be the fea-ture vector of document d and let td be the collection of corresponding terms to the pattern.Then we have:

d=(w

1,w2,...,w m)(4) td=(t1,t2,...,t m)(5) where

w i=

f i

m

j=1

f j

?log

|D|

|d:t i∈d|(6)

De?nition2.4(Feature Vector of Set of Documents). Let P i={(t1,f1),(t2,f2),...,(t m,f m)}be the pattern of the document i belonging to the set of document

ds,i=1,...,n.A set of term frequency pairs,P c=

i=1,...,n

P i,is called the pattern of the ds.Let

?→

ds be the feature vector of the ds and let tds be the collection of corresponding terms to the pattern.Then we have:

?→

ds=(w1,w2,...,w k)(7) tds=(t1,t2,...,t k)(8) where

w i=

f i

m

j=1

f j

?log

|D|

i

(9)

–|D|:the total number of documents in the corpus.–|d:t i∈d|:the number of documents where the term t i appears(that is).If the term is not in the corpus,this will lead to a division-by-zero.It is therefore common to use1+|d:t i∈d|.

2.4.2.Training

?Building an ontology representing a common user’s pro?le.

?Assigning documents to the ontology.?Computing the feature vector of each category (concept),via the following method:

–For each leaf concept,the feature vector is cal-culated as the feature vector of a set of docu-

ments:

?→

S c=

?→

ds c(10)

–For each non-leaf concept c,its feature vec-tor is calculated by taking into consideration

202T.H.Duong et al./Constructing and mining a semantic-based academic social network the contributions from the documents(denoted

D c)that have been assigned to it and other doc-

uments(denoted D c ,for any c is a direct sub

concept of c)that have been assigned to direct

sub concepts of the concept c:

?→

S c=α?→

ds c+β

?→

ds c (11)

where ?→

ds c and

?→

ds c correspond to the feature vec-

tors of the sets of documents D c and D c ,0≤α,β≤1andα+β=1.

2.4.

3.Automatically building a user’s pro?le

After collecting the relevant document,a number of features are extracted by the fore mentioned equation, which is the general candidate concept of the per-sonal ontology.With these concepts,a general personal

Algorithm1:Automatically building a user’s pro?le

input:Given a set of key words(keyWord)concerned

with the user and the pre-de?ned pro?le

(commonPro?le),trained as described above output

:

Corresponding user’s pro?le

/*Collecting use’s information*/ infoUser←?CollectingInfoUser(keyWord);

1

featureVector←?ExtractingFeatureVector

2

(infoUser);

commonPro?le←?AscendingImportantConcept 3

(commonPro?le);

foreach each feature vector d of each document belong to

4

featureVector do

foreach each concept c belonging to commonPro?le

5

do

/*Similarity between a concept and document*/

sim(d,c)=sim(?→d,

?→

S c)

=

(T d

i

,T c2

i

)∈K

(d i?S c i)

n

i=1

(d i)2?

n

i=1

(S c i)2

(12)

where K={(T d i,T c2i)|Sims(T d i,T c2i)=1}

6

if match≥threshold then

7

if the concept c does not exist in userPro?le 8

then

create a new concept c for the

9

userPro?le;

end

10

add document d into the corresponding

11

concept c belonging to userPro?le;

break;

end

12

end

13

end

14

Return(userPro?le);

15ontology can be automatically built,as shown in

Algorithm1.

3.Researcher social network visualization

3.1.An enrich social network of researchers

An enrich social network of researchers is generated

by considering a target researcher who plays a cen-

tral role.The target’s co-authors and referenced au-

thors(co-operators)collected from the target’s pub-

lications are counted as the second level of the net-

work.Continuously,the target’s collaborators are con-

sidered as the target researcher.His/her collaborators

are supplemented on the network as the third level.

The complete network is decided by n levels,where

n>1.The researchers’pro?les are generated.Rela-tional linkages among the researchers are produced by

matching between the corresponding personal pro?les.

Notice that there may be numerous edges connecting

two researchers.The number of the edges is equal to

the number of matches between their corresponding

pro?les.Here,we de?ne the researcher social network

as a direct loop graph:

De?nition3.1A researcher social network is a direct loop graph with the following quadruple:

G=(C?,R?,N,M)(13) where,

?C?is a set of nodes representing researchers.?R?is a set of arcs representing relations between researchers:Co-author,Reference-author,and In-teresting topic.Each arc is associated by a nu-merical value being the weight(w)of the relation represented by the arc.

–w co ij=(n co ij/N co i),where w co ij is the weight of the co-author relation from the author j to i’s;

n co ij is the number of collaborative instances of

the author j with i,the?rst author;N co i is the

number of co-authors of i.

–r ref ij=(n ref i/N ref i)where w ref ij is the weight of the reference-author relation from the author j

to i;n ref ij is the number of reference instances

of the author j in i’s papers;N ref i is the number

of reference-authors of i’s.

–Weight of each interesting topic relation is

equal to the similarity degree between two

feature vectors representing the corresponding

topics.

T.H.Duong et al./Constructing and mining a semantic-based academic social network203

?N is an adjacency matrix of G,denoted as N(G), is a n?by?n matrix where n is the number of nodes in G,and entry is the number of arcs in G with endpoints(v i,v j)/v i=v j.

?M is an incidence matrix of G,written as M(G), and is a n?by?m matrix where m is the number of edges(relations)in G;if v i is the starting point of

e j,entry m ij is equal to–1;i

f v i is the second point

of e j,entry m ij is equal to w>0being the weight of a relation that is represented by the arc e j.?If vertex v is a startpoint of edge e,then v and e are incident values.

?The degree of vertex v,written as d(v),is the number of incident values of edges.

?A local matrix of vertex v i,denoted as L(v i) being M(G)is limited by

left column

v∈(v0,...,v i)

d(v)+1and

right column

v∈(v0,...,v i)

d(v)+d(v i)(14)

where v i is a vertex at row i of matrix M(G). The network naturally represents such a social net-work.It is useful to deduce indirect relationships among authors.The network can be reduced by considering only speci?c relations or combining all relations.For example,a co-author social network is generated from the network by taking only co-author relations.A net-work that combines all the relations(and is thus called a combined social network)is effective to?nd a relevant author.

3.2.A combined social network of researchers

Here,we simply present the combined social network as follows:

De?nition3.2A combined social network is given by a graph G=(V,E),where V is a set of nodes repre-senting researchers and E is a set of edges(V*V)rep-resenting the relationships between the corresponding researchers.The relationships are combined by all the relations:Co-author,Reference-author,and Interesting topic.In particular,we need to estimate the weight of a given relation to blend all the relations together.

Let w(v i,v j)be the relation weight,and w ij= w(v i,v j)be the weight of all relations from v i to v j.The relation weight matrix representation of G,W=(w ij), is a n×n matrix where

w ij=α?w c+β?w r+δ?w i(15)w ij is the combined weight of the researcher social net-work graph,w c is the weight of a co-author relation,w r is the weight of a reference-author,and w i is the weight of an interesting topic.To calibrate the coef?cientsα,β,andδ,we can generate the coef?cients by taking a consensus among experts’suggestions.

De?nition3.3(Forward researchers).For any re-searcher v i∈V,the forward researchers of v i are de-?ned as F v

i

={v j|v j∈V,?r ij∈E}.

De?nition3.4(Backward researchers).For any re-searcher v i∈V,the forward researchers of v i are de-?ned as B v

i

={v j|v j∈V,?r ji∈E}.

4.Social network mining

4.1.Important researchers of an academic network An importance measurement of a researcher must take into account the contributions from all the other researchers in the academic network and the score of his/her research topics.It consists of two steps,Initial-ization and Propagation.

In Initialization,?rst,a combined social network G=(V,E)is composed,where V is a set of nodes representing researchers and E is a set of edges(V?V) representing the relationship between the correspond-ing researchers.The initial importance score(weight) of each relationship is calculated by eq.(14).Sec-ond,we use the interest knowledge of the researcher’pro?le to calculate an initial importance score of re-search topics for each person(the weight of the node). Suppose T={t1,t2,...,t m}is a collection of inter-esting topics.Each topic is represented by a descrip-tion t={a1,a2,...,a k}and the corresponding fea-ture vector s={s1,s2,...,s k},where0≤s i≤1is the weight of the term a i.We denote f(t)as a func-tion to compute the score of the feature vector s of the corresponding topic t.The initial importance score of research topics for the researcher c is calculated as follows:

W c=

m

i=1

f(t i)

n

j=1

h

j

l=1

f(t l)

(16)

where m and h j are the number of feature vectors of the corresponding researchers c and j,respectively.n is the number of researchers in the social network.

In Propagation,the importance score of each re-searcher is improved by taking into account the contri-

204T.H.Duong et al./Constructing and mining a semantic-based academic social network butions from all the other researchers in the academic

network via characterization of four features of poten-

tially important scores of the researchers and the rela-

tions,which drive the drifting stream of consciousness:

?A researcher is more important if there are more

relations originating from the researcher.

?A researcher is more important if there is a re-

lation originating from the researcher to a more

important researcher.

?A researcher is more important if it has a higher

relation weight to any other researchers.

?A relation weight is higher if it originates from a

more important researcher.

Let r(c i)be a function of an importance weight of

researcher c i,r i=r(c i)be a importance weight value of

the researcher c i,w(c i,c j)be a relation weight function,

and w i,j=w(c i,c j)be the weight of all relations from

c i to c j.It is possible that there exists more than one

relation from researcher c i to researcher c j.For exam-

ple,the researcher Geun Sik Jo may have two relations,

Supervisor and Co-author,with the researcher Trong

Hai Duong.Therefore,r j w i,j is the total importance

value of all the relations from concept c i to concept c j.

In fact,the basic idea underlying Propagation is sim-

ilar to the idea of[12];we present a similar recursive

formula that computes the weight of a relation starting

from researcher c i to researcher c j at the(k+1)th iter-

ation(see eq.(17)).The weight is proportional to the

importance of c i and is the inverse ratio of the sum of

all the importance values of c j’s backward concepts at

the k+1th iteration.

w k+1(c i,c j)=

r k(c i)

t i∈B i

r k(t i)

(17)

And the recursive formulae are used to calculate the importance of concept c i at the k+1th iteration.The im-portance consists of two parts;one contributed by all the importance values of c i’s forward researchers and the weight of relations from c i to the forward researchers with probabilityα.The other is contributed by some independent jump probabilities(here1/V)with prob-abilityλ;the formulation is then expressed as follows:

r k+1(c i)=α1

V

c j∈F i

w k+1(c i,c j)r k(c j),α+λ=1(18)

4.2.Expert?nding

The expert?nding task is generally de?ned as fol-lows[6,10,27]:given a keyword query,a list of experts and a collection of supporting documents,rank those experts based on the information from the data collec-tion.Expert?nding is similar to the traditional ad-hoc retrieval task,since both tasks target?nding relevant information items given a user query.This problem has many real-world applications.For example,organizers of a conference need to assign submissions to PC mem-bers based on their research interests and expertise.As a retrieval task,expert?nding has recently attracted much attention mostly due to the launching of the En-terprise track[6,27]of TREC[31].The key challenge in expert?nding is to infer the association between a person(i.e.,candidate expert)and an expertise area (i.e.,topic)from the supporting document collection. Different from previous researches,we?nd experts via the academic social network,taking into consider-ation two aspects:if a researcher has authored many papers on a topic or if the topic is mentioned in his/her many papers,then he/she is a candidate expert on the topic;and,if a researcher has relationships(such as co-author,reference,similar topic)with many other re-searchers who are experts on a topic,then he/she also is a candidate expert on the topic.The expert?nding is similar to the above Importance Researcher problem since for both tasks the importance measurement of a re-searcher must take into account the contributions from all the other researchers in the academic network.The major difference is that the expert?nding only considers researchers descending score of importance researcher based on a given topic,but not a combination of all topics.The importance is the consideration of that the relevancy of a candidate expert to the topic.It consists of two steps,Initialization and Propagation.

In Initialization:our strategy for calculating the ini-tial expert score is based on the probabilistic model as follows:

Formally,suppose T={t1,t2,...,t m}is a collection of interesting topics.Each topic is represented by a de-scription t={a1,a2,...,a k}and the corresponding feature vector s={s1,s2,...,s k}where0≤s i≤1is the weight of the term a i.Let R be a binary random variable to denote the relevance(1for relevant and0 for non-relevant).Given a query t and an expert candi-date c,we are interested in estimating the conditional probability R(R|c,t)indicating whether the candidate c is relevant to topic t or not.Denote oc(c,t,s)as the weight of the term t occurring in the feature vector s of a corresponding topic in T.

P(R=1|c,t)=

s

oc(c,t,s)

c i∈V

s

oc(c i,t,s)

P(t)(19)

T.H.Duong et al./Constructing and mining a semantic-based academic social network205

In Propagation:we use relationships between re-searchers to improve the accuracy score of experts, based on eqs(17)and(18).

5.Experiments

Our system automatically builds the academic social network explicitly based on the researchers’knowl-edge interests which are generated from the internet. The system enables users to?nd experts from the network.We evaluate proposed the system in the fol-lowing aspects:description of a dataset that is used for evaluating our proposal,explaining how to imple-ment the system,and an analysis of the experimental results.

5.1.Description of dataset

To evaluate our proposed academic social network construction method and expert?nding,we collect10 topics,each of which consists of a topic and a list of experts with corresponding emails.We perform collec-tion of dataset via two following steps:?rst,we refer to https://www.doczj.com/doc/0112301451.html,/project/PSN/dataset.html to obtain the sets of experts for a speci?c topic;second, we extract experts’emails from their journal papers by using search engine of https://www.doczj.com/doc/0112301451.html, with a heuristic extraction.The dataset is shown in the website htpp://https://www.doczj.com/doc/0112301451.html,.

5.2.Description of implementation

https://www.doczj.com/doc/0112301451.html,rmation collection and extraction

We use a meta-search engine to collect user’s in-formation.The information is generated from a con-sensus among two different kinds of search space, home/personal pages and scienti?c publications.When a user queries his/her pro?le,Management search cre-ates two different meta search engines and it assigns each meta search engine to collect a speci?c kind of information.Each meta search creates many queries concerning the assignment and sends each query to a speci?c search engine that it selects based on its knowl-edge base.

In Table1,we present the average results of evalua-tion of retrieving the user’s information with Precision and Recall1measurements.

1The Precision and Recall,two widely used statistical classi?-cations,were chosen to compare the aforementioned methods.

Precision can be seen as a measure of exactness or?delity, whereas,Recall is a measure of completeness.

Table1

Precision and recall evaluation for information extraction

Precision Recall

Personal information91.478.1

Journal papers’mentions96.380.2

5.2.2.Keyword extraction

We applied the traditional Vector space model

(tf–id f)to extract features from documents.However,

in our proposal,we improved the traditional method by

analyzing semantic between keywords.For example,

we consider the sentence“A computer is a machine that

manipulates data according to a list of instructions such as laptops,notebooks,servers.”.In this case,instead of keywords with the pattern{(computer,1),(machine,1),

(data,1),(laptop,1),(notebook,1),(server,1)}ex-

tracted by most existing methods,our approach obtains {(computer,4),(machine,2),(data,1),(laptop,1), (notebook,1),(server,1)}.The reason for this is

that the hypernym relation from computer to laptop,

notebook and server,and the hypernym relation from machine to computer are counted by our method.This means if a term is a hypernym of n other terms,the term occurs on the text corpus n+1times.Note that a hypernym relation can be quickly and effectively recognized by the lexico-syntactic patterns described in[7].While calculating the similarity between cate-gories,it is necessary to determine the correspondence among keywords belonging to each category’s feature vector.Thus,a similarity measure between keywords (terms)is necessary for the methods to distinguish between categories.Here,we applied our previous methods[7,8]to measure the similarity.

In Fig.2,we present average results of the compar-

isons between the traditional tf–id f model and our im-

proved model,generated from a large number of test

samples.

5.2.3.Pro?le construction

We created a common pro?le that consists of two

parts;demographic information of the user and interest-

ing topics the user.The interesting topic is constructed

based on vocabularies and structures from relevant

travel sections of the Open Directory Project(ODP)

and Yahoo!Category2.In particular,for each category,

2The Open Directory Project aims to build the largest human-edited directory of Internet resources and is maintained by community editors who evaluate sites to classify them in the right directory.

Yahoo!category is maintained by the Yahoo!directory team for the inclusion of web sites into Yahoo!directory.

https://www.doczj.com/doc/0112301451.html,parison between tf-idf and the proposed model. we downloaded the?rst10web sites.If there were fewer than10web sites in the category,we downloaded all available information to generate feature vectors for each category.If a category had no web site available then its direct sub-concepts were included to construct its feature vector.

Our proposed system differs from other existing re-searches[13,29]with respect to the approach to build a user pro?le.In our case,the information is gener-ated from the internet by user’s name and correspond-ing emails.Whereas,other methods the information is collected by watching the user’s activities on the internet[29].Signi?cantly,we improved the pro?le by interaction among users.We applied ontology integration techniques[7–9]to combine the target pro?le with existing neighbors to exchange interest knowledge.

5.2.4.Social network construction

The academic social network is constructed via the following steps:

?For each topic,we collect new researchers(we as-sume that prior to this system has stored numerous researchers for the topic)who are authorities on the topic.

?For each researcher(who is considered as a target researcher),we collect his/her new publica-tions and new collaborators;the corresponding re-searcher’s pro?le is created/updated.Iteration by considering each his/her collaborator as the target researcher is performed.

?Each researcher is considered as a node of the network.The relational ties among nodes are gen-

T.H.Duong et al./Constructing and mining a semantic-based academic social network207

ci?c problems between social networks is presented in Table3.

6.Conclusions

This paper conducted a thorough investigation of the issue of extraction and mining of an academic social network.Speci?cally,it focused how to extract pro?les for researchers and the identi?cation of researchers with similar interest knowledge to facilitate building an aca-demic social network.Mining in the social network has been explored for two problems,Expert ranking and Ex-pert?nding.We de?ned a dataset(https://www.doczj.com/doc/0112301451.html,) for evaluation of our proposal.The evaluation results, show that the relationships can be very useful for expert mining in a social network.In the future work,we will apply the proposed methods to a speci?c application of a user interaction management system.

References

[1] E.Adar,GUESS:The Graph Exploration System,http://

www.hpl.hp.col/research/idl/projects/graphs

[2]L.Azzopardi,K.Balog and M.de Rijke,Language modeling

approaches for enterprise tasks,In Proceedings of TREC-05, 2006.

[3]K.Balog,L.Azzopardi and M.de Rijke,Formal models for

expert?nding in enterprise corpora,In Proceedings of SIGIR-06,2006.

[4] C.Buckley and E.M.V oorhees,Retrieval evaluation with in-

complete information,In Proc of SIGIR4(2004),25–32. [5]Y.Cao,J.Liu,S.Bao and H.Li,Research on expert search

at enterprise track of trec2005,In Proceedings of TREC-05, 2006.

[6]N.Craswell,A.P.de Vries and I.Soboroff,Overview of the

trec-2005enterprise track,In Proceedings of TREC-05,2006.

[7]T.H.Duong,N.T.Nguyen and G.S.Jo,A method for inte-

gration across text corpus and wordnet-based ontologies,In IEEE/ACM/WI/IAT2008Workshops Proceedings,IEEE Com-puter Society3(2008),1–4.

[8]T.H.Duong,N.T.Nguyen and G.S.Jo,A method for inte-

gration of wordnet-based ontologies using distance measures, proceedings of KES2008,Lecture Notes in Arti?cial Intelli-gence5177/2009(2008),210–219.

[9]T.H.Duong,N.T.Nguyen and G.S.Jo,A method for inte-

grating multiple ontologies,Cybernetics and Systems40(2) (2009),123–145.

[10]Y.Fang,L.Si and A.Mathur,Ranking experts with discrimi-

native probabilistic models,In SIGIR Workshop on Learning to Rank for Information Retrieval,Boston,USA,2009. [11]Y.Fu,W.Yu,Y.Li,Y.Liu,M.Zhang and S.Ma,Thuir at trec

2005:Enterprise track,In Proceedings of TREC-05,2006. [12]W.Gang,L.Juanzi,F.Ling and W.Kehong,Identifying

potentially important concepts and relations in an ontol-

ogy,In Proceedings of International Semantic Web Confer-ence2008,Lecture Notes in Computer Science5318(2008), 33–49.

[13]M.Golemati, A.Katifori, C.Vassilakis,G.Lepouras,

C.Halatsis,Creating an ontology for the User Pro?le,Meth-

ods and Applications,In Proceedings of the First International Conference on Research Challenges in Information Science, RCIS(2007),407–412.

[14]N.Guarino,C.Masolo and G.V.OntoSeek,content-based

access to the web,IEEE Intelligent Systems14(3)(1999), 70–80.

[15]T.M.Harrison and T.D.Stephen,The Electronic Journal as the

Heart of an Online Scholarly Community,Networked Schol-arly Publishing,Library Trends,1995.

[16]J.He?in,J.Hendler and S.L.SHOE,A Knowledge Represen-

tation Language for Internet Applications,Technical Report CS-TR-4078(UMIACS TR-99-71),University of Maryland at College Park,1999.

[17]J.Heer and D.Boyd,Vizster:visualizing online social net-

works,IEEE Symposium on Information Visualization(Info-Vis),2005.

[18]https://www.doczj.com/doc/0112301451.html,/

[19]JUNG,Java Universal Network/Graph Framework.http://

https://www.doczj.com/doc/0112301451.html,/

[20] D.R.Karger and D.Quan,What would it mean to blog on the

semantic web,International Semantic Web Conference2004, LNCS3298,(2004),214–228.

[21]Y.Limin,T.Jie and L.Juanzi,A uni?ed approach to re-

searcher pro?ling,Proceedings of the IEEE/WIC/ACM In-ternational Conference on Web Intelligence,IEEE Computer Society(2007),359–366.

[22] C.Macdonald,B.He,V.Plachouras and I.Ounis,University of

glasgow at trec2005:Experiments in terabyte and enterprise tracks with terrier,In Proceedings of TREC-05,2006. [23]M.E.J.Newman,The structure of scienti?c collaboration net-

works,Proceedings of the National Academy of Sciences (2001),404–409.

[24]M.E.J.Newman,Co-authorship networks and patterns of sci-

enti?c collaboration,Proc Natl Acad Sciences101(2004), 5200–5205.

[25] D.Petkova and W.B.Croft,Umass notebook2006:Enterprise

track,In Proceedings of TREC-06,2007.

[26] A.Sieg,B.Mobasher and R.Burke,Learning ontology-based

user pro?les,a semantic approach to personalized web search, IEEE Intelligent Informatics Bulletin8(November2007),1.

[27]I.Soboroff,A.P.de Vries and N.Craswell,Overview of

the trec2006enterprise track,In Proceedings of TREC-06, 2007.

[28]Y.Takama,T.Kajinami and A.Matsumura,Blog search with

keyword map-based relevance feedback,In Fuzzy Systems and Knowledge Discovery,Springer,Berlin/Heidelberg,2005, 1208–1215.

[29]J.Trajkova and S.Gauch,Improving Ontology-Based User

Pro?les,Presented at RIAO2004,Vaucluse,France,2004, 380–389.

[30]UCINET,Social Network Analysis Software,http://

https://www.doczj.com/doc/0112301451.html,/

[31] E.V oorhees and D.Harman,editors,Proceedings of Text Re-

trieval Conference(TREC1-9),NIST Special Publications, 2001.https://www.doczj.com/doc/0112301451.html,/pubs.html

Copyright of Journal of Intelligent & Fuzzy Systems is the property of IOS Press and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use.

趋势分析之语义网

趋势分析之语义网 近几年来,语义网越来越频繁地出现在IT报道中,PowerSet、Twine、SearchMonkey、Hakia等一批语义网产品也陆续推出。早在2010年,Google就已经收购了语义网公司Metaweb。对于这次收购Google产品管理主管杰克·门泽尔(Jack Menzel)发文称,该公司可以处理许多搜索请求,但Metaweb的信息可以使其处理更多搜索请求,“通过推出搜索答案等功能,我们才刚刚开始将我们对互联网的理解用于改进搜索体验”,但对于部分搜索仍然无能为力,“例如,‘美国西海岸地区学费低于3万美元的大学’或‘年龄超过40岁且获得过至少一次奥斯卡奖的演员’,这些问题都很难回答。我们之所以收购Metaweb,是因为我们相信,整合Metaweb的技术将使我们能提供更好的答案”。这表明语义网技术经过近10年的研究与发展,已经走出实验室进入工程实践阶段。 语义网热度变化图 语义网(Semantic Web)是一种智能网络,它不但能够理解词语和概念,而且还能够理解它们之间的逻辑关系,可以使交流变得更有效率和价值。语义网实际上是对未来网络的一个设想,现在与Web 3.0这一概念结合在一起,作为3.0网络时代的特征之一。 语义网这一概念是由万维网联盟的蒂姆·伯纳斯-李(Tim Berners-Lee)在1998年提出的一个概念,实际上是基于很多现有技术的,也依赖于后来和text-and-markup与知识表现的综合。其渊源甚至可以追溯到20世纪60年代末期的Collins、Quillian、Loftus等人的研究,还有之后70年代初Simon、Schamk、Minsky等人陆续提出的一些理论上的成果。其中Simon在进行自然语言理解的应用研究时提出了语义网络(Semantic Network,不是现在的Semantic Web)的概念。 下面我们用Trend analysis分析语义网领域内的研究热点。(点击链接即可进入https://https://www.doczj.com/doc/0112301451.html,/topic/trend?query=Semantic%20Web)

语义网和语义网格中的本体研究综述

语义网和语义网格中的本体研究综述 余一娇1,2 (1 华中师范大学语言学系,武汉,430079) (2 华中科技大学计算机学院 武汉 430074) E-mail: yjyu@https://www.doczj.com/doc/0112301451.html, 摘要:本体是语义网和语义网格研究中的一种重要方法。文中首先介绍本体的定义、本体的四元素表示法和六元组表示方法,以及本体的设计分析生命周期;然后回顾语义网研究中曾产生过巨大影响的七种本体语言。通过分析众多文献的观点,文中提出在将来我们应重点针对 DAML+OIL 和OWL两种本体语言进行深入研究。文中还列举出了本体在生物信息计算和网络管理领域应用的两个实例。最后根据语义网格和本体研究现状,提出了利用本体研究语义网格服务质量的基本思路和研究方法。 关键词:本体 本体语言 DAML+OIL OWL 语义网 语义网格 服务质量 1.前 言 Ontology在哲学领域常译为“存在论”,是指关于事物是否存在思考的学科。在计算机科学和人工智能领域则译为“本体”,其词义与哲学中的“存在论”大相径邻。1993年美国Stanford大学知识系统实验室的Gruber博士在文献[1]中定义:本体是用来帮助程序和人共享知识的概念的规范描述 (An ontology is the specification of conceptualizations, used to help programs and humans share knowledge.),后来该定义得到了进一步发展和完善[2]。文献[1]还指出:概念化是关于世界上的实体,如:事物、事物之间的关系和约束条件的知识表达。而规范一词是强调这种表达是用一种固定的形式来描述。从我们已经阅读的多篇相关文献来看,几乎所有论文都接受了上述关于本体的定义。 迅速增加的Web页面数量、丰富的页面内容和时新的消息,为知识工程领域的科学家实现面向终端用户的应用研究、开发带来了极好的机会。在Internet上实现基于语义的信息检索和情报收集,无疑是广大因特网用户的迫切需求。2001年5月,Web之父Tim Berners-Lee和合作者在《Scientific American》杂志上发表了“The Semantic Web”一文。文中正式提出了语义网的概念,鉴于Tim Berners-Lee在Web领域的巨大影响,该文后来一直被公认为是开辟语义网研究的源头文献。为了实现知识的共享和重用,语义网研究中引入本体技术是最近几年来的发展趋势,且正在被不断的实践。知识工程和人工智能学科针对本体技术进行研究已有多年历史,其中最有影响的科学研究组织是美国Stanford大学的知识系统实验室。该实验室的Gruber博士以及Deborah L. McGuiinness博士都对本体和语义网本体研究作出了巨大的贡献。 本文的结构安排如下:第二部分介绍本体的表示方法和本体开发的生命周期;第三部分介绍语义网研究中的本体语言发展过程以及多种本体语言之间的关系;第四部分介绍本体在语义网研究中的应用实例;第五部分讨论我们今后一年的研究思路和研究目标。 2. 本体的表示与本体开发 关于本体的定义如今在计算机科学领域已比较统一,但在具体的应用环境中如何规范化描述本体至今还缺乏统一的标准。目前有两种本体表示方法应用比较广泛,第一是传统的四元素表示方法、第二是较新的六元组表示法。前者源于Gruber博士的观点,后者则是2002年由新加坡南洋理工大学的Myo Myo Naing博士在一篇国际会议论文中提出。前者在世界范围内得到了比较高的认同,但

语义网技术

语义网技术是当前互联网技术研究的热点之一。目前大多数页面中的使用的文字信息不便于机器自动处理,只适合人们自己阅读理解,解决可自动处理的数据和信息方面发展较慢的问题,在网络上信息量剧增、人们迫切需要计算机分担知识整理这一压力的今天,成为信息检索的一个难题。本文首先建构了一种形式化的本体描述方法,并给出了标准化的定义,主要针对在本体层定义的基础上对逻辑层展开了基础研究,对于本体概念进行逻辑推理,通过本体中关系的属性,推理出隐含在本体概念间的关系。在本文的定义中本体包含五个基本的建模元语,概念,关系,函数,公理,实例,通过本体的五个建模元语构建本体,给出本体的形式化的规范定义,本体描述中的四种特殊关系有继承关系,部分关系,实例关系和属性关系,关系的各种属性是进行本体推理的逻辑依据,有传递性属性,关系继承性,反向关系继承性,逆属性,对称性属性,反身性属性,等价性属性等等,依据这些属性的逻辑性,可以推理出所要的查找。本文利用属性的逻辑推理机制采用树搜索的查找检索方式查找出隐含在概念之间的逻辑关系是本文所要进行的主要工作,这样可以判断出概念之间是否存在一些给定判断的关系,或者一个概念和什么概念存在给定的关系,再或者两个概念间都存在什么关系等等都是我们用推理检索所要实现的判断。摘要语义网技术是当前互联网技术研究的热点之一。目前大多数页面中所使用的文字信息不便于机器自动处理,只适合人们自己阅读理解,解决可自动处理的数据和信息方面发展较慢的问题,在网络上信息量剧增、人们迫切需要计算机分担知识整理这一压力的今

天,成为信息检索的一个难题,本文中对本体层概念的推理就是为了探索计算机理解语义所做的一个尝试。语义网的体系结构向我们说明了语义网中各个层次的功能和特征,语义网的研究是阶段性的,首先解决syntax(语法)层面的问题,也就是xml,然后是解决(数据层)基本资源描述问题,也就是rdf,然后是(本体层)对资源间关系的形式化描述,就是owl,damloil,这三步已经基本告罄,当然,基于rdf 或者owl的数据挖掘和ontology管理(如合并,映射,进化)按TIMBERNERS-LEE的构想,这个工作大概到2008左右可以完成,在商业上,很快就会在知识管理,数据挖掘,数据集成方面出现一些企业。目前亟待发展的是LogicLayer(逻辑层),这方面在国内外的期刊著作中还少有提到,接下来的工作就应该是对于owlbased的数据进行推理和查询了,当前的推理方法主要是针对本体而言的,而本体的概念是在某个特定领域范围内的,而且在知识库中推理和查询是紧密的结合在一起的,相辅相成的,查询的同时必然存在着推理,而这里的推理就必须要建立在一定的逻辑模型的基础上,所以推理的方法就是基于逻辑模型的逻辑推理,可采用逻辑推理的方法。本体中推理的重点在于推理结论的正确性、完备性,若是不能保证推理的正确性,则语义网的引入就不但没有给网络资源的查询带来便利,反而阻碍了网络的发展,而且还要保证推理的完备,不遗漏应有的推理结果。本体推理的难点在于推理的高效性、资源利用率,若推理虽能达到正确性,完备性的目的而浪费了大量的时间和资源,则语义网也不能达到预期的效果,所以推理方法的使用及其效果是语义网成功的关

语义网本体

Part2:创建本体 本次所创建的本体是一个植物(plant)本体,所用的工具是Protege4.3。首先根据植物的分类来建立本体的Schema层,按照不同的分类方式可以有不同的分类例如可以分为花(flower)、草(grass)和树(tree)三类。花又可以分为蔷薇科(Rosaceae )、十字花科(cruciferae)、百合科(liliaceae)。草又可以分为草坪草(turfgrass)、孔雀草(maidenhair)、千日草(One thousand days grass)。树又可以分为乔木(arbor)、灌木(shrub)。所建的Schema层如下图1所示。 图1 植物本体的Schema层构建图 2、添加属性,属性包括对象属性和数据属性。所添加的对象属性有:颜色、枯萎季节、茂盛季节开花时间、开花时长,其定义域均设置为Plant。添加的数据属性有:根茎的长度。具体的添加如下图2所示。 (1)对象属性添加图(2)数据属性添加图 图2 植物本体的属性构建图

3、添加相应的实例。为百合科添加实例:百合花(greenish lily flower )为乔木添加实例:雪松和杨树,为草坪草添加实例:马蹄金草(The horseshoe golden grass )具体的实例图如下图3所示。 图3 具体实例添加图 4、定义公理,例如可以对其定义灌木为丛生状态比较矮小。则需要添加对象属性丛生状态(Cluster_State)和子属性主要丛生状态(Main_Cluster_State),然后添加分类:Type,包括short and small和tall。对草坪草定义为:主要丛生状态是short and small。对乔木添加定义:主要丛生状态是tall。在Plant类下面添加叶子(leaf),然后添加对象属性is_part_of,给leaf定义为:叶子是树叶的一部分。对草坪草的具体的定义效果如下图4所示。 图4 草坪草定义效果图

语义网主要应用技术与研究趋势_吴玥

2012年第2期 Computer CD Software and Applications 信息技术应用研究 — 41 — 语义网主要应用技术与研究趋势 吴 玥 (苏州大学计算机科学与技术学院,江苏苏州 215006) 摘 要:我国企业多数已经实现了网络办公自动化,为企业的经营管理创造了优越的环境。但随着销售业务的增长,企业经营管理的范围逐渐扩大,其内部网络面临的运营难题更加明显,网络知识管理是当前企业存在的最大困难。语义网络技术的运用方便了知识管理系统的构建与操控,促进了企业知识管理效率的提升。针对这一点,本文主要分析了语义网应用的相关技术,对未来研究趋势进行总结。 关键词:语义网;应用技术;知识管理;趋势 中图分类号:TP391.1 文献标识码:A 文章编号:1007-9599(2012)02-0041-02 The Main Application Technology and Research Trends of Semantic Web Wu Yue (School of Computer Science&Technology,Soochow University,Suzhou 215006,China) Abstract:Our country enterprise majority already realize the network office automation,enterprise management to create a favorable environment.But as the sales growth,gradually expanding the scope of business management of enterprise,its internal network operator facing the problem is more apparent,network knowledge management is the current enterprise is the most difficult.Semantic network technology is convenient to use the knowledge management system's construction and operation,promote the enterprise to improve the efficiency of knowledge management.In view of this,this article mainly analyzes the semantic web technologies,the future research trends are summarized. Keywords:Semantic network;Application technology;Knowledge management;Trend 语义网是对未来计算机网络的一种假设,通过相匹配的网络 语言对文件信息详细描述,最终判断不同文档之间的内在关系。 简言之,语义网就是能参照语义完成判断的网络。企业在经营管 理中引进语义网有助于数据信息的挖掘,对数据库潜在的信息资 源充分利用,以创造更大的经济收益。 一、传统互联网知识管理的不足 互联网用于企业经营管理初期,加快了国内行业经济的改革进 步,促进了企业自动化操控模式的升级。然而,当企业经营范围不 断扩大之后,企业面临的网络管理问题也更加显著。如:业务增多、产品增多、客户增多等, 企业网络每天需要处理的文件信息不计其 数,基于传统互联网的知识管理系统也会遇到多种问题。 (一)检索问题。互联网检索是十分重要的功能,如图一。用 户在互联网上检索某一项资源时,常用的方法是通过关键词搜寻, 未能考虑到语义对资源搜索的重要性。这种检索模式下则会遇到许 多难题,如:对同义词检索会出现多余的无关资源,尽管用户在互 联网上可以查找到许多与关键词相关的信息,但多数是无用的。 图一 互联网信息检索 (二)集成问题。信息集成是网络系统按照统一的标准、编码、程序等,对整个系统存储的资源集成处理,然后实现信息资源的共享。企业互联网信息集成依旧采用人工处理,这是由于网络的自动代理软件不能处理文本代表的常识知识,信息集成问题将制约着互联网功能的持续发挥。 (三)维护问题。对于企业知识管理系统而言,其采用的文档大部分是半结构化数据,这种数据的维护管理难度较大。现有的互联网在文档维护方面缺乏先进的软件工具,对于文档信息的处理也会遇到不少错误。知识管理中的数据库资源错误会给企业经营造成误导,且带来巨大的经济损失。 二、语义网应用的相关技术 互联网研发对语义网应用研究的最终目标是“开发各种各样计算机可理解和处理的表达语义信息的语言和技术,让语义网络的功能得到最大发挥” 。因此,结合语义网络的功能特点、结构形式、信息储存等情况,用户需掌握各种语义网应用技术。就目前而言,语义网主要的应用技术包括: (一)编码技术。编码是计算机网络运行的重要元素,通过编码之后才能让程序信号及时传递。语义网编码技术就是通过编码处理将知识内容表达出来,这一过程能够把不同的知识编码为某个数据结构,从而方便了用户对数据的检索。编码技术要用到各种知识表达方法,如:一阶谓词逻辑表示法、产生式表示法、框表示法、语义网络表示法等等。 (二)框架技术。框架技术本质上就是对语义网进行层次划分,将网络结构分层不同的层面。语义网框架技术应用要借助语义 Web 模型,经过长期研究,我们把语义网体系结构分为7个层面,如图二。每个层面在语义网运行时都可发挥对应的功能,促进了语义网程序操控的稳定进行。层面框架的分析,可以掌握语义网体系中各层的功能强弱。 图二 语义网的体系结构

黄智生博士谈语义网与Web 3

黄智生博士谈语义网与Web 3.0 作者徐涵发布于 2009年3月26日下午6时0分 社区 Architecture, SOA 主题 语义网 标签 Web 2.0, 采访, 元数据, 语义网 近两年来,“语义网(Semantic Web)”或“Web 3.0”越来越频繁地出现在IT 报道中,这表明语义网技术经过近10年的研究与发展,已经走出实验室进入工程实践阶段。PowerSet、Twine、 SearchMonkey、Hakia等一批语义网产品的陆续推出,预示着语义网即将在现实世界中改变人们的生活与工作方式。在Web 3.0时代即将揭开序幕之际,正确理解、掌握语义网的概念与技术,对IT人士与时俱进和增加优势是必不可少的。为此,InfoQ中文站特地邀请到来自著名语义网研究机构荷兰阿姆斯特丹自由大学的黄智生博士,请他为我们谈一谈工业界人士感兴趣的语义网话题,包括什么是语义网、语义网与Web 3.0的关系以及语义网如何给商业公司带来效益等。 InfoQ中文站:您是语义网方面的权威专家,能否先请您为我们消除概念上的困惑。现在有一个说法,即Web 3.0就是语义网。但是除了W3C定义的语义网以外,关于Web 3.0还有许多种其他说法,您认为谁才真正代表了Web 3.0?为什么? 黄智生博士(以下称黄博士):首先需要说明的是:我不认为自己是所谓的“权威”。纵观万维网的发展,总是年轻人在创造历史,他们给人类社会带来了一次又一次的惊奇。且不说万维网之父Tim Berners-Lee在1989年构想万维网的时候仅仅三十出头。Web 1.0产生的雅虎和谷歌等国际大公司的创始人大多是年轻的博士生。Web 2.0产生的Facebook等公司创始人的情况也大体如此。Web 3.0的情况也可能如此。我们甚至都不能完全指望通过现有的IT大公司的巨大投入来发展语义网。这些大公司往往受着过去成功经验的束缚,而且新技术采用的是与以往完全不同的思路,从而会加深大公司对新技术的怀疑。当然,这也为年轻人书写历史创造辉煌提供了发展空间。 由于Web 1.0和Web 2.0技术的成熟,Web 3.0的想法实际上表达了现在人们对下一代万维网技术的种种期待。从这个意义上讲,Web 3.0并不等同于语义网。网络上对Web 3.0众说纷纭,都有一定的道理。但我有一定的理由相信,语义网技术是Web 3.0的重要技术基础。我于2008年底在国内一些大学巡回讲学报告

语义搜索的分类

语义搜索的分类 一.按语义搜索引擎服务内容的分类 语义搜索引擎从人们头脑中的概念到在搜索领域占据一席之地经历不少坎坷。语义网出现后,语义搜索迎来了高速发展的机遇期。虽然语义搜索服务内容主要集中在传统搜索引擎不擅长的语义网搜索方面。不过语义搜索引擎也试图拓展服务范围,提供比传统搜索引擎更全面的服务。语义搜索引擎的服务内容主要包括以下几个方面:知识型搜索服务、生活型搜索服务、语义工具服务等。 (1)知识型搜索方面,主要针对语义网知识信息资源。其中包括: ①词典型搜索服务。一种形式是如同使用电子词典一样,通过关键词直接查询与关键词对应的概念。这些概念由语义搜索引擎索引的本体文件中提取。另一种形式则是对在线百科全书的搜索服务,如PowerSet,这一点与传统搜索引擎近似,但语义搜索引擎在信息的组织上远胜于传统搜索引擎。 ②语义网文档(SWD)的查询服务。用户可以通过语义搜索引擎查询所需的语义网文档和相关的语义网文档。Falcons 为统一资源标识符(URI)定义的语义网对象和内容提供基于关键词的检索方式。Swoogle 从互联网上抽取由RDF 格式编制的语义网文档(SWDs),并提供搜索语义网本体、语义网例证数据和语义网术语等服务。 ③领域知识查询。部分语义搜索引擎提供了针对某个或某几个专业门类的信息检索服务,用户可以选择自己所需相关信息。Cognition 以搜索法律、卫生和宗教领域为主。个别语义搜索引擎提供针对特定领域的多媒体语义搜索服务,如Falcon-S 对足球图片的搜索服务。不过多媒体语义搜索面临与传统多媒体搜索相似的困境,缺乏有效的语义标注。对多媒体信息的辨别和分类能力仍有待提高。 (2)生活型搜索方面,语义搜索引擎在传统搜索引擎力所不及的诸方面发展迅速。 ①社会网络搜索。部分语义搜索引擎提供社会网络搜索功能,这种功能可以实现通过姓名、著作、所在单位等信息中的一条或几条,查询与这些信息有关联的更多信息,如我国的ArnetMiner。 ②资讯搜索。目前语义化的网络搜索服务能够更有针对性,更准确地为用户提供新闻资讯。Koru就是这方面的代表。 (3)语义工具服务。 这是语义搜索引擎所属的研究机构的一个较为独特的方面,和传统搜索引擎提供的桌面搜索等工具不同,语义搜索引擎提供的语义工具一般不是对语义搜索功能的直接移植,而是对文档的相似性、标注等进行处理用的。这些工具可以为语义搜索引擎的索引对象进行前期数据加工,同时也供科研使用。 理论上讲语义搜索引擎能够提供包括普通网络文档检索在内的所有类型网络文档搜索服务,但是由于语义搜索引擎对网页的索引方式不同,微处理器需要比传统搜索更长的时间才能分析完一个页面,因此很多语义搜索网站只能扫描到外部网站的二级页面,这样将难以满足用户全网络搜索的需求。 二.按语义搜索引擎服务模式分类 语义搜索引擎高速发展的阶段正值传统搜索引擎发展的平台期,虽然语义搜索引擎暂时尚不具备传统搜索引擎的市场竞争力,但是它们却可以很容易地借鉴传统搜索引擎的成

语义检索

在数字图书馆中,信息检索存在明显不足。在文献的组织与描述上,简单将关键词作为描述文献的基本元素,文献之间没有关联,是相互独立的、无结构的集合。在检索操作上,通常是基于关键词的无结构查询,难以反映词语间各种语义联系, 查询能力有限,误检率和漏检率很高,检索结果的真实相关度较低;计算查询和文档之间的相似度的方法也有局限。在用户交互界面上,用户的检索意图难以被机器理解,采用自然语言输入的检索关键词与机器的交互存在障碍。现有数字图书馆信息资源检索存在资源表示语义贫乏和检索手段语义贫乏、查准率低下等问题,语义网技术的出现,为数字图书馆的发展注入了新的活力,为信息检索质量的提高带来了新的生机。运用语义网技术,使解决信息检索中现存的问题,完善信息检索流程成为了可能。3.1 数字图书馆信息检索模型目前数字图书馆的信息检索主要借助于目录、索引、关键词方法来实现, 或者要求了解检索对象数据结构等, 对用户提供的关键词的准确性要求较高,基于语法结构进行检索, 却不能处理复杂语义关系,常常检索出大量相关度很差的文献。 图3.1 数字图书馆信息检索模型用户通过检索界面,输入关键词,文本操作系统对用户的关键词进行简单的语法层次的处理整合,与数字图书馆资源进行匹配检索,最终将检索的结果,再通过用户界面返回给用户。而数字图书资源,专业数据库等都是数字图书馆信息检索的范畴,这些数字化的知识资源主要以数据库形态分布于全球互联网的数千个站点,这种以数据库形式存放的信息资源,通常是电子化了的一次文献,包括元数据、摘要或者是全文,也可以是全文链接的地址。 24 基于语义网的数字图书馆信息检索模型研究 3.2 基于语义网的数字图书馆信息检索模型的设计思想数字图书馆信息检索系统存在诸多问题。查询服务智能化水平低,无法对用户请求进行语义分析;信息资源的共享程度低,仅仅采用题名、文摘或全文中出现的关键词标识文献内容,难以揭示文献资料所反映的知识信息,易形成信息孤岛;对用户输入的关键词进行句法匹配,查准率不高;片面追求查全率,返回大量无关结果等。这些问题最终造成用户的真正检索意图难以实现。人们希望有突破性的信息检索技术出现,能够支持更为强大的信息检索功能,具备理解语义和自动扩展、联想的能力,并为用户提供个性化服务。在这样的需求下,本节深入探讨了现存问题的解决方法,结合语义网技术,提出了以下基于语义网的数字图书馆信息检索模型的设计思想。3.2.1 机器理解与人机交互人们通过信息的交流和沟通,表达一定的思想、意思和内容,因此,自然语言和表达的信息中蕴含着丰富的语义。尤其是自然语言中,一词多义、一义多词现象十分常见,在不同的语境中,同样的词汇还可以表达出不同的意义。在人与人的交流中,近义词、反义词、词语的词性、语法结构等帮助人们在特定的语言环境中理解语言表达的确切含义,而计算机要做到这点却有难度。随着网络的不断发展,网络信息充斥着人们的视野。如何在浩如烟海的信息资源中,以最短的时间查找出相关资源,成为人们所关注的问题之一。通常,检索系统总会返回相关度不高,甚至完全无关的信息,而有些相关的信息却往往被遗漏了。一方面,检索工具没能把已经存在的、对用户有价值的信息检索出来,另一方面,信息资源没有很好的被归纳,提炼成知识。利用语义网技术,将语义丰富的描述信息和资源关联起来,通过机器理解和人机交互,对信息资源进行深层次的分析和挖掘。从本质上讲,人机交互是认知的过程,主要通过系统建模、形式化语言描述等信息技术,最终实现和应用人机交互系统。3.2.2 语义知识与描述逻辑从语义学的角度讲,语义是语言形式表达的内容,是思维的体现者,是客观事物在人们头脑中的反映[72]。人们在进行信息交流和沟通时,通过词语、符号来表达思想。当人们看到

语义网基础教程

第一章概述 1.1万维网现状 万维网改变了人类彼此交流的方式和商业创作的方式。发达社会正在向知识经济和知识社会转型,而万维网处于这场革命的核心位置。 这种发展使得人们对计算机的看法也发生了变化。起初,计算机仅仅用作数值计算,而现在则主要用于信息处理,典型的应用包括数据库,文档处理和游戏等等。眼下,人们对计算机关注的焦点正在经历新的转变,将其视作信息高速公路的入口。 绝大部分现有的网络内容适合于人工处理。即使是从数据库自动生成的网络内容,通常也会丢弃原有的结构信息。目前万维网的典型应用方式是,人们在网上查找和使用信息、搜索和联系其他人、浏览网上商店的目录并且填表格订购商品等等。 现有软件工具没有很好的支持这些应用。除了建立文件间联系的链接之处,最优价值和必不可少的工具是搜索引擎。 基础关键词的搜索引擎,比如Alta Vista、Yahoo,Google等,是使用现有万维网的主要工具。毫无疑问,加入没有这些搜索引擎,万维网不会取得现在这么大的成功。然而,搜索引擎的使用也存在一些严重过的问题: ●高匹配、低精度。即使搜到了主要相关页面,但它们与同时搜到的28758 个低相关或不相关页面混在一起,检索的效果就很差。太多和太少一样令人不满意。 ●低匹配或无匹配。有时用户得不到任何搜索结果,或者漏掉了一些重要的 相关页面。虽然对于现在的搜索引擎来说,这种情况发生的频率不高,但确实会出现。 ●检索结果对词汇高度敏感。使用最初填写的关键词往往不能得到想要的结 果,因为祥光的文档里使用了与检索关键词不一样的术语。这当然令人不满意,因为语义相似的查询理应返回相似的结果。 ●检索结果是单一的网页。如果所需要的信息分布在不同的文档中,则用户 必须给出多个查询来收集相关的页面,然后自己提取这些页面中的相关信息并组织成一个整体 有趣的是,尽管搜索引擎技术在发展,但主要的困难还是上述几条,技术的发展速度似乎落后于网上内容量的增长速度。 此外,即使搜索是成功的,用户仍必须自己浏览搜索到的文档,从中提取所需的信息,也就是说,对极其耗时的信息检索本身,搜索引擎并没有提供更多支持。因此,用信息检索来描述搜索引擎为用户提供的功能,是不确切的;用信息定位可能更加合适。另外,由于现有网络搜索的结果不易直接被其他软件进一步处理,因此搜索引擎的应用往往是孤立的。 目前,为网络用户提供更大支持的主要障碍在于,网上内容的含义不是机器可解读的。当然,有一些工具能够检索文档、把它们分割成更小的部分、检查拼写并统计词频等等。可是,一旦牵涉到解释句子含义和提取对用户有用的信息,现有的软件能力就有限了。举一个简单的例子。对现有技术而言,一下俩个句子的含义是难以区分的: 我是一个计算机科学的教授。 你不妨认为,我是一个计算机科学的教授。

基于本体的语义检索系统的研究与应用

基于本体的语义检索系统的研究与应用 董涛,孟祥武 北京邮电大学计算机科学与技术学院,北京(100876) E-mail:tdong2005@https://www.doczj.com/doc/0112301451.html, 摘要:基于本体查询的语义检索是建立在Semantic Web基础之上的一种检索技术。与传统搜索引擎技术相比,它极大地提高了系统的查全率和查准率。文章首先介绍了语义网和本体的基本概念,然后通过实际举例的概念层次图详尽地阐述了本体中概念及其关系的具体意义。最后利用本体构建工具Protege并结合本体的相关标准共同构建本体,通过Jena API实现了基于OWL本体文件的语义查询系统。 关键词:本体,语义网,OWL,Jena,Protege 0. 引言 随着Internet的迅猛发展,互联网上的信息正在随指数的速度在迅速增长,出现了信息爆炸的问题。在如此浩瀚的信息海洋中,检索到有价值的信息成为当前计算机检索系统必须解决的问题。因此,信息检索技术成为当前热门的研究课题。 目前,最主要的信息检索技术有两种。一种是基于目录的检索技术,它将相关主题的页面组织起来,形成一棵目录树。因此,检索的过程,就是遍历一棵目录树的过程。另一种是基于关键字匹配的检索技术,也是最常见的检索技术[1]。 以上两种信息检索技术在查全率和查准率方面还存在着很多欠缺之处。例如:当用户查询番茄时,搜索引擎只会将包含有“番茄”一词的页面提供给用户,而不会把包含有“西红柿”一词的页面也返回给用户。因此,这就存在着查全率的问题。与此同时,搜索引擎会把包含有“番茄花园”的页面返回给用户,但这并不是用户想得到的,因此,这在查准率方面就出现了问题。 为了解决查全率和查准率的问题,就需要提高信息检索技术的精度和覆盖率。如何使搜索引擎更加智能化,使它能够充分理解用户的意图,是信息检索技术需要迫切解决的问题。近年来,语义网的提出为增强搜索引擎的智能化提供了良好的解决方案。它将网络中的各种资源结构化,使得计算机能够识别、处理。 计算机首先将检索词本体化,检索引擎通过解析、推理,然后将相关资源从本体库中提取出来,最后返回给用户。这种智能的检索技术能够提高用户的满意度,减少不相关的结果,得到更多相关的结果。 本文从构建本体及其本体库的角度出发,结合实际的应用,阐述如何建立语义检索系统进行信息检索。 1. 语义网与本体的概述 1.1 语义网 在2000年11月的XML2000会议上,Tim Berners-Lee首次提出了语义Web的概念。他将语义Web定义为:语义Web是一个网,它包含了文档或文档的一部分,描述了事物间的明显关系,且包含语义信息,以利于机器的自动处理。他于2000年提出了语义Web的体系结构[2],如下图所示:

基于语义WEB服务的E-Learning技术与应用研究

河南省基础与前沿技术研究计划项目 申请书 项目名称:基于语义WEB服务的E-Learning 技术与应用研究 申请者: 所在单位(签章): 主管部门:河南省教育厅 联系电话: 电子信箱: 通讯地址: 邮政编码: 申请日期:2006 年09 月28 日 河南省科学技术厅制

填报说明 1.“申请书”用于申请河南省基础与前沿技术研究项目,由申请者负责填写。填写前请先查阅有关河南省基础与前沿技术研究项目申请办法及规定。申请书各项内容,要逐条认真填写,表达要明确、严谨,实事求是。外来语要同时用原文和中文表达,第一次出现的缩写词,须注明全称。 2.封面右上角“申报号”按照河南省科学技术厅分配的编号填写;“项目类别”栏由申请者填写, 申请项目属基础研究的此栏为“A”,属应用基础研究的为“B”,属前沿技术的为“C”。“学科名称”及“学科代码”请根据申报项目所属学科,按最新国家标准“学科分类与代码表”,填至三级学科分支。 3.基础研究是指以认识自然现象、探索自然规律为目的,不直接考虑应用目标的研究活动;应用基础研究是指有广泛应用前景,但以获取新原理、新知识、新方法为主要目的的研究;前沿技术是指有产业化前景以获取具有当代国际国内前沿的新工艺、新技术、新方法为主要目的研究;“项目名称”应确切反映研究内容和范围,最多不超过25个汉字(包括标点符号);“申请者”是指申请项目实际主持人。 4.在读(含在职)研究生和申请单位的兼职科研人员不得作为申请者提出申请,但可作为项目组成员参加研究。 5.申请者和项目组中具有高级专业技术职务的主要成员申请(含参加)的项目数,连同在研的省级以上基础研究项目数, 不得超过两项。同一项目组研究内容相近的项目,只允许报送一个项目。 6.不具有副高以上专业技术职务或硕士以上学位的申请者,须有两名具有正高专业技术职务的同行专家推荐。

语义网络及其应用实例浅析

语义网络及其应用实例浅析 姓名:景飞 班级: 3005 学号:3113003029

摘要:本文从语义网络的实际应用为出发点,首先介绍了语义网络的基本概念,随后通过列举如何将其应用于家族人物关系的模型建立以及如何建立基于特征的零件知识语义网络两个实例,来说明语义网络在实际建模中的实用性和便捷性,以此说明语义网络的未来发展空间还很大,将其用在智能控制领域会有很大成果。 关键词:语义网络家族人物关系模型零件知识语义网络 一、语义网络概述 语义网络是一种出现较早的知识表达形式,并在人工智能中得到了比较广泛的应用。语义网络最早是1968年奎廉(Quillian)在他的博士论文中作为人类联想记忆的一个显式心理学模型提出的,认为记忆是由概念间的联系来是实现的,他主张处理问句时,将语义放在首位。当时的语义网络主要应用于自然语言理解系统中,表示事物之间的关系。由于其强大和直观的表示能力,不久就广泛应用于人工智能研究和应用开发的许多领域。1972年,西蒙正式提出语义网络的概念,讨论了它和一阶谓词的关系,并将语义网络应用到了自然语言理解的研究中。 语义网络采用网络形式表示人类的知识,其表示由词法部分、结构部分、过程部分和语义部分四部分组成。 一个语义网络是一个带标示的有向图。其中,带有标识的结点表示问题领域中的物体、概念、时间、动作或者态势。在语义网络知识表示中,结点一般划分为实例节点和类节点两种类型。结点之间带有标识的有向弧标识结点之间的语义联系,是语义网络组织知识的关键。 因为语义网络表示知识的实质是知识的图解表示,所以这种表示法容易把各种事物有机地联系起来,它特别适于表达关系知识。语义网络通过对于个体间的联系追溯到有关个体的节点,实现对知识的直接存取,能比较正确地反映人类对客观事物的本质认识。应用语义网络使得知识表示更为直观,便于理解。 语义网络的特征包括以下几个方面的内容: 1、重要的相关性得意明确表示; 2、相关事实可以从直接相连的节点推导出来,不必遍历整个庞大的知识库; 3、能够利用“IS-A”和“Subset”链在语义网络中建立属性继承的层次关 系; 4、易于对继承的属性进行演绎; 5、能够利用少量基本概念的几号建立状态和动作的描述。 下面我们通过一些简单语义网络表示知识的实例来进一步说明如何通过语义网络来实现知识的表示。

语义网的发展及其可用工具

语义网的发展及其可用工具 语义网的发展及其可用工具 美国阿尔法股权管理公司(Alpha Eqiuty Mangement)高级国际资产分析师Vince Fioramonti在2001年突然意识到,由于有价值的投资信息在网络上将会越来越多,今后越来越多的厂商将根据信息的重要性和关联性提供可搜集和解译这些信息的软件。语义网络将成为企业发展的利器Fioramonti称:“我曾经拥有一支专门为公司搜集和分析金融信息的分析团队。不过,他们的处理速度极为缓慢,得出的结论往往也过于主观,甚至有时会前后矛盾。”第二年,Fioramonti改用Autonomy集团的语义平台——智能数据操作层(IDOL)来自动处理各种形式的数字化信息。他们在部署中遇到了一个障碍:IDOL仅提供了常用的语义算法。Fioramonti称,阿尔法股权管理公司为此不得不组建了一个由程序员和金融分析师组成的团队,专门研发适用于金融学的算法和元数据。由于耗资过于巨大,公司最后放弃了这一项目。阿尔法股权管理公司在2008年迎来了新的契机,当时他们参加了汤森路透的机器可读新闻(Machine Readable News)服务。该服务可从3000多名路透社记者,以及网络报纸和博客等第三方资源那里收集、分析网络新闻。然后,根据影响力(如果公众对公司或产品的印象)、关联性和新颖性,对这些材料进行分析和评分。这些结果会源源不断的提供给客户,包括公共关系和营销人员、使用自动化“黑匣子交易(black

box trading)”系统的股票交易商、为长期投资决策收集整理数据的基金经理。Fioramonti称该服务每月收费并不便宜。据估计,实时数据更新的成本每月在15000至50000英镑之间。不过,对于阿尔法股权管理公司来说,该服务确实物有所值。他称,这些信息不仅帮助提升了公司的资产收益,还帮助公司击败了许多竞争对手。阿尔法股权管理公司的经历并不是唯一的案例。无论公司决定建造一个类似的内部系统,还是决定雇用服务提供商,通常都要花费巨资才能利用语义网技术。如果所搜索和分析的信息包括有针对特定商业领域的行话、概念和缩略语信息,那么同样可以实现。以下我们将为大家介绍一下那些能够帮助进行商业部署和利用语义网基础的工具,以及要想发挥这一技术的潜能还需要哪些东西。关键标准根据Tim Berners-Lee提出的概念,语义网的核心是联合搜索(Federated Search)。其可搜索引擎、代理或应用询问网络上成千上万个信息源,发现并在语义上分析相关内容,准确检索用户寻找的产品、答案或信息。尽管联合搜索正逐渐流行起来,特别是出现在了Windows7上,但是要在整个网络上广泛普及还有很长的路要走。为了有效的推动联合搜索,万维网联盟(W3C)制定了几个关键标准,定义了基本的语义基础设施。它们包括:•简单协议与RDF查询语言(SPARQL),其定义了用于查询和访问数据的标准语言。•资源描述框架(RDF)和RDF模式(RDFS),其规范了在语义本体(又称为词汇表)中如何陈述和组织信息。•网络本体语言(OWL),其对本体论和部分RDFS原理进行了详细陈述。目前这些标准的最终

知识表示-语义网络

知识表示Knowledge representation

什么是知识? ?①Feigenbaum:知识是经过消减、塑造、解释、选择和转换的信息。 ?知识是信息进行加工、整理、解释、挑选和改造,形成对客观世界的规律性认识。 ?知识是对信息进行智能性加工所形成的对客观世界规律性的认识。(什么是信息) ?②有关信息关联在一起所形成的信息结构称为知识(如果……,则……) ?③Hayes-roth:知识=事实十信念十启发式。 ?④Bernstein:知识是由特定领域的描述、关系和过程组成的。

知识的属性和分类?知识的属性 (1)真假性与相对性 (2)不完备性、不确定性与模糊性(3)矛盾性和相容性() (4)可表示性与可利用性 ?知识的分类 ?表示的问题

什么是知识表示 ?所谓知识表示实际上就是对知识的一种描述,即用一些约定的符号把知识编码成一组计算机可以接受的数据结构。所谓知识表示过程就是把知识编码成某种数据结构的过程。一般来说,同一知识可以有多种不同的表示形式,而不同表示形式所产生的效果又可能不一样。?知识表示是研究用机器表示知识的可行性、有效性的一般方法,是一种数据结构与控制结构的统一体,既要考虑知识的存储又考虑知识的使用。

总之,人工智能问题的求解是以知识表示 为基础的。如何将已获得的有关知识以计 算机内部代码形式加以合理地描述、存储、有效地利用便是表示应解决的问题。 ? 挑战:在AI系统中,给出一个清晰简洁的描 述是很困难的。有研究报道认为。严格地说AI 对知识表示的认真、系统的研究才刚刚开始。

知识和知识表示 知识原则 里南(D.B.Lenat)和费根鲍姆(E.A.Feigenbaum)于IJCAI-10(第十届国际人工智能会议)提出了所谓的知识原则:

语义网知识表示综述

机械故障诊断专家系统中的语义网知识表示 一、知识表示概述 知识表示就是知识的形式化和符号化的过程。知识表示就是为了计算机能够识别客观知识而对客观世界知识所做的一组约定,是知识的符号化过程。知识表示主要是选择合适的形式表达知识,寻找知识与知识表达之间的映射,其目的是在利用计算机方便的表示、存储处理和利用人类的知识。简单地说,知识表示就是应用程序对现实世界的建模;严格地说,知识表示的研究范围应该是知识表示方法,研究什么样的描述方式最有利于程序的自动处理和自动推理。 知识表示是指将知识符号化并输入到计算机的过程和方法。知识表示是关于各种数据结构及其解释过程的结合,知识表示方法研究各种数据结构的设计,以及把一个问题领域的各种知识通过这些数据结构结合到计算机系统的程序设计过程。对于同一种知识可以采用不同的表示方法。知识表示的目的不仅仅是要解决知识在计算机中的存储问题,更重要的是要使这种表示能够方便地运用知识和管理知识。知识表示的好坏,对知识处理的效率和应用范围,对知识的获取都有直接的影响。 到目前为止,许多专家学者在把知识表示和知识运用结合起来研究的过程中,提出了许多知识表示方法,如产生式规则表示法、框架表示法、一阶谓词逻辑表示法、语义网络表示法、面向对象表示法等,这些表示法各适用于表示某种类型的知识。由于各种知识表示方法的侧重点各不相同,在知识表示和知识运用的过程中各有优缺点。 二、语义网知识表示国内外研究现状 西蒙于1970年首先提出了语义网络概念,并于1972年把语义网络表示法应用到语言理论系统中。语义网最早是为了解决web网页单调枯燥、搜索引擎智能化低等问题而提出来的。最早把语义网络作为一种知识表示的工具是奎林(J.R.Quilian),在其1968年的博士论文首次提出了语义网络知识表示方法。 语义网络实质上就是通过概念及其语义关系来表达知识的一种网络图。 语义网络的BNF描述: <语义网络>::=<基本网元>|Merge(<基本网元>,…) <基本网元>::=<结点><语义联系><结点> <结点>::=(<属性一值对>,…)<属性值对>::=<属性名>:<属性值>

第二代互联网和语义网

第二代互联网和语义网 营销C112班:陈昕 王帅 李旭哲 张汉忠

摘要:随着时间的推移,第二代互联网不断取得新的发展,并且国家以教育科研领域为先展开了IPV6("Internet Protocol Version 6"的缩写,也被称作下一代互联网协议)试商用的序幕,基于IPV6技术的第二代互联网正在渐渐的融入到教育应用当中,为教育信息化注入新的血液。与此同时,语义网也逐渐进入了人们的生活和视野。本文介绍了互联网的特点,以及第二代互联网对教育信息化等的影响,和语义网的一些介绍。 关键词:第二代互联网教育信息化语义网

目录 一、第二代互联网 (4) 1. 背景 (4) 2.第二代互联网简介 (4) 3.第二代互联网特点 (4) 4.互联网对教育信息化的影响 (4) (一)IPv6对远程教育的影响 (5) (二)IPv6对高校校园网的影响 (5) 5.第二代互联网在各方面的影响以及作用 (5) 二、语义网 (6) 1. 背景及概念 (6) 2. 基本特征 (7) 3. 语义网与万维网的区别 (7) 4. 语义网的前景 (9) 5. 语义网的研究趋势 (9)

一、第二代互联网 1.背景 早在1996年,美国政府就宣布启动“第二代互联网”的研究计划。而我国在2005年就已完成第二代互联网示范工程,并初步实现产业化,不断推动第二代互联网科技和产业取得新的发展。并且,国家以教育科研领域为先锋,拉开了IPV6试商用的发展。明确要求要在2010年底实现百所高校校园网向第二代互联网IPV6升级改造。 2.第二代互联网简介 第二代互联网是以IPv6为基础。可以实现“户对户”连接的网络。有庞大的地址数量是它的特点,对以后的网速和网络安全有重大影响。 据国外媒体报道,Internet2协会计划建立一套新网络,预计该网络的传输容量将是现有网络的80倍,能够为众多对宽带有较高要求的科学研究提供服务,比如该网络可将分布在的全球望远镜连接在一起共同执行科研任务。 Internet2是指由美国120多所大学、协会、公司和政府机构共同努力建设的网络,它的目的是满足高等教育与科研的需要,开发下一代互联网高级网络应用项目。但在某种程度上,Internet2已经成为全球下一代互联网建设的代表名词。 Internet2网络将全部使用光纤传输,数据可同时通过不同波长的光来传输,总共将使用80种不同的波长。 3.第二代互联网特点 ①更大:采用IPv6协议,使下一代互联网具有非常巨大的地址空间,网络规模将更大,接入网络的终端种类和数量更多,网络应用更广泛; ②更快:100M字节/秒以上的端到端高性能通信; ③更安全:可进行网络对象识别、身份认证和访问授权,具有数据加密和完整性,实现一个可信任的网络; ④更及时:提供组播服务,进行服务质量控制,可开发大规模实时交互应用; ⑤更方便:无处不在的移动和无线通信应用; ⑥更可管理:有序的管理、有效的运营、及时的维护; ⑦更有效:有盈利模式,可创造重大社会效益和经济效益。 4.互联网对教育信息化的影响 随着社会的不断数字化和信息化,第二代互联网的应运而生对教育领域的影响不言而喻。IPv6作为第二代互联网的核心技术,它的产生、发展及在教育中的应用将对教育信息化产生巨大影响。而第二代胡两位的IP地址的总数用一个最长使用的比喻来说,这些地址哪怕给地球上每颗沙粒都分配一个,都还有剩余。这表示着在第二代互联网环境里每一个老师在网上永远一个完全独立的网络教

相关主题
文本预览
相关文档 最新文档