Research on Web Service Discovery with
- 格式:pdf
- 大小:185.51 KB
- 文档页数:6
程炜面向W e b服务的业务流程管理系统的研究和实现Standardization of sany group #QS8QHH-HHGX8Q8-GNHHJ8-HHMHGN#分类号_______ 密级_______ U D C _______硕士学位论文面向Web服务的业务流程管理系统的研究和实现学位申请人:程炜学科专业:通信与信息系统指导教师:杨宗凯教授论文答辩日期 2003年5月10日学位授予日期答辩委员会主席刘文予评阅人刘文予谭运猛A Thesis Submitted in Partial Fulfillment of the Requirementsfor the Degree of Master of EngineeringResearch and Implementation of Web Service-Oriented BusinessProcess Management SystemCandidate: Cheng WeiMajor: Communication & Information SystemSupervisor : Prof. Yang ZongkaiHuanghzong University of Science & technologyMay 2003摘要近几年,随着电子商务的深入发展,对企业信息化程度提出了更高的要求,如何利用现代网络技术来帮助企业管理各类业务流程,实现业务流程自动化已成为企业关注的热点。
所谓业务流程(Business Process,BP),是指为了在一定时期内达到特定的商业目标,而按照各种商业规则连接起来的业务功能的集合。
这些业务功能是抽象定义的:业务功能的具体实现受限于业务功能运行所需的可用资源。
业务功能的构成由商业目标决定。
业务流程中商业规则的目的是为了业务管理决策的实现。
而业务流程管理(Business Process Management,BPM)是理解、系统化、自动化以及改进公司业务运作方式的一门艺术,它可以看作是文档工作流和企业应用集成的紧密结合。
新版新视野大学英语读写教程第四册unit1答案<DIV&NBSP;&NBSP;CLASS=MSONORMAL>Unit One<DIV&NBSP;&NBSP;CLASS=MSONORMAL>III.1. idle2. justify3. discount4. distinct5. minute6.accused7. object8. contaminate9. sustain 10. worshipIV.1. accusing... of2. end up3. came upon4. at her worst5. pay for6. run a risk of7. participate in8. other than9. object to/objected 10. at bestV1. K2. G3. C4. E5. N6.O7.I8. L9. A 10. DCollocationVI.1. delay2. pain3. hardship4. suffering5. fever6. defeat7. poverty8. treatment9. noise 10. agonyWord buildingVII.1. justify2. glorify3. exemplifies4. classified5. purified6. intensify7. identify8. terrifiedVIII.1. bravery2. jewelry3. delivery4. machinery5. robbery6. nursery7. scenery8. discoverySentence StructureIX.1. other than for funerals and weddings2. other than to live an independent life3. other than that they appealed to his eye . . `4. but other than that, he'll eat just about everything .5. other than that it's somewhere in the town centerX.1. shouldn't have been to the cinema last night2. would have; told him the answer3. they needn't have gone at all4. must have had too much work to do5. might have been injured seriouslyTranslationXI. -1. The plant does not grow well in soils other than the one in which it has been developed.2. Research findings show that we spend about two hours dreaming every night, no matter what wemay have done during the day.3.Some people tend to justify their failure by blaming others for not trying their best.4.We remain tree to our commitment: Whatever we promised to do; we would do it.5.Even Beethoven's father discounted the possibility that his son would one day become the greatest musician in the world. The same is true of Edison, who seemed to his teacher to be quite dull.6. They were accused by authorities of threatening the state security.XII.l.出入除自己家以外的任何场所时,如果你带有宠物,一定要了解有关宠物的规定。
信息管理专业英语试题及答案一、写出以下单词的中文意思(每小题0.5分,共10分)1 algorithm 11 object-oriented2 optimization 12 subsystem3 transportation 13 analogous4 dissemination 14 authorization5 evaluate 15 collection6 reliability 16 database7 verification 17 distributing8 computerize 18 payment9 practical 19 warehouse10 manipulation 20 agility二、根据给出的中文意思,写出单词(每小题0.5分,共10分)1 n.组件,群件11 adj.分布式的2 n.反馈,反应12 n.构造,配置3 n.增殖,扩散13 vt.维持,维修4 n.扩散,传播14 vt.使标准化5 n.全球化,全球性15 n.识别,鉴定6 adj.多国的,跨国的16 adj.可接受的7 vt.分配,指派17 adj.兼容的8 adj.交互式的18 n.防火墙9 n.变化,转化,转换19 n.基本设施10 n.应用,应用程序20 vt.& vi.使同步三、根据给出的短语,写出中文意思(每小题1分,共10分)1 application program interface2 back up3 bar code4 base upon5 Business Intelligence6 c loud computing7 commercial service provider8 c ustomer churn9 customer relations management10 data independence四、根据给出的中文意思,写出短语(每小题1分,共10分)1 数据挖掘2 数据转换3 数据仓库4 脏数据,废数据5 外部存储设备6 遗传算法7 网格计算8 投资决策9 知识发现10 最小冗余五、写出以下缩略语的完整形式和中文意思(每小题1分,共10分)缩略语完整形式中文意思1 B2B2 B2C3 CRM4 DBMS5 ERP6 GIS7 HTTP8 IS9 SCM10 SQL六、把以下句子翻译为中文(每小题1.5分,共15分)1) Do you have any idea how to promote the sales of this product?2) Peter is confident of winning the post as the assistant to the managing director.3) There must be fair play whatever the competition is.4) She showed strong leadership during her first term in office.5) If you have any requirements, ask me.6) The managing director's only concern was how to improve the quality of their products.7) The cost of consumption articles is the first consideration, as far as most ordinary people are concerned.8) Appreciation of works of art is bound to be dominated by a particular kind of interest.9) We sought an answer to the question, but couldn't find one.10) The program was implemented with great efficiency and speed.七、把以下句子翻译为英文(每小题1.5分,共15分)1) 没有好的管理,生意是不可能做好的。
2024年重庆一中高2025届高三11月期中考试英语试题卷注意事项:1. 答卷前,考生务必将自己的姓名、准考证号码填写在答题卡上。
2. 作答时,务必将答案写在答题卡上。
写在本试卷及草稿纸上无效。
3. 考试结束后,将答题卡交回。
第一部分听力 (共两节,满分30分)第一节 (共5小题; 每小题1.5分, 满分7.5分)听下面5段对话。
每段对话后有一个小题,从题中所给的A、B、C三个选项中选出最佳选项,并标在试卷的相应位置。
听完每段对话后,你都有10秒钟的时间来回答有关小题和阅读下一小题。
每段对话仅读一遍。
1. Why won't the boy go on the ride?A. He is too short for it.B. He doesn't like it.C. He has to wait for his parents.2. What happened to the woman?A. She borrowed a wrong book.B. She bought a book at a high price.C. She was fined for not returning the book on time.3. What will the man most likely do this weekend?A. Buy a new tank.B. Visit the woman again.C. Get some goldfish delivered.4. What is the main topic of the conversation?A. A way to lose weight.B. The benefit of exercise.C. A workout plan.5. How much had the man borrowed before?A. $50.B. $100.C. $150.第二节 (共15小题; 每小题1.5分, 满分22.5分)听下面5段对话或独白。
081202计算机软件与理论计算机软件与理论专业主要培养具有良好科学作风、坚实理论基础和专业知识、能够独立从事计算机软件和系统的研究、开发与应用的高级专门人才。
硕士学位获得者应深入了解计算机软件理论和软件工程的研究现状和发展趋势,具有计算机软件与网络方面的理论基础和研究能力,能够研究和解决与本学科有关的科学和技术问题,掌握和精通若干软件编程工具和开发环境,熟练掌握一门外国语。
本学科近年培养和引进了一批高层次的人才,已建成了一支职称、学历、年龄结构合理,学术研究活跃,充满活力的师资队伍。
所从事的研究处在本学科前沿,取得了一批高水平的成果,在本学科领域的重要期刊上发表了一批有影响的论文,形成了有自己特色的研究方向,我校主要研究方向有:模式识别与数据挖掘、计算机图形图像处理、Web与服务计算、生物信息等。
附一:主要研究方向情况1、模式识别与数据挖掘随着电子技术和网络技术的飞速发展,各行各业积累大量的数据,如生物信息、商业数据、金融保险等行业。
于是大数据时代应运而生,如何利用这些海量、超高维和存在噪声的海量数据,挖掘到有用的数据,并提炼出知识是一个应用非常广泛的领域,是当前大数据时代面临的一个具有挑战性的问题。
主要研究领域包括:各种复杂生物数据、金融数据、和网络数据的分析方法和挖掘技术、基于机器学习方法的医疗诊断设计、开发和应用;基于网络的分析方法及应用、博弈论的应用与研究等。
2. 计算机图形图像处理计算机图形图像处理技术渗透在我们生活的方方面面,可以利用电子表格轻松地创建能够看到数据的图表,可以通过各种动画、游戏、特殊效果来增强娱乐性,还可以借助图形和图像更直观地进行视觉交流和理解科学原理等。
主要研究方向包括三维数字娱乐与交互技术、图像编辑与特效技术、科学计算可视化、高性能并行计算等,研发三维游戏、立体电影、平面设计、可视化分析等方面的软件系统,服务地方产业应用。
如今,计算机显卡核心(GPU)已经成为CPU的并行协处理器,其计算能力越来越强大,使以往许多计算机图形图像处理系统和算法成为实时自然交互的高效并行方法,不断地改变人们与计算机交流的方式。
收稿日期:2012-05-11;修回日期:2012-07-16基金项目:国家“973”计划资助项目(2009CB3020402);江苏省自然科学基金资助项目(BK2010103)作者简介:王珏(1987-),男,黑龙江哈尔滨人,助理工程师,主要研究方向为Web 服务匹配(415638921@qq.com );向朝参(1987-),男,博士研究生,主要研究方向为SOA 架构、无线传感网;王萌(1983-),男,助教,主要研究方向为Web 服务发现;田畅(1963-),男,教授,博导,主要研究方向为网络信息系统、无线分组网;赵文栋(1972-),男,副教授,主要研究方向为SOA 体系架构、计算机网络;代登坡(1986-),男,助理工程师,主要研究方向为Web 服务发现.语义Web 服务发现研究现状与发展*王珏1,向朝参1,王萌1,田畅1,赵文栋1,代登坡2(1.解放军理工大学通信工程学院,南京210007;2.中国人民解放军78083部队,成都610011)摘要:从不同方面对语义Web 服务发现相关技术进行综述。
阐述了语义Web 服务的基本概念和特点,总结和分析了语义服务发现的基础———语义服务描述语言,重点从单服务匹配和服务组合匹配两个方面对语义Web 服务发现的关键问题———语义Web 服务匹配近年来的研究,进行了全面的总结和讨论,并指出了语义Web 服务发现研究领域的挑战和未来的研究方向。
关键词:面向服务架构;语义Web 服务;服务发现;服务组合;服务匹配中图分类号:TP393文献标志码:A文章编号:1001-3695(2013)01-0007-06doi :10.3969/j.issn.1001-3695.2013.01.002Survey on semantic Web services discoveryWANG Jue 1,XIANG Chao-can 1,WANG Meng 1,TIAN Chang 1,ZHAO Wen-dong 1,DAI Deng-po 2(1.Institute of Communication Engineering ,PLA University of Science &Technology ,Nanjing 210007,China ;2.78083PLA Troops ,Chengdu 610011,China )Abstract :This paper presented the state-of-the-art of semantic Web service from various aspects.First of all ,this paper ex-plained the basic concepts and character of semantic Web service.And then ,it summarized and analyzed several kinds of ser-vice description languages which were the base of service discovery.Moreover ,it comprehensively summarized and discussed service matching as the key problem of service discovery from two aspects —single service matching and service compositionmatching.In the end ,this paper proposed the challenge and the future work of the study on semantic Web service discovery.Key words :SOA (service-oriented architecture );semantic Web service ;service discovery ;service composition ;servicematching0引言信息技术的发展带动了各类应用系统的不断涌现,在为用户带来更多便利的同时,也给IT 系统的设计和开发带来了新的挑战。
Research on Web Service Discovery with Semantics and ClusteringTao Wen, Guojun Sheng,Yingqiu Li College of Information Science and Engineering Northeastern University,NEUShenYang,Chinae-mail: shengguojun@Quan GuoDepartment of Computer Science and Technology DaLian Neusoft Institute of InformationDaLian,ChinaAbstract—In tradition al Web service discovery methods, the search results from service registries are inadequate due to the lack of seman tic description s of the Web services. Moreover, with the heavily in creasin g amoun t of Web services, these methods usually cause low efficien cy an d poor performan ce. This paper presents a Web service discovery method based on semantics and clustering. The experiment result indicates that the algorithm proposed is more better and efficient than that of full scan search method.Keywords-component;Web service discovery˗ontology˗semantic Web service˗service clustering˗K-medoids algorithmI.I NTRODUCTIONThe increasing usage of Web services has brought the research community great interest in the area of service discovery[1,2].In traditional Web service discovery methods, the search results from service registries are inadequate due to the lack of semantic descriptions of the Web services. Moreover, with the increasing amount of Web services, these methods usually cause low efficiency and poor performance. In this paper, We present a Web service discovery method based on semantics and clustering, which utilizes the semantic representation of services to group similar Web services so as to improve the service discovery efficiency. Our method is three-fold. First, similarities among services are computed by using the semantic information of service elements, then a modified K-medoids method is used to cluster service set. Lastly, the semantics-clustering-based web service discovery algorithm is presented. The experiment result indicates that the algorithm proposed is more better and efficient than that of full scan search.II.B ASIC C ONCEPTSAn ontology is an explicit specification of a conceptualization. The term is borrowed from philosophy, in which it refers to the subject of existence. People often use ontology to represent semantic information via the relationships among concepts. In practical modeling work, A labeled tree structure is used to express ontology, and the line between concept nodes stands for a semantic relation. An example of ontology is shown as figure 1,where the connecting line from concept "Mammal" to "Animal" carries an inheritance relation.Definition I.An ontology is a 6-tuple::,,,,,C R CO C R A A H X!,where C and R are two disjoint sets whose elements are called concepts and relation identi¿HUV respectively,CA is the attributes set of concepts in C, and RA represents the attributes set of relations in R,CH is a directed, transitive relation setCH C Cu, this concept hierarchy is also called concepttaxonomy.12(,)CH c c means that1c is a sub-concept of2c. X indicates the axiom set in which each axiom forms a constraint to elements in CA or R A.Definition II.A Semantic Web Service(SWS) is described with OWL-S, it can be formalized as a5-tuple::,,,,SWS Name Description Inputs Outputs O!Where,Name refers to the name of the service,it can be used as an identifier of the service. Description provides a brief syntactic text for describing the service.O is a domain ontology from which the service references some relativeconcepts ,12{,...}mInputs in in in is the set of service inputparameters and eachiin in Inputs is annotated with a domain concept in ontology O,12{,...}nOutputs out out out isthe set of service output parameters, and eachjout in Outputs is also annotated with a domain concept of ontology O .Figure 1. A fragment of an animal ontologyIII.S IMILARITY M EASUREMENT OF S EMANTIC W EBS ERVICESTextual elements of a SWS such as the name and the___________________________________ 978-1-4244-8625-0/11/$26.00 ©2011 IEEEdescription can be extracted from the corresponding OWL-S file by interpreting the <profile:serviceName>and the <profile:textDescription>tags. In this paper, A WordNet-based similarity measurement method for these textual elements is used. WordNet is a lexical database for the E nglish language,it groups E nglish words into sets of synonyms called synsets, and records the various semantic relations between these synonym sets[3].While for input parameters and output parameters of a SWS, we employ a domain-ontology-based method to measure the similarity between two concepts.A.Service description similarity measurementService description is a small-size human-readable text to depict basic characteristics of a service. Service descriptions may contain rich information for service classification. Here we use a WordNet-based semantic similarity measurement method for service description. Only nominal words are considered in the method, because,compared with other parts of speech such as verbs and adjectives, nouns carry the most valuable information for classification, besides, the specified stopwords are filtered out as well for dimensionality reduction.Algorithm 1. Similarity measurement for service descriptions.Input: WordNet ontology graph:G=<V,E>,service description:i sd ,j sd ,stopwords set : sws={“web”,“service”,”soap”…}Output:(,)sd i j Sim sd sd : The similarity between texti sd and j sd .(1)()i i sd pretreate sd m : to make each two words in isd separated by only one space, and the same treatment to j sd .(2)()i i set split sd m ,()j j set split sd m : split the stringto words by separator space.(3)()i i set filter sd m ,()j j set filter sd m : to ensure thati set and j set contain only noun words and allstopwords in sws are excluded from them.(4)i j tvs set set m , where tvs is the text vector space,thus, tvs is the dimensionality of tvs .(5)Construct two space vectors for i sd ,j sd respectively:12[,,...,]T i i i ni sd w w w JJJ G ,12[,,...,]T j j j nj sd w w w JJJ G,Where,each dimension corresponds to a separate term,ki w and kj w are term weights, n tvs ,[1,]k n .(6)Compute the similarity between i sd and j sd using thefollowing formula:(,)nikjki jsd i j i jww sd sd Sim sd sd sd sd uu ¦JJJ G JJJ G JJJ G JJJ GIt is a measurement of cosine similarity between twovectors i sd JJJ Gand j sd JJJ G based on VSM model by finding thecosine of the angle between them.In step 5, with respect to the computation of term weights ki w and kj w , a term-frequency-based methodcalled TF-IDF is often used. However, it can not reach a better precision due to the lack of semantic considerations, so we employ a WordNet-based semantic method to estimate the term weights instead, as the following algorithm 2.Algorithm 2. Construct space vector and weights:Input: WordNet ontology graph:G=<V,E>,filtered i set of i sd ,tvs .Output: vector i sd JJJ G that contains corresponding term weights.(1)n tvs m ,Initialize i sd JJJ G as an n -length double array(2)for ,[1,]kterm tvs k n do (3)[1,]((,))((,))(,)t i i k t ki term set k t k t t sd d lca term term w Max d lca term term l term term §·m ¨¸ ©¹(4)[]i kisd k w m JJJ G(5)endforfunction (,)lca x y is the lowest common ancestor of term x and y ,function ()d z is the depth of term z , and function (,)l x y returns the path length from term x to y .B.Service name similarity measurementWe believe that abundant useful information are hidden in service names. However, at most cases, naming a service is a manual work, so the concrete text for a given service name may have some particularity and randomicity. For example, the service name “car_pricequality_service”can be easily split by the separator ‘_’, but the substring “pricequality”is a combination of two words which makes it difficult to differentiate whether it’s just one word or two, while the service name “NationalGovernmentScholarship Service”is separated by space and uppercase characters.Therefore, considering this facts, we use an edit distance(namely,Levenshtein Distance) and WordNet based similarity measurement method for service names. Definition III.Service Name Similarity Measurement(,)(,)(,),sn i j edit i j cosine i j Sim s s Sim s s Sim s s D E u u Where,i s and j s are two given service names, the function(,)(,)1i j edit i j i jd s s Sim s s s s , is used to compute the similarity of i s and j s , and i s and j s represent the stringlength of i s and j s respectively,(,)i j d s s is the editdistance of is and j s .(,)cosine i jSim s s is the cosinesimilarity which is described in (,)sd i j Sim sd sd of algorithm 1 ,DandEare two adjustable weight parameters which satisfy the rule:1D E , in this paper,their values are empirically preset as D =0.4, E =0.6.C.Similarity measurement for input/output parameters based on domain ontologyAs portrayed in definition 2, each input/output parameter of a SWS is annotated with a domain ontology concept. The similarity between two ontology concepts relies on their position in the concept taxonomy. In this paper, we take into account both the distance and the depth factors, namely the different information and the common information of two concepts when dealing with their similarity.Definition IV.Ontology Concepts Similarity (OCS)(,)i j OCS c c,Where, i c and j c are two ontology concepts in the concept taxonomy,(,)i j lca c c is a function to compute the Lowest Common Ancestor (LCA) of ,i j c c .((,))i j depth lca c c isused to obtain the depth of the LCA, namely the path length from (,)i j lca c c to the root node of the tree. Thus, theexpression ((,))i jdepth lca c c eO here is employed to represent the common information of i c and j c , while the expression (,)i jdistance c c eO stands for the different information of i c and j c ,O is an adjustable parameter which is preset as :O =0.5.The embedded function (,)i j distance c c returns the length of the shortest path from i c to j c , and it satisfies the rule:(,)()()2((,))i j i j i j distance c c depth c depth c depth lca c c In figure 1, for example, some OCS values are :OCS(Herbivore,Carnivore)=0.4566,OCS(Lion,Cow)=0.2361, OCS(Lion,Wolf)=0.6839.The curved surface of OCS is shown in figure 2.As can be seen in figure 2, the similarity value increases with both the rise of common information and the reduction of different information between two ontology concepts.Algorithm 3. Similarity computation of input parameters :Input: Domain ontology graph : O=<V,E>,i inputs ofi SWS ,j inputs of j SWS .Output:(,)in i j Sim inputs inputs :Similarity betweeni inputs and j inputs .(1)(,)i j m Max inputs inputs m here we presumei jinputs inputs ! (2)Initialize mset as an m -length set ,(3)for ,[1,]k i i c inputs k inputs do(4)(,)t jaddm k t c inputs set Max OCS c c m (5)endfor (6)return 1n m n sim set m sim set §·¨¸¨¸©¹¦Where,k i c inputs and t j c inputs are twocompared concepts with which the corresponding input parameters are annotated. The similarity computation of output parameters (,)out i j Sim outputs outputs takes the same way as algorithm 3.According to the aforementioned contents, we propose a total method to compute the similarity between two services:i SWS and j SWS :41(,)i j r r r Sim SWS SWS Sim Z u ¦, Where,r Z is theweight factor,411rr Z¦.1(,)sn i j Sim Sim s s,Figure 2. The curved surface of OCS2(,)sd i j Sim Sim sd sd ,3(,)in i j Sim Sim inputs inputs ,4(,)out i j Sim Sim outputs outputsIV.SWS C LUSTERING B ASED ON S EMANTIC S IMILARITYDivision-based clustering methods[4,5,6]such as K-means and K-medoids are commonly used to group items into k clusters. However, the clustering result may be widely different due to the randomly selected initial clustering centers. Regarding this problem, we adopt a modified K-medoids method, which obtains the initial clustering centers in a human-guided way, and is preexecuted to improve service discovery performance, as shown in the following algorithm 4:Algorithm 4. Modified K-medoids to cluster SWS set:Input:the original unclustered service set :12{,...}sws n set SWS SWS SWS , the specified numberof clusters k ,a vector set 12{,...}kv k set V V V J G J G JJ G which isprovided by domain experts , where c kv V set J G is akeyword-based vector and [1,]c k .Output:The clustered set cluster set of sws set :12{,...}cluster k set cluster cluster cluster ,and11k ksws c c c c set cluster cluster I §·§· ¨¸¨¸©¹©¹* ,where c clustercluster set (1)Construct a similaritymatrix:()sws ij n n SimMatrix a u ,where , sws n set ,(,)ij i j a Sim SWS SWS is thesimilarityvalue between two services :i sws SWS set and j sws SWS set . This matrix actsas a data center for the following steps. For a large service set, this matrix may be stored at a disk file or a database table.(2)Initialize cluster set as a set, and make it include k zero-size service sets .(3)for each ,[1,]c kv V set c k J Gdo(4),returnsc sws cserviceSet doSearch set V m JJ G: search the original service set swsset with the filter condition c V J Gand return the matched query result scserviceSet .Get the service c SWS , which is the most similar to the vector c V J G , from the returned sc serviceSet ,by using a (5)combination of the edit distance and the cosine similarity to measure the name and description ofc SWS . Take c SWS as the initial th c cluster centroid,add c SWS to the corresponding service set : c cluster .(6)endfor// now we have the cluster set in which each element contains //one SWS as the initial clustering centroid.(7)Repeat(8)for each ,[1,]m sws sws SWS set m set do(9)!=(,)c cluster c m max c m cluster set SWS SWS cluster Max Sim SWS SWS §·¨¸m ¨¸©¹,where c SWS is the centroid of c cluster .(10)Add m SWS to the nearest cluster :maxcluster (11)endfor(12)for each ,[1,]c cluster cluster cluster set c set do(13)Construct a similarity matrix for c cluster :(),c ij d d c SimMatrix a d cluster u ,(14)Get the maximum-average-similarity row numberof c SimMatrix :1,11([][])d c r x row getRow Max SimMatrix r x d §·§·m ¨¸¨¸¨¸©¹©¹¦(15)Get the corresponding row SWS according to this rownumber, and set the row SWS as the new clustering centroid of c cluster .(16)endfor(17)Evaluate the clustering result cluster set using:1kc E ¦ '''E E E ' ,where'E and ''E are respectivelycomputed in the current iteration and the previous iteration.(18)Until (E H ' ) or (cluster membership no longerchanges) ,where R H is a small threshold value.After the clustering process, services in the same clusterhave higher similarity to each other, and those in differentclusters have lower similarity mutually.V.S ERVICE D ISCOVERY A LGORITHMService discovery is a process to match user querycondition with services in the service registry. We first givea formal description of a user service query.Definition V.A Web Service Query(WSQ) is a 5-tuple::,,,,WSQ Name Description Inputs Outputs O !A WSQ may be considered as a virtual SWS which is used to match real services. The tuples in a WSQ fundamentally have the same meaning with those defined in the SWS definition,with the exception that they need notto be actual elements of a real service. For example, The Name here refers to an actual or approximate string for matching real service names.Thus, we can determine whether a real i SWS matchesa user-specified j WSQ according to the following condition :,((,)),((,))i j i j matched if Sim SWS WSQ unmatched if Sim SWS WSQ W H ! °®°¯where W and H are two threshold values,(0,1]R H ,(0,1]R W and W H !.Given the j WSQ ,if a matched i SWS is found , theremay be a high probability of matching other services within the same cluster of i SWS , then a subsequent search for more matched services can be carried out in this cluster.Otherwise, if an unmatched i SWS is found, then other services in the same cluster can be ignored. The service discovery algorithm is shown as follows:Algorithm 5. Web service discovery based on semantics and clustering :Input:the clustered service set:12{,...}cluster k set cluster cluster cluster , a user queryj WSQ , two threshold values W ,H .Output:The matched service set matched set .(1)Initialize matched set as a service set that contains no services.(2)for each ,[1,]c cluster cluster cluster set c set do (3)().tmp c SWS cluster centroid m (4)0,10counts O m m ,(5)((,)(,))tmp j while Sim SWS WSQ H W then(6)1()c if counts cluster O !Break.endif(7)().tmp c SWS cluster randomSWS m : randomlyselect a SWS from c cluster .(8)endwhile(9) ((,))tmp j if Sim SWS WSQ H then(10)continue to the next iteration(to next cluster).(11)endif(12)((,))tmp j if Sim SWS WSQ W ! then(19)for each ,[1,]m c c SWS cluster m cluster do (13)((,))m j if Sim SWS WSQ W ! then(14){}matched matched m set set SWS (15)endif(16)endfor(17)endif(18)endfor(19)return matchedset For an n -size service set which is clustered into k clusters, the algorithm complexity of algorithm 5 is :(1)np n k H W W §·2 ¨¸©¹,where p is a value indicates the probability of clusters falls into the interval (,)H W .VI.S IMULATION E XPERIMENT The experiment environment is set up on a computer withIntel Xeon E5450,3.0 GHZ CPU ˈ2GB memory, Windows Server 2003 Enterprise Edition SP2. Java is adopted as theprogramming language. So far, considering that there are not any standard test collections for the algorithms mentioned,we generate the relative data randomly with the tools ,OWL-S API[7], Protégé[8],JENA API[9]. First we prepare a set of noun words for describing web services, then construct the ontology trees with Protégéand extract concepts from these ontologies using JENA API,finally we randomly create 6500 OWL-S files(may be takenas virtual services) for test, ensuring that different groups of these services come from different domain ontologies inorder to obtain better clustering quality. Before clustering,all similarities among services are computed and stored to afile(or DB table), they are loaded into a matrix in thememory while used.To measure the clustering effectiveness and theperformance of algorithm 5, we propose the following definitions:Definition VI.Time-Saved Percentage(TSP) :121:()/TSP t t t ,Where, 1t is the time spent on searching the wholeservice set (full-scan) without clustering,and 2t is the timespent on searching the clusters(clustering-scan) for thematched service set. 0TSP !means that clustering-scan is quicker than full-scan.Definition VII.Recall Ratio(RECALL) :21:/RECALL set set , Where, 1set is the matchedservice set by full-scan, and 2set is the matched service setby clustering-scan.Definition VIII.T otal Performance Index(TPI) ::TPI TSP RECALL u ,The more TPI, the better performance of the algorithm. Thus, the following two experiments are done for this paper.As shown in figure 3, with the increase of the number of clusters,on the whole, the TPI value of the algorithm keeps stable. TPI>0 means that the performance of algorithm 5 is more better and efficient than the method of full-scan.In figure 4, with the increase of the number of services,the time spent of clustering-scan is lower than that of full-scan.In figure 5, with the increase of the number of services,the TPI value indicates that the performance of algorithm 5 is far better and efficient than the method of full-scan.VII.C ONCLUSIONThis paper proposes a web service discovery method based on semantics and clustering. Firstly, similarities among services must be computed by using the semantic information of their textual descriptions and ontology concepts,then a modified K-medoids method is used to cluster service set. Lastly, the semantics-clustering-based web service discovery algorithm is presented detailedly. The experiment result indicates that the algorithm proposed is more better than that of full scan search.TABLE I. P ARAMETER V ALUES OF E XPERIMENT 1Number of services 6500 fixedNumber of clusters [5,25] dynamically changed W (in algorithm 5)0.83H (in algorithm 5)0.30jWSQ a random service from service set, keeps fixed at each queryTABLE II. P ARAMETER VALUES OF EXPERIMENT 2Numberof services [500,6500] dynamically changed with a step of500Number of clusters 15 fixed W (in algorithm 5)0.83H (in algorithm 5)0.30jWSQ a random service from service set, keeps fixed at each queryFigure 3. The TSP-RECALL-TPI values of experiment 1Figure 4. The time spent of clustering-scan and full-scanFigure 5. The TSP-RECALL-TPI values of experiment 2R EFERENCES[1]Benatallah, B., Hacid, M., Leger, Alain., Rey, C.,& Toumani, F. (2005) On Automating Web service Discovery. VLDB Journal. 14 (1):84-96[2]Colgrave, J., Akkiraju, R., & Goodwin, R. (2004). External Matchingin UDDI . ICWS'04[3]G. A. Miller, R. Beckwith, C. D. Fellbaum, D. Gross, K. Miller. 1990.WordNet: An online lexical database. Int. J. Lexicograph. 3, 4, pp. 235-244.[4]J.MacQueen. Some methods for classi ¿FDWLRQ DQG analysis ofmultivariate observations. Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability,volume 1,pages 281–297. 1967.[5]Kaufman L 㸪Rousseeuw P J 㸬Finding Groups in Data 㸸anIntroduction to Cluster Analysis 㸬John Wiley 8L Sons 㸪1990[6]Ng R 㸪Han J 㸬E fficient and effective clustering method for spatialdata mining 㸬In 㸸Proc 㸬1994 VLDB 㸬1994㸸144㹼155[7]E vren Sirin.Mindswap OWL-S Java API[E B/OL]./2004/owl-s/api/,Last visited on 2011-01-12.[8]The Stanford Center for Biomedical Informatics Research .Protegeuser document[EB/OL],/doc/users.html, .[9]Brian McBride.Jena-a semantic web framework forJava[EB/OL]..。