General Terms
- 格式:pdf
- 大小:83.67 KB
- 文档页数:7
A Tagging Approach for Bundling AnnotationsYamin Htun, Joanna McGrenere, Kellogg S. BoothDepartment of Computer Science, University of British Columbia{yhtun, joanna, ksbooth}@cs.ubc.caABSTRACTIn a paper presented at CHI 2006 we introduced structured annotations, called bundles, to support co-authors in the edit-review-comment document lifecycle, and we reported a study showing that bundles facilitate workflow by improving reviewing accuracy and efficiency. Bundles are a “top down” way to organize annotations. We demonstrate an enhanced prototype that also supports “bottom up” organization using tagging techniques, new automated bundle creation options, and the reviewing features and manual bundle creation present in the first prototype. Categories and Subject DescriptorsH.5.3. [Information interfaces and Presentation, HCI] Group and Organizational Interfaces – Asynchronous interaction, Computer-supported collaborative workGeneral TermsDesign, human factorsKeywordsAsynchronous collaboration, collaborative writing, tagging, structured annotationEXTENDED ABSTRACTAsynchronous collaborative writing is common, and annotations play an important role as a central communication medium connecting co-authors with evolving artifacts in the process [7]. However, the lack of support for rich annotations in most word processing systems often forces valuable communication to happen outside the shared document in the bodies of emails, to which the document is an attachment. These messages are separate from the document, making the establishment of a shared reference for discussion difficult [2].Co-authors often copy and paste referenced content of the document into email or type explicit navigation statements such as “Clarify my questions on the third and last paragraphs,” which can be time consuming and error-prone. Significant overhead is required to reconstruct the context of the communication [4]: workflow requires navigating between email messages and the document itself [4] and information is likely to be lost or ignored [1]. At best, in order to keep track of the workflow and progress in the task, collaborators need to maintain not only document files but also the email messages [8]. Information overload and workflow inefficiencies can result with increasing numbers of annotations after only a few reviewing cycles.To facilitate the workflow management involved in collaborative writing, we previously identified user-centered requirements for annotation support and developed a comprehensive model of annotations [8] in which each annotation has a set of attributes such as the creator of the annotation, a timestamp, reviewing status(read/unread and accepted/rejected), and one or more anchors to material in the document. Annotations can have optional attributes such as a list of recipients, a comment, replacements for the anchored material, a name, and substructure.A bundled annotation (or bundle)represents a structured group of annotations with various anchors into the document. There are no restrictions on structuring annotations other than that they be acyclic; an annotation can be associated with more than one bundle. Changes in an annotation’s status will be automatically synchronized across different bundles to which it belongs.We previously described a user study that investigated the effect of structured annotations on reviewing workload and quality [8]. Participants were asked to review a set of annotations with a Simple Editor containing only basic annotations (edits and comments) with high-level communication taking place in a separate email message window, and with a Bundle Editor in which annotations are structured into bundles with high-level communication integrated as generalized annotations. Participants performed faster and more accurately with the Bundle Editor and they found bundles innovative and intuitive. We did not investigate the usability and consequences of bundles in the annotation-creation stage. We are now examining this.In our model, bundles can be created in four ways: (1) manually, (2) automatically, (3) as a result of filtering operations and queries, and (4) as a result of editing commands. While annotating the document, co-authors manually create bundles by explicitly selecting and grouping annotations into bundles. At the end of each reviewing session, a bundle is created automatically with all the new annotations made during the session. Every time a user filters the annotations based on specified attributes, a temporary bundle is created, which can be saved as a permanent bundle with a single click. Moreover, when a user performs normal editing commands such as “Find/Replace” or “Spell Check”, a bundle will be created with all the edits from the command gathered into sub-bundles such as “replaced,” “skipped,” and “ignored”. Although automatic bundle creation does not require extra effort from reviewers, we doubt that automation can fully capture the richness and complexity of the annotations used in discussions. Hence, our goal is to minimize the effort required by reviewers when manually creating bundles and managing annotations. While exploring different approaches we were inspired by recent successes with tagging, in which users assign meta-data or keywords to information resources. Traditionally meta-data is created by professionals (catalogers or authors) [5], but systemsPermission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.CSCW 2006, November 4-8, 2006, Banff, Alberta, Canada.Copyright 2006 ACM …$5.00.like flickr and delicious allow ordinary users to describe and organize content with any vocabulary they choose. Tagging facilitates the organization of information within personal or shared information spaces. Browsing and searching tags attached to information resources by other users encourage collaboration. Compared to traditional folder-based hierarchical information management models, collaborative tagging is believed to reduce the cognitive workload experienced by users [6]. A major drawback for tagging is the ambiguity and imprecision of tags and the lack of control for synonyms and homonyms [3].In our top-down approach, a user associates an annotation with a bundle by manually dragging the annotation into the bundle. When an annotation is in multiple bundles the work increases linearly with the number of associated bundles. Tagging is a bottom-up approach that reduces effort and achieves a more seamless workflow. An annotation can be easily associated with more than one bundle simply by tagging it with appropriate keywords; bundles are created through filtering that recognizes tags as filterable attributes. Because co-authors have their document as a shared context, we believe tags will be consistent and scalable across users, alleviating the ambiguity and imprecision seen in more general contexts while providing flexibility in classifying information into more than one category. Bottom-up tagging captures multiple semantic concepts that are inherent in most information resources through a light-weight and intuitive means of organizing and sharing information in a collaborative setting.The core interface to the “Bundle Editor” prototype consists of a document pane and a reviewing pane(Figure 1). The main component of the document pane is the document editor, which has typical functionality (insert, delete, comment, etc.). The reviewing pane is a multi-tabbed pane with each tab displaying a specific group of annotations. The reviewing pane supports creating new bundles, adding and removing annotations from a specific bundle, and sorting and filtering annotations based on particular attributes.Tagging, which is bottom up, is appropriate for unknown workflows where structure emerges and serendipity needs to be supported. During more precise workflow, top-down structuring through manual or automated bundle creation is likely to be the preferred approach. We will demonstrate both top-down and bottom-up structuring in the Bundle Editor to illustrate the advantages of each. We expect to report results from preliminary studies of how co-authors use these two approaches. The studies will compare ease of use across the two approaches, examine the semantic categories within annotations for a shared document, and investigate the role of bundles in facilitating problem decomposition strategies involved in co-authoring workflow. REFERENCES[1]Cadiz, J., Gupta, A., Grudin, J. (2000). Using webannotations for asynchronous collaboration arounddocuments. ACM CSCW ‘00. pp309-318.[2]Churchill, E., Trevor, J., Bly, S., Nelson, L., and Cubranic,D. (2000). Anchored conversations: chatting in the context ofa document. ACM CHI ’00. pp 454-461.[3]Guy, M., and Tonkin, E. (2006) Folksonomies: Tidying upTags? D-Lib Magazine, 12, 1.[4]Hee-Cheol, K., and Eklundh K. (2001). Reviewing practicesin collaborative writing. In Computer Supported Cooperative Work, 10, 2. pp 247-259.[5]Mathes, A. (2004) Folksonomies – cooperative classificationand communication through shared metadata. http://www.{HYPERLINK "/academic/computer-mediated-communication/folksonomies.html"} (accessed07/2006).[6]Sinha, R. (2005). A cognitive analysis of tagging. In RashmiSinha’s weblog/archives/05_09/ {HYPERLINK"tagging-cognitive.html"}(accessed 07/ 2006).[7]Weng, C., & Gennari, J. (2004). Asynchronous collaborativewriting through annotations. ACM CSCW ’04. pp 578–581. [8]Zheng, Q., Booth, K.S., and McGrenere, J. (2006). Co-authoring with structured annotations. ACM CHI ’06. pp131-140.Figure 1. Bundle Editor with document and reviewing panes.。
General Terms 一般词汇manager 经纪人instructor 教练,技术指导guide 领队trainer 助理教练referee, umpire (网球.棒球)裁判linesman, touch judge (橄榄球)裁判contestant, competitor, player 运动员professional 职业运动员amateur 业余运动员,爱好者enthusiast, fan 迷,爱好者favourite 可望取胜者(美作:favorite) outsider 无取胜希望者championship 冠军赛,锦标赛champion 冠军record 纪录record holder 纪录创造者ace 网球赛中的一分Olympic Games, Olympics 奥林匹克运动会Winter Olympics 冬季奥林匹克运动会Universiade 世界大学生运动会stadium 运动场track 跑道ring 圈ground, field 场地pitch (足球、橄榄球)场地court 网球场team, side 队Football 足球football, soccer, Association football 足球field, pitch 足球场midfied 中场kick-off circle 中圈half-way line 中线football, eleven 足球队football player 足球运动员goalkeeper, goaltender, goalie 守门员back 后卫left 左后卫right back 右后卫centre half back 中卫half back 前卫left half back 左前卫right half back 右前卫forward 前锋centre forward, centre 中锋inside left forward, inside left 左内锋inside right forward, inside right 右内锋outside left forward, outside left 左边锋outside right forward, outside right 右边锋kick-off 开球bicycle kick, overhead kick 倒钩球chest-high ball 平胸球corner ball, corner 角球goal kick 球门球ground ball, grounder 地面球hand ball 手触球header 头球penalty kick 点球spot kick 罚点球free kick 罚任意球throw-in 掷界外球ball handling 控制球block tackle 正面抢截body check 身体阻挡bullt 球门前混战fair charge 合理冲撞chesting 胸部挡球close-marking defence 钉人防守close pass, short pass 短传consecutive passes 连续传球deceptive movement 假动作diving header 鱼跃顶球flying headar 跳起顶球dribbling 盘球finger-tip save (守门员)托救球clean catching (守门员)跳球抓好flank pass 边线传球high lobbing pass 高吊传球scissor pass 交叉传球volley pass 凌空传球triangular pass 三角传球rolling pass, ground pass 滚地传球slide tackle 铲球clearance kick 解除危险的球to shoot 射门grazing shot 贴地射门close-range shot 近射long drive 远射mishit 未射中offside 越位to pass the ball 传球to take a pass 接球spot pass 球传到位to trap 脚底停球to intercept 截球to break through, to beat 带球过人to break loose 摆脱to control the midfield 控制中场to disorganize the defence 破坏防守to fall back 退回to set a wall 筑人墙to set the pace 掌握进攻节奏to ward off an assault 击退一次攻势to break up an attack 破坏一次攻势ball playing skill 控球技术total football 全攻全守足球战术open football 拉开的足球战术off-side trap 越位战术wing play 边锋战术shoot-on-sight tactics 积极的抢射战术time wasting tactics 拖延战术Brazilian formation 巴西阵式,4-2-4 阵式four backs system 四后卫制four-three-three formation 4-3-3 阵式four-two-four formation 4-2-4 阵式red card 红牌(表示判罚出场)yellow card 黄牌(表示警告)Tennis 网球tennis 网球运动lawn tennis 草地网球运动grass court 草地网球场racket 球拍racket press 球拍夹gut, string (球拍的)弦line ball 触线球baseline ball 底线球sideline ball 边线球straight ball 直线球down-the-line shot 边线直线球crosscourt 斜线球high ball, lob 高球low ball 低球long shot 长球short shot 短球cut 削球smash 抽球jump smash 跃起抽球spin 旋转球low drive 抽低球volley 截击空中球low volley 低截球deep ball 深球heavy ball 重球net 落网球flat stroke 平击球flat drive 平抽球let 重发球fluke, set-up, easy 机会球ground stroke 击触地球wide 打出边线的球overhead smash, overhand smash 高球扣杀game 局set 盘fifteen all 一平thirty all 二平forty all 三平deuce 局末平分, 盘末平局love game 一方得零分的一局double fault 双误, 两次发球失误‘not up’, 两跳,还击前球着地两次service line 发球线fore court 前场back court 后场centre mark 中点server 发球员receiver 接球员Athletics 竞技race 跑middle-distance race 中长跑long-distance runner 长跑运动员sprint 短跑(美作:dash)the 400 metre hurdles 400米栏marathon 马拉松decathlon 十项cross-country race 越野跑jump 跳跃jumping 跳跃运动high jump 跳高long jump 跳远(美作:broad jump) triple jump, hop step and jump 三级跳pole vault 撑竿跳throw 投掷throwing 投掷运动putting the shot, shot put 推铅球throwing the discus 掷铁饼throwing the hammer 掷链锤throwing the javelin 掷标枪walk 竞走Individual Sports 体育项目gymnastics 体操gymnastic apparatus 体操器械horizontal bar 单杠parallel bars 双杠rings 吊环trapeze 秋千wall bars 肋木side horse, pommelled horse 鞍马weight-lifting 举重weights 重量级boxing 拳击Greece-Roman wrestling 古典式摔跤hold, lock 揪钮judo 柔道fencing 击剑winter sports 冬季运动skiing 滑雪ski 滑雪板downhill race 速降滑雪赛,滑降slalom 障碍滑雪ski jumping competition 跳高滑雪比赛ski jump 跳高滑雪ice skating 滑冰figure skating 花样滑冰roller skating 滑旱冰bobsleigh, bobsled 雪橇Games and Competitions 球类运动Football 足球football, soccer, Association football 足球field, pitch 足球场midfied 中场kick-off circle 中圈half-way line 中线football, eleven 足球队football player 足球运动员goalkeeper, goaltender, goalie 守门员back 后卫left 左后卫right back 右后卫centre half back 中卫half back 前卫left half back 左前卫right half back 右前卫forward 前锋centre forward, centre 中锋inside left forward, inside left 左内锋inside right forward, inside right 右内锋outside left forward, outside left 左边锋outside right forward, outside right 右边锋kick-off 开球bicycle kick, overhead kick 倒钩球chest-high ball 平胸球corner ball, corner 角球goal kick 球门球ground ball, grounder 地面球hand ball 手触球header 头球penalty kick 点球spot kick 罚点球free kick 罚任意球throw-in 掷界外球ball handling 控制球block tackle 正面抢截body check 身体阻挡bullt 球门前混战fair charge 合理冲撞chesting 胸部挡球close-marking defence 钉人防守close pass, short pass 短传consecutive passes 连续传球deceptive movement 假动作diving header 鱼跃顶球flying headar 跳起顶球dribbling 盘球finger-tip save (守门员)托救球clean catching (守门员)跳球抓好flank pass 边线传球high lobbing pass 高吊传球scissor pass 交叉传球volley pass 凌空传球triangular pass 三角传球rolling pass, ground pass 滚地传球slide tackle 铲球clearance kick 解除危险的球to shoot 射门grazing shot 贴地射门close-range shot 近射long drive 远射mishit 未射中offside 越位to pass the ball 传球to take a pass 接球spot pass 球传到位to trap 脚底停球to intercept 截球to break through, to beat 带球过人to break loose 摆脱to control the midfield 控制中场to disorganize the defence 破坏防守to fall back 退回to set a wall 筑人墙to set the pace 掌握进攻节奏to ward off an assault 击退一次攻势to break up an attack 破坏一次攻势ball playing skill 控球技术total football 全攻全守足球战术open football 拉开的足球战术off-side trap 越位战术wing play 边锋战术shoot-on-sight tactics 积极的抢射战术time wasting tactics 拖延战术Brazilian formation 巴西阵式,4-2-4 阵式four backs system 四后卫制four-three-three formation 4-3-3 阵式four-two-four formation 4-2-4 阵式red card 红牌(表示判罚出场)yellow card 黄牌(表示警告)rugby 橄榄球basketball 篮球volleyball 排球Tennis 网球tennis 网球运动lawn tennis 草地网球运动grass court 草地网球场racket 球拍racket press 球拍夹gut, string (球拍的)弦line ball 触线球baseline ball 底线球sideline ball 边线球straight ball 直线球down-the-line shot 边线直线球crosscourt 斜线球high ball, lob 高球low ball 低球long shot 长球short shot 短球cut 削球smash 抽球jump smash 跃起抽球spin 旋转球low drive 抽低球volley 截击空中球low volley 低截球deep ball 深球heavy ball 重球net 落网球flat stroke 平击球flat drive 平抽球let 重发球fluke, set-up, easy 机会球ground stroke 击触地球wide 打出边线的球overhead smash, overhand smash 高球扣杀game 局set 盘fifteen all 一平thirty all 二平forty all 三平deuce 局末平分, 盘末平局love game 一方得零分的一局double fault 双误, 两次发球失误‘not up’, 两跳,还击前球着地两次service line 发球线fore court 前场back court 后场centre mark 中点server 发球员receiver 接球员baseball 垒球handball 手球hockey 曲棍球golf 高尔夫球cricket 板球ice hockey 冰球goalkeeper 球门员centre kick 中线发球goal kick 球门发球throw in, line-out 边线发球to score a goal 射门得分to convert a try 对方球门线后触地得分batsman 板球运动员batter 击球运动员men's singles 单打运动员in the mixed doubles 混合双打Water Sports 水上运动swimming pool 游泳池swimming 游泳medley relay 混合泳crawl 爬泳breaststroke 蛙式backstroke 仰式freestyle 自由式butterfly (stroke) 蝶泳diving competition 跳水water polo 水球water skiing 水橇rowing 划船canoe 划艇boat race 赛艇yacht 游艇kayak 皮船sailing 帆船运动outboard boat 船外马达Bicycle Motorcycle 自行车,摩托车car 车类运动velodrome, cycling stadium 自行车赛车场road race 公路赛race 计时赛chase 追逐赛motorcycle, motorbike 摩托车racing car 赛车racing driver 赛车驾驶员rally 汽车拉力赛Riding and Horse Riding 赛马riding 骑马racecourse, racetrack 跑马场,赛马场jockey, polo 马球rider 马球运动员show jumping competition 跳跃赛steeplechase 障碍赛fence 障碍trotter 快跑的马其它体育英语词汇和术语之三:拳击Boxing 拳击boxer 拳击运动员boxing glove 拳击手套boxing shoe 拳击鞋infighting 近战straight punch 直拳uppercut 上钩拳right hook 右钩拳foul 犯规punch bag 沙袋punch ball 沙球boxing match 拳击比赛referee 裁判员boxing ring 拳击台rope 围绳winner 胜利者loser by a knockout 被击败出局者timekeeper 计时员boxing weights 拳击体重级别light flyweight 48公斤级, 次特轻量级flyweight 51公斤级, 特轻量级bantamweight 54公斤级, 最轻量级featherweight 57公斤级, 次轻量级lightweight 60公斤级, 轻量级light welterweight 63.5公斤级, 轻中量级welterweight 67公斤级, 次中量级light middleweight 71公斤级, 中量级middleweight 75公斤级, 次重量级light heavyweight 81公斤级, 重量级heavyweight 81以上公斤级, 最重量级acrobatic gymnastics---技巧运动athletics/track & field---田径beach---海滩boat race---赛艇bobsleigh, bobsled---雪橇boxing---拳击canoe slalom---激流划船canoe---赛艇chess---象棋cricket---板球cycling---自行车diving---跳水downhill race---速降滑雪赛,滑降dragon-boat racing---赛龙船dressage---盛装舞步equestrian---骑马fencing---击剑figure skating---花样滑冰football(英语)/soccer(美语)---足球freestyle----自由式gliding; sailplaning---滑翔运动golf----高尔夫球Greece-Roman wrestling----古典式摔跤gymnastic apparatus----体操器械gymnastics----体操handball-----手球hockey----曲棍球hold, lock-----揪钮horizontal bar-----单杠hurdles; hurdle race----跨栏比赛huttlecock kicking---踢毽子ice skating---滑冰indoor---室内item Archery---箭术judo---柔道jumping----障碍kayak----皮划艇mat exercises---垫上运动modern pentathlon---现代五项运动mountain bike---山地车parallel bars---双杠polo---马球qigong; breathing exercises---气功relative work---造型跳伞relay race; relay---接力rings----吊环roller skating----滑旱冰rowing-----划船rugby---橄榄球sailing--帆船shooting---射击side horse, pommelled horse---鞍马ski jump---跳高滑雪ski jumping competition---跳高滑雪比赛ski---滑雪板skiing---滑雪slalom---障碍滑雪softball---垒球surfing---冲浪swimming----游泳table tennis---乒乓球taekwondo---跆拳道tennis----网球toxophily---射箭track---赛道trampoline---蹦床trapeze---秋千triathlon---铁人三项tug-of-war---拔河volleyball---排球badminton---羽毛球baseball---棒球basketball---篮球walking; walking race---竞走wall bars---肋木water polo----水球weightlifting ---举重weights ---重量级winter sports -----冬季运动wrestling --- 摔交yacht --- 游艇Men's 10m Platform 男子10米跳台Women's Taekwondo Over 67kg 女子67公斤级以上跆拳道Women's Athletics 20km Walk 女子20公里竟走Men's Diving Synchronized 3m Springboard 男子3米跳板Women's Diving 3m Springboard 女子3米跳板Women's Diving Synchronized 10m Platform 女子10米跳台Men's Wrestling Greco-Roman 58kg 男子58公斤古典摔交Men's Diving 3m Springboard 男子3米跳板Men's Artistic Gymnastics Parallel Bars 竞技体操男子双杠Women's Artistic Gymnastics Beam 竞技体操女子自由体操Men's Table Tennis Singles 男子乒乓单打Women's Diving 10m Platform 女子10米跳台Women's Artistic Gymnastics Uneven Bars 竞技体操女子跳马Women's Table Tennis Singles 女子乒乓单打Men's Badminton Singles 男子羽毛球单打Women's Badminton Doubles 女子羽毛球双打Men's Diving Synchronized 10m Platform 跳水男子10米跳台Women's Diving Synchronized 3m Springboard 跳水女子3米跳板Men's Table Tennis Doubles 男子乒乓球双打Women's Badminton Singles 女子羽毛球单打Men's Fencing Team Foil 击剑男子团体花剑Women's Judo Heavyweight +78kg 柔道女子重量级78公斤Men's Shooting 10m Running Target 射击男子10米移动靶Women's Shooting 25m Pistol 射击女子25米运动手枪Women's Table Tennis Doubles 女子乒乓球双打Men's Weightlifting 77kg 举重男子77公斤级抓举Women's Weightlifting 75+ kg 举重女子75公斤以上级抓举Mixed Badminton Doubles 羽毛球男子双打Women's Artistic Gymnastics All-Around Finals 竞技体操女子个人全能决赛Women's Judo Half-Heavywt 78kg 女子次重量级78公斤级柔道Men's Artistic Gymnastics All-Around Finals 竞技体操男子个人全能Women's Fencing Team Epee 击剑女子团体重剑Women's Artistic Gymnastics Team Finals 竞技体操女子团体Women's Judo Half-Middlewt 63kg 女子次中量级63公斤级柔道Women's Weightlifting 63kg 女子63公斤级挺举举重Women's Weightlifting 69kg 女子69公斤级抓举举重Men's Artistic Gymnastics Team Finals 男子团体竞技体操Men's Shooting 10m Air Rifle 射击男子10米气步枪Women's Shooting Trap 射击女子多向飞碟Women's Weightlifting 53kg 举重女子53公斤级抓举Women's Judo Half-Lightwt 52kg 女子次轻量级52公斤柔道Women's Shooting 10m Air Pistol 女子10米汽枪Women's Cycling Track 500m Time Trial 运动场自行车赛女子500米计时赛Men's Shooting 10m Air Pistol 男子10米气手枪Women's Shooting 10m Air Rifle 女子10米气步枪Men's Weightlifting 56kg 男子56公斤级挺举1.General Terms 一般词汇manager 经纪人instructor 教练,技术指导guide 领队trainer 助理教练referee, umpire (网球.棒球)裁判linesman, touch judge (橄榄球)裁判contestant, competitor, player 运动员professional 职业运动员amateur 业余运动员,爱好者enthusiast, fan 迷,爱好者favourite 可望取胜者(美作:favorite)outsider 无取胜希望者championship 冠军赛,锦标赛champion 冠军record 纪录record holder 纪录创造者ace 网球赛中的一分Olympic Games, Olympics 奥林匹克运动会Winter Olympics 冬季奥林匹克运动会stadium 运动场track 跑道ring 圈ground, field 场地pitch (足球、橄榄球)场地court 网球场team, side 队2.Athletics 竞技race 跑middle-distance race 中长跑long-distance runner 长跑运动员sprint 短跑(美作:dash)the400 metre hurdles 400米栏marathon 马拉松decathlon 十项cross-country race 越野跑jump 跳跃jumping 跳跃运动high jump 跳高long jump 跳远(美作:broad jump)triple jump, hop step and jump 三级跳pole vault 撑竿跳throw 投掷throwing 投掷运动putting the shot, shot put 推铅球throwing the discus 掷铁饼throwing the hammer 掷链锤throwing the javelin 掷标枪walk 竞走3.Individual Sprots 体育项目gymnastics 体操gymnastic apparatus 体操器械horizontal bar 单杠parallel bars 双杠rings 吊环trapeze 秋千wall bars 肋木side horse, pommelled horse 鞍马weight-lifting 举重weights 重量级boxing 拳击GRE ece-Roman wrestling 古典式摔跤hold, lock 揪钮judo 柔道fencing 击剑winter sports 冬季运动skiing 滑雪ski 滑雪板downhill race 速降滑雪赛,滑降slalom 障碍滑雪ski jumping competition 跳高滑雪比赛ski jump 跳高滑雪ice skating 滑冰figure skating 花样滑冰roller skating 滑旱冰bobsleigh, bobsled 雪橇4.Games and Competitions 球类运动football 足球rugby 橄榄球basketball 篮球volleyball 排球tennis 网球baseball 垒球handball 手球hockey 曲棍球golf 高尔夫球cricket 板球ice hockey 冰球goalkeeper 球门员centre kick 中线发球goal kick 球门发球throw in, line-out 边线发球to score a goal 射门得分to convert a try 对方球门线后触地得分batsman 板球运动员batter 击球运动员men's singles 单打运动员in the mixed doubles 混合双打5.Water Sports 水上运动swimming pool 游泳池swimming 游泳medley relay 混合泳crawl 爬泳breaststroke 蛙式backstroke 仰式freestyle 自由式butterfly (stroke) 蝶泳diving competition 跳水water polo 水球water skiing 水橇rowing 划船canoe 划艇boat race 赛艇yacht 游艇kayak 皮船sailing 帆船运动outboard boat 船外马达。
GENERAL TERMS & CONDITIONS FOR THE PURCHASE OF SERVICESECLIPSE有限公司购买服务的一般条款和条件The original English language version of these Terms and Conditions is the legally-binding version. The translated version is for information only.此合同条款和条件的英文原文版为具法律效应版本。
翻译版本仅供参考。
1Interpretation 解释1.1In these Conditions:在这些条件下:’Company’means Eclipse Translations Limited whose registered office is at European Translation Centre Lionheart Enterprise Park Alnwick Northumberland NE66 2HT (Company Number: 03290358) “公司” 指Eclipse翻译有限公司,公司注册地址:诺森伯兰郡Alnwick,Lionheart企业园,欧洲翻译中心, 邮编:NE66 2HT (European Translation Centre, Lionheart Enterprise Park, Alnwick, Northumberland NE662HT)(公司编号:03290358)‘charges’means the charge for the Services“收费” 指提供服务的收费‘conditions’means the standard conditio ns of purchase set out in this document and (unless the context otherwise requires) includes any special conditions agreed in Writing between the Company and the Translator“条件” 指本文陈述的购买标准条件,以及包括公司和翻译者之间书面同意的任何特殊条件(除非文中另有要求)‘contract’means the agreement con stituted by the acceptance of these Conditions by the Translator“合同” 指翻译者接受这些条件达成的协议’Delivery Address’means that address stated on the Order“交付地址” 指定单上所述的地址’Order’means the Company’s purchase order to which these Conditions are annexed“定单” 指符合这些条件的公司购买定单‘Translator’means the person or company so described in the Order“翻译者” 指定单中所述的人员或公司‘Services’ means the services (if any) described in the Order“服务” 指定单中所描述的服务(如有)’Specification’includes any plans, drawings, data, description or otherinformation relating to the Services“说明” 包括任何规划、设计图、绘图、资料、描述或其它与服务有关的资讯‘terms’ means the standard terms of purchase set out in this document and (unless the context otherwise requires) includes any special terms agreed in Writing between the Company and Translator“条款” 指本文陈述的购买标准条款,以及包括公司和翻译者之间书面同意的任何特殊条款(除非文中另有要求)‘work”means a translation produced by the Translator in the course of performing the Services ——————————————————————————–Page 2“作品” 指翻译者通过提供服务所作的译文‘writing’includes telex, facsimile transmission and comparable means of communica-tion.“书面” 包括电报、传真和类似的通讯手段。
set out general terms and conditions 条款和条件一般条款与条件第一条:定义与解释1.1 在本合同中,除非上下文另有要求,否则以下词语和表达应具有以下含义:"我们"、"我们的":指本条款与条件的提供者,即服务的供应方;"您"、"您的":指接受本条款与条件的个人或实体,即服务的用户;"服务":指我们向您提供的任何产品、服务或功能,包括但不限于在线平台、应用程序、软件、内容或其他相关服务。
1.2 本条款与条件中的标题仅为方便阅读而设,不影响其解释。
第二条:接受条款与条件2.1 通过使用我们的服务,您表示已阅读、理解并同意受本条款与条件的约束。
如果您不同意这些条款与条件,您不得使用我们的服务。
2.2 我们可能随时修改本条款与条件。
任何修改将在发布时生效,并适用于此后对服务的使用。
您应定期查看本条款与条件以了解任何修改。
第三条:服务的使用3.1 您必须遵守所有适用的法律、法规和规章,不得将我们的服务用于任何非法、欺诈或有害的目的。
3.2 您不得干扰或破坏我们的服务,包括但不限于使用任何病毒、恶意软件、蠕虫、特洛伊木马或其他有害代码。
3.3 您有责任保护您的账户安全,包括但不限于保管好您的用户名和密码,防止未经授权的访问和使用。
第四条:知识产权4.1 我们的服务中包含的所有内容,如文本、图形、图像、音频、视频、软件、数据等,均受版权、商标、专利或其他知识产权法律的保护。
4.2 除非本条款与条件或适用法律明确允许,否则您不得复制、分发、修改、展示、公开表演、传输或以其他任何方式使用我们的服务中的任何内容。
第五条:免责声明与责任限制5.1 我们的服务按“现状”和“可用”的基础提供,不附带任何形式的明示或暗示的保证,包括但不限于对适销性、特定用途的适用性或非侵权性的保证。
5.2 在法律允许的范围内,我们对于因使用或无法使用我们的服务而产生的任何直接、间接、偶然、特殊、后果性或惩罚性的损害不承担任何责任,即使我们已被告知这种损害的可能性。
GENERAL TERMS & CONDITIONS FOR THE PURCHASE OF SERVICESECLIPSE有限公司购买服务的一般条款和条件The original English language version of these Terms and Conditions is the legally-binding version. The translated version is for information only.此合同条款和条件的英文原文版为具法律效应版本。
翻译版本仅供参考。
1Interpretation 解释1.1In these Conditions:在这些条件下:‘Company‘means Eclipse Translations Limited whose registered office is at European Translation Centre Lionheart Enterprise Park Alnwick Northumberland NE66 2HT (Company Number: 03290358) ―公司‖ 指Eclipse翻译有限公司,公司注册地址:诺森伯兰郡Alnwick,Lionheart企业园,欧洲翻译中心, 邮编:NE66 2HT (European Translation Centre, Lionheart Enterprise Park, Alnwick, Northumberland NE662HT)(公司编号:03290358)‗charges‘means the charge for the Services―收费‖ 指提供服务的收费‗conditions‘means the standard conditio ns of purchase set out in this document and (unless the context otherwise requires) includes any special conditions agreed in Writing between the Company and the Translator―条件‖ 指本文陈述的购买标准条件,以及包括公司和翻译者之间书面同意的任何特殊条件(除非文中另有要求)‗contract‘means the agreement con stituted by the acceptance of these Conditions by the Translator―合同‖ 指翻译者接受这些条件达成的协议‘Delivery Address‘means that address stated on the Order―交付地址‖ 指定单上所述的地址‘Order‘means the Company‘s purchase order to which these Conditions are annexed―定单‖ 指符合这些条件的公司购买定单‗Translator‘means the person or company so described in the Order―翻译者‖ 指定单中所述的人员或公司‗Services‘ means the services (if any) described in the Order―服务‖ 指定单中所描述的服务(如有)‘Specification‘includes any plans, drawings, data, description or otherinformation relating to the Services―说明‖ 包括任何规划、设计图、绘图、资料、描述或其它与服务有关的资讯‗terms‘ means the standard terms of purchase set out in this document and (unless the context otherwise requires) includes any special terms agreed in Writing between the Company and Translator―条款‖ 指本文陈述的购买标准条款,以及包括公司和翻译者之间书面同意的任何特殊条款(除非文中另有要求)‗work‖means a translation produced by the Translator in the course of performing the Services ——————————————————————————–Page 2―作品‖ 指翻译者通过提供服务所作的译文‗writing‘includes telex, facsimile transmission and comparable means of communica-tion.―书面‖ 包括电报、传真和类似的通讯手段。
Java Bytecode as a Typed Term CalculusTomoyuki HiguchiSchool of Information SicenceJapan Advanced Institute ofScience and Technology Tasunokuchi Ishikawa,923-1292Japan thiguchi@jaist.ac.jpAtsushi OhoriSchool of Information SicenceJapan Advanced Institute ofScience and Technology Tasunokuchi Ishikawa,923-1292Japan ohori@jaist.ac.jpABSTRACTWe propose a type system for the Java bytecode language, prove the type soundness,and develop a type inference al-gorithm.In contrast to the existing proposals,our type system yields a typed term calculus similar to type systems of lambda calculi.This enables us to transfer existing tech-niques and results of type theory to a JVM-style bytecode language.We show that ML-style let polymorphism and recursive types can be used to type JVM subroutines,and that there is an ML-style type inference algorithm.The type inference algorithm has beeen implemented.The ability to verify type soundness is a simple corollary of the existence of type inference algorithm.Moreover,our type theoretical approach opens up various type safe extensions including higher-order methods,flexible polymorphic typing through polymorphic type inference,and type-preserving compila-tion.Categories and Subject DescriptorsD.3.1[Programming Languages]:Formal Definitions and Theory;D.3.2[Programming Languages]:Language Clas-sifications—Macro and assembly languages,Object-oriented languagesGeneral TermsLanguages,Theory,VerificationKeywordsJava bytecode,bytecode verifier,type system,type inference 1.INTRODUCTIONType safety of executable code is becoming increasingly important due to recently emerging network computing,where pieces of executable code are dynamically exchanged over the Internet and used under the user’s own privileges.An Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on thefirst page.To copy otherwise,to republish,to post on servers or to redistribute to lists,requires prior specific permission and/or a fee.PPDP’02,October6-8,2002,Pittsburgh,Pennsylvania,USACopyright2002ACM1-58113-528-9/02/0010...$5.00.important achievement toward this direction is the develop-ment of the Java bytecode language[13],which is the target language of the Java programming language[5].A distin-guishing feature is its typing constraint.The system can ensure type correct execution of a given bytecode by check-ing type constraint before execution.Type verification of JVM bytecode is essentially static type checking,customary done in high-level typed program-ming languages.Since JVM is a powerful and complex sys-tem,the development of a correct and reliable type checking system requires a formal framework for semantics and typ-ing derivation of the JVM bytecode language.This prob-lem has recently attracted the attention of programming language researchers,and several static type systems have been developed.Stata and Abadi[17]propose a static type system for a subset of JVM bytecode language including subroutines.In this work,a type system checks the satisfi-ability of constraints on memory states induced by the set of instructions in the given code.This paradigm has been successfully used in subsequent proposals on type-checking the JVM bytecode language.Freund and Mitchell[4]use this framework to analyze a subtle problem of object initialization and propose a refined type-checking scheme.They further show in[3]that their approach extends to various features of JVM including ob-jects,classes,interfaces,and exceptions.O’Callahan[15] also extends the approach of[17]to allow moreflexible typ-ing for subroutines by giving an explicit return type of a subroutine.In addition to this,he introduces polymorphic typing to achieveflexibility in subroutine usage.Hagiya and Tozawa[6]give an alternative approach for subroutines based on dataflow analysis,which yields a simpler sound-ness proof.Iwama and Kobayashi[7]propose another type system based on[17]to verify the correctness of the usage of lock primitives,where the type of an object has the informa-tion about the order in which the object is locked/unlocked.Leroy[12]presents a light-weight verification method based on dataflow analysis.In those proposals,a type system is designed to check type consistency of memory states imposed by instructions in a given bytecode.While this paradigm is useful in establishing type safety of the JVM bytecode language,some important questions remain.For example,how does this paradigm ex-tend to other useful programming features,or how does this paradigm relates to existing theory of type systems?As we comment later in the next section,O’Callahan’s system uses the notion of continuation which represents more informa-1tion of the behavior of a program than set of memory usage constraints,but its relationship to existing notions in type theory is not entirely clear.One obstacle in answering these questions appears to be the fact that in this paradigm,types do not represent be-havior of bytecode.This is in contrast with type systems of programming languages where types provide a static view of the behavior of a program–a typingΓ£M:τimplies that M yields a value of typeτwhen it is executed in an envi-ronment of typeΓ.This view yields a clean type soundness theorem and has been the basis for incorporating various advanced features such as polymorphism and type inference in a language.Jones[8]suggests representing each JVM instruction as a function and sequencing as function composition.Although semantics of functions is general enough to represent JVM instructions,it does not seem to reflect machine execution directly.As a result,it is not obvious that this approach could be a basis for establishing type soundness or for de-veloping a feasible type-checking algorithm for bytecode in-cluding subroutines.Our goal is to develop a typed calculus for a JVM-style bytecode language,where types represent behavior of code. We base our development on Curry-Howard isomorphism for low-level code[16],where it is shown that a low-level code language corresponds to a sequent style proof system of the intuitionistic propositional logic.Katsumata and Ohori[9] sketch that this idea can be applied to represent JVM-style bytecode.The purpose of[9]is to present a proof-directed de-compilation method,and its treatment of JVM bytecode is only on syntactical typing and is quite limited;it does not consider subroutine or inheritance,and it does not discuss type soundness.In the present paper,we refine the logical approach of[16]and develop a typed calculus of JVM byte-code including most of its features,establish type soundness, and develop a type inference algorithm.We believe that the proposed calculus provides type-theoretical account for JVM,and serves as a framework for extending JVM-style bytecode languages with various advanced features includ-ing higher-order methods,flexible polymorphic typing,poly-morphic type inference,and type-preserving compilation. The rest of the paper is organized as follows.Section2 outlines our approach.Section3gives a typed calculus for JVM,and establishes type soundness.Section4extends the typed calculus with polymorphism and develops a type infer-ence algorithm.We have implemented a prototype bytecode verifier based on the type inference algorithm,which is de-scribed in Section5.Section6discusses some extensions to the type systems,and Section7concludes the paper.2.TYPE-THEORETICALINTERPRETATION OF BYTECODEWe follow the logical interpretation of low-level code[16] and interpret a bytecode program as a typing derivation.In JVM,a program consists of a collection of labeled blocks ending with a return instruction or a branch instruction. We let I and B range over(non-branching)instructions and code blocks,respectively.A non-branching instruction op-erates on a local environment and a stack.We useΓand ∆for types of local environments and stacks respectively. Code blocks,environment types,and stack types arefinite sequences,for which we use the following notations.e·S isthe sequence obtained by adding an element e at the beg-ging of a sequnce S,and S{n←e}is the sequence obtained by changing the n th element of S to e.The empty sequence is denoted byφ.Applying the idea of[16],we interpret a JVM block B asa judgment of the formΓ,∆£B:τin a sequent style proofsystem.A return instruction corresponds to an initial se-quent in the proof system.For example,Γ,int·∆£ireturn: int indicates that ireturn is a complete program returning the top element of the current stack.A goto(l)refers to an existing block B named l.So we typeΓ,∆£goto(l):τif Γ,∆£B:τ.An ordinary non-branching instruction I that changes the machine state of typeΓ1,∆1to that of type Γ2,∆2,written as I:Γ1,∆1=⇒Γ2,∆2,is interpreted as asa left-rule of the form:Γ2,∆2£B:τΓ1,∆1£I·B:τwhich can be read“backward”saying that the execution of I transforms a bigger proof of I·B to a smaller proof ofB.Most of the non-branching instructions including one formethod invocation can be interpreted in this way.An important exception is jsr for calling a subroutine.A subroutine block(ranged over by SB)does not return avalue but returns a modified environment and a modified stack to be used by the block that follows jsr.Let l be a la-bel of a subroutine block SB that changes a machine state of typeΓ1,∆1to that of typeΓ2,∆2,and let B be a block such thatΓ2,∆2£B:τ.We interpret a subroutine call jsr(l)·B as a function to transform a judgment of typeΓ2,∆2£τto that of typeΓ1,∆1£τ.To represent this intuitive static semantics,we type jsr(l)·B asΓ1,∆1£jsr(l)·B:τand assign to SB a type of the form Γ2,∆2£τ//Γ1,αl·∆1£τ whereαl denotes the type of a return address.The use of continuation in O’Callahan’s work[15]cor-responds to introducing the partΓ1,∆1in the above type.Capturing this part of typing is enough to ensure type safety of a program consisting only of blocks and subroutine blocks.When methods are added,however,some additional ma-chinery seems to be necessary.Our system can serve as a type theoretical framework for representing both methods (or other higher-order objects)and collection of blocks and subroutines.Further refinement is needed for typing ret(i)instruc-tion.This instruction transfers control back to the block whose address is stored in the i th element of a local envi-ronment.This means thatΓ2(i)is the type of the blockB to which it returns,and therefore the equationΓ2(i)=Γ2,∆2£τmust hold.We solve this problem by introduc-ing recursive type equations.For each subroutine identi-fied by its entry label l,we introduce a type variableαl representing the return address with an associated equa-tion,and assign to a subroutine l a type of the form(αl=Γ2,∆2£τin αl//Γ1,∆1£τ ).We treat a recursive type equationαl=Γ,∆£τas a global declaration similar to ML’s datatype.When the equation is irrelevant,we simply write αl//Γ,∆£τ .Figure1shows an example of type derivation for bytecode block using a ter in section4we show that there is an algorithm to infer a polymorphic typing for blocks and subroutine blocks.3.THE TYPED JVM CALCULUS2Bytecode program:L0:jsr L2 L1:iload_1ireturn L2:astore_2iload_1ifeq L3iload_1ireturnL3:iconst_1istore_1ret2Typing results:αL2={c,int,αL2},∆£int in L0:{c,int,τ},∆£intL1:{c,int,αL2},int·∆£intL2: αL2//{c,int,τ},αL2·∆£intL3: αL3//{c,int,αL2},∆£intFigure1:Example of Type DerivationThis section defines a typed calculus,Jvmc,for the JVM bytecode language.Since a set of classes and an associated subclass relation are explicitly declared,we define Jvmc rel-ative to a givenfixed set of class names(ranged over by c), and a givenfixed subclass relation on class names.We write c1<:c2if c1is a subclass of c2.3.1The Syntax of JvmcA JVM program is a collection of classes.We regards a collection of JVM classes as a pair(Θ,Π)of type specifica-tionsΘand method definitionsΠ.Θis a function assigning each class name a set of typedfield names(ranged over by f)and a set of typed method names(ranged over by m). Figure2gives the syntax ofΘ.{τ1,...,τn}⇒τis a methodΘ:={c=spec,...,c=sepc}spec:={methods={m:{τ1,...,τn}⇒τ,···}fields={f:τ,···}}τ:=int|void|c|αl|Figure2:Syntax ofΘtype with argument types{τ1,...,τn}and the return typeτ. is a special type representing unused environment entry. In this abstract syntax,we understand that the subclass re-lation is already incorporated so that the set offield names and the set of method names of a class contain all those defined in its super classes.We also assume that if some commonfiled names are used in super classes,then those names are properly disambiguated.Πassign each class name its method definitions.In an actual JVM classfile,each method body is a sequence of JVM instructions some of which have labels(ranged over by l)for branch instructions.We regard such a sequence as a labeled collection of basic blocks.Furthermore,we divide basic blocks into code blocks(ranged over by B)and sub-routine blocks(ranged over by SB).Subroutine blocks are those that are(originally)invoked by a subroutine call in-struction.We identify each subroutine block with its entry label and write SB(l s)for a subroutine block whose(orig-inal)entry point is l s.With this refinement,the syntax of method definitionsΠis given in given in Figure3,where i and n represent a local variabel and an integer value respec-tively.Some comments are in order.It is straightforward toΠ:={c=methods,...,c=methods} methods:={m=M,...,m=M}M:={l b:B,···|l s:SB(l s),···}B:=return|ireturn|areturn|jsr(l s,l b)|goto(l b)|I·BSB:=return|ireturn|areturn|jsr(l s,l s)|ret(i)|goto(l s)|I·SBI:=iconst(n)|iload(i)|aload(i)|istore(i)|astore(i)|dup|pop|iadd|ifeq(l)|new(c)|invoke(c,m)|getfield(c,f)|putfield(c,f)Figure3:Syntax ofΠconstruct a labeled collection of code blocks.Construction of subroutine blocks associated with an entry label can be done by traversing instruction sequence starting from a label l appearing in some jsr(l,l )and collecting all the reach-able basic blocks.For those basic blocks that are associated with more than one entry labels,we consider that separate copies exist for each entry labels.Since code is not mutable, any potions of B i and SB i in the result of this construction can be shared.So there is no danger of code size explosion.In JVM,subroutine call has the form jsr(l)·B.JVM pushes on the stack the address of B to be used as the return address by the callee.In order to develop a type system,we need to type a return address explicitly.For this reason,we regards jsr(l)·B as shorthand for jsr(l,l )with the introduction of a new labeled block l :B.3.2The Type SystemThe basic typing judgments are those for code blocks and subroutine blocks.As we have outlined in Section2,they are judgments of the form:•Γ,∆£B:τ•SB(l):(αl=Γ2,∆2£τin αl//Γ1,∆1£τ )Since these blocks contain labels for branch instructions, their derivation are defined relative to a label environment L of the formL={l b:Γ,∆£τ,...|l s:(αls=Γ2,∆2£τin αls//Γ1,∆1£τ ),...}specifying a typing of the code blocks and subroutine blocks of a given method.If S is a sequence,we write S.i for thei th element in S.Similarly,if S is a mapping and e is anelement in its domain,then S.e denotes the element assigned to e by S.Using these notations,static semantics of non-branching instructions is given in Figure4,and the typing rules for blocks are given in Figure5.In this definition,we simply assume that new(c)creates a complete object of class c.In an actual JVM,object creation is done in two stages byfirst creating a container by new and then initializing theirfields by the constructors.As observed in[4],there is subtle issues associated with this process.We believe that the mechanism proposed in[4]is orthogonal to3iconst(n):Γ,∆=⇒Γ,int·∆iload(i):Γ,∆=⇒Γ,int·∆(ifΓ(i)=int)aload(i):Γ,∆=⇒Γ,c·∆(ifΓ(i)=c)istore(i):Γ,int·∆=⇒Γ{i←int},∆astore(i):Γ,c·∆=⇒Γ{i←c},∆astore(i):Γ,αl·∆=⇒Γ{i←αl},∆dup:Γ,τ·∆=⇒Γ,τ·τ·∆iadd:Γ,int·int·∆=⇒Γ,int·∆pop:Γ,τ·∆=⇒Γ,∆new(c):Γ,∆=⇒Γ,c·∆getfield(c0,f):Γ,c1·∆=⇒Γ,τ·∆(ifΘ.c0.fields.f=τand c1<:c0)putfield(c0,f):Γ,τ·c1·∆=⇒Γ,∆(ifΘ.c0.fields.f=τand c1<:c0)invoke(c0,m):Γ,τn·...τ1·c1·∆=⇒Γ,τ0·∆(ifΘ.c0.methods.m={τ 1,···,τ n}=⇒τ0,τ0=void and c1<:c0∧τi<:τ i for all1≤i≤n)invoke(c0,m):Γ,τn·...·τ1·c1·∆=⇒Γ,∆(ifΘ.c0.methods.m={τ 1,···,τ n}=⇒voidand c1<:c0∧τi<:τ i for all1≤i≤n)Figure4:Static Semantics of Non-branching In-structionsour approach,and can be adopted in our type system as well.Another simplification we made is that all the neces-sary classfiles are available at the time of verification.This is reflected in the typing rules of getfield,putfield,and invoke,which refer toΘ.In an actual JVM,classes are dy-namically loaded.It is not hard to modify our type system to model dynamic class loading.Since those instructions in-clude the static type of the method orfield,we can simply use the specified type to verify the code block containing these instruction without referring to the classfiles.At the time of executing one of these instructions,which causes a class containing the method to be loaded,we infer the type of the method and verify that it is indeed equal to the one specified in the instruction.This process is easily formalized by using the mechanism of dynamic typing[1].Typing of a method M is then defined as follows.M:L⇔∀l b∈dom(M).L(l b)=Γ,∆£τ∧L Γ,∆£M(l b):τand∀l s∈dom(M).L M(l s):L(l s).We assume that a method M contains a block having a unique special label e indicating its entry point,and define the typing of a method as follows:M:{τ1,...,τn}⇒τ⇔∃L such that M:L andL Γ{0←c,1←τ1,···n←τn},φ£M.e:τ(Γ=Top(max M))where Top(n)is the type of environment of size nfilled with a meaningless type and max M is the number of local vari-ables used in the method M.We can now define the type correctness of a Jvmc programTyping rules for code blocks:L Γ,∆£ireturn:intL Γ,c·∆£areturn:cL Γ,∆£return:voidL Γ,∆£goto(l):τ(if L(l)=Γ,∆£τ)L Γ2,∆2£B:τL Γ1,∆1£I·B:τ(if I:Γ1,∆1=⇒Γ2,∆2)L Γ,∆£B:τL Γ,int·∆£ifeq(l)·B:τ(if L(l)=Γ,∆ τ) L Γ1,∆1£jsr(l1,l2):τ(if L(l1)=(αl1=Γ2,∆2£τin αl1//Γ1,αl1·∆1£τ ) and L(l2)=Γ2,∆2£τ)Typing rules for subroutine blocks:ret(x):(αl=Γ,∆£τin αl//Γ,∆£τ )(ifΓ(x)=αl)goto(l): αl//Γ,∆£τ (if L(l)= αl//Γ,∆£τ )return: αl//Γ,∆£voidireturn: αl//Γ,int·∆£intareturn: αl//Γ,c·∆£cjsr(l1,l2): αl//Γ,∆£τ(if L(l1)=(αl1=Γ ,∆ £τin αl1//Γ,αl1·∆£τ ) and L(l2)= αl//Γ ,∆ £τ )SB: αl//Γ2,∆2£τI·SB: αl//Γ1,∆1£τ(if I:Γ1,∆1=⇒Γ2,∆2) SB: αl//Γ,∆£τifeq(l)·SB: αl//Γ,int·∆£τ(if L(l)= αl//Γ,∆£τ )Figure5:Typing Rules for Blocks(Π,Θ)as the following property:Π:Θ⇔Dom(Π)=Dom(Θ),and∀c∈Dom(Π). Π.c.m:Θ.c.methods.m3.3Operational SemanticsWe establish that our type system is correct by formally proving the type soundness theorem with respect to an op-erational semantics of Jvmc.We let S range over runtime stacks,E range over runtime variable environments,and h range over heaps.Both S and E arefinite sequences of run-time values(ranged over by v).A heap h maps an address (ranged over by r)to a runtime representation of an object of the form f1=v1,...,f n=v n c where c is the class of the object.Possible runtime values are either natural num-bers n,an address r in a heap,or a return address adrs(l) which represents the entry address of a block named l and is used by a subroutine.We writeI:(S,E),h=⇒(S ,E ),hto indicate that I changes the machine state(S,E),h to (S ,E ),h .Fig.6gives this relation.Update(h,r,f,v)in the rule for putfield updates the ffield of a runtime repre-sentation of an object in the heap h pointed by r to v.The object created by new consists of default values⊥τof type τ.We assume that these values behave as ordinary values4of typeτin subsequent operation.This reflects our simpli-fying assumption mentioned earlier that new(c)creates an object of class c and we do not treat the issues of two stage object creation mechanism in JVM.iconst(n):(S,E),h=⇒(n·S,E),hiload(i):(S,E),h=⇒(E(i)·S,E),haload(i):(S,E),h=⇒(E(i)·S,E),histore(i):(n·S,E),h=⇒(S,E{i←n}),hastore(i):(r·S,E),h=⇒(S,E{i←r}),hastore(i):(adrs(l)·S,E),h=⇒(S,E{i←adrs(l)}),h dup:(v·S,E),h=⇒(v·v·S,E),hiadd:(n1·n2·S,E),h=⇒((n1+n2)·S,E),hpop:(v·S,E),h=⇒(S,E),hnew(c):(S,E),h=⇒(r·S,E),h(if h =h{r← f1=⊥τ1,...,f n=⊥τnc}Θ.c.fields={f1:τ1,...,f n:τn},and r/∈dom(h))getfield(c,f):(r·S,E),h=⇒(h(r).f·S,E),hputfield(c,f):(v·r·S,E),h=⇒(S,E),h(if h =h{r←Update(h,r,f,v)})Figure6:Dynamic Semantics of Non-branching In-structionsAn operational semantics of Jvmc is given through a set of rules similar to those of SECD machine[10]of the form (S,E,C,D),h−→(S ,E ,C ,D ),hC is a code of the form M{B}or M{SB}indicating that the machine executes the top instruction of code block B (or subroutine block SB)in a method body M.D is a dump,which is either emptyφ,or a sequence of saved ex-ecution frames of the form(S,E,C)·D.Note that these rules are taken with respect to a given type specificationsΘand method definitionsΠ.Fig.7gives the set of transition rules.In the rule for invoke,TopEnv(max(c,m))denotes an environment of size max(c,m)whose elements are spe-cial constant⊥ of a meaningless value having type ,and max(c,m)is the maximal local variable index used in the method m defined in the class c.3.4Type SoundnessWe are now in the position to prove type soundness the-orem.To do this,we define typing relations for various runtime objects used in Jvmc.Runtime values may form cycles and sharing through object pointers.To define value typing without resorting to co-induction,we follow[11]and define types of values relative to a heap type(ranged over by H)specifying the structure of a heap,which is a function from afinite set of heap addresses to types.Runtime values may also contain return addresses of the form adrs(l)which should be typed with a block type or a subroutine block type.This requires us to define value typing relative to a label environment L as well.We use the following typing relations.•L|=h:H h has a heap type H•L;H|=v:τv has typeτunder H•L;H|=S:∆S has a stack type∆under H(S,E,M{I·B},D),h−→(S ,E ,M{B},D),h (if I:(S,E),h=⇒(S ,E ),h ) (n·S,E,M{ireturn},(S0,E0,M0{B0})·D0)),h−→(n·S0,E0,M0{B0},D0),h(r·S,E,M{areturn},(S0,E0,M0{B0})·D0)),h−→(r·S0,E0,M0{B0},D0),h(S,E,M{return},(S0,E0,M0{B0})·D0)),h−→(S0,E0,M0{B0},D0),h(0·S,E,M{ifeq(l)·B},D),h−→(S,E,M{M(l)},D),h(n·S,E,M{ifeq(l)·B},D),h−→(S,E,M{B},D),h(if n=0)(S,E,M{goto(l)},D),h−→(S,E,M{M(l)},D),h(v n·...·v1·r·S,E,M{invoke(c,m)·B},D),h−→(φ,E ,M {M .entry},(S,E,M{B})·D)),h(if E =TopEnv(max(c,m)){0←c,1←τ1,···n←τn},Θ.c.methods.m={τ1,...,τn}⇒τ,andΠ.c.m=M )(S,E,M{ret(i)},D),h−→(S,E,M{M(E(i))},D),h(S,E,M{jsr(l1,l2)},D),h−→(adrs(l2)·S,E,M{M(l1)},D),hFigure7:Transition Rules of the Jvmc •L;H|=E:ΓE has an environment typeΓunder H These relations are given in Figure8.We note that,in theL;H|=n:intL;H|=r:τ(if H(r)<:τ)L;H|=adrs(l):αl(if L(l)=αlor L(l)= αl //Γ,∆£τ andαl =Γ,∆£τ) L|=h:H⇔dom(h)=dom(H)and∀r∈dom(h).if h(r)= f1=v1,...,f n=v n cthen c<:H(r)∧L;H|=v i:Θ.c.fields.f i for each i.L;H|=S:∆⇔dom(S)=dom(∆)∧L;H|=S.i:∆.i for each i.L;H|=E:Γ⇔dom(E)=dom(Γ)and L;H|=E.i:Γ.i for each i.Figure8:Typing of Runtime Values case of JVM,runtime objects are explicitly typed,and there-fore one can take H h for h such that H h(r)is the runtime tag of h(r).The resulting tying relation is the same as the one defined in[3].A key to establish type soundness with respect to SECD-style operational semantics is to define typing relation on dumps.This technique isfirst used in[16].We write H|= D:τto indicate that D has typeτunder H.Its intuitive meaning is that D accepts a value of typeτand resumes the saved computation.This relation is defined inductively onD in ing this relation,we define well typedness 5•H|=φ:τfor anyτ•H|=(S,E,M{B})·D:τ⇔∃Γ,∆,L,τ . M:L,L;H|=S:∆ ,L;H|=E:Γ, L Γ,∆£B:τ ,and H|=D:τ .where∆ =∆ifτ=void otherwise∆ =τ·∆•H|=(S,E,M{SB})·D:τ⇔∃Γ,∆,L,τ . M:L,L;H|=S:∆ ,L;H|=E:Γ, L SB: αl//Γ;∆£τ and H|=D:τ .where∆ =∆ifτ=void otherwise∆ =τ·∆Figure9:Typing of Dump Dof a machine state including a dump as follows.H (S,E,M{B},D),h⇔∃L,Γ,∆such that M:L,L|=h:H,L;H|=S:∆, L;H|=E:Γ,L Γ,∆£B:τ,and H|=D:τH (S,E,M{SB},D),h⇔∃L,Γ,∆such that M:L,L|=h:H,L;H|=S:∆, L;H|=E:Γ,L SB: αl//Γ,∆£τ ,and H|=D:τWe can now formally state type soundness as the following theorem.Theorem 1.Consider a Jvmc program(Π,Θ)such that Π:Θ.If H (S,E,M{C},D),h then either(1)C is one of return,ireturn,areturn,and D=φ,or(2)there are some S ,E ,M {C },D ,h ,and H such that H is an extension of H,(S,E,M{C},D),h−→(S ,E ,M {C },D ),h ,and H (S ,E ,M {C },D ),h .where C is B or SB.Proof.The proof uses following simmple lemma,which can be proved by simple case analysis.Lemma 1.If H|=v:τand H is an extension of H then H |=v:τ.Wefirst show the cases where C is a block B.Since |=Π:Θ,there is some L such that M:L.If H (S,E,M{B},D),h then there is someΓ,∆such that L|= h:H,L,H|=E:Γ,L,H|=S:∆,L Γ,∆£τand H|=D:τ.The proof proceeds by cases in terms of the first instruction of B.Case B=return.D=φor there is some S1,E1,M1,B1,D1 such that D=(S1,E1,M1{B1})·D1.The case for D=φis trivial.We assume that D=(S1,E1,M1{B1})·D1.By the transition rule,(S,E,M{return},(S1,E1,M1{B1})·D1),h −→(p·S1,E1,M1{B1},D1),h.By the type system,τ= void.By the typing rule for D,there are someΓ1,∆1,L1,τ1 such that M1:L1,L1,H|=S1:∆1,L1 Γ1,∆1£B1:τ1 and H|=D1:τ1.Case B=goto(l).By the transition rule,(S,E,M{goto(l)},D),h−→(S,E,M{M(l)},D),h.Since L(l)=Γ;∆£τby the definition of type system,L Γ,∆£M(l):τ.Case B=jsr(l1,l2).By the transition rule,(S,E,M{jsr(l1,l2)},D),h−→(adrs(l2)·S,E,M{M(l1)},D ),h.By the definition of the type system,since there aresomeΓ1,∆1such that L(l1)=(αl1=Γ1,∆1£τin αl1//Γ,αl1·∆£τ ),L(l2)=Γ1,∆1£τ,we have M(l1)=(αl1=Γ1;∆1£τin αl1//Γ,αl1·∆£τ ).Therefore we have only toshow L;H|=adrs(l2):αl1.Since L;H|=adrs(l2):L(l2)andαl1=L(l2),L;H|=adrs(l2):αl1.Case B=new(c)·B1.By the transition rule,(S,E,M{new(c)·B1},D),h−→(r·S,E,M{B1},D),h{r←f1=⊥τ1,...,f n=⊥τnc},Θ.c.fields={f1:τ1,···,f n:τn}and r/∈h.If we take H =H{r→c}then sincer is fresh H is a extension of H.Since L;H |=⊥τi:Θ.c.fieldes.f i(1≤i≤n),|=h :H .Then the resultfollows from Lemma1.Case B=getfield(c,f)·B1.By the definition of typesystem,there are someτ0,c0,∆ such that∆=τ0·c0·∆ ,c0<:c1,τ0<:Θ.c.fields.f.Consequently,there are somer0,s such that S=r0·S ,L;H|=r0:c0,L;H|=S :∆ .By the transition rule,(r0·S ,E,M{getfield(c,f)B1·},D)−→(v1·S ,E,M{B1},D1),h and v1=h(r0).f.By thedefinition of the type system,there is someτ1such thatL Γ;τ1·∆ £B1:τandτ1=Θ.c.fields.f.Since c0<:c,h(r0).f=Θ.c.fields.f,and therefore L;H|=v1:τ1.Case B=invoke(c,m)·B1.By the definition of thetype system,there are someτ1,···,τn,c0,∆ such that∆=τn·...·τ1·c0·∆ .Also,there are some v n,···,v n,v0,S suchthat L;H|=τn·...·τ1·c0·∆ :v n·...·v1·r·S ∧S=v n·...·v1·r·S Then we have:(v n·...·v1·r·S ,E,M{invoke(c,m)·B1},D),h−→(φ,E1,M1{M1.e},(S ,E,M{B1})·D),h such thatE1=T opEnv(max(c,m)){0→r,1→v1,···,n→v n},andΘ.c.methods.m={τ1,···,τn}⇒τ .We distinguishcases whetherτ is void or not.Here we only show that se forτ =void.The other case is similar.Since L Γ,∆ £B1:τby the definition of type system,H|=(S ,E,M{B1})·D:void.By Π:Θ,there is some L1such that M1:L1,L1T op(max e){0→c0,1→τ1,···,n→τn},φ£M1.entry:void.Due to the definition,L1;H1|=T opEnv(max(c,m)){0→r,1→v1,···,n→v n}:T op(max e){0→c,1→τ1,···,n→τn}.The other cases for blocks are simpler.The cases when C is a subroutine blocks can be shownsimilary by case analysis in terms of thefirst instruction.We only show the case for SB=jsr(l1,l2).By the def-inition of type system,L jsr(l1,l2): αl//Γ;∆£τ andL(l1)=(αl1=Γ1;∆1£τin αl1//Γ;αl1·∆£τ ),L(l2)= αl//Γ1;∆1£τ .If M:L,L|=h:H,L;H|=S:∆,L;H|=E:Γ,and H|=D:τthen we haveH (S,E,M{jsr(l1,l2)},D),h.By transition rule,(S,E,M{jsr(l1,l2)},D),h−→(adrs(l2)·S,E,M{M(l1)},D),h.Since L M(l1):L(l1),we have only to show L;H|=adrs(l2):αl1.This follows from the above equations forL(l2)and L(l1).This theorem implies that a well typed machine state iseither the halting state or a state such that the machinecan execute one step transition and produce another welltyped machine state.This immediately guarantees that awell typed program never goes wrong,and when the machineterminates,the top element of the stack is a value of correcttype specified by the type of the program.4.POLYMORPHISMAND TYPE INFERENCEIn order to use the type system we have developed as aframework for static verification of type safety of Jvmc code,two further extensions are necessary.One is polymorphic6。
CGS-CIMB Securities (Singapore) Pte Ltd – General Terms and Conditions 银河-联昌证券(新加坡)私人有限公司–一般条款和条件THIS DOCUMENT states the terms and conditions which govern the relationship between CGS-CIMB Securities (Singapore) Pte. Ltd. (“CGS-CIMB ”) and the applicant or applicants for the Account (as hereafter defined) (the “Client ”). 本文件阐述了银河-联昌证券(新加坡)私人有限公司(“CGS-CIMB ”)与申请人或账户申请人(如下文所定义)(简称“客户”)的关系的条款和条件。
P art A: Definition A 章:定义 1. Definitions 定义1.1 Unless the context otherwise requires or if specifically defined in the relevant part of these terms and conditions,the following words or expressions in these terms and conditions shall have the following meanings:除非上下文另有规定,或在这些条款和条件的相关部分中明确定义,否则这些术语或条件下的下列单词或表达式应具有以下含义:“Account ” means such account, including any sub-account, as may be necessary and expedient for the performance of Transactional Services, including but not limited to the Cash Trading Account, the Margin Trading Account, the Securities Borrowing Account, the Securities Lending Account, the CFD Account (as defined in Clause 62.1), the Investment Advisory Account, and the Multi-currency Trust Account;“账户”是指此类账户,包括任何子账户,以交易服务的性能是必要的和适当的,包括但不限于现金交易账户,保证金交易账户,证券借入账户,证券借出账户,CFD 账户(如第62.1条文定义)、投资咨询账户、和多币种信托账户;“Affiliate” means (i) a related corporation (as defined in the Companies Act (Cap 50)) of CGS-CIMB; (ii) CGS-CIMB Securities Sdn. Bhd. and its related corporations (as defined in the Companies Act (Cap 50)); (iii) a member of the CGI Group; and/or (iv) a member of the CIMB Group;“关联公司”指的是(i )与CGS-CIMB 相关的企业(如公司法规定(第50章));(ii )CGS-CIMB Securities Sdn. Bhd.及其关联公司(根据公司法令(第50章)的定义);(iii )CGI 集团成员;和/或(iv )CIMB 集团股东;“Amount Financed ” means the amount owed by the Client in the Margin Trading Account and shall include (a) amounts financed by CGS-CIMB in respect of outstanding purchases made for the Margin Trading Account net of the Cash Collateral and sales proceeds receivable from outstanding sales made in the Margin Trading Account of the Client; (b) all commission charges, interest expenses and all other related expenses; and (c) such other amount as CGS-CIMB may include for the purpose of determining the amount financed;“资金数额”是指由客户所欠的金额在保证金交易账户和金额应包括(a )由CGS-CIMB 的购买为保证金交易账户净应收现金抵押品和销售收入的优秀销售在客户的保证金交易账户;(b )中所有佣金,利息费用及其他相关费用;(c )其他金额如CGS-CIMB 可能包括用于确定融资金数额;“Authorised Person ” means a person authorised in writing by the Client to provide instructions to CGS-CIMB in relation to Transactions on behalf of the Client, and whose instructions will be accepted by CGS-CIMB and are binding on the Client;“授权人”指的是委托人在书面授权的情况下,为客户提供有关交易的指示,其指示将由CGS-CIMB 接受,并对客户有约束力;“Authority ” means the Monetary Authority of Singapore; “管理局”指的是新加坡的金融管理局;“Base Currency ” means Singapore Dollars; “基础货币”指的是新加坡元;“Business Day ” means any day on which CGS-CIMB is open for business in Singapore; “营业日”指银河-联昌在新加坡营业的任何一天;“CAR ” means Client Account Review; “CAR ”系指客户账户审核;“Cash Collateral ” means Collateral that takes the form of a deposit of cash; “现金抵押品”是指以保证金形式支付的抵押品;Contents 目录 Part A : A 章 : Definition 定义2 Part B : B 章 : Terms Applicable Generally 一般条款的适用 9 Part C : C 章 : Trading In Securities 证券交易22 Part D : D 章 : Financial Advisory Services 财务咨询服务24 Part E : E 章 : Custodian And Nominee Services 保管人和提名人服务26 Part F : F 章 : Securities Borrowing And Lending 证券借入和借出 29 Part G : G 章 : Margin Trading Account 保证金交易账户39 Part H : H 章 : Contracts For Difference 差价合约44 Part I : I 章 : Multi-currency Trust Account 多币种信托账户70 Part J : J 章 : Transactions In Foreign Exchanges 外汇交易71 Part K : K 章 : Electronic Communications 电子通信 71 Part L : L 章 : Online Services 网上服务73 Part M : M 章 : Electronic Payment For Securities 证券电子支付 79 Part N : N 章 : Personal Data 个人资料80 Part O : O 章: Miscellaneous Provisions 杂项规定83 Schedule I : 附表1 : Risk Disclosure Statement 风险披露声明91 Schedule II : 附表2 : Guide And Caution Note: Applying/Maintaining A Trading Account 指引和注意事项:申请/维持交易账户106“Cash Trading Account” means the Account (other than the CFD Account, the Margin Trading Account, the Securities Borrowing Account and the Securities Lending Account) designated by CGS-CIMB through which the Transactions are to be effected;“现金交易账户”指的是由CGS-CIMB指定的账户(CFD账户,保证金交易账户,证券借入账户和证券借出账户除外);“CDP” means The Central Depository (Pte) Limited;“CDP”指中央存管(私人)有限公司;“CFD” means contracts for difference;“CFD”是指差价合约;“Charged Securities” means the Collateral or marketable Securities provided by the Client (and which CGS-CIMB agrees to accept as security for the availability of or continued availability of the Margin Financing Facility) including, without limitation, all or any securities, rights, moneys and properties whatsoever which may at any time after the date hereof be derived from, accrued on or be offered in respect of, any of the Charged Securities; “抵押证券”系指客户提供的抵押品或有价证券(以及CGS-CIMB作为可用的安全或保证金融资设施的持续可用性),包括但不限于所有或任何证券,权利,款项和任何在该日期之后的任何时间可从任何已抵押证券而衍生,计提或提供的财产;“CGI Group” means China Galaxy International Financial Holdings Limited and its related corporations (as defined in the Companies Act (Cap 50));“CGI集团”指的是中国银河国际金融控股有限公司及其相关公司(如公司法规定(第50章));“CIMB Group” means CIMB Group Sdn. Bhd. and its related corporations (as defined in the Companies Act (Cap 50));“CIMB集团”指的是联昌国集团限公司。
set out general terms andconditions 条款和条件 -回复在这篇1500-2000字的文章中,我们将逐步回答与"条款和条件"相关的问题。
条款和条件是一种协议或合同的一部分,它规定了各方之间的权利和责任。
它们通常在购买商品或使用服务的过程中起着重要的作用。
在以下内容中,我们将介绍什么是条款和条件,为什么它们重要,如何制定和修改它们,以及如何应对违反条款和条件的情况。
一.什么是条款和条件?条款和条件是一份文件,它规定了双方在交易过程中应遵守的规则和规定。
它们通常以书面形式存在,可以在购买合同、网站使用条款、服务协议以及其他商业合同中找到。
条款和条件通常包括以下内容:1.定义和解释:条款和条件中通常会给出一些定义和解释,以确保双方对其中的术语和概念有相同的理解。
2.费用和支付方式:条款和条件通常会规定商品或服务的价格,以及支付方式和期限。
3.权利和责任:条款和条件会明确规定双方的权利和责任,例如商品的所有权归属、服务的提供范围和质量要求等。
4.服务的期限和终止:如果涉及到订阅服务或长期合同,条款和条件会规定服务的期限和条件以及提前终止的方式。
5.违约和争议处理:条款和条件通常会规定违约时的处理措施和争议解决程序,例如适用的法律和法庭管辖等。
二.为什么条款和条件重要?条款和条件在商业交易中起着至关重要的作用,它们有以下几个重要用途:1.确保交易的公平性和透明性:条款和条件可以确保交易的公平性和透明性,明确双方在交易过程中的权利和责任,避免一方对另一方的不公平行为。
2.减少纠纷和争议:条款和条件规定了交易的规则和条件,可以减少纠纷和争议的发生。
当争议发生时,双方可以依照条款和条件中的规定进行解决。
3.保护双方的权益:条款和条件确保双方在交易过程中的权益得到保护,例如商品的质量和服务的提供范围等。
双方可以根据条款和条件中的规定追究对方的责任。
4.提供法律保护:条款和条件通常包含有适用的法律和法律管辖的条款,这样双方在发生争议时可以依法寻求保护。
GENERAL TERMS AND CONDITIONS OF WOOD PULP木浆贸易通用条款1. PREAMBLE导言These General Trade Rules shall apply, except when altered by express agreement accepted in writing by both the seller and the buyer.合同之买卖双方除另有彼此接受的书面协议外,其余皆应以本通用条款为准。
2. QUANTITY: WEIGHT AND MOISTURE数量:重量和水分Unless otherwise stated, the word tonne or ton in this contract shall mean 1,000 kilogrammes air-dry weight gross for net. The term air-dry shall mean ninety per cent (90 %) absolutely dry pulp and ten per cent (10 %) water.除非另有说明,合同中所指之“吨”应为1000千克净空干总重量。
条款“空干”所指应为百分之九十绝干木浆和百分之十的水分。
The pulp shall be packed in bales of declared uniform weight and air-dry content or a specification to be given stating the weight and air-dry content and number of each bale. Each bale shall bear a number or other identification mark to enable the time of manufacture to be determined by the seller in case of need.木浆采用标准打包包装,注明均一重量和空干度。
General Terms一般词汇:manager 经纪人instructor 教练,技术指导guide 领队trainer 助理教练referee, umpire (网球.棒球)裁判linesman, touch judge (橄榄球)裁判contestant, competitor, player 运动员professional 职业运动员amateur 业余运动员,爱好者enthusiast, fan 迷,爱好者favourite 可望取胜者(美作:favorite)outsider 无取胜希望者championship 冠军赛,锦标赛champion 冠军record 纪录record holder 纪录创造者ace 网球赛中的一分Olympic Games, Olympics 奥林匹克运动会Winter Olympics 冬季奥林匹克运动会stadium 运动场track 跑道ring 圈ground, field 场地pitch (足球、橄榄球)场地court 网球场team, side 队竞技性运动competitive sport用粉笔记下(分数等);达到,得到chalk up 出名make one's mark体育项目(尤指重要比赛)event体育PE (Physical Education)体格、体质physique培训groom余的,带零头的odd年少者junior残疾人the handicapped/disabled学龄前儿童preschool全体;普通;一般at large平均寿命life expectancy复兴revitalize使有系统;整理systemize历史悠久的time-honored跳板spring-board秋千swing石弓,弩crossbow(比赛等的)观众spectator取得进展make headway体育大国/强国sporting/sports power与...有关系,加入be affiliated to/with落后lag behind武术martial arts五禽戏five-animal exercises体育运动physical culture and sports增强体质to strengthen one's physique可喜的,令人满意的gratifying称号,绰号label涌现出来to come to the fore源源不断a steady flow of队伍contingent又红又专/思想好,业务精to be both socialist-minded and vocationally proficient 体育界sports circle(s)承担义务to undertake obligation黑马dark horse冷门an unexpected winner; dark horse爆冷门to produce an unexpected winner发展体育运动,增强人民体质Promote physical culture and build up the people's health锻炼身体,保卫祖国Build up a good physique to defend the country为祖国争光to win honors for the motherland胜不骄,败不馁Do not become cocky/be dizzy with success, nor downcast over/discouraged by defeat.体育道德sportsmanship打出水平,打出风格up to one's best level in skill and style of play竞技状态好in good form失常to lose one's usual form比分领先to outscore打成平局to draw/to tie/to play even/to level the score失利to lose中华人民共和国运动委员会(国家体委)Physical Culture and Sports Commission of the PRC (State Physical Culture and Sports Commission)中华全国体育总会All-China Sports Federation国际奥林匹克委员会International Olympic Committee少年业余体育学校youth spare-time sports school, youth amateur athletic school 辅导站coaching center体育中心sports center/complex竞赛信息中心competition information center运动会sports meet; athletic meeting; games全国运动会National Games世界大学生运动会World University Games; Universiade比赛地点competition/sports venue(s)国际比赛international tournament邀请赛invitational/invitational tournament锦标赛championship东道国host country/nation体育场stadium; sports field/ground体育馆gymnasium, gym; indoor stadium比赛场馆competition gymnasiums and stadiums练习场馆training gymnasiums操场playground; sports ground; drill ground体育活动sports/sporting activities体育锻炼physical training体育锻炼标准standard for physical training体育疗法physical exercise therapy; sports therapy广播操setting-up exercises to music课/工间操physical exercises during breaks体育工作者physical culture workers, sports organizer运动爱好者sports fan/enthusiast观众spectator啦啦队cheering-section啦啦队长cheer-leader国家队national team种子队seeded team主队home team客队visiting team教练员coach裁判员referee, umpire裁判长chief referee团体项目team event单项individual event男子项目men's event女子项目women's event冠军champion; gold medalist全能冠军all-round champion亚军running-up; second; silver medalist第三名third; bronze medalist世界纪录保持者world-record holder运动员athlete; sportsman种子选手seeded player; seed优秀选手top-ranking/topnotch athlete田径运动track and field; athletics田赛field events竞赛track events跳高high jump撑杆跳高pole jump; polevault跳远long/broad jump三级跳远hop, step and jump; triple jump标枪javelin throw铅球shot put铁饼discus throw链球hammer throw马拉松赛跑Marathon (race)接力relay race; relay跨栏比赛hurdles; hurdle race竞走walking; walking race体操gymnastics自由体操floor/free exercises技巧运动acrobatic gymnastics垫上运动mat exercises单杠horizontal bar双杠parallel bars高低杠uneven bars; high-low bars吊环rings跳马vaulting horse鞍马pommel horse平衡木balance beam球类运动ball games足球football; soccer足球场field; pitch篮球basketball篮球场basketball court排球volleyball乒乓球table tennis; ping pong乒乓球拍racket; bat羽毛球运动badminton羽毛球shuttlecock; shuttle球拍racket网球tennis棒球baseball垒球softball棒/垒球场baseball(soft ball)field/ground手球handball手球场handball field曲棍球hockey; field hockey冰上运动ice sports冰球运动ice hockey冰球场rink冰球puck; rubber速度滑冰speed skating花样滑冰figure skating冰场skating rink; ice rink人工冰场artificial ice stadium滑雪skiing速度滑雪cross country ski racing高山滑雪alpine skiing水上运动water/acquatic sports水上运动中心aquatic sports center水球(运动)water polo水球场playing pool滑水water-skiing冲浪surfing游泳swimming游泳池swimming pool游泳馆natatorium自由泳freestyle; crawl (stroke)蛙泳breaststroke侧泳sidestroke蝶泳butterfly (stroke)海豚式dolphin stroke/kick蹼泳fin swimming跳水diving跳台跳水platform diving跳板跳水springboard diving赛艇运动rowing滑艇/皮艇canoeing帆船运动yachting; sailing赛龙船dragon-boat racing室内运动indoor sports举重weightlifting重量级heavyweight中量级middleweight轻量级lightweight拳击boxing摔交wresting击剑fencing射击shooting靶场shooting range射箭archery拳术quanshu; barehanded exercise; Chinese boxing气功qigong; breathing exercises自行车运动cycling; cycle racing赛车场(自行车等的)倾斜赛车场cycling track室内自行车赛场indoor velodrome摩托运动motorcycling登山运动mountaineering; mountain-climbing骑术horsemanship赛马场equestrian park国际象棋(international) chess特级大师grandmaster象棋xiangqi; Chinese chess。
戈洛博翻译-英语词汇库英语专题分类词汇(地理Geography)1.General Terms 一般词汇:physical geography自然地理;economic geography经济地理;geopolitics地理政治论;geology地理学;ethnography民族志;cosmography宇宙志;cosmology宇宙论;toponymy 地名学;oceanography海洋学;meteorology气象学;orography山志学;hydroaraphy水文学;vegetation植被;relief地形/地貌;climate气候;earth地球/大地;Universe/cosmos 宇宙;world世界;globe地球仪;earth/globe地壳;continent大陆;terra firma陆地;coast海岸;archipelago群岛;peninsula半岛;island岛;plain平原;valley谷地;meadow(小)草原;prairie(大)草原;lake湖泊;pond池塘;marsh/bog/swamp沼泽;small lake小湖;lagoon泻湖;moor/moorland荒原;desert沙漠;dune沙丘;oasis绿洲;savanna(南美)大草原;virgin forest 原始森林;steppe大草原;tundra冻原。
2.Cartography 地图绘制法:map 地图;map of the world 世界地图;planisphere 平面地形图;wall map/skeleton map 挂图;skeleton map 示意图;map/plan/chart 平面图;atlas 地图集;chart 海图;cadastre 地籍薄;topography 测绘学;photogrammetry 摄影测量学;world political 世界政区;world physical 世界地形;world communication 世界交通;world time zone 世界时区;geomorphy地貌;scale 比例尺;legend 图例;city map 城市图;populated localitied 居民点;capital 首都/首府;capital of first-order political unit 一级行政中心;main city/major city 主要城市;common city 一般城市;village村;town 镇/乡;city boundary市界;boundary境界;continental boundary洲界;international boundary 国界;undefined international boundary 未定国界;regional boundary 地区界;boundary or first-order political unit一级政区界;communication 交通;railroad/railway 铁路;expressway/motorway 高速公路;highway 公路;shipping route 航海线;airport 机场;port 港口;stream system 水系;river 河流;waterfall 瀑布;salt lake咸水湖/盐湖;lake 湖泊;seasonal lake 时令湖;seasonal river 时令河;reservoir 水库;dam水坝;water pipe line 输水管;well 井;spring 泉;hot spring 温泉;swamp 沼泽;desert 沙漠;peak 山峰;volcano 火山;coral reefs 珊瑚礁;historic site 古迹;pass山口/关隘;the great wall 长城;pyramid 金字塔;magnetic pole 地磁极;city blocks城市街区;main street/major street主要街道;secondary street次要街道;important building 重要建筑物;building 独立建筑;college or university 高等院校;national park 国家公园;military installation 军用设施;space centre or scientificresearch centre 航天或科研中心;lighthouse 灯塔;monument 纪念碑;stadium 体育场;gymnasium 体育馆;park公园;bridge 桥梁;green land 绿地;cemetery 墓地。
General Terms & Conditions一般条款和条件1.Interpretation解释― Acceptance ‖ means the process whereby Seller demonstrates to Buyer that the Supplies meet the requirements of the Agreement by successful completion of the Acceptance Tests ;―验收‖是指卖方向通过成功完成验收测试向买方展示供应品符合本协议要求的过程。
“Acceptance Date” means the date on which Buyer agrees that the Supplies have successfully completed the Acceptance Tests as defined in "Acceptance Tests and Forms" Appendix. The Supplies shall be deemed installed and accepted upon successful completion of the Acceptance Tests, as notified in writing by Buyer to Seller, or upon first use of the Supplies by Buyer, whichever occurs first ;―验收日期‖是指买方同意供应品成功完成附件“验收测试和表格”中定义的验收测试的日期。
买方书面通知卖方供应品验收测试成功完成时,或者买方首次使用供应品时,应视作供应品安装完毕并被接受,以先发生者为准。
“Acceptance Tests” are the tests listed in "Acceptance Tests and Forms" Appendix ;―验收测试‖是指附件“验收测试和表格”中列出的测试。
Two Supervised Learning Approaches for Name Disambiguation in Author CitationsHui Han Department of Computer Science and Engineering The Pennsylvania StateUniversity University Park,P A,16802 hhan@Lee GilesSchool of InformationSciences and T echnologyThe Pennsylvania StateUniversityUniversity Park,P A,16802giles@Hongyuan ZhaDepartment of ComputerScience and EngineeringThe Pennsylvania StateUniversityUniversity Park,P A,16802zha@Cheng Li Department of Biostatistics Harvard School of PublicHealthBoston,MA,02115cli@ Kostas Tsioutsiouliklis NEC Laboratories America,Inc.4Independence Way,Princeton,NJ08540kt@ABSTRACTDue to name abbreviations,identical names,name misspellings, and pseudonyms in publications or bibliographies(citations),an author may have multiple names and multiple authors may share the same name.Such name ambiguity affects the performance of document retrieval,web search,database integration,and may cause improper attribution to authors.This paper investigates two supervised learning approaches to disambiguate authors in the ci-tations1.One approach uses the naive Bayes probability model,a generative model;the other uses Support Vector Machines(SVMs) [39]and the vector space representation of citations,a discrimi-native model.Both approaches utilize three types of citation at-tributes:co-author names,the title of the paper,and the title of the journal or proceeding.We illustrate these two approaches on two types of data,one collected from the web,mainly publication lists from homepages,the other collected from the DBLP citation databases.Categories and Subject DescriptorsH.3.3[Information Systems]:Information Search and RetrievalGeneral TermsAlgorithms1“Citations”refer to an author’s publication list in the citation for-mat.Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on thefirst page.To copy otherwise,to republish,to post on servers or to redistribute to lists,requires prior specific permission and/or a fee.JCDL’04,June7–11,2004,Tucson,Arizona,USA.Copyright2004ACM1-58113-832-6/04/0006...$5.00.KeywordsNaive Bayes,Name Disambiguation,Support Vector Machine 1.INTRODUCTIONDue to name variation,identical names,name misspellings,and pseudonyms,we observe two types of name ambiguities in research papers or bibliographies(citations).Thefirst type is that an author has multiple name labels.For example,the author“David S.John-son”may appear in multiple publications under different name ab-breviations such as“David Johnson”,“D.Johnson”,or“D.S.John-son”,or a misspelled name such as“Davad Johnson”.The second type is that multiple authors may share the same name label.For example,“D.Johnson”may refer to“David B.Johnson”from Rice University,“David S.Johnson”from AT&T research lab,or“David E.Johnson”from Utah University(assuming the authors still have these affiliations).Name ambiguity can affect the quality of scientific data gather-ing,can decrease the performance of information retrieval and web search,and can cause the incorrect identification of and credit attri-bution to authors.For example,identical names cause the ambigu-ity of the“author page”in the web DBLP(Digital Bibliography& Library Project)2.The author page of“Yu Chen”in the DBLP con-tains citations from three different people with the same name:Yu Chen from University of California,Los Angeles;Yu Chen from Microsoft Beijing;Yu Chen as the senior professor from Renmin University of China.Such name ambiguity causes the incorrect identification of authors.For example,the author page of“Jia Li”in the DBLP refers to the“Jia Li”from the Department of Statistics at the Pennsylvania State University.However,the“Home Page”link in her author page directs to the professor with the identical name in the Department of Mathematical Sciences at the Univer-sity of Alabama in Huntsville.We observe from CiteSeer[18] the incorrect attribution to the authors due to similar ambiguity.“D.Johnson”is the most cited author in Computer Science accord-2rmatik.Uni-Trier.DE/∼ley/db/index.htmling to CiteSeer’s statistics in May2003(/ mostcited.html).However,the citation number that“D.Johnson”obtained in CiteSeer’s statistics is actually the sum of several dif-ferent authors such as“David B.Johnson”,“David S.Johnson”, and even“Joel T.Johnson”.This paper investigates the name disambiguation in the context of citations.We propose the idea of a canonical name,i.e.a name that is the minimal invariant and complete name entity for disam-biguation.Such a name may have more than just the name of the individual as constituents.A possible example of a canonical name would be a name entity that has all the characteristics of a name in-cluding abbreviations and AKA’s.“Authorized name”is a similar concept in library practice.Getty’s ULAN(Union List of Artist’s Names)[1]and the Library of Congress name authorityfile[2] are good demonstrations of the canonical,or authoritative form of names.Name ambiguity is a special case of the general problem of iden-tity uncertainty,where objects are not labeled with unique identi-fiers[30].Much research has been done to address the identity un-certainty problem in differentfields using different methods,such as record linkage[17],duplicate record detection and elimination [8,26,29],merge/purge[22],data association[6],database hard-ening[11],citation matching[28],name matching[7,37,9],and name authority work in library cataloging practice[40,15,19]. Citation matching,name matching and name authority work are the work most similar to ours.Citation matching and name match-ing are similar to our method in citation context and the choice of citation attributes for computation.However,the records identified by related work are actually duplicate records in different syntactic formats,while what we identify are different records authored by the same name entity.Another difference is that many name match-ing algorithms are string-based[7,37,9].Bilenko et al.show in their work that the string-based similarity computation works better than token-based methods,which may be due to many misspellings in their datasets[7].We expect token-based methods to betterfit the name disambiguation task because the problem can be treated at the token level,and misspellings and abbreviations are not the main source of the citation differences for these au-thority is the process through which librarians for the past centuary have intellectually provided disambiguation for personal and cor-porate names in the world’s bibliographic output.However,much name authority work is conducted manually.DiLauro et.al.[40, 15]propose a semi-automatic algorithm using Bayes probabilities to disambiguate composers and artists in the Levy music collec-tion.However,their algorithm largely depends on the Library of Congress name authorityfile.We study two machine learning approaches for name disam-biguation,one based on a generative model and the other based on a discriminative model.A generative model can create other examples of the data,usually provides good insight into the na-ture of the data and facilitates easy incorporation of domain knowl-edge[4].We observe that an author’s citations usually contain the information of the author’s research area and his or her individ-ual patterns of coauthoring.Therefore,we propose a naive Bayes model,a generative statistical model frequently used in word sense disambiguation tasks[16,38],to capture all authors’writing pat-terns.Discriminative models such as Support Vector Machines, are basically classifiers.Other differences are that the naive Bayes model uses only positive training citations to model an author’s writing patterns,while the SVMs learn from both positive and neg-ative training citations the distinction between different authors’citations.Also,the naive Bayes model classifies a citation to an author based on the probabilities,while the SVMs uses a distance measure[36].In addition,a probability model allows us to sys-tematically combine different models[23],and is easily extensibleto more information;the vector space representation of citations inclassification approaches usually needs to tune weights for differentattributes[7,37].Our approaches assume the existence of a citation database(train-ing data)indexed by the canonical name entities.Such a citation database can be constructed in several ways.For example,con-structing the database based on existing databases such as DBLP;collecting publication lists from researchers’home pages(Usuallythese publication lists are in the citation format);or clustering ci-tations according to the name entities,as shown by our previouswork[21].Given a full citation with the query name implicitly omitted,ourname disambiguation is to predict the most likely canonical namefrom the citation database.For example,“[J.Anderson],S.Baruah,K.Jeffay.Parallel Switching in Connection-Oriented Networks.IEEE Real-Time Systems Symposium1999:200-209”is a test cita-tion.“J.Anderson”is the omitted query name.The naive Bayes ap-proach estimates the author-specific probabilities,such as the priorprobability of each author,and his/her probabilities of coauthor-ing with coauthors,using certain keywords in the title of the paper,and publishing papers in certain places,as described in detail inSection2.Given a new citation and its query author name,namedisambiguation is to search the database and choose the canoni-cal name entry with the highest posterior probability of producingthis citation.The SVM approach considers each author as a class,and classifies a new citation to the closest author class.With theSVM approach,we represent each citation in a vector space;eachcoauthor name and keyword in paper/journal title is a feature of the vector.Both approaches use three attributes of the citations associatedwith each canonical name entry in the citation database:coauthornames,paper titles,and journal titles.By“journal titles”,we ac-tually refer to the titles of all the publication sources,such as pro-ceedings and journals.Author names in citations are represented by thefirst name initial and last name,the minimal name informationseen in citations.Citation attributes can be extracted by methodssuch as regular expression matching,rule-based system[10],hid-den Markov models[33,34,35],or Support Vector Machines[20].To minimize the effect of inaccurate citation parsing on the studyof two approaches,we use regular expression matching and man-ual correction to parse the citations in“J Anderson”and“J Smith”datasets,as discussed in Section4.1.The DBLP citation datasetsare already in the XML format with parsed attributes.The rest of the paper is organized as follows:Section2describesthe naive Bayes approach;Section3describes the SVM approach;Section4reports experiments and results;Section5concludes and discusses future work.2.THE NAIVE BAYES MODELWe assume that each author’s citation data is generated by thenaive Bayes model,and use his/her past citations as the trainingdata to estimate the model parameters.Based on the parameter es-timates,we use the Bayes rule to calculate the probability that eachname entry X i(i∈[1,N],where N is the total number of candi-date name entries in the citation database)would have generatedthe input citation.2.1Model OverviewGiven an input test citation C with the omission of the queryauthor,the target function is tofind a name entry X i in the citation database with the maximal posterior probability of producing thecitation C,i.e.,max i P(X i|C)(1) Using the Bayes rule,the problem becomesfindingmax i P(C|X i)P(X i)/P(C)(2) where P(X i)denotes the prior probability of X i authoring papers,and is estimated from the training data as the proportion of the pa-pers of X i among all the citations.The prior is useful to incorporate the knowledge,such that a prolific author can have large P(X i). P(C)denotes the probability of the citation C and is omitted since it does not depend on X i.Then Function2becomesmax i P(C|X i)P(X i)(3) We assume that coauthors,paper titles,and journal titles are in-dependent citation attributes,and different elements in an attribute type are also independent from each other.The different attribute element here refers to the individual coauthor,the individual key-word in the paper title,and the individual keyword in the journal title.By“keyword”,we mean the remaining words afterfiltering out the stop words(such as,“a”,“the”“of”,etc.).Therefore,we decompose P(C|X i)in Function3asP(C|X i)=Yj P(A j|X i)=YjYkP(A jk|X i)(4)where A j denotes the different type of attribute;that is,A1-the coauthor names;A2-the paper title;A3-the journal title.Each attribute is decomposed into independent elements represented by A jk(k∈[0..K(j)]).K(j)is the total number of elements in at-tribute A j.For example,A1=(A11,A12,...,A1k,...,A1K(1)), where A1k indicates the kth coauthor in C.To avoid underflow,we store log probabilities in our implementation,and the target func-tion becomes:max i P(X i|C)=max i[Xj Xklog(P(A jk))+log(P(X i))](5)where j∈[1,3]and k∈[0,K(j)).The above attribute inde-pendence assumption may not hold for real-world data,since there exist cases such as multiple coauthors always appearing together. However,empirical evidence shows that naive Bayes often per-forms well in spite of such violation.Friedman,Domingos and Paz-zani show that the violation of the word independence assumption sometimes may affect slightly the classification accuracy(Fried-man1997;Domingos and Pazzani1996).2.2Model Parameters and EstimationNext we describe the decomposition and estimation of the coau-thor conditional probability P(A1|X i)from the training citations, where A1=(A11,A12,...,A1k,...,A1K(1)).The probability es-timation is the maximum likelihood estimation for parameters of multinomial distributions.The pseudo count1is added in parame-ter estimation to avoid zero probability in the estimation results.P(A1|X i)is decomposed into the following conditional proba-bilities.•P(N|X i)-the probability of X i writing a future paper alone conditioned on the event of X i,estimated as the proportion of the papers that X i authors alone among all the papers of X i.(N stands for“No coauthor”,and“Co”below stands for “Has coauthor”).•P(Co|X i)-the probability of X i writing a future paper with coauthors conditioned on the event of X i.P(Co|X i)=1−P(N|X i).•P(Seen|Co,X i)-the probability of X i writing a future pa-per with previously seen coauthors conditioned on the eventthat X i writes a future paper with coauthors.We regard thecoauthors coauthoring a paper with X i at least twice in thetraining citations as the“seen coauthors”;the other coau-thors coauthoring a paper with X i only once in the trainingcitations is considered as the“unseen coauthors”.There-fore,we estimate P(Seen|Co,X i)as the proportion of thenumber of times that X i coauthors with“seen coauthors”among the total number of times that X i coauthors with anycoauthor.Note that if X i has n coauthors in a training cita-tion C,we count that X i coauthors n times in citation C.•P(Unseen|Co,X i)-the probability of X i writing a futurepaper with“unseen coauthors”conditioned on the event thatX i writes a paper with coauthors.P(Unseen|Co,X i)=1−P(Seen|Co,X i)•P(A1k|Seen,Co,X i)-the probability of X i writing a fu-ture paper with a particular coauthor A1k conditioned on theevent that X i writes a paper with previously seen coauthors.We estimate it as the proportion of the number of times thatX i coauthors with A1k among the total number of times X icoauthors with any coauthor.•P(A1k|Unseen,Co,X i)-the probability of X i writing afuture paper with a particular coauthor A1k conditioned onthe event that X i writes a paper with unseen coauthors.Con-sidering all the names in the training citations as the popula-tion and assuming that X i has equal probability to coauthorwith an unseen author,we estimate P(A1k|Unseen,Co,X i) as1divided by the total number of author(or coauthor)names in the training citations minus the number of coauthors of X i.However,the small citation size may underestimate the pop-ulation of new coauthors that X i will coauthor with in thereal-world.This may in turn underestimates the probabilityof an author coauthoring with previously seen coauthors.Inthis case we can set a larger population size.•P(A1|X i)=P(N|X i)if K(1)=0•P(A1|X i)=P(A11|X i)...P(A1k|X i)...P(A1K|X i)if K(1)>0,whereP(A1k|X i)=P(A1k,N|X i)+P(A1k,Co|X i)=0+P(A1k,Co|X i)=P(A1k,Seen,Co|X i)+P(A1k,Unseen,Co|X i)=P(A1k|Seen,Co,X i)∗P(Seen|Co,X i)∗P(Co|X i)+P(A1k|Unseen,Co,X i)∗P(Unseen|Co,X i)∗P(Co|X i) The above decomposition is motivated by the following hypothe-ses:(1)Different authors X i have different probabilities of writ-ing papers alone,writing papers with previously seen coauthors or previously unseen coauthors.(2)Each author X i has his/her own list of previously seen coauthors,and a unique probability distri-bution on these previously seen coauthors to write papers with.If the above hypotheses hold,we expect these conditional probabil-ities to capture the coauthoring history and pattern of X i,and to help disambiguate the omitted author from the rest of a citation C. Similarly,we can estimate the conditional probability P(A2|X i) that an author writes a paper title,and the conditional probability P(A3|X i)that he publishes in a particular journal.Taking each title word of the paper and journal as an independent element,we estimate the probabilities that X i uses a certain word for a future paper title,and publishes a future paper in a journal with a particu-lar word in the journal title.Here the goal is to use author-specificprobabilities to capture information such as the researchfield,key-words in the research direction,and the preference of title word usage from past citations of X i.2.3Computational ComplexitySuppose a citation database consists of N canonical authors, where each author has an average of M training citations,and each citation has an average of K attribute elements.The com-putational complexity for training(estimating the probabilities)the above model is O(MNK);the computational complexity for the query step using coauthor information alone is O(NK)for each query citation.This complexity indicates the scalability of our al-gorithm to real-world applications.3.SUPPORT VECTOR MACHINESThis approach considers each author as a class,and trains the classifier for each author class.Given a full citation with the omis-sion of the query name,the goal of name disambiguation is to clas-sify this citation to the closest author class.Each citation is repre-sented by a feature vector,with each coauthor name and keyword in the paper/journal title as a feature and its frequency in the citation as the feature weight.We use the X ∞to normalize the weight of features with different ranges of values,which was shown to improve the classification performance[20].We choose Support Vector Machines[39,12]as classifiers be-cause of their good generalization performance and ability in han-dling high dimensional data.All experiments use SV M light[24].3.1Support Vector Machine Classification andFeature SelectionThe SVM is designed for two class classification problem.Let {( x1,y1),...,( x N,y N)}be a two-class training dataset,with x i a training feature vector and their labels y i⊂(-1,+1).The SVM attempts tofind an optimal separating hyperplane to maximally separate two classes of training data.The corresponding decision function is called a classifier.In the case where the training data is linearly separable,computing an SVM for the data corresponds to minimizing w such thaty i( w· x i+w0)−1≥0,∀i(6) The linear decision function isf( x)=sgn{( w· x)+w0}=sgn{nXiα∗i y i( x i· x)+w∗0}(7)If f( x)>0,the data x belongs to class1;otherwise, x belongs to class2.The absolute value of f( x)indicates the distance of x from the other class.In thefinal decision function f( x),the training samples with non zero coefficientsα∗i lie closest to the hyperplane,and are called support vectors.As Equation7shows, f( x)is a weighted sum of all features,plus a constant term as the threshold.n is the number of support vectors.Zhang et.al [43]propose to rank the features according to their contribution in separating the differences between two classes.We formalize such a contribution of a feature by Expression8,where x ij is the weight of the feature j in support vector i.We use such ranking of features to analyze the classification performance by SVMs(Section4.1).nXiα∗i y i x ij(8)We extend SVMs to multi-class classification using the“One class versus all others”approach,i.e.,one class is positive and the remaining classes are negative.4.EXPERIMENTS4.1Datasets and Experiment DesignWe apply both approaches on two types of data.Thefirst type of data is publication lists collected from the web,mostly from researchers’homepages.This type of data contains two datasets, one from15different“J Anderson”s,shown in Table1,the other from11different“J Smith”s3.Both“J Anderson”and“J Smith”are ambiguous names in the database of our EbizSearch system-a CiteSeer like search engine specializing in the E-Business area[32].We query“Google”using name information such as“J An-derson”,or the full name information available in our EbizSearch databases such as“James Anderson”,and the keyword“publica-tions”.We manually check the returned links,recognize each re-searcher under the samefirst name initial and last name,and collect their publication web pages to construct our datasets.The other type of data are downloaded from the DBLP web-site,which contains more than300,000bibliographic XML citation records with parsed citation attributes.We form the three attributes in each citation as a string.We then cluster author names with the samefirst name initial and the same last name;each name is associ-ated with the citations where the name appears.We sort the formed name datasets by the number of citations in each set.9large name datasets with each having more than10name variations are cho-sen for experiments,as shown in Table7.We observe that many names in the DBLP have complete name information.To avoid te-dious manual checking,we choose from each name dataset the full names that have more thanfive citations,and consider each such name to represent a canonical name entity.We prepare the training/testing datasets,preprocess the data,and construct the citation databases,in the same way for all datasets. We preprocess the datasets on author names,paper title words and journal title words as follows.All the author names in the cita-tions are simplified tofirst name initial and last name.For example,“Yong-Jik Kim”is simplified to“Y Kim”.A reason for the simplifi-cation is that thefirst name initial and last name format is popular in bibliographic records.Since more name information usually helps name entity disambiguation,we think that insufficient name infor-mation from simplified name format would be good for evaluating our algorithm.Moreover,the simplified name format may avoid some cases of name misspellings.We stem the words of paper ti-tles and journal titles using Krovetz’s stemmer[25],and remove the stop words such as“a”,“the”,etc.We also replace the conference or journal title abbreviations by their full names for more informa-tion.The full names of the conference or journal titles are obtained from the DBLP websites4.Each name dataset is randomly split,with half of them used for training,and the other half for testing.For example,the“J Ander-son”dataset contains117citations for training and112citations for testing;the“J Smith”dataset contains172training citations and 166testing citations.A citation database is then constructed for each name dataset,based on the parsed and pre-processed train-ing citations.For example,the citation database of“J Anderson”contains15canonical name entries for15different“J Anderson”s, with each name entry associated with available identity informa-tion,such as full name,affiliation,research area,as well as authored citations.With each approach,we conduct10experiments with randomly split dataset for each experiment.In each experiment,we explore 3/users/h/x/hxh190/projects/name pro ject.htm4rmatik.uni-trier.de/∼ley/db/conf/indexa.html and rmatik.uni-trier.de/∼ley/db/journals/index.htmlJ Anderson Full name Affiliation Research area Training size Test size 1James Nicholas Anderson UK Edinburgh Communication interface research442James E.Anderson Boston College Economics773James A.Anderson Brown Univ.Neural network214James B.Anderson Penn.State Univ.Chemistry335James B.Anderson Univ.of Toronto Biologist11106James B.Anderson Univ.of Florida Entomology987James H.Anderson U.of North Carolina at Chapel Hill Computer processors27278James H.Anderson Stanford Univ.Robot229James D.Anderson Univ.of Toronto Dentistry3210James P.Anderson N/A Computer Security2111James M.Anderson N/A Pathology3212James Anderson UK Robot vision and philosophy91013James W.Anderson Univ.of KY Medicine5514Jim Anderson Univ.of Southampton Mathematician101015Jim V.Anderson Virginia Tech Univ.Plant pathology2020 Table1:The citation dataset of15“J Anderson”s.Column2,3&4shows the available“Identification information”of a“J Anderson”,e.g.,the full name of each“J Anderson”,his or her affiliation and research area.“Training/test size”lists the number of citations used for training/testing.For space limitation,we do not list here the web sites where we download the citations.Scheme Coauthor Paper title Journal title Hybrid I Hybrid IIApproach Bayes SVM Bayes SVM Bayes SVM Bayes SVM BayesMean71.3%64.4%77.9%82.9%72.1%74.4%91.3%95.6%93.5%StdDev 2.1% 3.8% 3.3% 1.9% 2.1% 3.0% 1.6% 1.7% 1.8%P Value 1.38E-050.0030.0120.0003Table2:The mean and the standard deviation(StdDev)of the10name disambiguation accuracy trials on the“J Anderson”dataset, with both the naive Bayes approach(Bayes)and the SVM approach(SVM);and the statistical significance(two tail P value)of the performance difference by the two approaches.multiple schemes based on different combinations of the utilized citation attributes.The motivation is to study the contributions of different citation attributes on name disambiguation.Both ap-proaches use three schemes which use alone one citation attribute, and at least one of two“Hybrid”schemes which combine aspects of all three attributes.In the naive Bayes model approach,“Hybrid I”computes the equal joint probability of different attributes.In the SVM approach,“Hybrid I”combines different attributes in the same feature space.The“Hybrid II”scheme is specific to the naive Bayes model and uses the coauthor attribute alone when a coauthor relationship exists between a coauthor in the test citation and a can-didate name entry in the citation database;otherwise,“Hybrid II”uses the equal joint probability of all the three attributes.Flexibility of manipulating attributes is an advantage of using the probability model.The absence of a particular attribute can be handled by omitting the corresponding probabilities.“Hybrid II”is motivated by the experimental observation that with the“J Anderson”dataset, adding title words decreases the number of disambiguated names when using only the co-author information.We observe that the coauthor information is valuable for name disambiguation,and de-sign the“Hybrid II”scheme to preserve the names disambiguated by using coauthor information alone.We evaluate the experiment performance by“accuracy”,and de-fine the“accuracy”as the percentage of the query names correctly predicted.The next section shows experiment results and analysis on the all name datasets.4.2Name Disambiguation on the First Type ofDataTable2shows the mean and the standard deviation(StdDev)of the10name disambiguation accuracy trials on the“J Anderson”name dataset,using both approaches.Table3shows the similar trials on the“J Smith”name dataset.The experiment results on these two name datasets are similar,most likely due to the two name datasets having similar probability distributions,since most citations in both datasets are derived from labeled homepages.We analyze the experiment results in detail as follows:(1)Different attributes have different contributions for name disambiguationConsider the“J Anderson”dataset as an example.Table2shows that using paper title words alone achieves higher average accu-racy(77.9%,82.9%)than using either coauthor(71.3%,64.4%)or journal information alone(72.1%,74.4%)with both approaches. Table4shows in detail one experiment using the naive Bayes ap-proach;all other9experiments show similar results.We observe that authors in this dataset have higher probabilities of reusing title words than collaborating with previously seen coauthors.Table4 shows an example of the probability distribution of each attribute. For example,Row4in Column2&3(with header“Seen”)shows that92.0%((86+17)out of112)test citations reuse the words in paper titles;Row5in Column2&3shows that84.8%((79+16)out of112)test citations reuse words in journal titles;and Row3in Column2&3shows that only57.1%(64out of112)test citations have the previously seen coauthor relationship.The above probability distribution indicates that authors in this dataset tend to use the same words for multiple papers,probably because multiple papers are about the same project.And the au-thors in some research areas such as Biology or Plant pathology tend to have a few places they prefer to submit papers.For exam-ple,J.Anderson15(Jim V.Anderson;J.Anderson15refers to the 15th table entry)publishes37.5%(15out of40)of his papers in the same journal“Plant physiology”.Such consistent information con-tained in the journal title helps name entity disambiguation more than the paper title words,especially when the name entities to be disambiguated have diverse research areas.(2)Bayes model better captures the coauthoring patterns of an author than the SVM approachTable2and Table3show that the naive Bayes model(71.3%, 75.2%average accuracy)outperforms the SVM approach(64.4%, 60.0%average accuracy)when using coauthor information alone in。
Clustering E-Commerce Search EnginesQian Peng, Weiyi Meng, Hai He Department of Computer ScienceSUNY at BinghamtonBinghamton, NY 13902, USA meng@Clement Yu Department of Computer Science University of Illinois at Chicago Chicago, IL 60607, USAyu@ABSTRACTIn this paper, we sketch a method for clustering e-commerce search engines by the type of products/services they sell. This method utilizes the special features of interface pages of such search engines. We also provide an analysis of different types of ESE interface pages.Categories & Subject DescriptorsH.3.3 [Information Storage and Retrieval]: Information Search and Retrieval – Clustering; H.3.5: Online Information Services – Commercial Services, Web-based Services.General TermsAlgorithms, Performance, Design, Experimentation. KeywordsSearch engine categorization, document clustering1. INTRODUCTIONA large number of databases are Web accessible through form-based search interfaces and many of these Web sources are E-commerce Search Engines (ESEs). Providing a unified access to multiple ESEs selling similar products is of great importance in allowing users to search and compare products from multiple sites with ease. To enable a unified access to multiple ESEs, we need to collect ESEs from the Web and cluster them into different groups such that the ESEs in the same group sell the same type of products (i.e., in the same domain) so that a comparison-shopping site can be built on top of the ESEs in each group. In this paper, we provide an overview of a technique to cluster ESEs into different groups according to their product domains.ESE clustering is related to search engine categorization [1, 2]. The methods described in [1, 2] are query-submission based and they consider document search engines, not ESEs. In contrast, our method uses only the Web pages that contain the search forms of the considered ESEs to perform clustering (no queries are submitted to these ESEs). ESE clustering is also related to document/Web page clustering. However, existing approaches for the latter do not utilize the special features (such as search forms) that only ESE search interfaces have.In this paper, we sketch a new method for clustering ESEs and the method utilizes the special features of ESE interface pages. We also provide an analysis of different types of ESE interface pages.2. DIFFERENT TYPES OF INTERFACES While most ESEs are dedicated to one category of products or services, a significant fraction of ESEs cover multiple categories of products. Based on our analysis of 270 ESE sites, we identified the following 6 types of ESE interfaces: 1. Devoted Type.In such an interface, only one ESE searchform exists and it searches only one category of products.About 83% of the ESE sites we studied belong to this type. 2. Divided Type. Such an interface can search multiplecategories of products but it also has separate child ESE search interfaces for different category of products. In this study, the child ESEs of the same site are treated as separate ESE interfaces for clustering purpose.3. Co-existing Type.In this case, multiple ESEs coexist on asingle Web page. For example, the interface page has4 search forms that can search flight, car reservation, travel package and hotels separately. In our approach, search forms on the same interface page are first separated and then treated as separate ESE interfaces during clustering.4. Merged Type. In this case, a single search form can searchmultiple categories of products and the search fields of different types of products co-exist in the search form.Moreover, only one submission button is available. For example, sells books, music and movies, and its search form contains book attributes, music attributes and movie attribute. In our approach, such attributes are separated into different logical interfaces such that each logical interface covers only one category of products.5. Shared Type. In this case, the ESE site has multiple childpages for different types of products but these pages share the same search form (i.e., the same search form is available on all child interface pages) that can search multiple categories of products. Note that each child page contains more information about a specific category of products. In our approach, we treat each such child page as a separate ESE for clustering purpose.6. Multi-page Type.An ESE of this type requires a user tosubmit a sequence of pages to complete a query. This occurs, for example, when a user needs to obtain an insurance quote.As our clustering approach does not submit queries, for an ESE of the multi-page type, only the first page of the ESE is used for clustering.3. ESE INTERFACE FEATURESOur method utilizes the following features on ESE interfaces.1. The number of links/images. Our observation and analysis ofthe 270 ESE sites indicate that this feature is very useful for differentiating ESEs that sell tangible products from those selling intangible products. Tangible products are those that have a physical presence such as books and music CDs while intangible products have no physical presence such as insurance policy. The interfaces of ESEs that sell tangible products usually have more images/links than those that sell intangible products.2. Price values. For online shopping customers, priceinformation is very important for their purchase decisions.Therefore, to attract consumers, bargaining products or top-selling products are frequently advertised with prices on theCopyright is held by the author/owner(s).WWW 2004, May 17–22, 2004, New York, New York, USA.ACM 1-58113-912-8/04/0005.416interface pages of ESEs that sell tangible products. Products of the same category usually have similar prices. As such, the range of price values can be useful to differentiate different types of products. To facilitate meaningful comparison, we convert each actual price to a representative price that will then be used for comparison. Each representative price represents all prices within a certain price range.3. Form terms. A typical ESE search form has an interfaceschema, which contains some labels (descriptive text) and a number of form control elements such as text boxes, selection lists, radio buttons or checkboxes to allow users to specify complex queries rather than just keywords. The labels usually specify the real meaning of its associated elements. These attributes and their values can strongly indicate the contents of the ESE. Therefore, they are very useful for ESE clustering. In our approach, form terms include both labels and values associated with form elements.4. Regular terms. These are the terms that appear in an ESEWeb page but are not price terms or form terms.To simplify notations in the following sections, for a given ESE interface, we use N to represent the total number of images and hyperlinks, P to represent the vector of price terms with weights, FT to represent the vector of weighted form terms and RT to represent the vector of weighted regular terms. The terms in each category (P, FT, or RT) are organized into a term vector with weights computed using the standard tf*idf formula [3]. For a given term, formula is an increasing function of the term’s frequency in the term category on a page and a decreasing function of the term’s document frequency (i.e., the number of ESE interfaces that have the term).4. CLUSTERING ALGORITHMOur clustering algorithm consists of two phases. In the first phase, all ESEs are clustered into two groups, one for selling tangible products and the other for selling intangible products. This is done by comparing the number of images/links (N) in an ESE with a threshold T. If N ≥T, the ESE is placed into the tangible group; otherwise, it is placed into the intangible group. In the second phase, ESEs in each group are further clustered using other features (i.e., P, FT and RT for the tangible group, and FT and RT for the intangible group). There is no fundamental difference between clustering the ESEs in the tangible group and those in the intangible group except that the latter does not have the price vector P. In both cases, the basic idea is to use features to cluster similar ESEs together based on their feature similarity. The similarity between two ESEs is defined to be the weighted sum of the similarities between the price vectors (for the tangible group only), the form term vectors and the regular term vectors of the two ESEs. In our current implementation, the similarity between two vectors is computed using the Cosine similarity function [3]. Below, we describe the clustering algorithm for the second phase in more detail. This phase itself consists of two steps. Preliminary Clustering step: In this step, a simple single-pass clustering algorithm is applied to obtain a preliminary clustering of the ESEs in each group. The basic idea of this algorithm is as follows. First, starting from an arbitrary order of all the input ESEs, take the first ESE from the list and use it to form a cluster. Next, for each of the remaining ESEs, say A, compute its similarity with each existing cluster. Let C be the cluster that has the maximum similarity with A. If sim(A, C) is greater than a threshold, which is to be determined experimentally, then add A to C; otherwise, form a new cluster based on A. Function sim(A, C) is defined to be the average of the similarities between A and all ESEs in C. This single-pass algorithm is efficient as it considers each input ESE once. However, it is order sensitive and t is possible that an ESE is not included in the most suitable cluster due to the order it is considered.Refining step: This step tries to remedy the weakness of the previous step. The idea is to move potentially unfitting ESEs from their current clusters to more suitable ones. This is carried out as follows. First, we compute the average similarity AS of each cluster C, which is the average of the similarities between all ESE pairs in cluster C. Second, identify every ESE A in C whose similarity(A, C) is less than AS. These ESEs are considered to be potentially unfitting. Third, for each ESE A obtained in the second step, we compute its similarities with all current clusters (including the cluster that contains A) and then move it to the cluster with the highest similarity. The above refining process is repeated until there is no improvement (increase) on the sum of the similarities of all clusters.5. EXPERIMENTS270 ESE interface pages from 8 categories in Yahoo directory “Business and Economy → Shopping and services” are obtained, 7 categories contain ESEs selling tangible product while one selling intangible products. After child/logical ESE interfaces are identified, 294 ESEs are obtained. Before evaluation, all ESEs are manually grouped based on what products they sell. Clusters obtained by the manual clustering are deemed correct and are used as the basis to evaluate our clustering algorithm. The following criteria are used to measure the performance of clustering: precision: the ratio of the number of ESEs that are correctly clustered over the number of all ESEs; recall: for a given cluster, recall is the ratio of the number of ESEs that are correctly clustered over the number of ESEs that should have been clustered, and the overall recall for all clusters is the average of the recalls for all clusters weighted by the size of each cluster.As discussed in Section 4, our clustering algorithm has two phases. In Phase 1, ESEs are clustered into two big groups: the tangible group and the intangible group based on the number of images/links on each ESE interface. The recall and prevision of the Phase 1 clustering when T is 30 are all above 95%.For the Phase 2 clustering, the performance is as follows: precision is 94% and recall is 93%. Our experiments indicate that form terms are critically important for accurately clustering ESEs. When form terms are not used, both recall and precision decrease by about 35%.6. ACKNOWLEDGEMENTSThis work is supported in part by the grants IIS-0208574, IIS-0208434 from the National Science Foundation.7. REFERENCES[1] P. Ipeirotis, L. Gravano, M. Sahami. Probe, Count andClassify: Categorizing Hidden-web Databases. ACMSIGMOD Conference, May 2001, Santa Barbara, California.[2] W. Meng, W. Wang, H. Sun, C. Yu. Concept HierarchyBased Text Database Categorization. Journal of Knowledgeand Information Systems, 4(2), 2002, 132-150.[3] G. Salton and M. McGill. Introduction to ModernInformation Retrieval. McGraw-Hill, 1983.417。
Developing Real-World Programming Assignments for CS1Daniel E. Stevenson and Paul J. WagnerDepartment of Computer ScienceUniversity of Wisconsin-Eau ClaireEau Claire, WI 54701{stevende, wagnerpj}@}ABSTRACTInstructors have struggled to generate good programming assignments for the CS1 course. In attempting to deal with this issue ourselves, we have generated two real-world programming assignments that can be solved by most students yet generate challenges for advanced students. We present our overall criteria for a quality programming assignment in CS1, details of the two example assignments, and other issues stemming from the generation and management of these assignments. Categories and Subject DescriptorsK.3.2 [Computers & Education]: Computer & Information Science Education – Computer Science Education.General TermsAlgorithms, DesignKeywordsCS1, Programming Assignments1INTRODUCTIONWe, and many of our fellow instructors, have found it difficult to construct quality programming assignments in CS1. The students are at an awkward point, where they have the basic tools of data types and perhaps control structures, but are not yet able to develop their own classes. They now need to learn to work with built-in objects, and master topics along the way such as strings, external files, and containers such as ArrayLists or arrays.The problem we see is that many example programming assignments from textbooks and historical usage do not satisfy the dual goals of allowing the students to practice concepts and helping them develop skills that scale up for the larger work to follow in CS2 and subsequent courses. Many of these example assignments have one or more of the following characteristics:-they are simplified, relating to one aspect of a larger problem(e.g. sorting an array of words, with no context of how thesorted array will be used)-they focus on only one particular current programming topic(e.g. strings) -they do not relate well to the real world of software development / computer science (e.g. they focus on abstract issues such as sorting an array of strings)-they leave little room for creativity and innovation, which at least one recent article has described as important [1]We have worked to develop assignments for CS1 that attempt to address the above shortcomings, and satisfy our criteria for quality programming assignments in a CS1 course. Section 2 presents our criteria for a quality, real-world programming assignment in a CS1 course. Sections 3 and 4 contain two examples of assignments that we use in CS1 that we have developed based on these criteria. Section 5 discusses advanced string parsing issues which we use to build on the basic assignments. Section 6 contains several additional issues that we have found important in constructing and managing these assignments.2CRITERIA FOR GOODPROGRAMMING ASSIGNMENTS Others have looked at the issue of developing good assignments for CS1 courses; e.g. see [2, 6]. Building on past ideas and updating them for a more advanced and complex world, we have developed the following eight criteria which we consider in developing a programming assignment for our CS1 course. A good programming assignment in CS1 should:-be based on a real-world problem. Toy projects with toy data are less motivating to the students, and convey a false sense of what the world of computer science contains.-allow the students to generate a realistic solution to that problem. They should be able to generate a system which generates useful results, even if it might not be entirely realistic in its interface or content.-allow the students to focus on current topic(s) from class within the context of larger program. The students still need to practice on particular topics (e.g. strings, or external files), but we want them to do that within a realistic wrapper.-be challenging to the students. We find that our students rise to the challenge and move forward faster when they have to work to and beyond their assumed boundaries.-be interesting to the students. A practical, current and interesting issue drives them to work harder on the problem, where an overly-theoretical or simplified project tends to decrease their effort and their learning.-make use of one or more existing application programming interfaces (API). This allows them to gain practice in using specific objects, as well as learning that much of their futurePermission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.ITiCSE’06, June 26-28, 2006, Bologna, Italy.Copyright 2006 ACM 1-59593-259-3/06/0003...$5.00software development work will consist of making use of other APIs rather than developing their own raw code.-have multiple levels of challenge and achievement, thus supporting possible refactoring. The problem should have a reasonably straightforward basic solution that all students are capable of solving. It should also provide additional challenges that cause the student to solve additional problems and possibly have to refactor their original solution if it is found to be lacking.-allow for some creativity and innovation. The opportunity should be given for some creative or innovative addition or alternate solution to the problem.3EXAMPLE – WEB CRAWLERA web crawler (aka robot or spider) is a program that automatically traverses the web’s hypertext structure by retrieving a document, and then recursively retrieves all documents that are referenced. Web crawlers can be used for a number of purposes, including indexing, HTML validation, link validation, “what’s new” monitoring, and mirroring. We have actually released two version of the web crawler assignment. One version uses the web crawler to perform link validation (that is, looking for broken links) and the other acts as a mirroring program that copies an entire web site to a local directory. Both versions are quite similar in their basic structure.3.1ASSIGNMENT SPECIFICATIONIn either version, a basic web crawler needs to be built. For succinctness we will refer to the link validation version in the following description. A web crawler by definition is recursive since we never know how deep our web sites will be. However, given this is a relatively early assignment for CS1, recursion has obviously not been covered yet nor do we intend to cover it until CS2. Thus, we show the students how to build an iterative version of the web crawler that uses a queue structure (FIFO processing) to perform a breadth first search of the web site. In particular, they will need to create two structures. The first needs to keep track of which URLs the web crawler still needs to look at, which we call the “to-do” list. The other needs to keep track of the URLs they have already processed since HTML links can refer back to previous HTML pages and we don’t want to get caught in circular loops. This is called the “processed” list. The to-do list needs to be a queue. In the past we have simply used a Vector and treated it as a queue. However, in Java 1.5 there is now a Queue interface so we now teach them to use the proper data structure for the job. The processed list can simply be a List. Once these structures have been set up, the students simply need to follow the following algorithm:•For each HTML page that the crawler need to look at:o Get the HTML contents of the page from acrossthe networko Parse the HTML contents to find all the links onthe pageo For each found link:Determine if the link is broken, recordingthat information in the broken link reportDetermine if the link is one we need torecursively look at In order to get the contents of the HTML page from across the network the students have to learn how to use the Java networking classes, in particular the URL class. At this point in the semester the students have already learned how to open up a BufferedReader to a file in order to read/write its contents. Using the URL class is very similar:URL aURL = new URL(“http://foo/default.htm”); URLConnection con = aURL.openConnection(); InputStream is = con.getInputStream(); InputStreamReader isr =new InputStreamReader(is); BufferedReader reader = new BufferedReader(isr);They can use the BufferedReader just like they did with file IO, thus driving home the fact that IO in Java is just about streams even if the other end is not directly connected to a local file. This will prove a useful concept to know in CS2 when we talk extensively about sockets and distributed computing. As a starting point, we teach them to simply read in the entire file line by line and concatenate them together to form a String that represents the entire HTML page. There are some variations on this based on XHTML specs and more advanced parsers, but those extensions will be discussed later.The next step is to identify all the links on the page that was just read. Again, there are several more advanced parsing options that one can use (discussed later), but as a simple starting step we teach the students to use a combination of StringTokenizers and substring searching to identify the anchor tags. A StringTokenizer can easily be configured to hand back tokens that match HTML tags in the document. Thus, the job simply becomes searching for which tokens are anchor tags. Once an anchor tag is identified then substring searching can be used on that tag (which is now a string by itself) to parse out the destination of the href attribute.The next stage is to determine if the link is broken. We classify broken links into 3 categories:1.Malformed link2.Well-formed link, but points to a server that doesn’texist3.Well-formed link and points to a valid server, but therequested file doesn’t exist.The first two are fairly easy to handle as they simply involve handling exceptions (MalformedURL and IO respectively) at the different points in connecting. However, the 404 error (#3 in our list) is harder since an actual web page will be returned. This page consists of HTML tags that state that you just tried to access a nonexistent page. Each web server has its own way of generating this page, so you can’t count on uniformity. Thus, the students need to drop down to the HTTP protocol level that is doing the actual transfer of the HTML page and check the response code that is returned. While this is not hard to accomplish it does generate a good discussion of protocols and the layers of abstraction that are placed on top of the low-level protocols.Links that are marked as broken are then recorded in an external file for later viewing by an administrator. Links that are not broken are then put on the to-do list as long as they don’t already exist on the processed list. We also add the restriction that linksto different domains not be followed as this utility is generally only run on a single domain. The processing continues until the “to-do” list becomes empty.3.2ASSIGNMENT DISCUSSIONDoing the parsing using low-level string manipulation has pros and cons. On the good side it really drives home how to manipulate strings since the parsing problem is certainly not trivial. However, it can become messy when given a real web site. For example, links can be placed in more than just anchor tags (JavaScript, framed pages, etc.), not all anchor tags represent links, within an anchor tag there can be lots of variation in the formation of the href attribute (e.g. spacing, quotes), etc.To minimize these issues and keep the focus on basic string parsing, we provide a test web site for which the students to run their code. This site has very well defined HTML and makes the above parsing issues non-existent. The test site is also quite useful as a benchmark for testing their code against since it is a known quantity. In real-world assignments, this is a very important feature as it can help you guide your students through their debugging much easier since you know the fixed answer for the test site.Additionally, it provides another benefit in our opinion. After they have their crawler working on the basic test site, they need to try it on some real sites. The students very quickly realize that things are not as structured and neat in the real world and their crawlers quickly break. This seems to come as a shock to many of them since they failed to realize the variation that exists in the real-world and had not coded their project to be able to handle it. However, just noting that their code doesn’t work in these more complex situations is not enough. We make them rework parts of their code so it can handle some of the more advanced variation in real sites. The choice as to what they handle and how they modify their code is up to them, but they need to make it handle at least two additional problems found in real sites if they want full credit. They need to write up a small document that describes the issue that caused the problem and what they did in their code to handle it.We have found that this sort of structured test problem to real-world transition is a very powerful technique to help them learn about code robustness and error checking – things that are often left untouched in basic CS1 programs. Also the fact that they correct at least two problems of their choice gives them a sense of creativity that they may not experience normally.Overall this assignment was a big success from our point of view in terms of what they learned and the quality of their solutions. But it was also successful in terms of student interest, as they found it quite fun to build a real tool that they could run on actual sites. They really liked telling us when one of the CS department web pages had a broken link!4EXAMPLE – SPAM EVALUATORA spam evaluator is a program that processes a mail folder to determine the probability that each message within the folder is unwanted bulk email, or spam. A spam evaluator would normally be used as part of a larger system (a spam filter) which would actually move messages to another Junk Mail folder. In order to keep the assignment manageable in a CS1 context and avoid issues with possible loss of moved email messages, we have focused on the spam evaluator portion of the overall spam filter.4.1ASSIGNMENT SPECIFICATIONThe basic algorithm for a spam evaluator is straightforward: •Connect to the mail store•Open the mail folder and get the messages•For each message in the foldero Determine the likelihood that this message is spam There are many methods that spam filters use to determine if a particular message is spam; for an example, see [7]. Most use some sort of probabilistic combination of the presence and/or absence of certain spam indicators. Fundamental indicators often include, but are not limited to, the following:-the inclusion of certain keywords (in the subject header and/or the message body), such as the word “Viagra”.-the original source IP address or name matching a “blacklisted” IP address or name, that is normally keptin separate file or other data structure.-the inclusion of other significant features in the message, such as multiple images or a large image.-the presence of spam avoidance techniques, such as inserting a period character between words, using digitsor special characters to substitute for certain letters (e.g.using ‘!’ for the letter ‘I’.While the number and type of indicators and the probabilistic formulas for the likelihood of a message being spam can be quite complex, we have found that a simplified combination of these features can still be quite realistic and accurate in evaluating mail messages for their spam content.The above indicators rely heavily on string matching, and several of the indicators also involve working with external files (e.g. a keyword file, a blacklisted site file, and possibly an output file for the results). This makes this assignment fit naturally at the point when the String class and external files (including classes like BufferedReader and BufferedWriter).To access a mail store and mail folder in Java, the students need to understand and use the Java Mail API [3]. We first gave this assignment using JavaMail 1.2, which uses a relatively straightforward set of mail API calls to connect to a mail store, open a given folder, and to pull out a message from within a folder. A stripped-down version of the basic code is below: Properties props = System.getProperties();Session session = Session.getDefaultInstance(props, null); Store store = session.getStore("imap");store.connect(server, user, pass);folder = store.getDefaultFolder();folder = folder.getFolder(mailbox);folder.open(Folder.READ_ONLY);Message [ ] msgs = folder.getMessages();We have now moved to using JavaMail 1.3, which requires a slightly more sophisticated model of providers. With eitherversion of JavaMail, we have given basic explanations of the API and given examples of how to open a store and folder and display one or more messages, and this is easily understood by most students. At a minimum, they should be able to read the text of each message line by line and concatenate these text lines into one large String that represents the entire message body. This body can then be analyzed for spam content.The next step is analyzing the mail headers. We show them that the various mail headers (e.g. From, Subject, To, Date, Received, Content-Type) are basically name/value pairs, with the name being the type and the value being the actual content. There may be multiple headers with the same name; e.g. the set of Received headers shows the path of the message across the Internet by indicating the host chain that is used in transmission of the message. We again explain how to get a single or a group of headers as an Enumeration, iterate through the names and values, and build a Java String containing the header content.Now that the students can get the header or body content, the next step is basically doing matching between possible content (e.g. keywords, blacklist addresses) and actual content. While most students start by manually manipulating and comparing Strings using substring searching, this also provides us with an opportunity to talk about more sophisticated pattern matching, as we discuss below.4.2ASSIGNMENT DISCUSSIONAt this point, the students should be able to process and analyze well-formed messages, and we provide a test email account (set up by our system staff, and populated by the instructor) with a set of both good and spam email messages. The students can also analyze their own Inbox, adding to the realism of the assignment. However, after the assignment has been released for a certain amount of time and most students have the basic functionality, we provide the final set of mail messages, which contains variations and issues that may cause their spam evaluator program to fail. These issues include:-empty subject (generally creating a null pointer exception in their program, unless they have checked tomake sure the Subject header they pull off of themessage is not null)-message with attachments (which leads to a discussion of message parts)-message with empty body (e.g. just with an attachment) -messages with a different content type than plain text(e.g. text/html)The students are required to solve as many of these problems as possible in order to maximize their grade. Most are solved fairly quickly, demanding some additional information and reasoning beyond the base case.This assignment is also a popular one with the students. Not only do they enjoy learning how real spam filters work, but they also like being able to access their own email through a program they wrote. 5ADVANCED STRING PARSINGIn both of the example assignments a significant amount of String and HTML parsing was needed. The entry level versions of these assignments simply use StringTokenizers and substring searching to accomplish these tasks. These parsing and string manipulation techniques are good basic techniques to know, but the students soon see that they have their limitations. This opens the door for discussing more complex parsing and searching techniques that, while on the surface seem to require more setup and understanding of their associated APIs, will pay off in a big way in the more complex situations. It has been our experience that students often don’t see the need to use “over the top” parsing if it is introduced too early. They need some motivation to want to use it. These assignments also provide a nice backdrop for doing refactorings using these advanced concepts later in the course or in following courses. The following sections list some of the advanced techniques that can be used.5.1SPLIT AND REGULAR EXPRESSIONSThe Java API does declare the StringTokenizer class to be maintained for backward compatibility at this point, and recommends the use of the String class split method and the java.util.regexp package. We still talk about StringTokenizer first to introduce the general concept of Iterators, but now show them the comments regarding StringTokenizer being at least unofficially deprecated and then show them the newer split functionality. The Java 1.5 Scanner class provides another option. The more powerful tool for manipulating String data is the java.util.regexp package. While this is a complex topic, we still have introduced the package by example, in order to provide tools for the advanced students and for future, more complex work. For example, split can be used to pull apart the pieces of a numeric internet address, and a regular expression can be used to see if each piece has a valid format (one to three digits – more work must be done to see if these numbers are valid).String ip = "137.28.109.29";String [] pieces = ip.split("\\."); // escape the period; not wildcard Pattern p = pile("\\d{1,3}"); // 1 to 3 digitsfor (int i = 0; i < pieces.length; i++) {if (p.matcher(pieces[i]).matches()) {System.out.println("valid form - IP number piece"); } }5.2HTML PARSING VIA SWINGEspecially when trying to find links in an HTML document, one would like to have an automatic parser. Java provides one as part of Swing [5]. One simply needs to create a parser and set up the callbacks for the tags of interest. However, while this may be simple in concept, there are a couple of catches to using it early in CS1. First it is difficult to obtain an HTMLEditorKit.Parser class in the first place. One needs to create a new class that overrides a constructor. The same is true for the callbacks with respect to having to override certain methods.While “magic lines” of code can certainly be given to the students to get them started, the idea of extending and overriding is generally handled later in the semester and thus may be to conceptually abstract for them to handle at this stage. However,this does make a nice follow-on assignment or lab later in the semester. They already have done, for example, the link validating web crawler with simple String manipulation and seen some of the issues as the data got complex. Thus, they have the motivation necessary for trying a refactoring with something more complex like this once they have learned how to implement inheritance and override methods.5.3XMLAnother advanced option for HTML parsing is to use one of the XML parsing technologies such as DOM [4]. This, of course, assumes that the HTML you are processing is XHTML. While this generally can’t be something that is assumed on the web, one can make a simple test site that does meet this specification. This will also vastly simplify the reading in of the document as well, basically reducing it to a single line of code. And the XML DOM classes have powerful searching mechanisms build into them so finding the tags you are interested is simplified as well. However, in order to use DOM one needs to understand the tree abstraction of an XML document. Again, while this is not a complex concept, it can be a bit abstract for the start of CS1 in our opinion. We usually save this information for our CS2 class. But at that point, a refactoring of an assignment such as the web crawler using this new technology makes a great assignment.6REAL-WORLD ASSIGNMENT ISSUES We are repeatedly challenged by several concerns in developing real-world assignments, and address three of them here.First, it is more important than usual for the instructor to develop a solution in advance for larger real-world assignments. It is too easy for time-pressed instructors to come up with a good idea and write it up as an assignment before solving it themselves. However, with a larger, real-world assignment, we are more likely to run up against real-world complexities, including subtle features in the APIs being used or interactions of the used components. As an instructor, developing a solution first avoids problems that may frustrate the students and potentially interfere with completion by the given deadline.Second, a related benefit of pre-developing the solution is that it gives us as instructors the opportunity to improve the program’s design through analysis and refactoring. Students at this level can certainly not be expected to design their own solutions, and may make poor decisions if allowed to do so. We try to provide a good design based on our own solution, and basically can then remove sections of our code in areas relevant to the topics which we are trying to have them practice. This also allows us as instructors to gauge the size and complexity of the work assigned. Third, in real-world assignments there is a delicate balance that needs to be struck between using toy data and real-world data. Our approach is to start with simple test data to get the students to a place where they understand the basics of the problem and have generally working code before moving on to the real-world data. This leads to more students getting at least a basic working project, which in turn builds their confidence. However, it still educates them as to how messy the real world is when they see their code break on what is fairly simple real-world data. It also introduces them to the world of refactoring which is a powerful software engineering technique. Finally, it fosters a sense of creativity by allowing them to try and solve whichever complex problems they are most motivated by in order to get their code to work on some advanced form of real-world data.7SUMMARYAt the start of CS1 one needs to focus on string manipulation, arrays and lists, and using built-in objects. It can be difficult to come up with interesting real-world assignments that focus here. We feel that both the web crawler and the spam evaluator cover the criteria we set forth in defining a good CS1 assignment. They are real-world problems that are relevant and interesting to students, as they understand the need for spam filtering and for web crawling since they see them in use every day. The way we have constructed the assignments allows the focus to be on those basic string manipulation and array/List processing topics presented in class, without making it seem as if the assignment is just on these topics. New APIs must be used (e.g. JavaMail, networking) in order to complete the assignment, giving them valuable experience in learning how to use existing code rather than always generating their own code. Having the students create a base solution to a small set of test data followed by moving to real-world data allows most students to complete the basic solution while the real-world parts add challenge and creativity to entertain even the best students. Finally, the resulting programs can actually be used in the real world to find broken links or to filter their email. This gives the students a strong sense of accomplishment as they have produced something real.We encourage others to use variations on these assignments (materials available upon request) as well as develop new assignments based on the principles we have outlined.REFERENCES1.Denning, Peter J., and McGettrick, Andrew, “RecenteringComputer Science”, Communications of the ACM, Vol. 48, No. 11, November 2005, pp. 15-19.2.Feldman, Todd J. and Zelenski, Julie D., “The Quest forExcellence in Designing CS1/CS2 Assignments”,Proceedings of the 27th SIGCSE Technical Sympoisum onComputer Science Education, 1996, pp. 319-323.3.Java Mail API, /products/javamail4.Java XML Tutorial,/xml/tutorial_intro.html5.Parsing HTML with Swing,/articles/article.asp?p=31059 &seqNum=16.Reed, David, “The Use of Ill-Defined Problems ForDeveloping Problem-Solving and Empirical Skills in CS1”,Journal of Computing in Small Colleges, Vol. 18, No. 1,October 2002, pp. 121-133.7.Robinson, Gary, “A Statistical Approach to the SpamProblem”, Linux Journal online,/article/6467。
Relevance and Reinforcement in Interactive BrowsingAnton LeuskiCenter for Intelligent Information RetrievalDepartment of Computer ScienceUniversity of MassachusettsAmherst,MA01003USAleuski@ABSTRACTWe consider the problem of browsing the top ranked por-tion of the documents returned by an information retrieval system.We describe an interactive relevance feedback agent that analyzes the inter-document similarities and can help the user to locate the interesting information quickly.We show how such an agent can be designed and improved by us-ing neural networks and reinforcement learning.We demon-strate that its performance significantly exceeds the perfor-mance of the traditional relevance feedback approach. Categories and Subject DescriptorsH.3.3[Information storage and retrieval]:Information Search and Retrieval—Relevance feedback,Selection process;H.5.m[Information Interfaces and Presentation]:Mis-cellaneous;I.2.6[Artificial Intelligence]:Learning—Con-nectionism and neural netsGeneral TermsExperimentation,performance,algorithms1.INTRODUCTIONHelping the user to locate interesting information among the retrieved material is almost as important as the retrieval it-self.An information retrieval system responds to the user’s query with a large set of documents and leaves the user tofind the relevant information among these documents by herself.Generally,the user is provided with some assis-tance in form of a document organization subsystem.For example,one of the most popular approaches is to order the retrieved document by their likelihood of being relevant to the query–place them in the ranked list[14].Other alterna-tives include clustering and different document visualization techniques[15,16].In our previous work[21]we have considered a browsing in-terface for an information retrieval system that integrates the traditional ranked list with clustering visualization of documents.The visualization presents the documents as spheresfloating in2-or3-dimensional space and positions them in proportion to the inter-document similarity.If two documents are very similar to each other,the correspond-ing spheres will be closely located and the spheres that are positioned far apart indicate a very different document con-tent.Thus the visualization provides additional and very important information about the content of the retrieved set:while the ranked list shows how similar the documents are to the original query,the clustering visualization high-lights how the documents relate to each other.We studied the combination of the ranked list and clustering visualization by running intensive off-line simulations and a user study[20,21].We showed that the inter-document similarities can be accurately visualized for retrieval pur-poses.We observed that users navigating the combination can do significantly better than by following the ranked list alone.However,it was also obvious that the system having access to the exact values of the inter-document similarity can provide a helpful assistance to the users.In this paper we present an extension to that study.We design a document selection procedure that can guide the user through the retrieved set helping her to locate the rel-evant information as quickly as possible.We assume that the user is ready to provide the system with her relevance judgments about the documents as she is examining them. The document selection procedure can be implemented as a“wizard”that comes up right after the documents are re-trieved[22].The wizard estimates the relevance values for each unexamined document and highlights the document’s screen representations–both the sphere in the visualization part and the title in the ranked list–with a color shade of intensity proportional to the relevance estimation.The user can see where the most likely to be relevant documents are located in the ranked list and in the visualization.As the user examines the documents and marks them as relevant or not,the wizard reevaluates its estimations and changes the highlighting accordingly.This interaction is very similar to the traditional relevance feedback approach.However,in contrast to the relevance feedback we concentrate on analy-sis of documents that are already present in the retrieved set. We do not attempt to improve the original query,repeat the search process,and bring in new documents.In addition, our analysis focuses only on the inter-document similarity information obtained after the original retrieval session and ignores all term-level statistics.We assume that any estima-Permission to make digital or hard copies of part or all of this work or personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee.CIKM 2000, McLean, VA USA© ACM 2000 1-58113-320-0/00/11 . . .$5.00tions the wizard assigns based on the similarity information will be easily explainable to the user by our proximity-based visualization system.The wizard keeps both the ranked list and the visualization unchanged during the whole session adjusting only the color highlighting.We believe,that re-ordering the ranked list or making other structural changes to the presentation will have a disorienting effect on the users.The wizard points the user to the most likely rele-vant documents without forcing its choice onto her.In this paper we design such a wizard by modeling the inter-action process between the user and the document set–we consider an agent that is examining the document set,select-ing unknown documents,and adjusting its behavior based on whether the selected document was relevant or not.We show that the relevance feedback methodology can be ef-fectively applied in this setting.We formulate the feedback problem in terms of reinforcement learning and show how the agent effectiveness can be significantly improved after a modest amount of training.We evaluate the agent per-formance by assuming that the user always follows its sug-gestions and run the experiments on the standard TREC collections.In the following sections we summarize the past research in relevance feedback and neural networks in information re-trieval.We define interactive browsing as a reinforcement learning problem and consider three different agent repre-sentations.We describe our experiments and discuss results. We conclude with suggestions for future work.2.RELATED WORKRelevance feedback has been a major research focus of infor-mation retrieval for well over30years and has been shown to dramatically improve retrieval performance[29].The gen-eral model is to examine a portion of the retrieved docu-ments assigning relevance values to each of them.The doc-uments’content are analyzed to“shift”the query closer to-wards the examined relevant documents and away from the examined non-relevant documents.This process is usually done in batches–the documents in the collection are bro-ken into training and testing sub-collections,the queries are modified with the information from the training collection and evaluated on the test collection.Aalbersberg[1]explored the notion of incremental relevance feedback,where the documents are considered one at a time and the query is modified after each document.Allan[2] conducted an intensive study of the incremental approach showing results as good as if the feedback occurred in one pass.The most common model for implementing relevance feed-back was outlined by Rocchio[27].The terms are ranked based on the weighted sum of the terms weights in the old query,the known relevant,and known non-relevant doc-uments.The new query is constructed with several top ranked terms:w i(Q )=α·w i(Q)+β·w i(N)where w i(N)are the average weights of the i th term in the relevant and non-relevant documents,w i(Q)and w i(Q )are the weights of the same term in the old and new queries.Parametersα,β,andγ–called Rocchio coefficients –control the relative impact of each component.Generally, these parameters are selected empirically and the best10-20terms are added to the query[4].Buckley and Salton[9] suggested an approach called Dynamic Feedback Optimiza-tion(DFO)where the coefficients are learned by greedy ex-ploration of the parameter space.They also demonstrated that increasing the number of expansion terms can improve performance.Biron and Kraft[8]give a detailed overview of alternative methods for relevance feedback including the connectionist approaches.Neural networks found successful application in both retrieval and relevance feedback[6,34,19].The net-work is constructed connecting the documents,terms,and query.The connection weights are initialized using the sta-tistical information about the term usage and adjusted as the relevance judgments become available.In our study we consider only the similarity information between the docu-ments and ignore the data about the individual terms. Text classification and categorization is where the neural networks are trained to assign class labels to the documents based on the terms they contain[25,30].Informationfil-tering also benefits from the neural network classification abilities.Here they are used to select relevant material from document streaming through the system.Sch¨u tze et al.[31] compared neural network-based classifiers with other clas-sification techniques.Lewis et al.[23]and later Callan[10] studied different learning methods for the neural networks. Hull et al.[17]examined performance of combined methods where the decisions of several different classifiers including neural network were considered together.The growth of the world wide web resulted in the devel-opment of autonomous agents that use the hyperlink envi-ronment to perform distributed search tasks on behalf of the user.Quite often a neural network forms a core part of such an agent.Menczer and Belew[24]studied the fea-tures of the web structure that can be most useful for the agents.The Wisconsin Adaptive Web Assistant[32]com-piles its search instructions into a neural network allowing for adjusting of the search strategies as the training infor-mation become available.Joachims et al.[18]developed a system that performs look-ahead searches and provide the suggestions to the user on the base of reinforcement learn-ing.Balabanovic[5]studied the agents that make use of preference information collected from multiple users.Choi and Yoo[11]considered how the multiple agents accepting user’s feedback,can interact with each other and learn from their common experience.These systems make extensive use of the hyperlink structure and term level information.They slowly adapt to the user and her requests while performing the search task.We are looking at navigating unstructured text documents,training the agent off-line,and keep it un-changed during the actual search session.3.SEARCH STRATEGYThe problem of navigating the retrieved document set can be naturally expressed as a reinforcement learning problem. Indeed,we have an agent operating in an environment–our document set.The agent interacts with the document set at discrete time steps.At each time step t the environmentstate is defined by the inter-document similarities,what doc-uments were examined,and what relevance judgments were assigned.The agent receives some representation of the en-vironment state(D t)and has to select an action–choose the next document d to examine.At the next time step the agent receives a numerical reward from the environment –whether the examined document is relevant or not.The agent has a specific goal–finding all relevant documents as quickly as possible.The agent implementation is defined at each time step by a mapping between a state representation combined with an action and a numeric value:F(D t,d).The agent computes the mapping for each unexamined document and selects d with the highest value of F(D t,d).The process continues until all documents are examined.We call this interaction process the search strategy and call the agent implementa-tion F(D t,d)the search strategy function.Reinforcement learning methods specify how the search strategy function can be changed as a result of the interaction experience with the goal of maximizing the total reward.In this paper we focus on a particular reinforcement learning algorithm called Temporal Difference(TD)learning[33].A detailed description of the algorithm is the beyond the scope of this paper and can be found elsewhere[33,p.212].Here we provide the update rule that modifies the search strategy function at each learning step and controls the outcome of the learning process:∆ θt=η·(r t+1+ρ·Fθ(D t+1,d t+1)−Fθ(D t,d t))·∇θFθ(D t,d t)where t is the time step when we have to select a document, D t is the environment state at the time step t,r t+1is the reward obtained after the document d t is examined, θis the parameter vector that define the search strategy functionF θ(D t,d t),ηis the learning rate that controls the speed andstability of the learning process,andρis the discount factor that controls how much the reward at time t+1is discounted by comparing to reward at time t.There are three important questions that we have to yet to consider:the goal of the search strategy,the reward func-tion,and the document set state representation:•How do we measure if a search strategy was quick enough in discovering the relevant documents?The outcome of a search strategy is a new document rank-ing as opposed to the original ranked list.Two rank-ings can be compared using traditional information re-trieval measure.In this study we use the average pre-cision.Thus our problem is to define a search strategy that will maximize the expected average precision.•The search strategy strongly depends upon the reward function r.In this study our reward function r t+1=1 if the examined document d t was relevant and0oth-erwise–we reward the search strategy for discovering the relevant documents.However,the total reward will not depend upon when the relevant documents are examined–i.e.,the total reward will be the same if all relevant documents examined at the beginning ofthe search or at the end.The discount factorρis a part of the reward function that decreases the rewardif it was obtained at a later time step.Ifρ<1,the search strategy will prefer the relevant documents to occur towards the beginning of the search.At the same time it can allow for some exploration of the document set if the information collected during the exploration will help tofind the relevant documents in one quick sweep.Ifρ=0,the search strategy will interested in only immediate rewards and ignore any chance for exploration.•In the next three sections we consider the question of selecting good features to represent the documentset state and define three different forms for Fθ(D,d).Our analysis is based on the ideas from the Rocchio feedback approach,where the document ranking is ad-justed based on the information from the old query and examined documents.3.1Simple RocchioThefirst design follows directly from the Rocchio approach. We define the search strategy function F1(D,d)as a single perceptron unit that has four inputs:bias or constant input, document similarity to the query,average similarity between the document and all examined relevant documents,and average similarity between the document and all examined non-relevant documents:F1(D t,d)=θ0+θ1·querysim(d)+θ2·1|N t|∀x∈N tsim(x,d)where querysim(d)is the similarity between the document and the original query,sim(x,d)is the similarity between two documents,R t and N t are the set of all examined rele-vant and non-relevant documents at time step t.3.2Application CoefficientsOur second design results from an heuristic observation that different parts in the Rocchio-based representation should have different weights depending on how many documents were examined.For example,at the beginning of the search, while few relevant documents are known,one should rely more on the similarity to the original query than further into the process when the document relevance information is plentiful.We accommodate this intuition by dividing the search process into three distinct phases and training a sep-arate search strategy function for each phase.Specifically, we define our second search strategy function F2(D,d)as a linear combination of three instances of the search strategy function from the previous section(F1(D t,d)),where the coefficients–they are called application coefficients in ma-chine learning–are smooth functions of the number of the examined documents[7]:F2(D t,d)=iA i(|R t|+|N t|)·F1i(D t,d)where A i(·)is defined as following:A i(x)=exp−(x−µi)2t f+0.5+1.5doclendocf)Table1:Average precision and percent improvement over the baseline(in parentheses)for the topfifty retrieved documents.The search strategies are given a starting point for exploration:the relevance judg-ments for the top ranked relevant document and all non-relevant documents that precede it in the ranked list.Precision is computed for the unexamined portion of the retrieved set.Asterisks indicate statistical significance by two-tailed t-test with p<0.05.Data Set Best RF(+100t)F1(D,d)F2(D,d)F3(D,d)14.6939.8646.55(5.38*%)51.00(15.44*%)TREC-5Desc88.0050.6053.00(4.74%)10.5738.0841.96(1.38%)43.99(6.27*%)Title+Desc64.0038.2643.76(14.37*%)Full90.0054.7061.84(13.06*%)16.1056.0460.52(0.22%)63.29(4.80*%)Title84.0054.7458.36(6.61*%)17.7448.6855.38(10.59*%)57.03(13.90*%)14.7446.0351.65(4.79*%)54.26(10.08*%) Table2:Average precision and percent improvement over the baseline(in parentheses)for the top hundred retrieved documents.The search strategies are given a starting point for exploration:the relevance judg-ments for the top ranked relevant document and all non-relevant documents that precede it in the ranked list.Precision is computed for the unexamined portion of the retrieved set.Asterisks indicate statistical significance by two-tailed t-test with p<0.05.Data Set bestFull90.0045.81(5.32%)50.59(16.31*%) TREC-5Desc88.0048.37(6.64*%)48.92(7.84*%)Title84.0042.03(6.45*%)44.00(11.42*%)Title+Desc70.0037.82(10.42*%)40.80(19.12*%)14.7551.4860.01(16.58*%)10.9857.3064.41(12.41*%)10.7852.4355.20(5.28%)12.4351.0755.79(9.24*%)total average88.2550.11(6.94*%)52.75(12.58*%)overall.We then evaluated that function on the four data sets retrieved from another collection.We trained the search strategies by running them on the training set multiple times.Each search strategy function began with all parameters( θ)initialized to zero.We have experimented with different values for the TD-algorithm pa-rameters.The learning rateη=0.1and discount factor ρ=0.4worked well in our experiments.The parameters for application coefficients(A(·))were defined asµ=1,25,50 andσ=6.The tile coding representation used256tilings, each tile side was equal to one third of the corresponding feature range.The learning process was terminated at the point when the average precision failed to improve for sev-eral iterations.4.4BaselineWe have used the Inquery relevance feedback subsystem as our baseline:starting with the ranked list we follow it until thefirst relevant document is found.At that point all ex-amined documents are analyzed and a new query is created by expanding the old query with several top ranked terms from the examined documents.The rest of the unexamined documents are re-ordered using the modified query and the process continues until all documents are examined.5.RESULTSTables1,2,and3show the average precision numbers ob-tained for our baseline and three different search strate-gies and the percent improvement over the baseline.The first two columns in each table show the average precision numbers for two hypothetical search strategies:one that al-ways examines all non-relevant documents before any rele-vant ones(“worst”)and the other that considers all relevant documents before all non-relevant(“best”).These numbers provide a scale for the performance results.In the earlier study we considered the same baseline rel-evance feedback algorithm each time expanding the query with the top10highest ranked terms[21].In our latest ex-periments we observed that adding more terms to the query increases performance–Table1shows a significant increase in average precision while expanding the query with top100 terms instead of top10.Increasing the number of terms beyond that amount(i.e.,adding500terms)degrades the performance.Thus we use the expansion algorithm with100 terms as our baseline.Table1shows a significant improvement in the average pre-cision for all search strategies over the baseline.The perfor-mance increases as more information is added to the docu-ment set representation.Thefirst search strategy learns a better set of Rocchio coefficients and exhibits a5%improve-ment over the baseline.The second strategy allows these coefficient to change as the browsing process progresses and achieves a9%improvement.The third search strategy that uses tiling to represent its function demonstrates a10%im-provement.Table4:Rocchio coefficients from three different sources.System Relevant(θ2)Inquery[3]40.25-1F1(D,d)2Table3:Average precision and percent improvement over the baseline(in parentheses)for the topfifty retrieved documents.The search strategies start without any relevance information:precision is computed for the whole data set.Asterisks indicate statistical significance by two-tailed t-test with p<0.05.Data Set bestFull94.0046.06(8.18*%)48.19(13.17*%) TREC-5Desc90.0044.65(0.77%)47.27(6.68*%)Title84.0042.01(7.31*%)43.37(10.77*%)Title+Desc68.0036.21(4.00%)40.32(15.80*%)20.9254.6662.44(14.24*%)17.1558.5960.09(2.56%)16.3557.2759.19(3.34%)18.5555.1860.09(8.89*%)total average88.7550.17(3.83*%)52.56(8.77*%)with the training data set.We have experimented with an idea of allowing the agent to learn while running it on the evaluation set.Specifically,at each time step during the evaluation phase the agent was allowed to learn for several iterations using only the examined subset of the data.Af-ter each query the agent parameters were restored to their initial(learned on the training set)values.We observed that this learning did not have any effect on average:we saw an increase in average precision for some queries and a degradation for others.There are several key points that made the reinforcement learning and TD-learning in particular a good choice for our problem.Firstly,the search strategy is learned from experience by interacting with the training examples.That interaction has an intuitive connection to how a user would interact with the system.Secondly,the exhaustive knowledge about the problem space is not required–the TD-learning method allows the search strategy to be learned from samples.Several theoretical re-sults guarantee convergence of the learning process to an op-timal search strategy–the strategy that maximizes the ex-pected total reward.These results are based on the assump-tion that the problem satisfies the Markov property[33]. We did not study if and by how much the problem in our formulation violates the Markov property.However,our ex-perimental results show that convergence does occur and we leave the question about the Markov property for future work.Lastly,unlike supervised learning,the reinforcement learn-ing does not require knowledge about the perfect search strategy.The learning process is directed by means of pro-viding rewards and punishments for each choice made by the search strategy.Such a framework is veryflexible–differ-ent reward schemes will result in the search strategies max-imizing different utility functions.It also requires a careful consideration while selecting a particular reward function. An intuitive choice of using the relevance value as the re-ward number requires from the search strategy to maximize “total discounted relevance”(TDR):T DR(ρ)=∀d t∈Rρtwhile we evaluate the performance using the average preci-sion(P):P=1jR t|+|N t|where R is the subset of all relevant documents in the data set and d t is the document examined at time step t.Notice that|R t|+|N t|=t.Despite that these are two different measures,the resulting search strategies exhibit high values of average precision in our experiments.That can probably be explained by a high correlation between T DR(ρ)and P for low values ofρ.An alternative would be to use P as the reward at the end of the search(r|D|+1)with all intermediate rewards set to zero. Then the search strategy should learn to maximize the aver-age precision.However,we have observed that using such a reward function results in a very slow learning process and a very similar search strategy.Currently we are experiment-ing with other different forms for the reward function and evaluation measure.AcknowledgmentsI am deeply grateful to James Allan for his support and contributions to the project.I also thank Andrew Barto and Michael Kositsky for their advice and comments about the reinforcement learning part of this work.This material is based on work supported in part by the National Science Foundation,Library of Congress and De-partment of Commerce under cooperative agreement num-ber EEC-9209623,and SPAWARSYSCEN-SD grant number N66001-99-1-8912.Any opinions,findings and conclusions or recommendations expressed in this material are the au-thors’and do not necessarily reflect those of the sponsors.8.REFERENCES[1]I.J.Aalbersberg.Incremental relevance feedback.InProceedings of ACM SIGIR,pages11–22,1992.[2]J.Allan.Incremental relevance feedback forinformationfiltering.In Proceedings of ACM SIGIR,pages270–278,1996.[3]J.Allan,J.Callan,B.Croft,L.Ballesteros,J.Broglio,J.Xu,and H.Shu.Inquery at TREC-5.In Fifth Text REtrieval Conference(TREC-5),pages119–132,1997.[4]J.Allan,J.Callan,W.B.Croft,L.Ballesteros,D.Byrd,R.Swan,and J.Xu.Inquery does battlewith TREC-6.In Sixth Text REtrieval Conference(TREC-6),pages169–206,1998.[5]M.Balabanovic.An adaptive web pagerecommendation service.In Proceedings of the FirstInternational Conference on Autonomous Agents,pages378–385,1997.[6]R.K.Belew.Adaptive information retrieval:using aconnectionist representation to retrieve and learnabout documents.In Proceedings of ACM SIGIR,pages11–20,1989.[7]H.Berliner.On the construction of evaluationfunctions for large domains.In Proceedings of theSixth International Joint Conference on ArtificialIntelligence,1979.[8]P.V.Biron and D.H.Kraft.New methods forrelevance feedback:improving information retrievalperformance.In ACM symposium on Appliedcomputing,pages482–487,1995.[9]C.Buckley and G.Salton.Optimization of relevancefeedback weights.In Proceedings of ACM SIGIR,pages351–357,1995.[10]J.P.Callan.Learning whilefiltering documents.InProceedings of ACM SIGIR,pages224–231,1998. [11]Y.S.Choi and S.I.Yoo.Multi-agent learningapproach to www information retrieval using neuralnetwork.In Proceedings of International Conferenceon Intelligent User Interfaces,pages23–30,1999.[12]CMAC./robots/cmac.htm.[13]D.Harman and E.Voorhees,editors.The Fifth TextREtrieval Conference(TREC-5).NIST,1997.[14]D.Harman and E.Voorhees,editors.The Sixth TextREtrieval Conference(TREC-6).NIST,1998.[15]M.A.Hearst and J.O.Pedersen.Reexamining thecluster hypothesis:Scatter/gather on retrieval results.In Proceedings of ACM SIGIR,pages76–84,Aug.1996.[16]M.Hemmje,C.Kunkel,and A.Willet.LyberWorld-a visualization user interface supporting fulltextretrieval.In Proceedings of ACM SIGIR,pages254–259,July1994.[17]D.A.Hull,J.O.Pedersen,and H.Sch¨u tze.Methodcombination for documentfiltering.In Proceedings ofACM SIGIR,pages279–287,1996.[18]T.Joachims,D.Freitag,and T.Mitchell.Webwatcher:A tour guide for the world wide web.In Proceedings ofIJCAI,pages770–778,1997.[19]K.L.Kwok.A network approach to probabilisticinformation retrieval.ACM Transactions onInformation Systems,13(3):324–353,1995.[20]A.Leuski and J.Allan.Evaluating a visual navigationsystem for a digital library.International Journal onDigital Libraries,2000.Forthcoming.[21]A.Leuski and J.Allan.Improving interactive retrievalby combining ranked lists and clustering.InProceedings of RIAO’2000,pages665–681,April2000.[22]A.Leuski and J.Allan.Lighthoise:showing the wayto relevant information.In Proceedings ofInfoVis’2000,October2000.[23]D.D.Lewis,R.E.Schapire,J.P.Callan,andR.Papka.Training algorithms for linear textclassifiers.In Proceedings of ACM SIGIR,pages298–306,1996.[24]F.Menczer and R.K.Belew.Adaptive informationagents in distributed textual environments.InProceedings of the second international conference onAutonomous agents,pages157–164,1998.[25]H.T.Ng,W.B.Goh,and K.L.Low.Featureselection,perception learning,and a usability casestudy for text categorization.In Proceedings of ACMSIGIR,pages67–73,1997.[26]S.E.Robertson,S.Walker,S.Jones,M.M.Hancock-Beaulieu,and M.Gatford.Okapi at TREC-3.In D.Harman and E.Voorhees,editors,Third TextREtrieval Conference(TREC-3).NIST,1995.[27]J.J.Rocchio,Jr.Relevance feedback in informationretrieval.In G.Salton,editor,The SMART RetrievalSystem:Experiments in Automatic DocumentProcessing,pages313–323.Prentice-Hall,Inc.,1971. [28]G.Salton.Automatic Text Processing.Addison-Wesley,1989.[29]G.Salton and C.Buckley.Improving retrievalperformance by relevance feedback.Journal of theAmerican Society for Information Science,41:288–297,1990.[30]R.E.Schapire,Y.Singer,and A.Singhal.Boostingand rocchio applied to textfiltering.In Proceedings of ACM SIGIR,pages215–223,1998.[31]H.Sch¨u tze,D.A.Hull,and J.O.Pedersen.Acomparison of classifiers and document representations for the routing problem.In Proceedings of ACMSIGIR,pages229–237,1995.[32]J.Shavlik,S.Calcari,T.Eliassi-Rad,and J.Solock.An instructable,adaptive interface for discovering and monitoring information on the world-wide web.InProceedings of the1999international conference onIntelligent user interfaces,pages157–160,1999. [33]R.S.Sutton and A.G.Barto.Reinforcement learning:an introduction.The MIT Press,1998.[34]R.Wilkinson and ing the cosinemeasure in a neural network for document retrieval.In Proceedings of ACM SIGIR,pages202–210,1991. [35]J.Xu and W.B.Croft.Querying expansion usinglocal and global document analysis.In Proceedings ofACM SIGIR,pages4–11,1996.。
Towards a Formal Framework for the Search of a Consensus Between Autonomous AgentsLeila AmgoudIRIT-CNRS 118,route de Narbonne T oulouse,Franceamgoud@irit.frSihem BelabbesIRIT-CNRS118,route de NarbonneT oulouse,Francebelabbes@irit.frHenri PradeIRIT-CNRS118,route de NarbonneT oulouse,Franceprade@irit.frABSTRACTThis paper aims at proposing a general formal framework for di-alogue between autonomous agents which are looking for a com-mon agreement about a collective choice.The proposed setting has three main components:the agents,their reasoning capabil-ities,and a protocol.The agents are supposed to maintain beliefs about the environment and the other agents,together with their own goals.The beliefs are more or less certain and the goals may not have equal priority.These agents are supposed to be able to make decisions,to revise their beliefs and to support their points of view by arguments.A general protocol is also proposed.It governs the high-level behaviour of interacting agents.Particularly,it specifies the legal moves in the dialogue.Properties of the framework are studied.This setting is illustrated on an example involving three agents discussing the place and date of their next meeting.Categories and Subject DescriptorsI.2.3[Deduction and Theorem Proving]:Nonmonotonic reason-ing and belief revision;I.2.11[Distributed Artificial Intelligence]:Intelligent agentsGeneral TermsHuman Factors,TheoryKeywordsArgumentation,Negotiation1.INTRODUCTIONRoughly speaking,negotiation is a process aiming atfinding some compromise or consensus between two or several agents about some matters of collective agreement,such as pricing products,al-locating ressources,or choosing candidates.Negotiation models have been proposed for the design of systems able to bargain in an optimal way with other agents for,e.g.,buying or selling products in e-commerce[6].Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on thefirst page.To copy otherwise,to republish,to post on servers or to redistribute to lists,requires prior specific permission and/or a fee.AAMAS’05,July25-29,2005,Utrecht,Netherlands.Copyright2005ACM1-59593-094-9/05/0007...$5.00.Different approaches to automated negotiation have been inves-tigated[11],including game-theoretic approaches(which usually assume complete information and unlimited computation capabili-ties),heuristic-based approaches which try to cope with these limi-tations,and argumentation-based approaches[3,1,10,8,7]which emphasize the importance of exchanging information and expla-nations between negotiating agents in order to mutually influence their behaviors(e.g.an agent may concede a goal having a small priority).Indeed,the twofirst types of settings do not allow for the addition of information or for exchanging opinions about of-fers.Integrating argumentation theory in negotiation provides a good means for supplying additional information and also helps agents to convince each other by adequate arguments during a ne-gotiation dialogue.In the present work,we consider agents having knowledge about the environment graded in certainty levels and preferences expressed under the form of more or less important goals.Their reasoning model will be based on an argumentative decision framework,as the one proposed in[5]in order to help agents making decisions about what to say during the dialogue,and to support their behav-ior by founded reasons,namely“safe arguments”.We will focus on negotiation dialogues where autonomous agents try tofind a joint compromise about a collective choice that will satisfy at least all their most important goals,according to their most certain pieces of knowledge.The aim of this paper is to propose a general and formal frame-work for handling such negotiation dialogues.A protocol specify-ing rules of interaction between agents is proposed.As the agents negotiate about a set of offers in order to choose the best one from their common point of view,it is assumed that the protocol is run, at most,as many times as there are offers.Indeed,each run of the protocol consists of the discussion of an offer by the agents.If that offer is accepted by all the agents,then the negotiation ends suc-cessfully.Otherwise,if at least one agent rejects it strongly and doesn’t revise its beliefs in the light of new information,the current offer is(at least temporarily)eliminated and a new one is discussed. We take an example to illustrate our proposed framework.It con-sists of three human agents trying to set a date and a place for orga-nizing their next meeting.Thus the offers allow for multiple com-ponents(date and place).For simplicity reasons,we consider them as combined offers so that if an agent has a reason to refuse an ele-ment of a given offer,it refuses the whole offer.One of the agents starts the dialogue by proposing an offer which can be accepted or rejected.The negotiation goes on until a consensus is found,or stops if it is impossible to satisfy all the most important goals of the agents at the same time.The remainder of this paper is organized as follows:in section2wedefine the mental states of the agents representing their beliefs and goals.In section3we present the argumentative decision frame-work capturing their reasoning capabilities.Section4describes a protocol for multi-agent negotiation dialogues.Section5illustrates the argued-decision based approach on an example dealing with the choice of a place and a date to organize a meeting.Section6con-ludes the paper and outlines some possible future work.2.MENTAL STATES AND THEIR DYNAM-ICSAs said before,it is supposed that the mental states of each agent are represented by bases modeling beliefs and goals graded in termsof certainty and of importance respectively.Following[4,12],each agent is equipped with(2n)bases,where n is the number of agents taking part to the negotiation.Let L be a propositional language and W ff(L)the set of well-formed formulas built from L.Each agent a i has the following bases:K i={(k i p,ρi p),p=1,s k}where k i p∈W ff(L),is a knowledge base gathering the information the agent has about the envi-ronment.The beliefs can be less or more certain.They areassociated with certainty levelsρi p.G i={(g i q,λi q),q=1,s g}where g i q∈W ff(L),is a base ofgoals to pursue.These can have different priority degrees,represented byλi q.GO i j={(go i r,j,γi r,j),r=1,s go(j)},where j=i,go i r,j∈W ff(L), are(n−1)bases containing what the agent a i believes thegoals of the other agents a j are.Each of these goals is sup-posed to have a priority levelγi r,j.KO i j={(ko i t,j,δi t,j),t=1,s ko(j)}where j=i,ko i t,j∈W ff(L), are(n−1)bases containing what the agent a i believes theknowledge of the other agents a j are.Each of these beliefshas a certainty levelδi t,j.This latter base is useful only if the agents intend to simulate the reasoning of the other agents.In negotiation dialogues where agents are trying tofind a common agreement,it is more important for each agent to consider the beliefs that it has on the other agents’goals rather than those on their knowledge.Indeed,a common agreement can be more easily reached if the agents check that their offers may be consistent with what they believe are the goals of the others.Soin what follows,we will omit the use of the bases KO i j.The different certainty levels and priority degrees are assumed to belong to a unique linearly ordered scale T with maximal element denoted by1(corresponding to total certainty and full priority)anda minimal element denoted by0corresponding to the complete ab-sence of certainty or priority.m will denote the order-reversing map of the scale.In particular,m(0)=1and m(1)=0.We shall denote by K∗and G∗the corresponding sets of classical propositions when weights are ignored.3.ARGUED DECISIONSRecently,Amgoud and Prade[5]have proposed a formal frame-work for making decisions under uncertainty on the bases of argu-ments that can be built in favor or against a possible choice.Such an approach has two obvious merits.First,decisions can be more eas-ily explained.Moreover,argumentation-based decision is maybe closer to the way humans make decisions than approaches requiring explicit utility functions and uncertainty distributions.Decisions for an agent are computed from stratified knowledge and prefer-ence bases in the sense of Section2.This approach distinguishes between a pessimistic attitude,which focuses on the existence of strong arguments that support a decision,and an optimistic one, which concentrates on the absence of strong arguments against a considered choice.This approach can be related to the estimation of qualitative pessimistic and optimistic expected utility measures. Indeed,such measures can be obtained from a qualitative plausi-bility distribution and a qualitative preference profile that can be associated with a stratified knowledge base and with a stratified set of goals[5].In this paper,we only use the syntactic counterpart of these se-mantical computations in terms of distribution and profile(which has been proved to be equivalent for selecting best decisions),un-der its argumentative form.This syntactic approach is now recalled and illustrated on an example.The idea is that a decision is justified and supported if it leads to the satisfaction of at least the most important goals of the agent,taking into account the most certain part of knowledge.Let D be the set of all possible decisions,where a decision d is a literal.D EFINITION1(A RGUMENT PRO).An argument in favor of a decision d is a triple A=<S,C,d>such that:-d∈D-S⊆K∗and C⊆G∗-S∪{d}is consistent-S∪{d} C-S is minimal and C is maximal(for set inclusion)among thesets satisfying the above conditions.S=Support(A)is the support of the argument,C=Conse-quences(A)its consequences(the goals which are reached by the decision d)and d=Conclusion(A)is the conclusion of the ar-gument.The set A P gathers all the arguments which can be con-structed from<K,G,D>.Due to the stratification of the bases K i and G i,arguments in favor of a decision are more or less strong for i.D EFINITION2(S TRENGTH OF AN A RGUMENT PRO).Let A =<S,C,d>be an argument in A P.The strength of A is a pair<Level P(A),W eight P(A)>such that:-The certainty level of the argument is Level P(A)=min{ρi|k i∈S and(k i,ρi)∈K}.If S=∅then Level P(A)=1.-The degree of satisfaction of the argument is W eight P(A)=m(β)withβ=max{λj|(g j,λi)∈G and g j/∈C}.Ifβ=1then W eight P(A)=0and if C=G∗then W eight P(A)=1.Then,strengths of arguments make it possible to compare pairs of arguments as follows:D EFINITION 3.Let A and B be two arguments in A P.A is preferred to B,denoted A P B,iff min(Level P(A),W eight P(A))≥min(Level P(B),W eight P(B)).Thus arguments are constructed in favor of decisions and those ar-guments can be compared.Then decisions can also be compared on the basis of the relevant arguments.D EFINITION 4.Let d,d ∈D.d is preferred to d ,denotedd P d ,iff∃A∈A P,Conclusion(A)=d such that∀B∈A P, Conclusion(B)=d ,then A P B.This decision process is pessimistic in nature since it is based on the idea of making sure that the important goals are reached.An optimistic attitude can be also captured.It focuses on the idea thata decision is all the better as there is no strong argument against it.D EFINITION5(A RGUMENT CON).An argument against a decision d is a triple A=<S,C,d>such that:-d∈D-S⊆K∗and C⊆G∗-S∪{d}is consistent-∀g i∈C,S∪{d} ¬g i-S is minimal and C is maximal(for set inclusion)among thesets satisfying the above conditions.S=Support(A)is the support of the argument,C=Conse-quences(A)its consequences(the goals which are not satisfied by the decision d),and d=Conclusion(A)its conclusion.The setA O gathers all the arguments which can be constructed from<K, G,D>.Note that the consequences considered here are the negative ones. Again,arguments are more or less strong or weak.D EFINITION6(W EAKNESS OF AN A RGUMENT CON).Let A =<S,C,d>be an argument of A O.The weakness of A is a pair<Level O(A),W eight O(A)>such that:-The level of the argument is Level O(A)=m(ϕ)such thatϕ=min{ρi|k i∈S and(k i,ρi)∈K}.If S=∅thenLevel O(A)=0.-The degree of the argument is W eight O(A)=m(β)suchthatβ=max{λj such that g j∈C and(g j,λi)∈G}.Once we have defined the arguments and their weaknesses,pairs of arguments can be compared.Clearly,decisions for which all the ar-guments against it are weak will be preferred,i.e.we are interestedin the least weak arguments against a considered decision.This leads to the two following definitions:D EFINITION7.Let A and B be two arguments in A O.A is preferred to B,denoted A O B,iff max(Level O(A),W eight O(A))≥max(Level O(B),W eight O(B)).As in the pessimistic case,decisions are compared on the basis of the relevant arguments.D EFINITION8.Let d,d ∈D.d is preferred to d ,denotedd O d ,iff∃A∈A O with Conclusion(A)=d such that∀B∈A O with Conclusion(B)=d ,then A is preferred to B.Let us illustrate this approach using the two points of view(pes-simistic and optimistic)on an example about deciding or not to ar-gue in a multiple agent dialogue for an agent which is not satisfied with the current offer.E XAMPLE 1.The knowledge base is K={(a→suu,1),(¬a→¬suu,1),(a→¬aco,1),(fco∧¬a→aco,1),(sb,1),(¬fco→¬aco,1),(sb→fco,λ)}(0<λ<1)with the intended meaning: suu:saying something unpleasant,fco:other agents in favor of current offer,aco:obliged to accept the current offer,a:argue,sb:current offer seems beneficial for the other agents.The base of goals is G={(¬aco,1),(¬suu,σ)}with(0<σ<1). The agent does not like to say something unpleasant,but it is more important not to be obliged to accept the current offer.The set of decisions is D={a,¬a},i.e.,arguing or not.There is one argument in favor of the decision‘a’:<{a→¬aco}, {¬aco},a>.There is also a unique argument in favor of the deci-sion‘¬a :<{¬a→¬suu},{¬suu},¬a>.The level of the argument<{a→¬aco},{¬aco},a>is1whereasits weight is m(σ).Concerning the argument<{¬a→¬suu}, {¬suu},¬a>,its level is1and its weight is m(1)=0.The argument<{a→¬aco},{¬aco},a>is preferred to the argument<{¬a→¬suu},{¬suu},¬a>.From a pessimistic point of view,decision a is preferred to the decision¬a since<{a→¬aco},{¬aco},a>is preferred to<{¬a→¬suu},{¬suu},¬a>.Let us examine the optimistic point of view.There is one argu-ment against the decision‘a’:<{a→suu},{¬suu},a>.Thereis also a unique argument against the decision¬a:<{sb,sb→fco,fco∧¬a→aco},{¬aco},¬a>.The level of the argument<{a→suu},{¬suu},a>is0whereasits degree is m(σ).Concerning the argument<{sb,sb→fco,fco∧¬a→aco},{¬aco},¬a>,its level is m(λ),and its degree is0. Then the comparison of the two arguments amounts to comparem(σ)with m(λ).Thefinal recommended decision with the optimistic approach de-pends on this comparison.This argumentation system will be used to take decisions about the offers to propose in a negotiation dialogue.The following defini-tion is the same as Definition1where the decision d is about offers.D EFINITION9(A RGUMENT FOR AN OFFER).An argument in favor of an offer x is a triple A=<S,C,x>such that:-x∈X-S⊆K∗and C⊆G∗-S(x)is consistent-S(x) C(x)-S is minimal and C is maximal(for set inclusion)among thesets satisfying the above conditions.X is the set of offers,S=Support(A),C=Consequences(A) (the goals which are satisfied by the offer x)and x=Conclusion(A). S(x)(resp.C(x))denotes the belief state(resp.the preference state)when an offer x takes place.E XAMPLE 2.The example is about an agent wanting to pro-pose an offer corresponding to its desired place for holidays.The set of available offers is X={T unisia,Italy}.Its knowledge base is:K={(Sunny(T unisia),1),(¬Cheap(Italy),β),(Sunny(x)→Cheap(x),1)}.Its preferences base is:G={(Cheap(x),1)}.The decision to take by the agent is whether to offer Tunisia or Italy.Following the last definition,it has an argument in favor of Tunisia:A=<{Sunny(T unisia),Sunny(x)→cheap(x)}, cheap(T unisia),tunisia>.It has no argument in favor of Italy(it violates its goal which is very important).So this agent will offer Tunisia.4.THE NEGOTIATION PROTOCOL4.1Formal settingIn this section,we propose a formal protocol handling negotia-tion dialogues between many agents(n≥2).Agents having to discuss several offers,the protocol is supposed to be run as many times as there are non-discussed offers,and such that a common agreement is still not found.The agents take turns to start new runs of the protocol and only one offer is discussed at each run.A negotiation interaction protocol is a tuple Objective,Agents, Object,Acts,Replies,Wff-Moves,Dialogue,Result such that:Objective is the aim of the dialogue which is tofind an acceptable offer.Agents is the set of agents taking part to the dialogue,Ag= {a0,...,a n−1}.Object is the subject of the dialogue.It is a multi-issue one,de-noted by the tuple O1,...,O m ,m≥1.Each O i is avariable taking its values in a set T i.Let X be the set of all possible offers,its elements are x=x1,...,x m with x i∈T i.Acts is the set of possible negotiation speech acts:Acts={Offer, Challenge,Argue,Accept,Refuse,Withdraw,Say nothing}. Replies:Acts−→Power(Acts),is a mapping that associates to each speech act its possible replies.-Replies(Offer)={Accept,Refuse,Challenge}-Replies(Challenge)={Argue}-Replies(Argue)={Accept,Challenge,Argue}-Replies(Accept)={Accept,Challenge,Argue,Withdraw}-Replies(Refuse)={Accept,Challenge,Argue,Withdraw}-Replies(W ithdraw)=∅Well-founded moves={M0,...,M p}is a set of tuples M k= S k,H k,Move k ,such that:-S k∈Agents,the agent which plays the move is givenby the function Speaker(M k)=S k.-H k⊆Agents\{S k},the set of agents to which themove is addressed is given by the function Hearer(M k)=H k.-Move k=Act k(c k)is the uttered move where Act k isa speech act applied to a content c k.Dialogue is afinite non-empty sequence of well-founded moves D={M0,...,M p}such that:-M0= S0,H0,offer(x) :each dialogue starts withan offer x∈X-Move k=offer(x),∀k=0and∀x∈X:only oneoffer is proposed during the dialogue at thefirst move-Speaker(M k)=a k modulo n:the agents take turnsduring the dialogue.-Speaker(M k)/∈Hearer(M k).This condition for-bids an agent to address a move to itself.-Hearer(M0)=a j,∀j=i:the agent a i which uttersthefirst move addresses it to all the agents.-For each pair of tuples M k,M h,k=h,if S k=S hthen Move k=Move h.This condition forbids anagent to repeat a move that it has already played.These conditions guarantee that the dialogue D is non circu-lar.Result:D−→{success,failure},is a mapping which returns the result of the dialogue.-Result(D)=success if the preferences of the agentsare satisfied by the current offer.-Result(D)=failure if the most important prefer-ences of at least one agent are violated by the currentoffer.This protocol is based on dialogue games.Each agent is equipped with a commitment store(CS)[9]containing the set of facts it is committed to during the dialogue.Using the idea introduced in[2]of decomposing the agents’com-mitments store(CS)into many components,we suppose that each agent’s CS has the structureCS= S,A,Cwith:CS.S contains the offers proposed by the agent and those it has accepted(CS.S⊆X),CS.A is the set of arguments presented by the agent(CS.A⊆Arg(L)),where Arg(L))is the set of all arguments we can construct from L,CS.C is the set of challenges made by the agent.At thefirst run of the protocol,all the CS are empty.This is not the case when the protocol is run again.Indeed,agents must keep their previous commitments to avoid to repeat what they have already uttered during previous runs of the protocol.4.2Conditions on the negotiation actsIn what follows,we specify for each act its pre-conditions and post-conditions(effects).For the agents’commitments(CS),we only specify the changes to effect.We suppose that agent a i ad-dresses a move to the(n−1)other agents.Offer(x)where x∈X.It’s the basic move in negotiation.The idea is that an agent chooses an offer x for which there are the strongest supporting arguments(w.r.t.G i).Since the agent is coop-erative(it tries to satisfy its own goals taking into account the goals of the other agents),this offer x is the also the one for which there exists no strong argument against it(using GO i j instead of G i).Pre-conditions:Among the elements of X,choose x which is pre-ferred to any x ∈X such that x =x,in the sense of defini-tion4,provided that there is no strong argument against the offer x(i.e.with a weakness degree equal to0)where G i is changed into GO i j,∀j=i in definition8.Post-conditions:CS.S t(a i)=CS.S t−1(a i)∪{x}.Challenge(x)where x∈X.This move incites the agent which receives it to give an argument in favor of the offer x.An agent asks for an argument when this offer is not acceptable for it and it knows that there are still non-rejected offers.Pre-conditions:∃x ∈X such that x is preferred to x w.r.t.defi-nition4.Post-conditions:CS.C t(a i)=CS.C t−1(a i)∪{x}:the agent a i which played the move Challenge(x)keeps it in its CS. Challenge(y)where y∈W ff(L).This move incites the agent which receives it to give an argument in favor of the proposition y.Pre-conditions:There is no condition.Post-conditions:CS.C t(a i)=CS.C t−1(a i)∪{y}:the agent a i which played the move Challenge(y)keeps it in its CS. Argue(S)with S={(k p,αp),p=1,s}⊆K i is a set of formulas representing the support of an argument given by agent a i.In[5], it is shown how to compute and evaluate acceptable arguments.Pre-conditions:S is acceptable.Post-conditions:CS.A t(a i)=CS.A t−1(a i)∪S.If S is accept-able(according to the definition given in[5]),the agents a j revise their base K j into a new base(K j)∗(S).Withdraw An agent can withdraw from the negotiation if it hasn’t any acceptable offer to propose.Pre-conditions:∀x∈X,there is an argument with maximal strength against x,or(X=∅).Post-conditions:(Result(D)=failure)and∀i,CS t(a i)=∅.As soon as an agent withdraws,the negotiation ends and all the commitment stores are emptied.We suppose the dialogue ends this way because we aim to find a compromise between the n agents taking part to the negotiation.Accept(x)where x∈X.This move is played when the offer x is acceptable for the agent.Pre-conditions:the offer x is the most preferred decision in X in the sense of definition4.Post-conditions:CS.S t(a i)=CS.S t−1(a i)∪{x}.If x∈CS.S(a i),∀i,then Result(D)=success,i.e if all the agents accept the offer x,the negotiation ends with x as compromise.Accept(S)S⊂W ff(L).Pre-conditions:S is acceptable for a i.Post-conditions:CS.A t(a i)=CS.A t−1(a i)∪S.Refuse(x)where x∈X.An agent refuses an offer if it is not acceptable for it.Pre-conditions:there exists an argument in the sense of definition 5against x.Post-conditions:If∀a j, (S,x),i.e.if there not exist any accept-able argument for x then X=X\{x}.A rejected offer isremoved from the set X.Result(D)=failure.Say nothing This move allows an agent to miss its turn if it has already accepted the current offer,or it has no argument to present. This move has no effect on the dialogue.4.3Properties of the negotiation protocolP ROPERTY1(T ERMINATION).Any negotiation between n agents managed by our protocol ends,either with Result(D)=suc-cess or Result(D)=failure.P ROPERTY2(O PTIMAL OUTCOME).If the agents do not mis-represent the preferences of the other agents(GO i j),then the com-promise found is an offer x which is preferred to any other offerx ∈X in the sense of definition4,for all the agents.5.EXAMPLE OF DELIBERATIVE CHOICE We illustrate our negotiation protocol through an example of di-alogue between three agents:Mary,John and Peter,partners ona common project aiming at setting a town and a date for their next meeting.The negotiation object O is in this case the cou-ple(T own,Date)denoted t,d ,where t is for the town and d the date.Suppose that the set of offers is X={(V,E),(L,S),(V,J)},i.e. the meeting will take part either in Valencia(denoted V),at one of the dates respectively denoted E and J;or in London(denoted L)at the date denoted S.In what follows,we use the following scale T={a,b,c,d}with the condition a>b>c>d.We recall that m is the order revers-ing map on the scale T such that m(a)=d and m(b)=c. Suppose Mary has the following beliefs:K0={(disposable(V,E),1),(disposable(t,d)→meet(t,d),1),(free(V,E),1),(¬free(L,S),1),(disposable(t,J),1)}.The goals of Mary are to meet her partners in any town and at any date,provided that accommodations are free.This can be written:G0={(meet,1),(free,b)}.Where”meet”is a short for(meet(V,E)∨meet(L,S)∨meet(V,J)).”free”is defined the same way.We use this type of abbreviationin what follows.Suppose John’s beliefs are:K1={(hot(V,d),a),(¬hot(L,S),1), (disposable(L,S),1),(disposable(t,d)→meet(t,d),1),(meet(V,J)→work saturday,1)}.His goals are to meet his partners in any town and at any date,and that this town must be not hot at this date.We write:G1={(meet,1),(¬hot,c)}.Finally we suppose Peter’s beliefs are:K2={(¬meet(V,E),1),(∀d=E,meet(V,d),1), (disposable(t,d)→meet(t,d),1),(disposable(V,J),b), (manager,1),(manager→work saturday,1)}.His goals are to meet his partners and to don’t work on Saturday. We write:G2={(meet,1),(¬work saturday,d)}.For simplicity,we suppose that Mary,John and Peter ignore the preferences of each other.This means that GO i j=∅,∀i,j.In what follows,we illustrate the dialogue between the agents and give the moves played by each agent.First run of the protocol:Mary starts the dialogue by proposing an offer.Mary:The next meeting should be in Valencia during the confer-ence ECAI.Offer(V,E).Pre-condition:(V,E)is the most preferred decision for Mary.Post-condition:CS.S(Mary)={(V,E)}.John:Why?Challenge(V,E).Pre-condition:For John,there exists another decision which is preferred to(V,E).Post-condition:CS.C(John)={(V,E)}.Peter:What are the advantages?Challenge(V,E).Pre-condition:For Peter,this decision violates his most im-portant goal.Post-condition:CS.C(P eter)={(V,E)}.Mary:I think we can meet as soon as it will be during ECAI.Argue(meet(V,E)).Pre-condition:The argument is acceptable.Post-condition:CS.A(Mary)={disposable(V,E), disposable(V,E)→meet(V,E)}.John:I refuse Valencia because it is hot.Argue(hot(V,d)).Pre-condition:{hot(V,d)}is an acceptable argument.Post-condition:CS.A(John)={hot(V,d)}.Peter:For my part,I will not be able to meet you.Argue(¬meet(V,E)).Pre-condition:{¬meet(V,E)}is an acceptable argument.Post-condition:CS.A(P eter)={¬meet(V,E)}. Mary:Nevertheless the accommodation will be free.Argue(free(V,E)).Pre-condition:{free(V,E)}is an acceptable argument.Post-condition:CS.A(Mary)=CS.A(Mary)∪{free(V,E)}.John:It still doesn’tfit me.Refuse(V,E).Pre-condition:the offer violates one of his goals.Peter:Neither do I.Refuse(V,E).Pre-condition:the offer violates his most important goal.Post-condition:Result(D)=failure.X=X\{(V,E)}and all the CS are emptied exceptthe components of the arguments.Second run of the protocol:It is started by John.John:What about London in September?Offer(L,S).Pre-condition:(L,S)is the most preferred decision for John.Post-condition:CS.S(John)={(L,S)}.Peter:I refuse.Refuse(L,S).Pre-condition:this offer violates his most important goal. Mary:John,what are your arguments in favor of your offer?Challenge(L,S).Pre-condition:(L,S)is not the preferred decision for Mary.Post-condition:CS.C(Mary)={(L,S)}.John:London is not hot and I will be able to meet you.Argue(¬hot(L,S),meet(L,S)).Pre-condition:The argument is acceptable.Post-condition:CS.A(John)=CS.A(John)∪{¬hot(L,S),disposable(L,S),disposable(L,S)→meet(L,S)}.Peter:But we have already said that the meeting should be in Va-lencia!Argue(∀d=E,meet(V,d)).Pre-condition:{∀d=E,meet(V,d)}is an acceptable ar-gument.Post-condition:CS.A(P eter)=CS.A(P eter)∪{∀d= E,meet(V,d)}.Mary:I can’t pay the accommodation.Argue(¬free(L,S)).Pre-condition:{¬free(L,S)}is acceptable.Post-condition:CS.A(Mary)=CS.A(Mary)∪{¬free(L,S)}.John:I forgot that we have chosen Valencia to organize the meet-ing.Accept(∀d=E,meet(V,d)).Pre-condition:{∀d=E,meet(V,d)}is an acceptable ar-gument.Post-condition:CS.A(John)=CS.A(John)∪{∀d= E,meet(V,d)}.John revises its belief base which becomes:K1={(hot(V,d),0.9),(¬hot(L,S),1),(disposable(L,S),1),(disposable(t,d)→meet(t,d),1),(meet(V,J)→work saturday,1),(∀d=E, meet(V,d),1)}.Peter:I have nothing more to say.Say nothing.Mary:I also forgot this.Accept(∀d=E,meet(V,d)).Pre-condition:{∀d=E,meet(V,d)}is acceptable.Post-condition:CS.A(Mary)=CS.A(Mary)∪{∀d= E,meet(V,d)}.Mary revises its belief base which becomes:K0={(disposable(V,E),1),(disposable(t,d)→meet(t,d),1), (free(V,E),1),(¬free(L,S),1),(disposable(t,J),1),(∀d=E, meet(V,d),1)}.Result(D)=failure,X=X\{(L,S)}={(V,J)}and all the CS are emptied except the components of the arguments.。