Abstract A Comparative Study for Domain Ontology Guided Feature Extraction
- 格式:pdf
- 大小:116.92 KB
- 文档页数:10
比较文学与比较文化研究——斯蒂文•托托西•德•让普泰内克教授访谈录张叉 斯蒂文•托托西•德•让普泰内克内容摘要:本文是四川师范大学外国语学院教授张叉对普渡大学比较文学系教授斯蒂文•托托西•德•让普泰内就比较文学与比较文化研究专题所作的访谈录。
访谈中,斯蒂文•托托西•德•让普泰内教授回顾了自己的生活与学术经历,讨论了比较文学学者应该具备的条件,评价了比较文学及其标志性理论框架“比较文化研究”的重要性,探索了中国比较文学学派的建立问题,思考了“美国梦”和“中国梦”的理念,就如何提高中国比较文学的学术水平提出了建议。
关键词:比较文学;数字人文学科;文学经典;比较文化研究;比较文学中国学派基金项目:2016年四川省社科规划基地四川省比较文学研究基地项目“比较文学中外名人访谈录”(项目编号SC16E036)阶段性研究成果。
作者简介:张叉,四川师范大学外国语学院教授,主要从事英美文学、比较文学与比较文化研究。
斯蒂文•托托西•德•让普泰内克,美国普渡大学比较文学系教授,主要从事比较文化、比较文学研究。
Title: Comparative Literature and Comparative Cultural Studies: An Interview with Professor Ste-ven Tötösy de ZepetnekAbstract: This is the interview with Professor Steven Tötösy de Zepetnek at Department of Com-parative Literature, Purdue University, conducted by Zhang Cha, Professor at School of Foreign Languages, Sichuan Normal University. In this interview, Professor Steven Tötösy de Zepetnek discusses his personal and scholarly background and what he believes a comparative literature scholar ought to have training in. Further, Tötösy de Zepetnek comments on comparative literature and his signature theoretical framework “comparative cultural studies”, and probes into the estab-lishment of the Chinese School of Comparative Literature. Tötösy de Zepetnek closes the inter-view with his thoughts about the notions of the “American Dream” and the “Chinese Dream” and his suggestion as to how to improve comparative literature scholarship in China.Key words: comparative literature; digital humanities; literary canon; comparative cultural stud-ies; the Chinese School of Comparative LiteratureAuthors: Zhang Cha is professor at school of Foreign Languages, Sichuan Normal Universi-ty(Chengdu 610101, China). His major research areas include English and American literature, comparative literature and comparative culture studies. E-mail:zhangchasc@. Steven Tötösy de Zepetnek is professor at Department of Comparative Literature, Purdue University (West Lafayette 47906, the U. S.), specializing in studies of comparative culture and comparative liter-ature.E-mail:totosysteven@104Foreign Language and Literature Research 4(2018)外国语文研究2018年第4期张叉:托托西•德•让普泰内克教授,您已经多次应邀来四川大学讲学了,这也正是我今天对您进行采访的一个原因。
第22卷 第3期宇 航 学 报V ol.22N o.3 2001年5月JO U RN A L OF A ST RO N AU T ICS M ay2001一种空间用抗原子氧复合膜 布的原子氧、温度、紫外辐射效应的试验研究赵小虎 沈志刚 王忠涛 邢玉山 麻树林(北京航空航天大学流体力学研究所,北京 100083)摘 要:在原子氧剥蚀效应地面模拟设备中对一种抗原子氧复合材料 布进行了原子氧剥蚀效应试验、试样温度升高对原子氧效应的影响试验以及原子氧与紫外辐射复合效应试验研究,对试验前后 布试样的质量及表面形貌进行了比较,得出了材料在设备中的反应特点以及温度变化、紫外辐射对材料的原子氧效应的影响规律。
另外对这种材料与原子氧的反应机理进行了分析,得出了一些有意义的结论。
关键词:原子氧效应;抗原子氧;地面模拟试验;紫外辐射中图分类号:V257 文献标识码:A 文章编号:1000-1328(2001)03-0087-05EXPERIMENTAL INVESTIGATIONS OF ATOMIC OXYGEN, TEMPERATURE,ULTRAVIOLET RADIATION EFFECTS ON AN AO-RESISTANT SPACECRAFT COMPOSITEMATERIAL-BETA CLOTHXiaohu Zhao Zhigang Shen Zhongtao Wang Yushan Xing Shulin Ma(Institute of Fluid M echan ics,Beijng University of Aer on autics and Astronautics,Beijin g 100083)Abstract:Bet a cloth is an AO-r esistant spacecraft compo site ma terial,on which exper iments ar e co nducted in t his paper to inv estig ate the at omic o x yg en effects,the impact o f the tempera-ture change on t he ato mic o x yg en effects and the ultr aviolet r adiation effects with ato mic ox yg eneffects gr o und-based simulat ion facility.T he sample mater ial befo re and after the ex per iments iscompar ed in mass and sur face mor pho lo gy.T he r eact ion cha racter istics of the ma terial in the fa-cility and the impact of temperat ur e chang e and ultrav io let r adiatio n on ato mic o x yg en effectsw ere acquir ed.T hr o ug h analyzing the reaction m echanism bet ween atomic o xy gen and bet aclo th,it wa s co ncluded that the collision o f ener g et ic pa rticle w ill be an impor tant facto r.Key words:A tomic ox yg en effects;A O-resist ant;G ro und-based ex periment;U ltr aviolet r a-diatio n收稿日期:2000-04-07,修回日期:2000-10-08作者简介:赵小虎(1974-),男,博士研究生,研究方向:空间环境中原子氧与材料相互作用的现象与机理及其地面模拟;流体力学在工程上的应用。
·经验交流·右美托咪定联合臂丛神经阻滞麻醉用于老年高血压骨科手术的观察陈鹏王小明李方宽华东王峻沈正超王冠男【摘要】目的观察右美托咪定在高血压老年患者臂丛神经阻滞麻醉下的应用。
方法选择2016年1月—2017年8月马鞍山市中心医院收治的择期行上肢和肩部手术的高血压老年患者60例作为研究对象,随机分为观察组(A组)和对照组(B组)两组,每组各30例。
A组先给予右美托咪定负荷量(浓度为4μg/ml)0.5μg/kg,然后以3μg/(kg·h)持续至患者达到Ramsay3级镇静;然后在B超定位下进行肌间沟入路臂丛神经阻滞,将2%利多卡因10ml+0.75%罗哌卡因10ml注射进去。
对照组在B超定位下进行肌间沟入路臂丛神经阻滞,同样注入2%利多卡因10ml+0.75%罗哌卡因10ml。
A组再以右美托咪定0.2μg/(kg·h)维持至手术结束。
B组以等量生理盐水到手术结束,分别记录T0(入室)、T1(臂丛穿刺时)、T2(手术开始10min)T3(手术开始30min)、T4(手术结束时)的HR、MAP、SPO2、VAS评分、Ramsay评分和术后24h遗忘性分析。
结果与B组比较,A组患者T1 T4时HR明显减慢,MAP明显降低(P<0.05);A组患者T1 T4的VAS评分明显降低(P<0.05),Ramsay评分明显增加(P<0.05)。
A组术后访视发现患者对麻醉穿刺和对手术不良刺激的遗忘性高(P<0.05)。
结论右美托咪定具有一定的降压和顺行性遗忘作用,可安全有效地应用于臂丛神经阻滞下上肢手术的辅助镇静镇痛。
【关键词】右美托咪定;老年;高血压;臂丛麻醉[中图分类号]R614[文献标识码]A DOI:10.3969/j.issn.1002-1256.2019.05.017Application of dexmedetomidine combined with brachial plexus block anesthesia in orthopedic surgery forsenile patients with hypertension CHEN Peng.The first affiliated hospital of Wannan Medical College&Yijishan Hospital,Wuhu,Anhui,241001,China.【Abstract】Objective To observe the application of dexmedetomidine in brachial plexus block anesthesiafor elderly hypertensive patients.Methods Sixty patients those who will receive upper extremity and shouldersurgeries during January2016and August2018were randomly divided into observation group(group A)andcontrol group(group B),30cases in each group.Group A was given dexmedetomidine(4μg/ml)at the dose of0.5μg/kg,then the dose just to3μg/(kg·h)until the patient reachedRamsay level3sedation.B-plexusapproach was then performed with B-mode ultrasonography,10ml of2%lidocaine and10ml of0.75%ropivacaine were injected into the group A,and then maintained injection of dexmedetomidine at the dose of0.2μg/(kg·h)until the end of surgery.Brachial plexus block in group B also guided by B-mode ultrasonography,10ml of2%lidocaine and10ml of0.75%ropivacaine were injected,and then,the same amount of normal saline wasgiven patients in group B until the end of surgery.The HR,MAP,SpO2,VAS score andRamsay score at the timepoints T0(the patients entered the operation room),T1(at the time of brachial plexus puncture),T2(10minafter the start of surgery),T3(30min after the start of surgery)and T4(at the end of surgery)and24hpostoperative amnesia were analyzed.Results Compared with group B,the HRof group A was significantly slowdown and the MAP was significantly decreased at the time points T1to T4(P<0.05).The VAS score of group T1was significantly decreased(P<0.05)andRamsay score was significantly increased(P<0.05).Postoperative visitsfound that patients performed high amnesia in narcotic puncture and harmful surgical stimulation(P<0.05).Conclusions Dexmedetomidine has certain antihypertensive and amnesic oblivion effect,which could be safelyand effectively applied to the auxiliary sedation and analgesia of upper limb surgery under brachial plexus block.【Key words】Dexmedetomidine;Old age;Hypertension;Brachial plexus anesthesia实施臂丛阻滞麻醉时,患者会出现不同程度的恐惧与紧张,出现血压升高、心率加快等症状,进而可能会导致心血管疾病[1]。
国外博士生毕业论文Title: Understanding the Impact of Climate Change on Agriculture: A Comparative StudyIntroduction:Climate change has emerged as one of the most pressing challenges of our time, affecting various sectors of society. One of the sectors most vulnerable to the consequences of climate change is agriculture. As the global population continues to grow, ensuring food security becomes increasingly critical. With this background, this study aims to understand the impact of climate change on agricultural practices and productivity in two contrasting regions: Africa and North America.Methodology:This research employed a comparative study design, using both qualitative and quantitative data analysis techniques. Data from previous studies, reports, and scientific literature were gathered to provide a comprehensive understanding of the climate change-related concerns in each region. Key factors explored include changing temperature patterns, rainfall variability, and the frequency and intensity of extreme weatherevents. Furthermore, interviews with farmers, agricultural experts, and policymakers in both regions were conducted to delve deeper into local perspectives and adaptation strategies.Results:The findings of this study indicate that climate change has direct and indirect impacts on agricultural practices and productivity in both Africa and North America. In Africa, increasing temperatures and changing rainfall patterns have led to prolonged droughts and an overall decrease in crop productivity. Moreover, the frequency of extreme weather events, such as floods and cyclones, has significantly disrupted agricultural activities, leading to crop losses and infrastructural damage.Contrastingly, North America has experienced more localized impacts, with temperature increases affecting certain agricultural regions differently. Some areas have witnessed longer growing seasons, enabling the cultivation of new crops and increased agricultural productivity. However, these benefits are offset by other regions experiencing more frequent and severe heatwaves and droughts, affecting crop yields and livestock production.In terms of adaptation strategies, Africa has shown resilience by adopting various climate-smart agriculture practices such as improved irrigation techniques, crop diversification, and agroforestry. However, limited access to financial resources and technology hinders widespread adoption. On the other hand, North America's agricultural sector benefits from its relatively greater economic resources, enabling the adoption of precision agriculture techniques, advanced irrigation systems, and the use of genetically modified crops. However, concerns regarding the environmental sustainability of these methods remain.Conclusion:This study highlights the significant impact of climate change on agricultural practices and productivity in Africa and North America. It underscores the urgency for developing adaptive strategies that can mitigate the effects of changing climatic conditions. The findings also emphasize the need for international collaboration to address the challenges faced by vulnerable regions, promoting technology and knowledge transfer to support sustainable agricultural practices. Ultimately, recognizing the diverse impacts of climate change on different regions would aid in formulating tailoredpolicies and strategies to build resilience and ensure food security in the face of a changing climate.。
创新教育对专家学者(以下合称学者)的学术地位进行及时准确的评价与定位具有重要意义,因为这不仅事关学者获得晋升、资助、荣衔、名望,更事关学术资源的合理分配,事关科研活动与学术合作是否富有成效,尤其是直接决定了学术带头人的甄选,因此是牵一发动全身的头等大事。
笔者提出的学术带头人决定论对此有简明有力的说明,即:(1)只要科技体制机制不是特别糟糕,经费有一定保障,学术带头人的作用就是决定性的;(2)研究团队、课题组的水平取决于学术带头人的水平,而不取决于团队或课题组中水平最高的学者的水平;(3)只有学术带头人是名符其实的前沿学者,即当下的一流学者或潜一流学者,才可能取得前沿突破,团队或课题组中的其他成员才可能发挥相应的作用。
带头人的绝对重要性怎么强调也不过分,在企业界也有一种说法,即使项目再好,如果没有合适的带头人也不会上马。
[1]可以把学者分成顶尖、拔尖、优秀、普通四个等级,顶尖学者是获得或达到诺贝尔科学奖、菲尔兹数学奖得主水平的学者,当然这些科学大奖得主的水平也参差不齐,例如:爱因斯坦是大师中的大师、顶尖中的顶尖,但是为方便起见我们就用这四个等级而不再细分。
迄今为止,有两类方式来确定学者学术地位,一类以单项成果决定,另一类以综合成果决定。
前者仅对于顶尖学者实现了准确的学术地位的评价与定位,就是用一项具体的标志性科学成果来确立其学术地位,往往这个标志性成果正是该学者的最高成就(当然也有例外,爱因斯坦的最高成就相对论并没有获奖),这与①作者简介:刘益东(1961,3—),男,汉,北京人,研究生,中国科学院自然科学史研究所研究员,博士生导师,研究领域为科技战 略、人才战略、科技与社会、科技史。
DOI:10.16660/ k i.1674-098X.2015.35.226最高成就决定学者的学术地位与学术地位的可视化①刘益东(中国科学院自然科学史研究所 北京 100190)摘 要:该文提出应以学者的最高成就而不是综合贡献来确定学者的学术地位,并用突破性成果的可视化五步法实现学者学术地位在知识前沿地图上的可视化。
第28卷 第17期 中国现代医学杂志 Vol. 28 No.17 2018年6月 China Journal of Modern Medicine Jun. 2018收稿日期:2016-12-15*基金项目:湖北省自然科学基金(No :2014CFB733)DOI: 10.3969/j.issn.1005-8982.2018.17.012文章编号: 1005-8982(2018)17-0065-04哮喘患者麻醉方案的可行性研究*肖兴鹏,贾一帆,赵博,左芳芳,余奇劲(武汉大学人民医院 麻醉科,湖北 武汉 430060)摘要:目的 探讨哮喘病患者硬膜外麻醉联合无阿片类镇痛药全身麻醉能否有效预防术中哮喘的发作。
方法 选取年龄30~60岁、美国麻醉医师协会分级Ⅱ、Ⅲ级并有支气管哮喘的肺部病变患者50例,随机分为采用常规气管插管的对照组和采用硬膜外麻醉联合无阿片类镇痛药全身麻醉的实验组。
麻醉后采集患者麻醉前(T 0)、诱导后插管前(T 1)、气管插管即刻(T 2)、插管后5 min(T 3)、清醒准备拔管时(T 4)及拔管后5 min (T 5)各时点的生命体征,进行统计学分析。
结果 两组患者在不同时点的SpO 2、BIS 比较有差异(P <0.05);两组患者Ppeak 比较有差异(P <0.05),实验组Ppeak 比对照组低,两组的Ppeak 变化趋势有差异(P <0.05)。
对照组术中哮喘发作、清醒后耐受气管导管及清醒后切口疼痛患者较实验组多(P <0.05)。
两组患者术后挣扎躁动、术中知晓人数及术后咽喉部疼痛不适等指标比较无差异(P >0.05)。
结论 硬膜外麻醉联合无阿片类镇痛药全身麻醉能获得较好的麻醉效果,可降低术中哮喘的发生率,并能安全地应用于哮喘患者。
关键词: 镇痛药;全身麻醉;硬膜外麻醉;哮喘中图分类号: R614.2 文献标识码: AFeasibility study of anesthesia program for asthma patients *Xing-peng Xiao, Yi-fan Jia, Bo Zhao, Fang-fang Zuo, Qi-jin Yu(Department of Anaesthesiology, Renmin Hospital of Wuhan University, Wuhan, Hubei 430060, China)Abstract: Objective To study whether epidural anesthesia combined with general anesthesia without an opioid analgesic in patients with history of asthma could effectively prevent an asthma attack during operation. Methods Fifty patients (ASAII -III), aged 30-60 y, with the history of bronchial asthma and pulmonary lesions, were randomly divided into group I (routine tracheal intubation group, n = 25) and group II (epidural anesthesia combined with general anesthesia without an opioid analgesic group, n = 25). The MAP, HR, BIS, SpO 2, Ppeak and PETCO 2 were recorded before anesthesia (T 0), before intubation after anesthesia induction (T 1), at the moment of trachea intubation (T 2), 5 min after intubation (T 3), resuscitation before extubation (T 4), and 5 min after extubation (T 5). Other indexes were recorded too, including the number of intraoperative asthma cases, endotracheal tube tolerance after sober, incision pain after sober, agitation and struggle after operation, intraoperative awareness, throat pain and discomfort after surgery and so on. Results At various time points, the SpO 2 and BIS values of the two groups were different (P < 0.05); the Ppeak in the group II was significantly lower than that in the group I (P < 0.05) and there was a significant difference in the variation trend of the Ppeak between the two gruops (P < 0.05). Meanwhile, the incidences of asthma attack, endotracheal tube tolerance after sober and incision pain after sober in the group I were significantly higher than those in the group II (P < 0.05). There was no significant difference in agitation and struggle after operation, intraoperative awareness, or throat pain and discomfort after operation between the two groups (P > 0.05). Conclusions Epidural anesthesia combined with general anesthesia without an opioid analgesic can be中国现代医学杂志 第28卷在气管插管全身麻醉(简称全麻)诱导期间,常规需要应用可诱发哮喘发作的阿片类镇痛药,且其药物说明书中也明确标注慎用或者禁用于有支气管哮喘病史或呼吸系统疾病的患者,但在临床实际麻醉中阿片类镇痛药却是常规麻醉必不可少的镇痛药,为避免这种矛盾现象,有效降低哮喘患者在全麻中哮喘发作,减少临床医疗纠纷的发生,武汉大学人民医院对50例并发支气管哮喘的肺部病变患者使用无阿片类镇痛药全麻联合硬膜外麻醉,探讨预防术中哮喘发作的可行性。
ISSN 1798-4769Journal of Language Teaching and Research, V ol. 1, No. 1, pp. 77-80, January 2010© 2010 ACADEMY PUBLISHER Manufactured in Finland.doi:10.4304/jltr.1.1.77-80Brief Study on Domestication and Foreignizationin TranslationWenfen YangSchool of Foreign Languages, Qingdao University of Science and Technology, Qingdao, ChinaEmail: wfyoung@Abstract— This essay gives a brief study of Domestication and Foreignization and the disputes over these twobasic translation strategies which provide both linguistic and cultural guidance. Domestication designates thetype of translation in which a transparent, fluent style is adopted to minimize the strangeness of the foreigntext for target language readers; while foreignization means a target text is produced which deliberatelybreaks target conventions by retaining something of the foreignness of the original. In the contemporaryinternational translation field, Eugene Nida is regarded as the representative of those who favourdomesticating translation, whereas the Italian scholar Lawrence Venuti is regarded to be the spokesman forthose who favour foreignizing translation, who has also led the debate to a white-hot state.Index Terms— domestication, foreignization, translation strategiesI.O VERVIEW OF D OMESTICATION AND F OREIGNIZATIONDomestication and foreignization are two basic translation strategies which provide both linguistic and cultural guidance. They are termed by American translation theorist L.Venuti (qtd. in Schaffner 1995:4). According to Venuti, the former refers to ―an ethnocentric reduction of the foreign text to target-language cultural values, bring the author back home,‖ while the latter is ―an ethnodeviant pressure on those (cultural) values to register the linguistic and cultural difference of the foreign text, sending the reader abroad.‖ (Venuti 1995: 20) Generally speaking, domestication designates the type of translation in which a transparent, fluent style is adopted to minimize the strangeness of the foreign text for target language readers, while foreignization means a target text is produced which deliberately breaks target conventions by retaining something of the foreignness of the original (Shuttleworth & Cowie 1997:59). Disputes over domestication and foreignization have existed for a long time. However, till 1950s and 1960s, when the more systematic, and mostly linguistic-oriented, approach to the study of translation began to emerge (Jeremy 2001:9), the focus had been on the linguistic level. Since the cultural turn appeared in 1970s, the dispute has been viewed from a brand new perspective ––social, cultural and historical. The conflict between domestication and foreignization as opposite translation strategies can be regarded as the cultural and political rather than linguistic extension of the time-worn controversy over free translation and literal translation (Wang Dongfeng 2002:24).Seen from this, liberal translation and literal translation are not synonymous to domestication and foreignization, but they may overlap sometimes. Foreignness in language or culture can serve as a standard to judge whether a translation is domesticated or foreignized. Literal and liberal translations are techniques to tackle the linguistic form and they are two ways to transcode language. Domestication and foreignization, however, are concerned with the two cultures, the former meaning replacing the source culture with the target culture and the latter preserving the differences of the source culture. Only when there are differences in both linguistic presentation and cultural connotation, domestication and foreignization exist.Nida (2001:82) points out that ―For truly successful translation, biculturalism is even more important than bilingualism, since words only have meanings in terms of the cultures in which they function.‖ Cultural gaps between the source language and the target language have always turned to be a hard nut for translators to crack. Christiane.Nord (2001:34) holds that ―translating means comparing cultures.‖A brief retrospect may facilitate deeper understanding about the question under discussion. For the sake of convenience, the authoress here follows two clues, namely, studies abroad and studies at home.II.S TUDIES A BROADMany of translation theories from Cicero (106-43 B.C.) to the twentieth century centred on the recurring and sterile debate as to whether translation should be literal (word-for-word) or free (sense-for-sense), a dyad that is famously discussed by St Jerome in his translation of the Bible into Latin. Controversy over the translation of the Bible and other religious texts was central to translation theory for over a thousand years (Jeremy 2001:33). However, according to Routledge Encyclopedia of Translation Studies (Baker 1998: 242), the domestication strategy has been implemented at least since ancient Rome, when, as Niethzshe remarked, ―translation was a form of conquest‖ and Latin poets like Horace and Propertius translated Greek text into the Roman present. A foreignizing strategy was first formulated in German culture during the classical and Romantic periods, perhaps most decisively by the philosopher and theologianFriedrich Schleiermacher. In his famous lecture On the Different Ways of Translation, Friedrich Schleiermacher demanded that translations from different languages into German should read and sound different: the reader should be able to guess the Spanish behind a translation from Spanish, and the Greek behind a translation from Greek. He argued that if all translations read and sound alike, the identity of the source text would be lost, levelled in the target culture.In the contemporary international translation field, the person who has initiated the controversy between domestication and foreignization is Eugene Nida, whom is regarded as the representative of those who favour domesticating translation. While it is the Italian scholar Lawrence Venuti who has led the debate to a white-hot state. He can be regarded as the spokesman for those who favour foreignizing translation.A. Nida’s Formal and Functional EquivalencesNida differentiates between two types of equivalences: formal and dynamic (or functional) as basic translation orientations. Formal equivalence focuses attention on the message itself, in both form and content. It is a means of providing some insight into the lexical, grammatical or structural form of a source text, which is similar to literal translation. Functional equivalence, however, is based on the principle of equivalent effect, i.e. the relationship between receiver and message should aim at being the same as that between the original receivers and the SL message. In language, Culture and Translating, a minimal definition of functional equivalence is stated as ―the readers of a translated text should be able to comprehend it to the point that they can conceive of how the original readers of the text must have understood and appreciated it.‖ The maximal, ideal definition is stated as ―the readers of a translated text should be able to understand and appreciate it in essentially the same manner as the original readers did.‖ (Nida 1995: 118)In fact Nida‘s func tional equivalence is based on and is used to guide the translation of Bible. His translation work, splendid though it is, comes out of a specific purpose: the translation of a Christian text with the goal of converting non-Christians to a different spiritual viewpoint. In order to entail a good understanding and operative function for the receptors of the target language, the message in the Bible with the meaning in Latin ―do not let your left hand know what your right hand is doing‖ can be rendered in English as ―do it in such a way that even your closest friend will not know about it‖. Nida points out that this translation first avoids the possible misunderstanding by the receptors and thus makes clear the tangible reference to present-day circumstances of life. This practice may be acceptable in translating Bible, but in handling cultural factors in texts other than Biblical one, functional equivalence is inadequate and even misleading. Peter Newmark thinks that Nida‘s functional equivalence has done too much for the readers by rendering everything so plain, so easy. He states ―Following Nida‘s ‗Translating is communicating‘ with its emphasis on a readable, understandable text (although Nida also insists on accuracy and fidelity), one notices inevitably a great loss of meaning in the dropping of so many Biblical metaphors which, Nida insists, the reader cannot understand.‖(Newmark 2001a: 51)B. Venuti’s Foreignization vs. Domestication: Resistance against the Anglo-American CultureVenuti‘s foreignizing strategy is put forward in the ―aggressive monolingual‖ cultural background such as the Anglo-American culture. As a staunch advocate of foreignization, Venuti believes there is violence residing in the very purpose and activity of domestication. He holds that the phenomenon of domestication involves ‗an ethnocentric reduction of the foreign text to [Anglo-American] target-language cultural values‘. This entails translating in a transparent, fluent, ‗invisible‘ style in order to minimize the foreignness of the TT (Jeremy 2001:146). Venuti proposes the strategy of ―resistant translation‖( i.e. foreignization) against the tradition of ―smooth translation‖. He argues that foreignization ―entails choosing a foreign text and developing a translation method along lines which are excluded by dominant cultural values in the target language (Venuti 1997: 242).Foreignization produces ―something that cannot be confused with either the source-language text or a text written originally in the target language.‖ (qtd. in Albrecht 1992:4) Venuti (1995: 20) considers the foreinizing method to be ‗an ethnodeviant pressure on [target-language culture] values to register the linguistic and cultural difference of the foreign text, sending the reader abroad‘. It is ‗highly desirable‘, he says, in an effort ‗to restrain the ethnocentric violence of translation‘. In other words, the foreignizing method can restrain the ‗violently‘ domesticating cultural values of the English-language world (qtd. in Jeremy 2001:147). In summary, foreignization advocated by Venuti and his followers is a non-fluent or estranging translation style designed to make visible the presence of the translator by highlighting the foreign identity of the ST and protecting it from the ideological dominance of the target culture(ibid: 147). According to Venuti, domestication and foreignization are ‗heuristic concepts‘ rather than binary opposites. They may change meaning across time and location. What dose not change, however, is that domestication and foreignization are ‗deal with the question of how much it rather signals the differences of that text‘ (ibid: 148).C. Other StudiesIn the 1970s, polysystem theory was developed by the Israeli scholar Itamar Even-Zohar. A literary work is studied as part of a literary system, which itself is defined as ‗a system of functions of the literary order which are in continual interrelationship with other orders‘ (Tynjanov 1927: 71-72, Jeremy 2001: 109). Literature is thus part of the social, cultural, literary and historical framework and the key concept is that of the system(Jeremy 2001: 109). According to polysystem hypothesis, the translators in a strong literary polysystem tend to apply domesticating strategy and thusproduce translations characterized by superficial fluency, while in a weak culture foreinizing strategy or resistant translation prevails (Zohar 1978: 7-8).Almost contemporary with Zohar‘s polysystem theory, the cultural studies proposed by Andre Lefevere and Susan Bassnett also provide a new perspective on the problem of domestication and foreignization. Generally speaking, Lefevere and Bassnett agree with Nida‘s ‗complete naturalness of expression‘. The difference is that Nida‘s ‗equivalence‘ is at the level of linguistics while Lefevere and Bassnett seek f or a cultural equivalence. Both of them attach great attention to the type of target readers, considering the nature of the text as well. Bassnett also proposes that different historical periods require different translation norms. The specific translation strategy adopted, domestication or foreignization, could reflect and in turn, determine the social and cultural trend in the contemporary society.Also in the 1970s, skopos, the Greek word for ‗aim‘ or ‗purpose‘, was introduced into translation theory and developed by Hans J. Vermeer. His idea is then extended by some second-generation skopos theorists, most notably, Christiane Nord. In the framework of skopos theory, ―translate means ‗to produce a text in a target setting for a target purpose and target a ddresses in target circumstances.‖ (Nord 2001:12) According to skopos theory, the top-ranking rule for any translational action is the ‗skopos rule‘. This rule is intended to solve the eternal dilemmas of free vs. faithful translation, domestication vs. foreignization, etc. It means that the Skopos of a particular translation task may require a ‗domestication‘ or a ‗foreignization‘, or anything between these two extremes, depending on the purpose for which translation is needed. What it does not mean is that a good translation should ipso facto conform or adapt to target-culture behaviour or expectations, although the concept is often misunderstood in this way (Nord 2001: 29).III.S TUDIES AT H OMEIn China, from the 1980s, there are also many debates over domestication and foreignization. In 1987, Liu Yingkai published his paper Domestication –The Wrong Track of Translation in which he pointed out the prevalence of domestication in Chinese translation field. Liu summed up the manifestation of domestication in five forms: (1) the abuse of four-word idioms; (2) the abuse of words of classic elegance; (3) the abuse of abstraction; (4) the abuse of replacement; (5) the abuse of allusions and images. Liu (Liu in Yang, 1994: 269) argues that domesticating translation, by assimilating the national characteristics of the ST, distorts the ST and may efface the national features of a culture. Xu Yuanchong favors domesticating translation. He (Xu 2000:2) sees clearly the differences between eastern and western cultures, and proposes the theory of cultural competition to deal with the cultural differences. That is, a translator should make full use of the strength of the TL in order to make the TT more beautiful. For example, as using of four-character-phrases is widely acknowledged as one of the characteristics as well as strong points of the Chinese language, Xu uses a lot of four-character phrases in his translation. He also likes to use phrases from ancient Chinese literary works in his translation.In 2002, Chinese Translation Journal alone published six papers on translation strategies from the perspective of cross-cultural communication. As a whole the voice for foreignization dominates. Sun Zhili, a representative of foreignization, thinks that the primary task of translating is to precisely and fully convey the thought and style of the source text. He predicts foreignization will be the preferred strategy of literary translation in China in the 21st century (Sun Zhili 2002:40-44). Sun‘s opinion confronts some disagreements. Cai Ping, for instance, says that domestication should be the main stream in literary translation. Cai further explains the essential purpose of translation is to communicate, to lead readers to a good understanding of the source text. A heavily foreignized translation may be too foreign for readers to identify with, let alone to appreciate. In retrospect of translation history, Cai concludes that with the passage of time, foreignization always gives way to domestication (Cai Ping 2002:39-41). Xu Jianping holds a compromise proposal. He distinguishes two types of source texts: in English and in Chinese. Xu suggests that in order to fulfil cross-cultural communication, foreignization should be used in English-Chinese translation with domestication as supplement, while in Chinese-English translation, domestication should be used as much as possible. The reason is that an enormous group of Chinese readers eager to accept the foreign elements known of the foreign culture far more than foreign readers do about the Chinese culture (Xu Jianping 2002: 36-38).The wild variety of viewpoints presented to be for or against domestication or foreignization are from different perspectives. In fact, both domestication and foreignization have their advantages and disadvantages. Domesticating translation is easier for the readers to understand and accept. However, the naturalness and smoothness of the TT are often achieved at the expense of the cultural and stylistic messages of the ST. Foreignizing translation preserves the ST formal features and in turn informs the readers of the SL-culture, but alien cultural images and linguistic features may cause the information overload to the reader. In a word, both domestication and foreignization entail losses, as losses are inevitable in the translation process. It‘s hard to say which strategy is better, if the condition under which a translation is done is not taken into account.R EFERENCES[1]Venuti, Lawrence. 1995. The Translator‘s Invisibility: A History of Tra nslation. London & New York: Routledge.[2]Shuttleworth, M. & M. Cowie. 1997. Dictionary of Translation Studies.Manchester, UK: St Jerome Publishing.[3]Jeremy Munday.2001. Introducing Translation Studies: Theories and applications. London and New York: Routledge.[4]Nida, Eugene. 2001. Language and Culture-Contexts in Translation. Shanghai: Shanghai Foreign Language Education Press.[5]Nord, Christiane. 2001. Translating as a Purposeful Activity – Functional Approaches Explained. Shanghai: Shanghai ForeignLanguage Education Press.[6]Baker, Mona. 1998. Routledge Encyclopedia of Translation Studies.London and New York: Routledge. pp.240-242[7]Newmark, Peter.2001.Approaches to Translation. Shanghai: Shanghai Foreign Language Education Press.[8]Even-Zohar, Itmar. 1978. Papers in Historical Poetics, in Benjamin Hrushovski and Itamar Even-Zohar (eds) Papers on Poeticsand Semiotics 8. Tel Aviv: University Publishing Projects.[9]Sun Zhili,2002.Literature Translation in China: from Domestication to Foreignization , in China Translation,V ol. 1, pp.40-44.[10]Xu Jianping, Zhang Rongxi, 2002. Domestication and Foreignization in Cross-Cultural Translation , in China Translation, V ol.5, pp. 36-38.[11]Wang Dongfeng, 2002. Domestication and Foreignization: a Contradiction?in China Translation, V ol. 9, pp. 24-26.[12]Cai Ping, 2002. Foreignization as the Main Method in Translation, in China Translation, V ol. 5, pp. 39-41.[13]Xu Yuanchong, 2000. New Translation Theory in the New Century, in China Translation, V ol. 4, pp. 2-6.[14]Liu Yingkai, 1987. Domestication —— the Forked Road in Translation, in Yang Zijian and Liu Xueyun (eds) New ViewpointsonTranslation, Hubei: Hubei Education Press, 1994, pp. 269-282.。
Techniques of Automation &Applications以视频点播(VOD)为载体的高校思想政治教育网络系统构建与实践刘博(西安航空职业技术学院学生处,陕西西安710089)摘要:作为计算机技术、网络通讯技术、电视技术综合发展的重要产物,视频点播直接为现代化教育教学提供了全新的教学方式与环境,在教育领域中所占据的地位也越来越突出。
对此进行以视频点播为载体的高校思想政治教育网络系统构建,并实现其与思政教育的有机融合,促使其充分发挥网络多媒体技术的优势作用,有助于学生的个性化学习与合作学习,这同时也是进行高校思政教育方式创新的有效途径。
关键词:视频点播;载体;高校;思政教育网络系统中图分类号:TP311.52文献标志码:A文章编号:1003-7241(2019)10-0166-05Construction and Practice of Ideological and Political Education Network System in Colleges and Universities Based on VODLIU Bo(Student office of Xi'an Aeronautical Polytechnic Institut,Xi ’an 710089China )Abstract:As an important product of the comprehensive development of computer technology ,network communication technologyand television technology,video-on-demand directly provides a new teaching method and environment for modern educa-tion and teaching ,and occupies a more and more prominent position in the field of education.The network system of ideo-logical and political education in colleges and universities based on video-on-demand (VOD)is constructed ,and its organ-ic integration with ideological and political education is realized ,which promotes it to give full play to the advantage of network multimedia technology.It is helpful for students undefined individualized learning and cooperative learning ,which is also an effective way to innovate the way of ideological and political education in colleges and universities.Key words:video-on-demand;carrier;university;ideological and political education network system收稿日期:2019-02-271引言在传统思政教育教学中,学生通过教师接收思政知识体系,并且课程内容是由教师所决定的,学生只能够被动接受教育内容,使得教学效果相对偏低。
A Comparative Study for Domain Ontology Guided FeatureExtractionBill B. Wang, R I. (Bob)M cK ay, Hussein A. Abbass, Michael BarlowSchool of Computer Science, University College, ADFA, University of New South WalesCanberra, ACT 2600{biaowang, rim, abbass, spike}@.auAbstractWe introduced a novel method employing a hierarchical domain ontology structure to extract features representing documents in our previous publication (Wang 2002). All raw words in the training documents are mapped to concepts in a concept hierarchy derived from the domain ontology. Based on these concepts, a concept hierarchy is established for the training document space, using is-a relationships defined in the domain ontology. An optimum concept set may be obtained by searching the concept hierarchy with an appropriate heuristic function. This may be used as the feature space to represent the training dataset. The proposed method aims to solve some drawbacks suffered by text classification algorithms and feature selection algorithms. In this paper, we conducted a series of experiments to compare our approach with other comparable feature-selection and feature-extraction methods. The results indicated that our approach has advantages in many aspects.Keywords: text classification, ontology, concept hierarchy, principal component analysis, KNN algorithm, information gain, 2 statistics.1.IntroductionWith the large amount of papers that exist in organisations, it is becoming vital to automatically import these papers into the Computer. Optical Character Recognition (OCR) software is quite efficient in transforming typed documents. However, after storing millions of documents in databases, the question arises of how to assign these documents to specialised databases. On a more technical level, the question becomes how to classify these documents. Automatic text classification (Salton 1989) is the task of assigning natural language texts to one or more pre-defined categories based on their content.Compared with common data classification tasks, text classification presents unique challenges due to the large and unfixed number of features present in the dataset, large number of documents, and multi-modality of categories. Existing classification techniques have limited applicability in these datasets because the large numbers of features make most Copyright © 2003, Australian Computer Society, Inc. This paper appeared at the Twenty-Fifth Australian Computer Science Conference (ACSC2003), Adelaide, Australia. Conferences in Research and Practice in Information Technology, Vol. 16. Michael Oudshoorn, Ed. Reproduction for academic, not-for profit purposes permitted provided this text is included.documents undistinguishable in higher dimensional spaces.Many researchers have shown that similarity based classification algorithms, such as KNN and centroid based classification, are very effective for large document collections (Shankar 2000). A cross-experiment comparison (Yang 1999) between 14 major classification methods, including KNN, decision tree, naive Bayes, linear least squares fit, neural network, SWAP-1, Rocchio, etc., has shown that KNN is one of the top performers, and it performs well in scaling up to very large and noisy classification problems. However, these effective classification algorithms still suffer disadvantages from high dimensionality that greatly limit their practical performance. Empirical and mathematical analysis (Beyer 1999, Hinneburg 2000) has shown that finding the nearest neighbors in high-dimensional space is very difficult because most points in high-dimensional space are almost equi-distant from all the other points.In fact, in many document data sets, only a relatively small number of the total features may be useful in classifying documents, and using all the features may adversely affect performance. So determining how to reduce the length of document vectors effectively and reasonably is a challenge for classification researchers. Stop words lists (Fox 1992) and word stemming (Frakes 1992) are some of the earliest efforts in this problem. In recent years, many term-weighting and feature-selection algorithms (Lewis 1994, Yang 1997, Shankar 2000, John 1994, Kira 1992) have been developed, to reduce the feature space without sacrificing remarkable classification accuracy. However, the effectiveness of these algorithms heavily depends on the quality of the training dataset. This is a major drawback for text classification methods, as the creation of high quality datasets may be very expensive. The performances of both the text classification algorithms discussed above, and of feature selection algorithms, depend on the quality of training dataset. The KNN classifier is an instance-based classifier, which means a training dataset of high quality is particularly important. An ideal training document set for each particular category will cover all the important terms, and their possible distribution in this category. With such a training set, a classifier can find the true distribution model of the target domain. Otherwise, a text that uses only some key words out of a training set may be assigned to the wrong category. In practice,however, establishing such a training set is usually infeasible. In practice, a perfect training set can never be expected.In our previous work (Wang 2002), we introduced a novel method to effectively and reasonably reduce the dimensionality and improve the performance of a text classifier. By searching the concept hierarchy defined by a domain-specific ontology, a more precise distribution model for a pre-defined classification task can be determined. The experiments indicated that, by using this approach, the size of the feature sets can be effectively reduced and the accuracy of the classifiers can be increased. In this paper, we will compare our approach with some other dimensionality-reduction methods through a series of comparison experiments. This paper is structured as follows: Section 2 briefly introduces the related work, Section 3 introduces the notion of domain-specific concept ontology and UMLS knowledge resources, Section 4 describes the process of this system, some experimental results and discussions are presented in Section 5, finally the conclusion is given in Section 6.2.Related WorkIn this section, we briefly review some background research including unsupervised and supervised dimensionality reduction applied to document datasets, some previous attempts to apply semantic knowledge in unsupervised and supervised learning, and our work. There are several methods for reducing the dimensionality of high-dimensional data in an unsupervised learning model. Most of these methods reduce the dimensionality by combining multiple variables or features, utilizing the dependencies among the variables detected by statistical tests. Consequently, these techniques can capture synonyms in the document datasets. These methods are also called feature extraction.Principal Component Analysis (PCA) (Calvo 1998) is a key method. Given an n ×m document-term matrix, PCA uses the first eigenvectors of the m ×m covariance matrix as the axes of the lower k-dimensional space. These leading eigenvectors correspond to linear combinations of the original variables that account for the largest amount of term variability (Jackson 1991). Latent Semantic Indexing (LSI) (Deerwester 1990) is a dimensionality reduction technique extensively used in the information retrieval domain and is similar in nature to PCA. In LSI, instead of finding the truncated singular value decomposition of the covariance matrix, the method finds the truncated singular value decomposition of the original n ×m document-term matrix, and uses these singular eigenvectors as the axes of the lower dimensional space. Experiments have shown that LSI substantially improves the retrieval performance on a wide range of datasets (Dumais 1995). However, the reason for LSI’s robust performance is not well understood, and is currently an active area of research (Papadimitriou 1998).In effect, these methods try to find the semantic relationships between features using statistical tests. A key problem is that their performances depend strongly on the sufficiency and the quality of the training dataset. Most importantly, their discovery cannot extend to features that do not occur in the training dataset. In principle, all of the methods developed for unsupervised dimensionality reduction can potentially be used to reduce the dimensionality in a supervised model as well. However, in doing so, they cannot take advantage of the class or category information available in the dataset. Another limitation of these methods in supervised data is that characteristic variables that describe smaller classes tend to be lost as a result of dimensionality reduction. Hence, the classification accuracy on smaller classes in the reduced dimensional space can be quite poor. On the other hand, stratified sampling to avoid this problem can result in poor classification accuracy on the larger classes.Various feature selection methods have been developed for supervised dimensionality reduction (Kira 1992, Karypis 2000, Yang 1997, Moore 1997, Liu 1998b, Kohavi 1997). A number of researchers have recently addressed the issues of feature subset selection in machine learning. As noted by John, Kohavi and Pfleger (John 1994), this work is often divided along two lines: the filter approach and wrapper approach. In the filter approach, feature selection is performed as a preprocessing step to induction. Thus, the bias of the learning algorithm does not interact with the bias inherent in the feature selection algorithm. The main disadvantage of the filter approach is that it totally ignores the effects of the selected feature subset on the performance of the induction algorithm. In the filter approaches, the different features are ranked using a variety of criteria, and then only the highest-ranked features are kept. A variety of techniques have been developed for ranking the features (i.e., words in the collection) including document frequency (number of documents in which a word occurs), mutual information (Cover 1991, Yang 1997,Joachims 1997), and χ2statistics (Yang 1997). Consequently, even though the criteria used for ranking is the measure of the effectiveness of each feature in the classification task, these criteria may not be optimal for the classification algorithm used. Another limitation of this approach is that these criteria measure the effectiveness of a feature independent of other features, and hence features that are effective in classification only in conjunction with other features will not be selected. In contrast to the filter approaches, wrapper approaches find a subset of features using a classification algorithm as a black box (Kohavi 1997, Kohavi 1995, Liu 1998b). In these approaches the features are selected based on how well they improve the accuracy of the classifier. The wrapper approaches have been shown to be more effective than the filterapproaches in many applications (Kohavi 1995, Wettschereck 1997, Langley 1994). However, the major drawback of these approaches is that their computational requirements are very high. This is particularly true for document datasets where there are thousands of features.To date, there have been few efforts to apply semantic information in text classification.Koller and Sahami (Koller 1997) proposed an approach that utilizes the semantic information provided by the hierarchical topic structure to decompose the classification task into a set of simpler problems, one at each node in the classification tree. Their experiments indicate that each of these smaller problems can be solved accurately by focusing only on a very small set of features, those relevant to the task at hand. This set of relevant features varies widely throughout the hierarchy, so that, while the overall relevant feature set may be large, each classifier only examines a small subset.Hotho, Staab and Maedche (Hotho 2001) proposeda semantic approach for document clustering. Theyapply background knowledge during preprocessing in order to improve clustering results and allow for selection between results. In their experiments, all terms occurring in documents were mapped to concepts. They built various views, basing the selection of text features on a hierarchy of concepts.Their results indicate that this approach compares favorably with baselines, such as clustering based on terms by tf/idf measures. The selected concepts may be used to indicate, to the user, which text features were most relevant for the particular clustering results, and to distinguish different views.3 Domain-Specific Concept Ontology and UMLS Knowledge ResourcesThe term ontology has various meanings when it is used in different ways and in different disciplines. However computer scientists use the term ontology to describe formal descriptions of objects in the world, the properties of those objects, and the relationships among them. In artificial intelligence, according to Gruber (Gruber 1993), an ontology is a specification of a conceptualization. It defines the vocabulary of a domain, and constraints on the use of terms in the vocabulary.In our research, a term is a sequence of alpha-numeric characters which is delimited by white space or punctuation marks. A domain-specific concept ontology specifies the concepts that are used to represent documents. A concept represents a unit of meaningful information in this domain. A concept may consist of one or more terms. A domain ontology also specifies the categories attached to these concepts, and the relations (ISA in this paper) which exist between concepts and categories (Figure 1). The hierarchical concept structure, which we use for a particulartraining document set, is a part of a domain-specific concept ontology based on terms used in the training set. The process to establish this structure is introduced in Section 4.The Unified Medical Language System (UMLS), a set of knowledge sources developed by the US National Library of Medicine, can be viewed as a complete concept ontology for medical domains. It consists of three sections: a metathesaurus, a semantic network and a specialist lexicon; and contains information about medical terms and their inter-relationships. It is organized by concept, and contains over 800,000 concepts and 1.9 million entries. Various types of relationships between concepts are defined in this system. ISA is the primary relationship. We used this relationship to establish the hierarchical concept structure for a particular training set containing documents in the medical domain.4Establishing Concept RepresentationThere are four major steps to establishing a concept representation for documents.1.Map raw terms to concepts based on UMLS2.Establish a concept hierarchy for the training set3.Search the concept hierarchy to obtain the optimalconcept set4.Establish a new feature model for both trainingand test documents4.1Mapping Raw Terms to ConceptsThe most straightforward representation of documents relies on term vectors. The major drawback of this basic approach for document representation is the length of the feature vectors, usually more than 10,000 terms. In the application of text categorisation, however, completely different terms may represent the same concepts. In some cases, terms with different concepts can even be replaced with a higher level concept without negative effect on performance of the Figure 1. A Sketch of a Domain-Specific Concept Ontologyclassifier. For example, ANEMIA and LEUKEMIA can be replaced with the higher level concept HEMATOLOGIC DISEASE, in many situations of text categorization, without loss of the classifiers’ accuracy. Obviously, mapping terms to concepts is an effective and reasonable method to reduce the dimensionality of the vector space.The mapping process relies on the API provided by the UMLS system. We use the mapping function provided by the UMLS query interface. We aim to find the ‘longest concept units’ (LCUs) in documents. An LCU is an independent concept defined by a string of continuous terms such that any other string of continuous terms, which contains this string, does not define an independent concept. For example, consider the sentenceAIDS is a kind of human immunodeficiency virus.According to the mapping algorithm defined below, we will get two concepts: ‘AIDS’ and ‘HIV’. ‘Human’ and ‘virus’ are not recorded as independent concepts, even though they do occur as independent concepts in the concept ontology, because they are part of the LCU ‘HIV’. We take this approach because an LCU is usually more meaningful for identifying the content of a document.Term to Concept Mapping Algorithm:Input: a sentence consisting of n terms Φ = {t 1, t 2, …, t n }a concept set C = ∅ while (Φ ≠ ∅) Φt ← Φwhile (Φt ≠ ∅)c ← mapping(Φt ) if c ≠ Nullthen put c in C, remove Φt from Φ, Φt ← ∅ else remove t i from Φt (i = |Φt |) loop loopreturn C as the concept set for this sentenceThrough this mapping process, we will obtain concept sets for individual documents. Each will include all distinct concepts. The frequency of the concepts is recorded. Combining such data for all training documents, we will also obtain a shared concept set for the training set, which includes all distinct concepts from the whole document set.4.2 Establishing the Concept HierarchyThe UMLS query interface provides a parent query function for retrieving parents of concepts. The concept hierarchic structure is established byrepeatedly querying parent’ from shared concepts up to the root of the semantic network. The completedconcept hierarchic structure is a fully-connected graph rooted at ‘top_type’.For instance, suppose that there are only five distinctconcepts as below occurring in a document set.Figure 2. A Sample Concept HierarchicalStructure [Mastadenovirus, AIDS, Human Immunodeficiency Syndrome, Alfamovirus, Dengue Virus ]Based on this shared concept set, we will get the concept hierarchic structure in Figure 2. From this structure, we can see that several combinations of different level concepts can be chosen to represent documents for different taxonomic standards, e.g. [Virus], [DNA Virus, RNA Virus] and [DNA Virus, Retroviridae, Astroviridae]. All original concepts can be mapped to these higher level concepts.Then, when a new concept (e.g. adenoviridae) occurs in the fresh documents, it can be easy mapped to ‘DNA Virus’ for classification algorithms. This new concept would be ignored in most traditional classifiers because it never occurs in the training data set.4.3 Search Concept HierarchyIn this paper, we use a hill-climbing algorithm to search the concept hierarchical structure obtained in the previous step to find the optimum representation (a set of concepts) for a particular document set. Our aim is to use a set of concepts to represent training documents which is as high in the concept hierarchy as possible without loss of categorization accuracy.First, we specify that all concept nodes, except the root node, have an out edge to their parent nodes. Then we establish a copy of the hierarchical structure for each document . We assign the frequency of each concept occurring in the document to the edge leading into the parent concept node. Thus the frequency for the edge leading out of a parent concept node is the sum of the edge frequencies of all child nodes.The vital problem is to define an appropriate heuristic function for the hill climbing search algorithm. In this model, each document d is considered to be a vector in the concept-space. In its simplest form, each document is represented by the concept-frequency (CF) vector,where cf i is the frequency of the ith concept in the document.In order to account for documents of different lengths, each frequency is normalized by dividing by the document length.In the vector-space model, the similarity between document i and document j is commonly measuredthe above formula simplifies toji j i ij dd d d s ⋅==),cos(.Finally, we define the heuristic function ( f ) for the hillci the set of all documents that belong to the same category, which contains document i, in a particular document set. D KNNi is the set of k nearest neighbors for document i in the training set. n is the dimensionality of feature space. α is the number of leaf nodes. β is a constant. The first part of the right side of the equation is a reward factor, intended to encourage the use of high level concepts in the feature set with despite a limited loss of categorization accuracy. We suggest that β is chosen less than 0.05. This means that the effect on heuristic value resulting from the reward is at most 5%. The value of k depends on the size of the document collection. We do not suggest that a large value of k is used. This is because a large k may result in unbalanced performance on different categories. In the future, a further study about the k value sensitivity of the performance of classifiers will be conducted.We define our bottom-up hill climbing search algorithm as follows.Initial status : current_concept_set Φ ccs = {all leaf concepts in concept hierarchy} heuristic f opt = f(Φ ccs )Temporal_ concept_set (Φ tcs ) = ∅while(has more unmarked concept c in Φ ccs )Φtcs ← Φ ccstake an unmarked concept c ∈ Φ tcsfind parent concept node (c p ) of c in concept hierarchyuse parent c p to replace c in Φ tcsremove all child concepts of c p from Φ tcs f = f(Φ tcs ) if f > f optthen Φ ccs ← Φ tcs , f opt ← f else mark c in Φ ccs loopreturn: Φ ccs as optimal concept set4.4 Establish new Feature ModelBased on the optimum concept set obtained from the previous step, we can establish a new feature model for training documents. All training documents are represented by concept vectors, that is, all original concepts are mapped to the optimum concepts.For new documents, we also map the terms to concepts using the same process. Then we add all the new concepts that did not occur in the training set to the concept hierarchy. We can also represent testing documents as concept vectors based on the optimum concept set.5 Experiment Setup and ResultsIn this section we experimentally evaluate the effect on the KNN classifier of using the domain-specific concept hierarchy to guide feature selection. In our experiments, we compare the performance of our feature selection method against the performance achieved by a common KNN classifier and other feature-extraction methods. Also, we study the effect of the training set size on the performance of these methods.The common KNN classification experiments used the RAINBOW system (McCallum 1996), which includes stop word removal, stemming and feature selection. The principal components were calculated by SPSS 11.5.1 Document CollectionWe chose documents from 10 journals of the MEDLINE database (PUBMED) to form our training and test document sets as in Table 1. Every document is labeled by the name of the journal that contains the document. The subjects of these documents are obviously independent of each other so they can be viewed as reasonable pre-existing categories. 150 documents were chosen randomly from each journal by people without specialized medical knowledge. 50 of 150 documents from each category were randomly chosen as test set and the remaining 100 formed the training set except where otherwise specified. For one set of experiments comparatively studying the effect of the training set size, we formed specific-size subsets by randomly choosing the same number of documents from each category.The size of the test set was never varied in the experiments. We used title plus abstract as the text for our experiments.),...,,(21n cf cf cf =Journal Name Category Name Covered yearsAddiction (Abingdon, England) Addiction 1999, 2000AIDS Care AIDS 1998, 1999, 2000, 2001American Heart Journal Heart 2000, 2001Cancer Research Cancer 1999, 2000The British Journal of Ophthalmology Ophthalmology 2000, 2001Burns: journal of the InternationalSociety for Burn InjuriesBurns 2000, 2001 Bone Bone 2000, 2001Epilepsy research Epilepsy 1999, 2000, 2001Diabetes Diabetes 1999, 2000Clinical and experimental dermatology Dermatology 1999, 2000Table 1: Details of document collectionTraining Size Distinct Terms Distinct Concepts Optimum Concepts 1,000 16,163 4,645 1,634750 14,046 4,034 1,540500 11,454 3,182 1,327300 8,948 2,436 942200 7,634 1,894695Table 2: Statistical information concerning the training setCategory Name Term-based Original Concepts Optimum Concepts Addiction 100% 96% 94%AIDS 92% 92% 90%Heart 96% 92% 92%Cancer 96% 90% 94% Ophthalmology 72% 80% 84%Burns 58% 66% 70%Bone 66% 68% 74%Epilepsy 74% 72% 78%Diabetes 80% 82% 88% Dermatology 30% 52% 58%Overall 76.4% 79.0% 82.2%STD (Overall) 0.2162 0.1424 0.1198Table 3: Comparison of accuracy in default training set5.2 Accuracy MeasureTo evaluate the trained classifier on test documents for each class, we defined an accuracy measure as follows. It is consistent with that used by RAINBOW system. of the classifiers on each particular category. “correctly assigned documents” means all documents which are correctly assigned to the particular category. “total candidate documents” means all test documents which should be assigned to the particular category. The overall performance can be measured in the same way. Since every document in our data set has only one category label, the ‘recall’ measure is not considered in our experiments.5.3 Document pre-processing By pre-processing the training documents using the two methods separately, we derived statistical information about our training set as in Table 2.The number of distinct terms was obtained by using the mapping process and the number of optimum concepts was obtained by using the search algorithm we introduced above. For the heuristic function, βof 0.05 and k of 5 were used.As we see in Table 2, even using the original concept set, compared with the term set, causes a significant reduction in dimensionality of feature space.5.4 Performance comparison between the two approachesIn this section, we used the default training set to compare the effect of different feature sets on performance of KNN classifier. Category ranking in KNN is based on the categories assigned to the knearest training documents to the test document. The categories of these neighbors are weighted using the similarity of each neighbor to the cosine between the two document vectors. The reasons for choosing KNN in our experiments are the same as those of Yang and Pedersen (1997): it is one of top-performing classifier; it is a context-sensitive classifier that enables a better observation on feature selection.Table 3 shows the performance of the classifier on the 10 categories. A desirable classifier should have balanced performance for the pre-defined categories in the training set. Therefore, we computed the standard deviation (STD) for performance of the categories. It is possible that different values of k might be needed to achieve optimal performance for the different methods. For each method, therefore, we tried three values (5, 10, 15) of k, and the best results are reported in the result table.A number of interesting observations can be made from the results in Table 3. First, compared with the term-based classifier, the overall performances achieved by the original concept model and the optimum concept model increased from 76.4% to 79% and 82.2% respectively, or a 3.4% and a 7.6% increase relatively respectively. Second, we see that our method smoothes the performance of the classifier on different categories. Compared with the term-based classifier, the values of STD for two concept-based classifiers have a 31.4% and a 44.6% relative decrease respectively.5.5 Effect of feature size on performanceIn this section, we apply feature selection methods to documents in the pre-processing of a term-based KNN classifier. Through this experiment, we may study the effect of statistics-based feature selection methods on performance of term-based KNN classifier using the default training set. The RAINBOW system provides a feature selection function using the information gain method. A comparative study on feature selection methods (Yang 1997), including document frequency thresholding (DF) (Yang 1997), information gain (IG) (Mitchell 1996), mutual information (MI) (Yang 1997, Wierner 1994), χ2statistic (CHI) (Yang 1997), and term strength (TS) (Yang 1995, Wilbur 1992), shows that IG, DF and CHIMAX have similar effects on performance of the classifiers and all are better than the other two. Therefore, the experimental results may, in a sense, provide information on how the statistics-based feature methods affect the performance of a KNN classifier on our dataset.The influence of the information gain method and CHIMAX method are evaluated using the overall accuracies of the classifier and the STD of accuracies of individual categories. Figure 3 displays the four curves for the term-based KNN classifier on the default training set.sizes obtained by feature selectionAn observation emerges from the categorization results of Figure 3. That is, although the performance of the classifier in terms of overall accuracy has not significantly declined until 90% of distinct terms are eliminated, the values of STD starts a clear increase once half the terms are removed for both selection methods. This means that we expect an imbalance of categorization accuracy when the information gain method is employed to reduce the dimensionality of the feature space for a KNN classifier on our dataset.5.6 Effect of training set size on performanceA comparative experiment measuring the performance against the size of training set was conducted using training sets of different sizes listed in table 2. The optimum concept sets were discovered for training sets of different sizes. The feature selection algorithm was not used for this experiment because it does not improve the performance in terms of accuracy. Moreover, it is difficult to define a selection threshold for training sets of different sizes. The experimental results are shown in Figure 4.When the size of training set increased from 200 to 1000, the accuracy of concept-based classifier increased from 72.2% to 82.2%, or a 13.9% increase relatively, and the accuracy of term-based classifier increased from 66.8% to 76.4%, or a 14.4% increase relatively. In addition to this, another interesting observation can be made from Figure 4. We divide this process into two stages. In the first stage, when the size of training set increased from 200 to 500, the accuracy of the concept-based classifier increased from 72.2% to 79.8%, or a 10.5% increase relatively, and the accuracy of the term-based classifier increased from 66.8% to 72.0%, or a 7.8% increase relatively. In other words, in this stage, the gradient of the accuracy of the concept-based classifier is 34.6% larger than that of the term-based classifier. In contrast, in the second stage, when the size of training set increased from 500 to 1000, the gradient of the accuracy of the term-based classifier is 103.3% larger than that of the concept-based classifier. It seems to indicate that the accuracy of the concept-。