常用的英语语料库
- 格式:doc
- 大小:11.00 KB
- 文档页数:1
常用的英语语料库English corpora, or language corpora, are collections of text samples that are used for linguistic research and analysis. These corpora serve as valuable resources for studying language patterns, trends, and usage in various contexts. In this article, we will explore some of the commonly used English language corpora and their applications.1. British National Corpus (BNC)The British National Corpus is one of the most widely used language corpora for studying contemporary British English. It contains a diverse range of texts, including spoken conversations, written documents, and academic papers. Researchers can access the BNC to examine language usage in different genres and domains, such as science, politics, and fiction. The BNC provides valuable insights into the changes in the English language over time.2. Corpus of Contemporary American English (COCA)The Corpus of Contemporary American English is a comprehensive corpus that provides a vast collection of English texts from different genres, including spoken, written, and academic. It offers researchers the opportunity to investigate various aspects of American English, including vocabulary, syntax, and discourse patterns. The COCA is frequently used in linguistic research, language teaching, and corpus-based language analysis.3. Google Books Ngram ViewerThe Google Books Ngram Viewer is a powerful tool that allows researchers to analyze the frequency of words or phrases in the vast collection of books digitized by Google. It provides a visual representation of the usage of specific terms over time, offering insights into the historical development and popularity of certain expressions. This tool is useful for investigating language change and cultural shifts through the lens of published literature.4. Corpus Linguistics Toolkit (CLAWS)The Corpus Linguistics Toolkit, also known as CLAWS, is a suite of programs specifically designed for corpus analysis. It provides researchers with tools for processing, annotating, and analyzing text corpora. CLAWS allows for the extraction of linguistic features, such as part-of-speech tags and named entities, which can be utilized for various linguistic studies. The toolkit's versatility makes it a valuable resource for researchers in the field.5. International Corpus of English (ICE)The International Corpus of English is a collection of English language corpora from different countries and regions. It aims to capture the linguistic variations within the English language across different cultures and contexts. The ICE provides researchers with valuable data for studying dialectal differences, language contact phenomena, and sociolinguistic aspects of English.6. Oxford English Corpus (OEC)The Oxford English Corpus is a corpus of contemporary English texts that serves as a reference for the analysis of language usage and trends. Itincludes a wide range of written and spoken materials from various sources, such as books, newspapers, and online platforms. The OEC is frequently used for linguistic research, lexicography, and language teaching purposes.7. Corpus Query Language (CQL)Corpus Query Language is a specialized language used to search and retrieve specific linguistic patterns within corpora. It enables researchers to formulate complex queries and retrieve relevant linguistic data for analysis. CQL is widely used in corpus linguistics and facilitates the exploration of language patterns and structures within corpora.In conclusion, English language corpora play a vital role in linguistic research and analysis. The aforementioned corpora, including the British National Corpus, Corpus of Contemporary American English, Google Books Ngram Viewer, Corpus Linguistics Toolkit, International Corpus of English, Oxford English Corpus, and Corpus Query Language, provide valuable resources for investigating language usage, trends, and patterns in various contexts. These corpora aid in the understanding of language change, societal influences, and cultural shifts, making them invaluable tools for language researchers, educators, and language enthusiasts.。
英语语料库#参考2012-03-02 22:29:26■BNC=The British National Corpus英国国家语料库/(备用)/bnc/■ANC = The American National Corpus美国国家语料库/■COCA = Corpus of Contemporary American English 美国当代英语语料库/■COHA = Corpus of Historical American English 美国近当代英语语料库/coha/■BOE=Bank of English 柯林斯英语语料库/wordbanks/■NMC=New Model Corpus 新规范语料库/■ARC=ACL Anthology Reference Corpus (ARC) 文选参考语料库/auth/preloaded_corpus/aclarc/ske/first_form■BAWE=British Academic Written Eng Corpus (BAWE) 英国学术书面语语料库/auth/preloaded_corpus/bawe2/ske/first_form/AcaDepts/ll/app_ling/internal/bawe/sketch_engine_bawe.htm download■BASE= British Academic Spoken English Corpus (BASE) 英国学术口语语料库/fac/soc/celte/research/base/■SCTS=Scottish Corpus Of Texts and Speech苏格兰口语与书面语语料库/■SCTS=Corpus Of Modern Scottish Writing 当代苏格兰语实库/cmsw/Slang/ (American, English, and Urban slang)/slang/ (UK)//cybereng/slang///可以免费使用的大型英语语料库资源常用语料库资源链接汇集(语料天涯)/time/http://www.lextutor.ca/concordancers/concord_e.html 常用语料库资源链接汇集(语料天涯)http://202.204.128.82/sweccl/Corpus//netprints/Corporalink/Corporalink.htm1. BNC-World Simple Search ☆☆☆/lookup.html But no more than 50 hits will be displayed, with a fixed amount of context.2. Brown, LOB, BNC sampler ☆☆☆Here are a few links for searching corpora online, including monolingual corpora like Brown, LOB, and BNC sampler and also some parallel English-Chinese corpora.English: /concordance/WWWConcappE.htmEnglish: http://www.lextutor.ca/concordancers/concord_e.htmlParallel: /concordance/paralleltexts/3. Collins Cobuild Corpus Concordance Sampler☆☆☆☆☆/Corpus/CorpusSearch.aspxThe Collins WordbanksOnline English corpus is composed of 56 million words of contemporary written and spoken text.4. New BNC interface - VIEW: ☆☆☆☆☆/5. Samples (about 2 million words) from the British National Corpus: both written and spoken ☆☆☆The Brown Corpus and many others - native, learner... Go tohttp://www.lextutor.ca/concordancers/concord_e.html6. MICASE ☆☆☆☆/m/micase/There are currently 152 transcripts (totaling 1,848,364 words) available at the site.7. CLEC online concordancing ☆☆☆☆/corpus/EngSearchEngine.aspxCLEC收集了包括中学生、大学英语4级和6级、专业英语低年级和高年级在内的5种学生的语料一百多万词,并对言语失误进行标注。
语言学常用语料库
语言学常用语料库有很多,以下是一些常用的语料库:
1. Brown语料库:美国布朗大学语言学部于1960年代编制的语料库,是英语语料库中最早的、最著名的语料库之一。
2. Penn Treebank语料库:由宾夕法尼亚大学开发的语料库,主要用于句法分析和语言学研究。
3. CoNLL语料库:共享任务(Conference on Computational Natural Language Learning)所使用的语料库,包括各种语言的语料。
4. Europarl语料库:包括欧洲议会会议的多种语言翻译版本,用于机器翻译和跨语言研究。
5. Google语料库:由Google搜索引擎收集的大规模网络文本语料库,可用于研究自然语言处理和文本挖掘等领域。
6. Corpus of Contemporary American English (COCA):包括当代美国英语的语料库,涵盖了各种不同类型的文本。
7. British National Corpus (BNC):出版物、广播和会话等来源的英国英语语料库,是英国英语的重要资源。
这些语料库提供了大量的文本数据,可用于研究不同语言的语
言学现象,如词汇使用、语法结构和语义等。
它们对于语言学研究和自然语言处理的发展起着重要作用。
英语句子语料库
英语句子语料库是用于自然语言处理和机器学习的文本数据集,其中包含大量英语句子。
以下是一些常用的英语句子语料库:
1. Brown Corpus:布朗语料库,包含一百万个单词,涵盖了多种文体和主题。
2. Penn Treebank:宾州树库,包含大量的英语句子和词性标注信息。
3. CoNLL 2003:用于命名实体识别和词性标注的语料库。
4. OntoNotes:多语言语料库,包含英语、中文、阿拉伯语等多种语言的文本和标注信息。
5. BERT Corpus:BERT模型使用的语料库,包含大量的英语句子和预训练模型。
6. OpenWebText Corpus:开源的Web文本语料库,包含大量的英语句子和网络用语。
7. Common Crawl Corpus:公共爬虫语料库,包含大量的英语句子和网页内容。
8. News Crawl Corpus:新闻爬虫语料库,包含大量的英语新闻文章和句子。
9. WikiText-103:维基百科文本语料库,包含大量的英语维基百科文章和句子。
10. BookCorpus:书籍语料库,包含大量的英语书籍和句子。
以上是一些常用的英语句子语料库,可以根据需要进行选择和使用。
这些语料库可以用于自然语言处理、机器学习、文本挖掘等领域的研究和应用。
一、简单介词:1.表示地点:at ,in, on, to, above, over,below, under, beside, behind , between2.表示时间:in , on,at, after, from, since for, behind3.表示运动:across, through, past, to, towards, onto, into, up, down4.表示进行:at, under, on5.表示其他:on, about, by, with, in二、复杂介词:1.双词介词:指由两个单词构成的复杂介词。
according to 按照 irrespective of 不顾ahead of 在...之前owing to 由于but for 要不是 together with 与...一起prior to 在...之前as for 至于save for 除了what with 由于2.三词介词:指由三个单词构成的复杂介词。
in line with 与...一致 in place of 代替for lack of 因缺少in return for 对...的回报by way of 经由,作为on account of 由于by force of 凭借with respect to 关于3.四词介词:指由四个单词构成的复杂介词。
for the purpose of 为了...的目的at the mercy of 受...的摆布for the sake of 为了 in the care of 由...照管in the teeth of 不顾,逆着on the eve of 在...的前夕on the ground of 根据on the part of 在...方面to the exclusion of 把...排除在外 with an eye to 为了under the auspices of 在...的支持下 under the guise of 在...的幌子下。
ig英语口语语料库大全1、You should take the medicine after you read the _______. [单选题] *A. linesB. wordsC. instructions(正确答案)D. suggestions2、1.________my father ________ my mother is able to drive a car. So they are going to buy one. [单选题] *A.Neither; norB.Both; andC.Either; orD.Not only; but also(正确答案)3、5.Shanghais is known ________ “the Oriental Pearl”, so many foreigners come to visit Shanghai very year. [单选题] *A.forB.as (正确答案)C.withD.about4、I'm sorry I cannot see you immediately. But if you wait, I'll see you_____. [单选题] *A. for a momentB. in a moment(正确答案)C. for the momentD. at the moment5、The sun disappeared behind the clouds. [单选题] *A. 出现B. 悬挂C. 盛开D. 消失(正确答案)6、How _______ it rained yesterday! We had to cancel(取消) our football match. [单选题] *A. heavily(正确答案)B. lightC. lightlyD. heavy7、Growing vegetables()constantly watering. [单选题] *A. neededB. are neededC. were neededD. needs(正确答案)8、33.Will Mary's mother ______ this afternoon? [单选题] * A.goes to see a filmB.go to the filmC.see a film(正确答案)D.goes to the film9、He has bought an unusual car. [单选题] *A. 平常的B. 异常的(正确答案)C. 漂亮的D. 废弃的10、We need a _______ when we travel around a new place. [单选题] *A. guide(正确答案)B. touristC. painterD. teacher11、She passed me in the street, but took no()of me. [单选题] * Attention (正确答案)B. watchC. careD. notice12、I think _______ is nothing wrong with my car. [单选题] *A. thatB. hereC. there(正确答案)D. where13、If you don’t feel well, you’d better ask a ______ for help. [单选题] *A. policemanB. driverC. pilotD. doctor(正确答案)14、I have worked all day. I'm so tired that I need _____ . [单选题] *A. a night restB. rest of nightC. a night's rest(正确答案)D. a rest of night15、22.______ is convenient to travel between Pudong and Puxi now. [单选题] *A.It(正确答案)B.ThisC.ThatD.What16、I _______ play the game well. [单选题] *A. mustB. can(正确答案)C. wouldD. will17、—What can I do to help at the old people’s home?—You ______ read stories to the old people. ()[单选题] *A. could(正确答案)B. mustC. shouldD. would18、I _______ seeing you soon. [单选题] *A. look afterB. look forC. look atD. look forward to(正确答案)19、45.—Let's make a cake ________ our mother ________ Mother's Day.—Good idea. [单选题] *A.with; forB.for; on(正确答案)C.to; onD.for; in20、We _______ swim every day in summer when we were young. [单选题] *A. use toB. are used toC. were used toD. used to(正确答案)21、______ the morning of September 8th, many visitors arrived at the train station for a tour.()[单选题] *A. FromB. ToC. InD. On(正确答案)22、23.Hurry up! The train ________ in two minutes. [单选题] *A.will go(正确答案)B.goC.goesD.went23、You have failed two tests. You’d better start working harder, ____ you won’t pass the course. [单选题] *A. andB. soC. butD. or(正确答案)24、Will you see to()that the flowers are well protected during the rainy season? [单选题] *A. it(正确答案)B. meC. oneD. yourself25、—______ is the concert ticket?—It’s only 160 yuan.()[单选题] *A. How manyB How much(正确答案)C. How oftenD. How long26、When Max rushed to the classroom, his classmates _____ exercises attentively. [单选题] *A. didB. have doneC. were doing(正确答案)D. do27、Mrs. Green has given us some _______ on how to study English well. [单选题] *A. practiceB. newsC. messagesD. suggestions(正确答案)28、Many people prefer the bowls made of steel to the _____ made of plastic. [单选题] *A. itB. ones(正确答案)C. oneD. them29、—I can’t always get good grades. What should I do?—The more ______ you are under, the worse grades you may get. So take it easy!()[单选题] *A. wasteB. interestC. stress(正确答案)D. fairness30、Sorry, I can't accept your invitation. [单选题] *A. 礼物B. 观点C. 邀请(正确答案)D. 好意。
汉英双语语料库1. 这个城市的夜景很美,特别是那些高楼大厦的灯光,让整个城市感觉都变得更加璀璨。
The night view of this city is beautiful, especially the lights of the tall buildings, which make the whole city feel even more dazzling.2. 这个博物馆收藏了许多有价值的文物和艺术品,是了解本地历史和文化的最佳场所。
This museum has collected many valuable relics and artworks, and is the best place to learn about local history and culture.3. 我们一家人喜欢去海边度假,享受阳光、沙滩和大海的美丽。
Our family likes to go to the beach for vacation, and enjoy the beauty of the sun, sand, and sea.4. 我们在学校的食堂里可以选择各种各样的美食,包括中餐、西餐和快餐。
We can choose a variety of cuisine in the school cafeteria, including Chinese, Western, and fast food.5. 这个地区的气候十分宜人,四季分明,春暖花开、夏日清凉、秋高气爽、冬日雪景,都让人感受到大自然的美好。
The climate of this region is very pleasant, with distinct seasons, spring blooms, cool summers, refreshing autumns, and snowy winters, all making people feel the beauty of nature.6. 现在越来越多的人喜欢做运动来保持身体健康,例如跑步、游泳、瑜伽等等。
语言学常用语料库
以下是一些语言学常用的语料库:
- Brown语料库:这是一个基于英语的语料库,包含了1961年至1979年间推广的1,000,000个单词的样本,覆盖了各种文体和题材。
- COCA(Corpus of Contemporary American English):这是一
个覆盖美国当代英语的语料库,包含了1990年至今的一亿多
个单词样本。
- BNC(British National Corpus):这是一个覆盖英国英语的
语料库,包含了1980年代至1993年间的一亿个单词样本。
- CHILDES(Child Language Data Exchange System):这是一
个收集婴儿和儿童语言数据的数据库,用于研究儿童语言发展。
- Penn Treebank:这是一个标注了句法和语义信息的英语语料库,用于自然语言处理研究。
- EuroParl语料库:这是一个包含欧洲议会会议记录的多语言
语料库,可以用于研究多语言对比和机器翻译。
- COrE(Corpus of English):这是一个以英语为基础的多样
化语料库,包含了来自不同国家和地区的语言样本,用于研究语言变体和语言接触。
- WALS(World Atlas of Language Structures):这是一个收集了世界各地不同语言结构的数据库,可以用于跨语言比较和语言学理论研究。
这些语料库可以通过在线平台或特定的研究机构访问和获取。
使用语料库可以帮助语言学家进行语言研究、语言分析和理论构建。
1.英语学习者语料库(书面语及口语)中国学习者语料库 CLEC(100万)广外、上海交大2.大学英语学习者口语语料库 COLSEC (5万) 上海交大3.香港科技大学学习者语料库 HKUST Learner Corpus 香港科技大学4.中国英语专业语料库 CEME (148万) 南京大学5.中国英语学习者口语语料库 SECCL (100万) 南京大学6.国际外语学习者英语口语语料库中国部分 LINSEI-China (10万) 华南师大7.硕士写作语料库 MWC (12万) 华中科技大学9.平行语料库汉英平行语料库 PCCE 北外10.南大-国关平行语料库南京大学11.英汉文学作品语料库;外研社12.冯友兰《中国哲学史》汉英对照语料库13.李约瑟(Joself Needham)《中国科学技术史》英汉对照语料库14.计算机专业的双语语料库;国家语言文字工作委员会语言文字应用研究所15.柏拉图(Plato)哲学名著《理想国》的双语语料库16.英汉双语语料库(15万对) 中科院软件所17.英汉双语语料库:LDC香港新闻英汉双语对齐语料36294段以及香港法律英汉双语对齐语料31万句子对中国科学院自动化研究所18.英汉双语语料库(100万),网上英汉语段电子词典及网上电子英汉搭配词典(1000万) 东北大学19.英汉双语语料库(40-50万句子对) 哈尔滨工业大学20.双语语料库(5万多对) 北京大学计算语言学研究所21.对比语料库 LIVAC(Linguistic variety in Chinese communities) 香港城市理工大学22.平衡语料库(Sinica Corpus);树图语料库(Sinica Treebank) 台湾23.特殊英语语料库中国英语(China English)语料库河南师范大学24.军事英语语料库(Corpus of Military Texts) 解放军外语学院25.新视野大学英语教材语料库上海交通大学26.汉语语料库汉语现代文学作品语料库(1979年,527万字) 武汉大学27.现代汉语语料库(1983年,2000万字) 北京航空航天大学28.中学语文教材语料库(1983年,106万8000字) 北京师范大学29.现代汉语词频统计语料库(1983年,182万字) 北京语言学院30.国家级大型汉语均衡语料库(2000万字) 国家语言文字工作委员会31.《人民日报》语料库(2700万字) 北京大学计算机语言学研究所32.大型中文语料库(5亿字,10分库) 北京语言文化大学33.现代汉语语料库(1亿字) 清华大学34.汉语新闻语料库;(1988年,250万字) 山西大学35.标准语料库(2000年,70万字)36.生语料库(3000万字);《作家文摘》的标注语料库(100万字) 上海师范大学37.现代自然口语语料库中国社会科学院语言所38.旅游咨询口语对话语料库和旅馆预定口语对话语料库中国科学院自动化所39.北京大学汉语语言学研究中心的三个语料库现代汉语语料库/yuliao.asp?item=1古代汉语语料库/yuliao.asp?item=2汉英双语语料库/yuliao.asp?item=3/printthread.php?t=2742汉语语料库使用权限国家语委语料库(http://219.238.40.213:8080/CpsQrySv.srf)”虽说是通用型平衡语料库,但不能完全免费使用;北京语言大学的汉语语料库(http://202.112.195.8)语料产出时间较早,且不能完全免费使用;北京大学汉语语言学研究中心语料库(现代汉语子库)”(/YuLiao_Contents.Asp)规模最大,逾亿字,但取样极不均衡,多半为文学作品;台湾“中央研究院”Sinica Corpus也是可免费使用的平衡汉语语料库。
常用的英语语料库有以下几个:
英国国家语料库(BNC):是目前世界上非常有代表性的当代英语语料库之一,收录了1亿字的电子资源,涉及口语和书面英语。
美国当代英语语料库(COCA):是目前最大的免费英语语料库,包含5.2亿词的文本,由口语、小说、流行杂志、报纸以及学术文章五种不同的文体构成。
密歇根大学学术口语语料库(MICASE):专注于学术口语,收集了大量学术讨论和讲座的语音转录。
密西根高阶学生论文语料库(MICUSP):主要收录高年级学生的论文,对于研究学术写作风格和习惯很有帮助。
台湾清大自然语言处理语料库(Linggle):结合大数据分析,提供了丰富的语料和语言统计信息。
这些语料库各具特色,可以根据具体的研究需求选择合适的语料库。