美国当代英语语料库(COCA)使用介绍共34页
- 格式:ppt
- 大小:7.19 MB
- 文档页数:34
美国当代英语语料库(COCA)在词汇教学中的应用价值张仁霞【摘要】本研究介绍了美国当代英语语料库(COCA)在英语词汇教学中的利用价值:充实单词语义,建立图式;学习单词搭配,归纳语义偏好;培养学生语体意识,学会恰当使用单词;发现单词的同义词近义词;真实语料和语境中习得词汇,培养观察归纳思维能力。
COCA对于学生进行英语词汇网络自主学习是很有价值的语料库资源和工具。
【期刊名称】齐齐哈尔大学学报(哲学社会科学版)【年(卷),期】2015(000)004【总页数】4【关键词】语料库;COCA;词汇教学□学科教学研究近年来,计算机和网络技术的迅猛发展为英语教学创造了新的条件,大大提高了英语教学的效率。
教学中引入网络语料库手段,将极大丰富英语教学的手段。
COCA—美国当代英语语料库 (Corpus of Contemporary American English) 是美国最新当代英语语料库,是当今世界上最大的英语平衡语料库。
关于其系统介绍,可以参考《美国当代英语语料库(COCA)——英语教学与研究的良好平台》[1]专业语料库需要购买昂贵的软件或者注册费用,繁忙的教学使得教师们无暇自建语料库,所以提到语料库,很多英语教师望而却步,加上多数具有“技术恐惧症”,认为语料库望尘莫及。
英语教师和学习者要观察当今美语使用变化的情况,COCA 提供了在线免费使用的良好平台。
它是由杨伯翰大学 Mark Davies 教授开发的高达 4.5 亿词汇库容的美国最新当代英语语料库,是当今世界上最大的英语平衡语料库。
其界面主要是为语言学家和语言学习者了解单词、短语以及句子结构的频率及进行相关信息比较而设计。
它具备了一个好语料库的三项最基本条件:规模、速度以及词性标注。
[2] 它收集的数据涵盖了最近22 年(1990 年到2012 年)美国的口语、小说、流行杂志、报纸和学术期刊五大类型的语料,并且每种类型基本呈均匀平衡分布。
值得一提的是,COCA 具有其它语料库不可企及的突出优势,它是一种动态的语料库资源,没有最后的版本,处于不断更新与发展中,每年约2000 万词汇,而且今后每年至少更新两次。
coca等级词汇【原创实用版】目录1.引言:介绍 COCA 词汇等级2.COCA 词汇等级的定义与划分3.COCA 词汇等级的应用领域4.COCA 词汇等级对于英语学习的重要性5.结论:总结 COCA 词汇等级的价值和意义正文1.引言COCA(Corpus of Contemporary American English)是美国当代英语的一个大规模语料库,它包含了众多英语词汇和短语。
在 COCA 中,词汇被分为五个等级,分别为高频词汇、中频词汇、低频词汇、罕见词汇和极罕见词汇。
这些等级对于英语学习者来说具有重要的参考价值。
2.COCA 词汇等级的定义与划分(1)高频词汇:在 COCA 语料库中出现频率最高的词汇,如“the”、“is”、“and”等。
这些词汇是英语基础中的基础,掌握这些词汇有助于提高阅读和写作效率。
(2)中频词汇:在 COCA 语料库中出现频率较高的词汇,如“education”、“technology”等。
这些词汇扩大了英语学习者的词汇量,有助于提高阅读理解的能力。
(3)低频词汇:在 COCA 语料库中出现频率适中的词汇,如“empanada”、“antics”等。
这些词汇在日常交流中不常用,但在特定场景下会出现,掌握这些词汇有助于提高英语表达的准确性。
(4)罕见词汇:在 COCA 语料库中出现频率较低的词汇,如“plethora”、“ephemeral”等。
这些词汇在日常交流中很少出现,但在文学作品或专业领域中会有所涉及,掌握这些词汇有助于提高英语阅读和写作的深度。
(5)极罕见词汇:在 COCA 语料库中出现频率极低的词汇,如“supercalifragilisticexpialidocious”等。
这些词汇在英语学习中几乎不会用到,但对于语言研究和词汇爱好者来说具有一定的价值。
3.COCA 词汇等级的应用领域COCA 词汇等级在英语教学、研究、翻译等领域都有广泛的应用。
英语学习者可以根据这些等级有针对性地进行学习和记忆,提高自己的英语水平。
coca等级词汇摘要:一、引言1.介绍COCA 等级词汇的背景和作用2.阐述COCA 等级词汇对于学习者的重要性二、COCA 等级词汇的概述1.COCA 的定义和来源2.COCA 等级词汇的分类和特点三、COCA 等级词汇的应用1.在英语学习中的作用2.如何有效地利用COCA 等级词汇提高英语水平四、COCA 等级词汇与其他词汇体系的比较1.GSL (General Service List)2.BNC (British National Corpus)五、结论1.总结COCA 等级词汇的重要性2.鼓励学习者积极利用COCA 等级词汇提高英语能力正文:一、引言COCA(The Corpus of Contemporary American English)等级词汇是英语学习者提高英语能力的重要工具。
COCA 等级词汇不仅可以帮助学习者掌握英语中最常用的词汇,还能让学习者了解词汇的难度和重要性,从而更好地进行英语学习。
二、COCA 等级词汇的概述COCA 等级词汇是基于COCA 语料库(The Corpus of Contemporary American English)进行的研究成果。
COCA 语料库包含了大量美国英语的文本,包括书籍、报纸、杂志、网络文章等,共约5.2 亿词。
通过对这些语料库的分析,研究人员将词汇按照其在英语中的使用频率和重要性进行分类,形成了COCA 等级词汇。
COCA 等级词汇共分为十个等级,从最常用的Level 1 词汇到较为生僻的Level 10 词汇。
每个等级的词汇都有其特定的使用场景和重要性。
例如,Level 1 词汇是英语中最常用的词汇,学习者需要熟练掌握这些词汇;而Level 10 词汇虽然在日常生活中使用频率较低,但对于学习特定领域(如科技、医学等)的专业知识具有重要意义。
三、COCA 等级词汇的应用COCA 等级词汇在英语学习中具有广泛的应用。
学习者可以通过掌握不同等级的词汇,提高自己的英语水平。
在美国当代英语语料库(COCA)如何查词.doc 在美国当代英语语料库(COCA)如何查词摘要:美国当代英语语料库(Corpus of Contemporary American English,COCA)由美国Brigham Young University 的Mark Davies教授开发,目前单词容量在4.5亿,是美国当前最新的当代英语语料库,也是当今世界上最大的英语平衡语料库。
该语料库的语料来自1990-2012年,每年更新,检索功能强大,是最佳的英语学习助手。
本文以sorry为例介绍了如何在美国当代英语语料库中查询单词及对单词sorry的检查与研究结果。
关键词:美国当代英语语料库,平衡语料库,sorryAbstract: The Corpus of Contemporary American English (COCA) is the largest freely-available corpus of English,and the only large and balanced corpus of American English.The corpus was created by Mark avies of Brigham Young University,and it is used by tens of thousands of sers every month (linguists,teachers,translators,and other searchers).COCA is also related to other large corpora that we have created.The corpus contains more than 450 million words of text and isqually divided among spoken,fiction,popular magazines,newspapers,and academic texts.It includes 20 million words each year from 1990-2012.Key words: the Corpus of Contemporary American English,parallel corpus,sorry中图分类号:H319.3文献标识码:A文章编号:1006-026X(2013)12-0000-02一、引论美国当代英语语料库(Corpus of Contemporary American English,COCA)由美国Brigham Young University 的Mark Davies教授开发,目前单词容量在4.5亿以上,是美国当前最新的当代英语语料库,也是当今世界上最大的英语平衡语料库,且与其他所建语料库相连。
Application Value of American COCA in Vocabulary
Teaching
作者: 张仁霞
作者机构: 广东技术师范学院大学英语部,广东广州501665
出版物刊名: 齐齐哈尔大学学报:哲学社会科学版
页码: 175-178页
年卷期: 2015年 第4期
主题词: 语料库;COCA;词汇教学
摘要:本研究介绍了美国当代英语语料库(COCA)在英语词汇教学中的利用价值:充实单词语义,建立图式;学习单词搭配,归纳语义偏好;培养学生语体意识,学会恰当使用单词;发现单词的同义词近义词;真实语料和语境中习得词汇,培养观察归纳思维能力。
COCA对于学生进行英语词汇网络自主学习是很有价值的语料库资源和工具。
1. Who created these corpora?The corpora were created by Mark Davies, Professor of Linguistics at Brigham Young University in Provo, Utah, USA. In most cases (though see #2 below) this involved designing the corpora, collecting the texts, editing and annotating them, creating the corpus architecture, and designing and programming the web interfaces. Even though I use the terms "we" and "us" on this and other pages, most activities related to the development of most of these corpora were actually carried out by just one person.2. Who else contributed?3. Could you use additional funding or support?As noted above, we have received support from the US National Endowm ent for the Humanities and Brigham Young University for the developm ent of several corpora. However, we are always in need of ongoing support for new hardware and software, to add new features, and especially to create new corpora. Because we do not charge for the use of the corpora (which are used by 80,000+ researchers, teachers, and language learners each month) and since the creation and maintenance of these corpora is essentially a "one person enterprise", any additional support would be very welcom e. There might be graduate programs in linguistics, or ESL or linguistics publishers, who might want to make a contribution, and we would then "spotlight" them on the front page of the corpora. Also, if you have contacts at a funding source like the Mellon Foundation or the MacArthur grants, please let them know about us (and no, we're not kidding).4. What's the history of these corpora?The first large online corpus was the Corpus del Español in 2002, followed by the BYU-BNC in 2004, the Corpus do Português in 2006, TIME Corpus in 2007, the Corpus of Contemporary American English (COCA) in 2008, and the Corpus of Historical American English (COHA) in 2010. (More details...)5. What is the advantage of these corpora over other ones that are available?For some languages and time periods, these are really the only corpora available. For example, in spite of earlier corpora like the American National Corpus and the Bank of English, our Corpus of Contemporary American English is the only large, balanced corpus of contemporary American English. In spite of the Brown family of corpora and the ARCHER corpus, the Corpus of Historical American English is the only large and balanced corpus of historical American English. And the Corpus del Español and the Corpus do Português are the only large, annotated corpora of these two languages. Beyond the "textual" corpora, however, the corpus architecture and interface that we have developed allows for speed, size, annotation, and a range of queries that we believe is unmatched with other architectures, and which makes it useful for corpora such as the British National Corpus, which does have other interfaces. Also, they're free -- a nice feature.6. What software is used to index, search, and retrieve data from these corpora?We have created our own corpus architecture, using Microsoft SQL Server as the backbone of the relational database approach. Our proprietary architecture allows for size, speed, and very good scalability that we believe are not available with any other architecture. Even complex queries of the more than 425 million word COCA corpus or the 400 million word COHA corpus typically only take one or two seconds. In addition, be cause of the relational database design, we can keep adding on more annotation "modules" with little or no performance hit. Finally, the relational database design allows for a range of queries that we believe is unmatched by any other architecture for large corpora.7. How many people use the corpora?As measured by Google Analytics, as of March 2011 the corpora are used by more than 80,000 unique people each month. (In other words, if the same person uses three different corpora a total of ten times that month, it counts as just one of the 80,000 unique users). The most widely-used corpus is the Corpus of Contemporary American English -- with more than 40,000 unique users each month. And people don't just come in, look for one word, and move on -- average time at the site each visit is between 10-15 minutes.8. What do they use the corpora for?For lots of things. Linguists use the corpora to analyze variation and change in the different languages. Some are materials developers, who use the data to create teaching materials. A high number of users are language teachers and learners, who use the corpus data to model native speaker performance and intuition. Translators use the corpora to get precise data on the target languages. Some businesses purchase data from the corpora to use in natural language processing projects. And lots of people are just curious about language, and (believe it or not) just use the corpora for fun, to see what's going on with the languages currently. If you are a registered user, you can look at the profiles of other users (by country or by interest) after you log in.9. Are there any published materials that are based on these corpora?As of mid-2011, researchers have submitted entries for more than 260 books, articles and conference presentations that are based on the corpora, and this is probably only a sm all fraction of all of the publications that have actually been done. In addition, we ourselves have published three frequency dictionaries that are based on data from the corpora -- Spanish (2005), Portuguese (2007), and American English (2010).10. How can I collaborate with other users?You can search users' profiles to find researchers from your country, or to find researchers who have similar interests. In the near future, we may start a Google Group for those who want more interaction.11. What about copyright?Our corpora contain hundreds of millions of words of copyrighted material. The only way that their use is legal (under US Fair Use Law) is because of the limited "Keyword in Context" (KWIC) displays. It's kind of like the "snippet defense" used by Google. They retrieve and index billions of words of copyright material, but they only allow end users to access"snippets" (片段,少许)of this data from their servers. Click here for an extended discussion of US Fair Use Law and how it applies to our COCA texts.12. Can I get access to the full text of these corpora?Unfortunately, no, for reasons of copyright discussed above. We would love to allow end users to have access to full-text, but we simply cannot. Even when "no one else will ever use it" and even when "it's only one article or one page" of text, we can't. We have to be 100% compliant with US Fair Use Law, and that means no full text for anyone under any circumstances -- ever. Sorry about that.13. I want more data than what's available via the standard interface. What can I do?Users can purchase derived data -- such as frequency lists, collocates lists, n-grams lists (e.g. all two or three word strings of words), or even blocks of sentences from the corpus. Basically anything, as long as it does not involve full-text access (e.g. paragraphs or pages of text), which would violate copyright restrictions. Click here for much more detailed information on this data, as well as downloadable samples.14. Can my class have additional access to a corpus on a given day?Yes. Sometimes your school will be blocked after an hour or so of heavy use from a classroom full of students. (This is a security mechanism, to prevent "bots" from running thousands of queries in a short time.) To avoid this, sign up ahead of time for "group access".15. Can you create a corpus for us, based on our own materials?Well, I probably could, but I'm not overly inclined to at this point. Creating and maintaining corpora is extremely time intensive, even when you give me the data "all ready" to import into the database. The one exception, I guess, would be if you get a large grant to create and maintain the corpus. Feel free to contact me with questions.16. How do I cite the corpora in my published articles?Please use the following information when you cite the corpus in academic publications or conference papers. And please remember to add an entry to the publication database (it takes only 30-40 seconds!). Thanks.In the first reference to the corpus in your paper, please use the full name. For example, for COCA: "the Corpus of Contemporary American English" with the appropriate citation to the references section of the paper, e.g. (Davies 2008-). After that reference, feel free touse something shorter, like "COCA" (for example: "...and as seen in COCA, there are..."). Also, please do not refer to the corpus in the body of your paper as "Mark Davies' COCA corpus", "a corpus created by Mark Davies", etc. The bibliographic entry itself is enough to indicate who created the corpus.。