语料库研究方法概述

  • 格式:ppt
  • 大小:339.00 KB
  • 文档页数:47

下载文档原格式

  / 47
  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

Descriptive research
–single text –text vs. text –people vs. text
语料库研究方法概述
2012 语料库与外语研究研修班
Research questions
1. How many different word forms are used in the text? How many running words are used? What is their distribution?
语料库研究方法概述
2012 语料库与外语研究研修班
Any corpus-based research is necessarily driven by corpus data.
语料库研究方法概述
2012 语料库与外语研究研修班
目标:通过语料库分析和研究:
–验证假设、直觉 –获得新发现 –建立新的假设 –构建新的理论 –验证已有的发现 –解决难题
• If the text is very large, standardize the TTR
• the types and their frequency cumulative percentage
语料库研究方法概述
2012 语料库与外语研究研修班
– To answer RQ 2, compute the wordlist against a batch of graded wordlists, and observe:
• Some thingLeabharlann Baiduor phenomenon:
– out of expectation – Incongruent – Need a solution – puzzling
Reading to be better informed
• What has been done as contribution • What has been left undone • What has been done wrong
语料库研究方法概述
2012 语料库与外语研究研修班
Unbridgeable
world of reality
world of text
Einstein Gulf
语料库研究方法概述
2012 语料库与外语研究研修班







学问思辨行







语料库研究方法概述
2012 语料库与外语研究研修班
occur? • Predictive: What will happen if…? • Never ask a question to which you already
know the answer;never ask 'how to' question
Finding a method
• Population • Sample • Sampling
• Never count someone else’s money.
Formulating research questions
• Naming: what is… • Classificatory: How are they interrelated
(patterned)? • Explanatory: to what extent do they co-
2. To what extent can the level of difficulty of the text be computed on the basis of the graded wordlists?
3. How many different word classes are used? What is the number of each word class?
语料库研究方法概述
2012 语料库与外语研究研修班
创新: 数据
方法
技术
解读/理论/ 视角
√ √√


√ √√

√√

语料库研究方法概述
2012 语料库与外语研究研修班
基于语料库方法是一种验证程序 语料库驱动方法是一种发现程序
语料库研究方法概述
2012 语料库与外语研究研修班
理据:任何感知都是推断 Any perception is but inferencing.
• How many types on Level 1, 2, and 3 lists are used in the text? And what is their percentage?
• What about their tokens?
S (Sample)
Sampling validity
P (population)
reliability
Generalizability
R (Result)
Validity
I (Interpretation)
• IF •PS •S R •R I • THEN • IP
语料库研究方法概述
2012 语料库与外语研究研修班
语料库研究方法概述
2012 语料库与外语研究研修班
Method
– To answer RQ 1, generate a wordlist of the given text and observe:
• The number of types
• The number of tokens
• the type/token ratio (TTR)
基本步骤: 1.确定题目 2.提出问题 3.确定总体和样本 4.选择工具 5.处理数据 6.描述结果:分类、总结特征(description) 7.解释结果:观察、描述、解释(explanation) 8.解读结果:意义、价值、应用(interpretation)
Identifying a problem
语料库研究方法概述
2012 语料库与外语研究研修班
选题、设计与方法
Put it altogether
李文中 中国外语教育研究中心
2012
语料库研究方法概述
2012 语料库与外语研究研修班
语料库不是人学的, 正则表达式不是女人学的。
语料库研究方法概述
2012 语料库与外语研究研修班
Corpus-driven is basically corpus based.