Descriptive research –single text –text vs. text –people vs. text 语料库研究方法概述 2012 语料库与外语研究研修班 Research questions 1. How many different word forms are used in the text? How many running words are used? What is their distribution? 语料库研究方法概述 2012 语料库与外语研究研修班 Any corpus-based research is necessarily driven by corpus data. 语料库研究方法概述 2012 语料库与外语研究研修班 目标:通过语料库分析和研究: –验证假设、直觉 –获得新发现 –建立新的假设 –构建新的理论 –验证已有的发现 –解决难题 • If the text is very large, standardize the TTR • the types and their frequency cumulative percentage 语料库研究方法概述 2012 语料库与外语研究研修班 – To answer RQ 2, compute the wordlist against a batch of graded wordlists, and observe: • Some thingLeabharlann Baiduor phenomenon: – out of expectation – Incongruent – Need a solution – puzzling Reading to be better informed • What has been done as contribution • What has been left undone • What has been done wrong 语料库研究方法概述 2012 语料库与外语研究研修班 Unbridgeable world of reality world of text Einstein Gulf 语料库研究方法概述 2012 语料库与外语研究研修班 色 眼 声 耳 文 香 鼻 学问思辨行 本 味 舌 触 身 法 意 语料库研究方法概述 2012 语料库与外语研究研修班 occur? • Predictive: What will happen if…? • Never ask a question to which you already know the answer;never ask 'how to' question Finding a method • Population • Sample • Sampling • Never count someone else’s money. Formulating research questions • Naming: what is… • Classificatory: How are they interrelated (patterned)? • Explanatory: to what extent do they co- 2. To what extent can the level of difficulty of the text be computed on the basis of the graded wordlists? 3. How many different word classes are used? What is the number of each word class? 语料库研究方法概述 2012 语料库与外语研究研修班 创新: 数据 方法 技术 解读/理论/ 视角 √ √√ √ 新 √ √√ √ √√ √ 语料库研究方法概述 2012 语料库与外语研究研修班 基于语料库方法是一种验证程序 语料库驱动方法是一种发现程序 语料库研究方法概述 2012 语料库与外语研究研修班 理据:任何感知都是推断 Any perception is but inferencing. • How many types on Level 1, 2, and 3 lists are used in the text? And what is their percentage? • What about their tokens? S (Sample) Sampling validity P (population) reliability Generalizability R (Result) Validity I (Interpretation) • IF •PS •S R •R I • THEN • IP 语料库研究方法概述 2012 语料库与外语研究研修班 语料库研究方法概述 2012 语料库与外语研究研修班 Method – To answer RQ 1, generate a wordlist of the given text and observe: • The number of types • The number of tokens • the type/token ratio (TTR) 基本步骤: 1.确定题目 2.提出问题 3.确定总体和样本 4.选择工具 5.处理数据 6.描述结果:分类、总结特征(description) 7.解释结果:观察、描述、解释(explanation) 8.解读结果:意义、价值、应用(interpretation) Identifying a problem 语料库研究方法概述 2012 语料库与外语研究研修班 选题、设计与方法 Put it altogether 李文中 中国外语教育研究中心 2012 语料库研究方法概述 2012 语料库与外语研究研修班 语料库不是人学的, 正则表达式不是女人学的。 语料库研究方法概述 2012 语料库与外语研究研修班 Corpus-driven is basically corpus based.