language testing 5

格式：ppt
大小：612.00 KB
文档页数：69

下载文档原格式

Language Testing

IV.Some Key Skills & Problems of Proficiency Tests
• 1. Application of language knowledge • 2. Cloze • 3. Reading comprehension • 4. Written expression • Language ability-oriented, • never knowledge-oriented
Testing ≠ Teaching
• Testing focuses on competence discrimination • Teaching highlights learning progress • Testing depends on sampling (to infer one’s
competence from his/her sampled performance)
• Teaching lays stress on integrated language learning • A good means of testing can’t be a good way of teaching!
I.the relationship between teaching and testing
• • • • • • • •
Three possibilities 1.Part of teaching,serve for teaching diagnosing,consolidating,evaluating,etc. 2.Guiding teaching test-oriented teaching (negative washback) 3.Independent of teaching serve as other social functions Don’t distinguish them!

语言测试的种类

客观性测试备有标准答案，评分工作比较简单，可用机器阅卷客观性测试问题
这种测试很难测试考生产出性语言能力（即说、写和译的能力）不易考查考生在真实交际环境中的语言交际能力
23
6.2 主观性测试（subjective test）
试题答案具有开放性或灵活性。考生根据试题构造答案（构答题，constructive item）。常用口语、写作、翻译等题型阅读、听力测试中，有使用半主观性试题的。这种试题的答案是半开放的常用来测试考生的产出性语言能力，可用来考查考生在真实交际环境中的实际交际能力主观性测试的缺陷主要是难以掌握评分标准，不易保证评分信度
间接测试面临的问题
试题的目的不易确定，一道试题是测试哪种微技能往往会产生分歧试题编制比较困难（语料的选择、题眼的确定、选项的编制、试题难易程度的把握等方面都十分困难）
14
效度不易保证
4. 测试语言技能的分合
根据测试语言技能分合的不同，语言测试可以分为：分立式测试综合式测试
15
4.1 分立式测试（discrete point test）
4
1. 测试涉及语言使用领域的不同
普通语言测试测试一般场合的语言使用能力专用语言测试测试在某个（些）专门领域的语言使用能力
5
2. 测试目的不同
水平测试成绩测试分班测试学能测试诊断测试
6
2.1 水平测试（proficiency test））
可以测量普通的语言能力，也可以用来测量考生在某一领域或某些领域的语言能力可以测量考生的某项或几项语言技能的水平水平测试是一种基于某种语言能力理论（theory-based）的测试，不一定与某课程或教学内容直接关联水平测试一般是大规模的标准化考试，多由专门的考试机构研发和实施参加水平测试的考生总体成绩一般呈正态分布水平测试多用于人才选拔

Language Testing

Lift 1. V-T If you lift your eyes or your head, you look up, for example, when you have been reading and someone comes into the room. 抬起 (眼睛或头) 例: When he finished he lifted his eyes and looked out the window. 他完成以后抬眼向窗外看去。 2. V-T If people in authority lift a law or rule that prevents people from doing something, they end it. 解除 (法令等) 例: The European Commission has urged France to lift its ban on imports of British beef. 欧盟委员会已敦促法国解除对英国牛肉进口的禁令。 3. V-T/V-I If something lifts your spirits or your mood, or if they lift, you start feeling more cheerful. 鼓舞例: He used his incredible sense of humor to lift my spirits. 他以不可思议的幽默感鼓舞了我的士气。 4. N-COUNT If you give someone a lift somewhere, you take them there in your car as a favor to them. 搭便车例: He had a car and often gave me a lift home. 他有一辆汽车，经常让我搭便车回家。

Language Testing

Language Testing
Chen Huaying
1 1/29/2011 12:56 PM
Language Testing
(For M. A. Students Spring Semester, 2003)
Course instructor: Chen Huaying (E-mail: chenhuaying@) Classroom: Room 631, Main Building Time: Friday 9:30 – 11:20 Office Hours: Wed. 4:00 – 7:00 pm
Chen Huaying
7 1/29/2011 12:56 PM
Core Readings
Hughes, Arthur. 1991. Testing for Language Teachers. Cambridge: CUP Bachman, Lyle F. 1990. Fundamental Considerations in Language Testing. Shanghai: SFEP. Bachman, Lyle F. & Palmer, Adrian S. 1996. Language Testing in Practice. Shanghai: SFEP.
Chen Huaying 4 1/29/2011 12:56 PM
Assignment/Project Schedule (Please refer to
Hughes, Arthur. pp53—57. Bachman, Lyle F. & Palmer, Adrian S. pp53— Bachman, Palmer, 1996. Language Testing in Practice. Shanghai: SFEP. pp253—365.) Practice. pp253—

language testing 测试练习

Quiz for chapter21.What are the 4 approaches in language testing?2.What is the key idea of psychometric-structuralist approach?3.What levels and modes does this approach analyze language proficiency into？4.Translate: item difficulty, discrimination, criterion proficiency, subtest, sampling, construction,stem/lead, options, distracters, forced-choice items, constructed-response items, discrete-point items, method effect5. Although sampling is an indispensable step in test development, there will be no a sampling error. F6.Test construction is the process of converting the test problems chosen to test items or tasks T7. The most frequently and extensively used test item in a psychometric-structuralist test are multiple-choice items. T8. Directions should be brief，simple and unambiguous. T9. Multiple -choice format remains the most valid measures of language proficiency. F10. This approach is out of date today. F11. This psychometric-structuralist tests are direct tests. F12. The results of the scores are hard to interpret. TQuiz for chapter31．The integrative approach proposed to use holistic procedures to test language proficiency. T2．John W Oller was the leading figure in the campaignagainst psychometric –stucturalist approach. T3．The main ideas of Oller are that language proficiency was indivisible and this indivisibility could be measured directly by pragmatic tests. T4．All test items in a psychometric test is context free. F 5．According to Oller, the first feature of psychometric approach is the involvement of context. F 6．Pragmatic expectance grammar is ability responsible for context dependent language processing. T7．All dictation tests are pragmatic tests. F8．The integrative tests are direct tests and their scores have clear meanings. F9．Cloze tests are designed based on Gestalt Psychology. T 10．The linguistic basis for cloze tests is language redundancy.T11．For standard cloze, there are two ways to delete words from a text：error-counting and right asnwer-counting. F 12．The test taker’s performance on cloze is scored either by exact word method or contextually acceptable method. T 13．Short memory is the container of wording while the long memory is the container of meaning. T14．For dictation tests, the pauses should be in natural breakpoints. T15．Partial dictation is different from standard dictation in that part of it is presented in a printed form. T16．The 1980s witnessed the end of the UCH. T 17．Dictation tests are in lack of domain analysis and sampling, which reduces their validity. TQuiz for chapter 41.In early 20th century, there emerged communicative tests. F2.Both psychometric and integrative tests are ability tests. T3.Needs analysis was first put forward by Brendan Carroll. F4.The characteristics of communicative performance testing aremore readily seen in tests of writing and speaking. T5.According to Carroll, test takers’performance can be scoredwith a 9-band scale. This indicates communicative performance tests are norm-referenced tests. F6.Timothy McNamara is the major developer of OET., aperformance-based test of English as a second language for health professionals. T7.Performance tests are often used for screening and selection.Tmunicative performance tests enable test users to makeconnection between test performance and future language behavior. Tmunicative performance test are low in validity becausethey are scored subjectively. F10.M ultiple scoring and Rash measurement technique are oftenintroduced to increase the reliability of communicative performance tests. T11. According to Carroll, listening, speaking, reading, andwriting are not equal to language skills; instead, they are language performance. T12. In Carroll’s model, listening, speaking and reading, writingare presented in isolation, but in real communication they occur in various combinations. TQuiz for chapter 5municative language ability was proposed by Bachmanand is a new development of communicative competence. T municative competence was first proposed by Hymes inthe 1960s and 1970s, including knowledge that and knowledge how. F3.Canale and Swain’s model divides knowledge in to fourcomponents: grammatical, sociolinguistic, discourse and strategic competence. T4.Bachman’s communicative language ability involveslanguage competence, strategic competence and psychophysical mechanisms, which are subdivided into different components. T(psychophysiological)5.Bachman’s test methods are not specific techniques but aframework of five facets which can be used for different purposes. T6.The relationship between input and response in language testscan be either of the two: reciprocal and nonreciprocal F7.Reciprocal tests must be adaptive tests. T8.Reciprocal language use is characterized by interaction andfeedback. T。

国际英语语言测试系统IELTS(雅思)考试介绍

国际英语语言测试系统IELTS（雅思）考试介绍IELTS考试的全称为：International English Language Testing System,中文译为＂雅思＂.它是由University of Cambridge Local Examinations Syndicate、IDP Education Australia及The British Council三家共同参与组织设计，并由英国文化教育委员会（The British Council）负责在世界各地组织考试。

设在中国的英国大使馆文化教育处专门设有考试部，负责IELTS考试工作。

此项考试是为非英语国家的人士赴英联邦国家高等教育机构就读和进修必须通过的语言测试。

现在大多的英联邦国家对本国申请技术移民的人士也采用这项考试做为申请人英语能力达标的认证。

IELTS考试分为学术类（Academic）和培训类（General Training）,移民申请者被要求参加General Training类的考试,整个考试包括四个部分,听力（30分钟）阅读（60分钟）、写作（60分钟）和口语（15分钟）。

两类听力和口语采用相同试卷,阅读和写作采用不同试卷。

考试成绩的有效期为二年，并要求考生的连续参加考试的时间间隔三个月。

整个考试由英国文化教育处考官亲自主持。

目前在中国设有的固定考点有10个，包括北京、沈阳、西安、上海、南京、广州、福州、深圳、成都、香港。

北京和上海每月举办一次，其它地区2－3个月举办一次。

由于中国对外交流的迅速发展，英国文化教育处正在积极发展和推广此项考试，不久将会有更多的考点和考试时间满足各地考生的要求。

IELTS考试与目前的TOEFL和国内四、六级英语考试有一定的差异,它的听力和口语部分使用使用英音,它的书面考试部分,不采用标准多项选项形式，而以填写单词和短句为主，题目形式多样，较好地考察考生的实际英语能力，避免了答题的猜测因素。

语言测试听力篇 language testing --listening

Why did the woman have to go to the office twice?
• A. The director could not give her an appointment right away. • B .The office was closed the first time she went. • C .The computers were out of service the first time she was there. • D . She did not have acceptable identification with her on her first visit.
• • • • •
•
1.听力材料必须是口语材料 2.听力材料的难度不要过高 3.听力材料的量不要过大 4.听力材料的类型要真实多样 5.听力材料的内容要有新意
-ቤተ መጻሕፍቲ ባይዱ-（刘润清，语言测试和它的方法:139-140）
2.听力材料的难度不要过高 • Speech rate ：
• 刘润清先生在《语言测试和它的方法》一书里对初级英语水平者提议了120—150 WPM的语速要求（刘润清，2000：140）
• In comprehensive language test, the length of the recording should be kept down in 30 min.
4.听力材料的类型要真实多样
1.The authenticity of communicative situation. 2.Increase some accented English. 3.Soften the influence that the memory factors have on the results of the listening comprehension.

Language Testing

Thank you!
is expected to be able to write or speak, and the people for whom reading and listening materials are primarily intended. Length of text: Topics: Readability: Structural range: a list of structures which may occur in texts or should be not excluded or a general indication of range of structures. Vocabulary range: Dialect, accent, style: Speed of processing: how many words candidates are expected to read or write
the required level of performance for success (80% of the items must be responded to correctly)
Scoring procedure
What rating scales will be used? How many people will rate each piece
Number of passages:
Medium/channel: paper and pen; computer etc.
Techniques: multiple choices; true/false; blank filling
Critirial levels of performance

语言测试的分类

Types of Language Test（语言测试的分类）(2011-06-03 10:43:25)转载▼标签：杂谈Language tests may be distinguished by use or function, standard for measuring, linguistic level and skills, system of scoring and objective...1.Test Classified by Use(按用途划分测试)1) Achievement Test(成绩/成就测试)The aim is to measure how much of a languagethe learner has learned with reference to aparticular course of study or program ofinstruction.2) Proficiency Test(水平测试)This test is not related to any course, syllabus, curriculum ora single skill in the language. It is not concerned with generalattainment but specific language skills required for future jobor study. CEE(College Entrance Examination高考), TOEFL, PETS etc.3) Diagnostic Test (诊断测试)This test aims to find out what language skills or knowledge thelearner knows and what he does not know in order to diagnose hisdifficulties or problems and give remedial teaching to the students,Such as quiz.4) Aptitude Test（潜能测试）This test is intended to predict a person’s future success or measure thePotential ability. It assesses the learner’saptitude or gift for learning a language and seeksto predict his probable strengths and weaknessesin a second language. Such as GRE (Graduate Record Examination).5) Placement Test(编班测试/分级考试)It should be especially noted that achievement, diagnostic, placement andProficiency tests are not absolutely exclusive but sometimes interwoven.2. Test Classified by stage (按学习阶段来分)1)Classroom Test (随堂测试)2)Mid-term Test(期中测试)3)End- of-term Test(期末测试)3. Test Classified by Nature(按性质划分)1)Discrete-point Test(分离式测试)A language test which measures knowledge ofindividual language items,Such as tenses, articles, or modal verbs in a grammar test,is called a discrete-point test. This type of test isconstructed on the assumption that language consistsof different parts such as grammar, vocabulary andsyntax, and different skills such as speaking, listening,reading and writing.2)Integrative Test(综合性测试)A language test that requires the learner touse several language skills atthe same time. It is believed that communicative competence is global and requires an integrationof linguistic knowledge and skills for itspragmatic use in the real world that cannot be broken down into discrete point.4. Test Classified by System of Scoring(按评分方式划分)1)subjective test（主观性测试）2)objective test（客观性测试）A test, which is scored according to the personal judgment of thescore, such as an essay examination, is called a subjective test which is marked without the use of the examiner’s personal judgment is called an objective test.5. Test Classified by Standard for Measuring（按衡量标准划分）1)Criterion-referenced Test (标准参照性测试)The criterion-referenced te st measures a student’s performance according to a particular criterion which has been agreed upon. Such as TEM4 & TEM8.2)Norm-referenced Test(常模参照性测试)This test measures how performance of a student or a group of students compares with the performance of other students or other groups of students whose scores are used as a norm.6. Test Classified by Way of Testing1)Direct Tests（直接测试）Language skills or abilities can be directly in the test. Such as pronunciation and intonation, writing, translation, etc.2)Indirect Tests（间接测试）By testing one’s language skills, one’s language ability can be evaluated.。

language testing

Reflection after reading Language Testing1.IntroductionThere are four parts in the book: Survey, Readings, References, and Glossary. The Survey is a summary overview of the main feature of the are of language study concerned: its scope and principles of enquiry, its basic concerns and key concepts. It draws a map of the subject area in such a way as to stimulate thought and to invite a critical participation in the exploration of ideas. The Reading provide the necessary transition. The purpose is to focus on the specific of what is said, and how it is said, in these source texts. In the References, there is a selection of works (books and articles) for further reading. The Glossary is cross-referenced to the Survey, and therefore serves at the same time as an index.2. Language Testing2.1 What is language test?Language test is a procedure for gathering evidence of general or specific language abilities from performance on tasks designed to provide a basis for predictions about an individual’s use of thos e abilities in real world contexts.2.2 The significance of language testingFirst, language tests play a powerful role in many people’s lives, acting as gateways at important transitional moments in education, in employment, and in moving from one country to another. Since language tests are devices for the institutional control of individuals, it is clearly important that they should be understood, and subjected to scrutiny.Secondly, you may be working with language tests in your professional life as a teacher or administrator, teaching to a test, administering tests, or relying on information from tests to make decisions on the placement of students on particular courses.Finally, if you are conducting research in language study you may need to have measures of the language proficiency of your subjects. For this you need either to choose an appropriate existing language test or design your own.2.3 Types of test(1) Paper-and-pencil language tests take the form of the familiar examination question paper. They are typically used for the assessment either of separate components of language knowledge(grammar, vocabulary etc.) or of receptive understanding (listening and reading comprehension).(2) Performance tests are most commonly tests of speaking and writing, in whicha more or less extended sample of speech or writing is elicited from the test-taker, and judged by one or more trained raters using an agreed rating procedure.(3) Achievement tests are associated with the process of instruction. They accumulate evidence during, or at the end of, a course of study in order to see whether and where progress has been made in terms of the goals of learning. Achievement tests should support the teaching to which they relate.(4) Proficiency tests look to the future situation of language use without necessarily any reference to the previous process of teaching. They includeperformance features in their design, whereby characteristics of the criterion setting are represented.2.4 The test-criterion relationshipThe very practical activity of testing is inevitably underpinned by theoretical understanding of the relationship between the criterion and test performance. Tests are based on theories of the nature language use in the target setting and the way in which this is understood will be reflected in the test design. Theories of language and language in use have of course developed in very of theoretical orientations.3. Communication and the design of language tests3.1 Discrete point testsThe term test construct refers to those aspects of knowledge or skill possessed by the candidate which are being measured. The test construct involves being clear about what knowledge of language consists of, and how that knowledge is deployed in actual performance(language use).The practice testing separate, individual points of knowledge, knows as discrete point testing, was reinforced by theory and practice within psychometrics, the emerging science of the measurement of cognitive abilities. While there was also realization among some writers that the integrate nature of performance needed to be reflected somewhere in a test battery, the usual way of handling this integration was at the level of skills testing, so that the four language macro skills of listening, reading, writing, and speaking were in various degrees tested(again, in strict isolation from one another) as a supplement to discrete point tests.3.2 Communicative language testsFrom the early 1970s, a new theory of language and language use began to exert a significant influence on language teaching and potentially on language testing. Hymes saw that knowing a language was more than knowing its rules of grammar. Communicative language tests ultimately came to have two features:(1)They were performance tests, requiring assessment to be carried out when the Learner or candidate was engaged in an extended act of communication, either receptive or productive, or both.(2)They paid attention to the social roles candidates were likely to assume in real world settings, and offered a means of specifying the demands of such roles in detail.3.3 Models of communicative abilityVarious aspects of knowledge or competence were specified in the early 1980s by Michael Canale and Merrill Swain in Canada:(1) Grammatical or formal competence, which covered the kind of knowledge(of systematic features of grammar, lexis, and phonology) familiar from the discrete point tradition of testing.(2) Sociolinguistic competence, or knowledge of rules of language use in terms of what is appropriate to different types of interlocutors, in different setting, and on different topics.(3) strategic competence, or the ability to compensate in performance for incomplete or imperfect linguistic resources in a second language.(4) discourse competence, or the ability to deal with extended use of language in context.4. The testing cycleIn this chapter we will outline the stage and typical procedures in this cyclical process. New situation arise, usually associate with social or political changes, which generate the need for a new test or assessment procedure. These include the growth of international education, increased labour flows between countries as the result of treaties, the educational impact of immigration or refugee programmes, school curriculum reform, or reform of vocational education, and training for adults in the light of technological change.Before they begin thinking in detail about the design of a test, test developers will need to get the lay of the land, that is, to establish the constraints under which they are working, and under which the test will be administered. Following this initial ground-clearing, we move on to the detailed design of the test. This will involve procedures to establish test content, what the test contains and test method, the way in which it will appear to the test-taker, the format in which responses will be required, and how these responses will be scored.5. The rating processMaking judgments about people is a common feature of everyday life. The judgement will in most cases have direct consequences for the person judged, and so issues of fairness arise, which most public procedures try to take account of in some way. This section will discuss rating procedures used in language assessment. The terms ratings and raters will be used to refer to the judgements and those who make them.5.1 Establishing a rating procedureAs communicative language teaching has increasingly focused on communicative performance in context, so rating the impact of that communication has become the focus of language assessment. Where assessments meet institutional requirements, for example for certification, as with any bureaucratic procedure there are set methods for yielding the judgement in question. These methods typically have three main aspects. First, there is agreement about the conditions (including the length of time) under which the person’s performance or behaviour is elicited, and is attended to by the rater. Second, certain features of the performance are agreed to be critical; the criteria for judging these will be determined and agreed. Usually this will involve considering various components of competence-fluency, accuracy, organization, sociocultural appropriateness, and son on. Finally, raters who have been trained to an agreed understanding o f the criteria characterize a performance by allocating a grade or rating.5.2 The problem with ratersIntroducing the rater into the assessment process is both necessary and problematic. It is problematic because ratings are necessarily subjective. The rating given to a candidate is a reflection, not only of the quality of the performance, but of the qualities as a rater of person who has judged it. The rater is trained carefully to interpret them in accordance with the intentions of the test designers, and concentrateswhile doing the rating, then the rating process can be made objective. The allocation of individuals to categories is not a deterministic process, driven by the objective, recognizable characteristics of performances, external to the rater. Rather, rating always contains a significant degree of chance, associated with the rater and other factors5.3 Establishing a framework for making judgementsIn establishing a rating procedure, we need to consider the criteria by which performances as at a given level will be recognized, and then to decide how many different levels of performance we wish to distinguish. It is useful to view achievement as a continuum. We can illustrate the distinction between the handle and ladder perspectives by reference to two very different kinds of performance. The function of the assessment at a given level is not to make distinctions between candidates, other than a binary distinction between those who meet the requirements of the level and those who do not..6. Validity: testing the testThe purpose of validation in language testing is to ensure the defensibility and fairness of interpretations based on test performance. Test validation similarly involves thinking about the logic of the test, particularly its design and its intentions, and also involves looking at empirical evidence- the hard facts-emerging from data from test trial s or operational administrations. Test validation looks at the procedures as a whole, for all the candidates affected by them.The research carried out to validate test procedures can accompany test development, and is often done by the test developers themselves; that is, it can begin before the test becomes operational. Validation ideally continues through the life of the test, as new questions about its validity arise, usually in the context of language testing research.7. MeasurementMeasurement investigates the quality of the process of assessment by looking at scores. Two main steps are involved:(1)Quantification, that is, the assigning of numbers or scores to various outcomes of assessment. The set of scores available for analysis when data are gathered from a number of test-takers is known as data matrix.(2)Checking for various kinds of mathematical and statistical patterning within the matrix in order to investigate the extent to which necessary properties(for example, consistency of performance by candidates, or by judges) are present in the assessment.The aim of these procedures id to achieve quality control, that is, to improve the meaningfulness and fairness of the conclusions reached about individual candidates (the validity of the test). Measurement procedures have no rationale other than to underpin validity.8. The social character of language tests8.1 the institutional character of assessmentWhen an assessment is made, it is not done by someone acting in a private capacity, motivated by personal curiosity about the other individual, but in an institutional role, and serving institutional purposes,. These will typically involve thefulfillment of policy objectives in education and other areas of social policy. And social practice raises questions of social responsibility.(1)Assessment and social policyLanguage tests have a long history of use as instruments of social and cultural exclusion. The test here is a test of authenticity of identity, rather than of proficiency;a single instance is enough to betray the identity which the test aims to detect. More conventional proficiency tests have also been used for purposes of exclusion. Language tests can form part of a politically and morally objectionable policy.(2) Assessment and education policyAssessment serves policy functions in educational context, too. Most industrialized countries have, in recent years, responded to the need for the upgrading of the workforce in the face of rapid technological change by developing more flexible policies for the recognition and certification of specific work-related skills, each of which may be termed a competency. In international education, tests are used to control access to educational opportunities. Typically, international students need to meet a standard on a test of language for academic purposes before they are admitted to the university of their choice.8.2 The social responsibility of the language testerRecently, serious attention has been given t these issues for the first time, an overdue development, one might say, given the essentially institutional character of testing. On the one hand, the advent of the new test might appear t promote fairness. On the other hand, the introduction of such an instrument raises worrying possibilities.8.3 Ethical language testing(1) AccountabilityEthical testing practice is seen as involving making tests accountable to test-taker. Test developers are typically more preoccupied with satisfying the demands of those commissioning the test, and with their own work of creating a workable test. Test-takers are seldom represented on test development committees which supervise the work of test development, and represent the interests of stakeholders. Test developers should be required to demonstrate that the test content and format are relevant to candidates, and that the testing practice is accountable to their needs and interests. An aspect of accountability is the question of determining the norms of language behaviour which will act as a reference point in the assessment.(2)WashbackThe effect of tests on teaching and learning is known as test washback. Ethical language testing practice, it is felt, should work to ensure positive washback from tests. Authorities responsible for assessment sometimes use assessment reform to drive curriculum reform, believing that the assessment can be designed to have positive washback on the curriculum.(3)Test impactTest can also have effects beyond the classroom. The wider effect of tests on the community as a whole, including the school, is referred to as the impact. Test impact is likely to be complex and unpredictable.(4) Codes professional ethics for language testersIn contracts to those advocating the direct social responsibility of the tester, a more traditional approach involves limiting the social responsibility of language testers to questions of the professional ethics of their practice. In this view the approach to the ethics of language testing practice, such as medicine or law. Professional bodies of language testers should formulate codes of practice which will guide language testers in their work. The emphasis is on good professional practice: that is, language testers should in general take responsibility for the development of quality language tests.8.4 Critical language testingCritical language testing is best understood as an intellectual project to expose the role of tests in this exercise of power. From the perspective of critical language testing, the emphasis in ethical language testing on the individual responsibility of the language tester is misguided because it presupposes that this would operate within the established institution of testing, and so essentially accept the status quo and concede its legitimacy. Critical language testing at its most radical is not reformist since reform is a matter of modification not total replacement.9 New directions – and dilemmasWe live in a time of contradictions. The speed and impressiveness of technological advance suggest an era of great certainty and confidence. Aspects of these contradictory trends also define important points of change in language testing. Language testing is a field in crisis, one which is masked by the impressive appearance of technological advance.9.1Computers and language testingRapid developments in computer technology have had a major impact on test delivery. Already, many important national and international language tests, including TOEFL, are moving to computer based testing(CBT). The proponents of computer based testing can point to a number of advantages. Computer adaptive tests do just this. The use of computers for the delivery of test materials raises questions of validity, as we might expect. Questions about the importance of different kinds of presentation format are raised or exacerbated by the use of computer. The ability of computers to carry out various kinds of automatic processes on spoken or written texts is having an impact on testing.9.2 Technology and the testing of speakingTape recorders can be used in the administration of speaking tests. Candidates are presented with a prompt on tape, and are asked to respond as if they were talking to a person, the response being recorded on tape. This performance is then scored from the tape. Such a test is called a semi-direct test of speaking, as compared with a direct test format such as a live face-to-face interview.9.3 Dilemmas; whose performance?The speed of technological advances affecting language testing sometimes gives an impression of a field confidently moving ahead, not within standing the issues of validity raised above. But concomitantly the change in perspective from the individual to the social nature of test performance has provoked something of an intellectualcrisis in the field. Language testing remains a complex and perplexing activity, while insights from evolving theories of communication maybe disconcerting, it is necessart to fully grasp them and the challenge the pose if our assessments are to have any chance of having the meaning we intend them to have.Language testing is an uncertain and approximate business at the best of times, even if to the outsider this may be camouflaged by its impressive, even daunting, technical(and technological) trappings, not to mention the authority of the institutions whose goals tests serve. Every test is vulnerable t good questions, about language and language use, about measurement, about test procedures and about the uses to which the information in tests is to be put. In particular, a language test is only as good as the theory of language on which it is based, and it is within this area of theoretical inquiry into the essential nature of language and communication that we need to develop our ability to ask the next question.。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

2016/11/15

举例来讲，有几个评委对某学生的英语口语进行评定。评委A认为一个人的口语要好，必须发音准确，而该学生的发音好，所以他给打了个5分。评委B认为流利性最能体现一个人的口语水平，该同学尽管发音不错，但流利性差一些，所以她给他3分。同一名学生，让不同的评委去打分，成绩出现了偏差。这也很自然，原因是他们没有按照一个评定口语成绩的统一法则（rules）去给这名学生打分，结果造成了偏差。这个例子提醒我们，在对人的某些心理特征，如口语表达能力、阅读理解能力等等进行测量时，首先要制订一个便于操作的、稳定的法则或标准。这样得到的测量结果才可靠，才具有可比性。
Why Tests? In a Classroom
2016/11/15
Why Tests? Outside the Classroom

For the purpose of

selection and admission (Screening/Admissions Test) Assigning students into different levels (placement tests). Examining testees’ language ability (proficiency test). o learn a language (aptitude test). A boundary to determine in or out For both CRT and NRT
2016/11/15
(1)行为样本

语言考试的目的是要测量受试者的语言能力。语言能力是无形的，如何去测量？只能测量它的有形表现，这里所说的有形表现，是指语言表现，如说出来的话，写出来的句子，对考试题目所做的各种反应等等。这些行为，都是无形的语言能力的有形表现，用心理学术语叫“表征”（manifestation）。所谓行为样本，是指对语言能力表现行为的有效的抽样。我们知道，一个人的语言能力的表现行为会有各种各样的形式，考试时不可能也没有必要把它的全部表现行为都测到，只能选取一部分有代表性的抽样进行测量，然后据此对受试者的语言能力做出推测。所以，测验只选取一组有代表性的行为来考察个体在相应行为领域的行为特征。当个体在某一测验中的反应很恰当地反映出测验所要测的东西时，该测验就为我们提供了有用的信息。因而可以说构成测验的行为样本是相应行为领域的一个有效的代表。

Cut-off score

2016/11/15
Evaluation with Tests

Uses (purposes) Different types Characteristics of a good test

Validity, reliability, practicality, backwash
2016/11/15
Measurement, Test and Evaluation
测量、考试与评估
2016/11/15
Measurement

Quantifies the characteristics (both physical and mental) of persons

Examples: height, motivation, aptitude
2016/11/15

在谈到评估与测量及考试的关系时， Bachman（1990）指出，在对个体（学生）做出评估时，我们可以从质量和数量两个方面进行描述，或只描述其中一个方面。所谓质量方面的描述是指对学生的行为做出定性的描述，如某某学生的口头表达能力优秀，书面表达能力优良等；数量方面的描述则是指某次测验的分数等。对于考试、测量及评估三者之间的关系，他用下面的图来表示。
2016/11/15
(3)标准化的测量

标准化的测量是指测验在编制、实施、记分及分数解释等方面有一套严密系统程序。只有这样，考试才有统一的标准，使不同人的测量结果才有可比性。同时，可以减少无关因素对测验结果的影响，从而使之更为准确、可靠。凡是不标准化的测量，都没有可比性。
2016/11/15
2016/11/15
Test

Reading/writing tests A procedure designed to get specific samples of a person’s ability A measurement instrument
2016/11/15

考试的定义为“用来获取某些行为的方法，其目的是从这些行为中推断出个人具有的某些特征”。 Anastasi（1982）认为，“测试实质上是对行为样本所做的客观的标准化的测量。” 考试包含以下三个基本要素： ①行为样本 ②客观的测量 ③标准化的测量
2016/11/15
Functions of Tests（考试功能）

Pedagogical Functions (教学功能): To reinforce learning and to motivate the student or primarily as a means of assessing the student’s performance in the language.

Traditional paper-and-pencil tests Format (content and type of questions)
2016/11/15
Evaluation without Tests

Alternative assessment
2016/11/15
③法则

法则是指测量所依据的规则和方法，是测量的关键。法则不好或不可靠，得到的测量结果就会出偏差，失去测量的意义。简单来说，尺子不准，测量的结果就无法使人信服。对客观世界的物体进行测量时，由于有公认的测量法则或尺度，如测量物体的高度、重量等，一般不会出现大的偏差。而对人的某些特性（心理特征）进行测量时，则往往会出现较大的偏差。
2016/11/15
(2)客观的测量

测验的客观性在什么程度上可为公众认可？这就牵涉到对测验客观性程度的几个评价指标：题目质量分析，包括难度和区分度，这是筛选题目以构成一个好测验的基础。信度，指测验结果的可靠性程度；效度，指测验结果的有效性程度，这是评价测验质量最重要的指标。因此，所谓客观的测量是指测量的标准是否符合实际。对于一项考试的客观性程度可以从这几个方面去评价：⑴考试题目的难易度和区分度如何；⑵考试结果的可靠性程度如何？⑶考试结果的有效性如何？这三项指标是衡量一项测试质量的重要指标。
2016/11/15
测量、考试与评估之间关系示意图
Evaluation
Test
Measurement
2016/11/15
评估
2016/11/15

从图中可以看出，我们在对某教育目标（或学生的行为）作出评估时不一定用到测试或测量（如面积1 所示），这种评估属于质量评估，或叫定性评估，如指出学生在学习方面存在的问题。有时在作出评估时只需测量，而无需测试（如面积2 所示），对学生的口头表达能力定出级别就属于这种性质的评估。如果要检查学生学习的进步情况，通常就要对学生实施测试，这又是另一种性质的评估，即只通过测试对学生的成绩作出评估（如面积3 所示）。
2016/11/15
入学
课程
结业
2016/11/15
入学考试课程考试
结业
2016/11/15
教学功能
教师学生家长管理部门
调整教学计划
制订教学计划
了解实际教学效果
反映学习进展
反映学习上存在的问题
了解子女学业进展
2016/11/15
②指派数字或符号

所谓指派数字或符号，就是用数字或符号来代表某一事物或事物的某一属性的量。如张三在本次阅读考试中得了87分，李四得了92分，我们说李四比张三多考了5分。数字本身没有意义，只是一种符号。我们用它来代表考生的阅读成绩，这时它就变成了量化的数，可以对其进行解释和分析。在一定的条件下，还可以对数据进行运算从而对事物的属性进行推测。

2016/11/15

测量这一定义包含三个要素： ①事物及其属性 ②指派数字或符号 ③法则
2016/11/15
①事物及其属性

这是测量的对象或目标。对桌子的高度进行测量，属于对物体进行测量，其属性——高度，是可以观察到的，可以进行客观测量的。在外语教学领域，我们感兴趣的是学生的语言能力，而学生的语言能力属于人的心理特征，是无法直接测量的，但是人的心理活动会在人的具体活动和行为中体现出来，所以只能通过测量其外显行为或外在表现特征来推论一个学生语言能力的高低。
教学评估
课程设置评估
2016/11/15

Why tests?

In a classroom Outside the classroom

Evaluation with tests Evaluation without tests
2016/11/15

For teachers’ teaching Evaluating on the effectiveness of syllabus, teaching materials, texts. Making adjustment For knowing more about the students Discuss learners’ abilities in search of suitable texts Diagnose learners’ strengths/weaknesses (Diagnostic test) Make sure the students keep up with the teaching (Progress Test) See if the student is ready for the next level (Achievement Test) Motivate students to study

沪版牛津英语教材第一单元bodylanguage12

页数:6
必修四unit4Body-language课文教学内容

页数:46
Body_Language教学设计方案

页数:5
Body language 课文

页数:4
必修四四单元 Body Language小课文人教高中英语

页数:17
必修四unit4Body language课文

页数:46
八年级英语下册UnitBodylanguage阅读课件牛津深圳版

页数:53
七下英语 WY 课文原文Module 11 Body language

页数:2
关于bodylanguage的小短文

页数:1
必修四unit 4 body language课文翻译

页数:1

language testing 5

合集下载

Language Testing

语言测试的种类

Language Testing

Language Testing

language testing 测试练习

国际英语语言测试系统IELTS(雅思)考试介绍

语言测试听力篇 language testing --listening

Language Testing

语言测试的分类

language testing

文档推荐

最新文档