Exploiting Similarities among Languages for Machine Translation - google word2vec
- 格式:pdf
- 大小:280.49 KB
- 文档页数:10
Questionnaires in language teaching research作者:Ping Guo来源:《速读·下旬》2019年第03期Questionnaires are often used to examine people’s attitudes,beliefs and behaviors in language learning and teaching.The data we get from questionnaire research can be especially insightful and satisfying when patterns emerge from a large number of respondents,when apparent differences or similarities are found among groups,or when relationships are ascertained among variables.As researchers,we feel empowered when we make recommendations for learning and teaching if the pattern we find is not only salient and strong,but also attested on a large scale.However,by nature,survey research is exploratory and shallow.It often does not go beyond pattern finding or relationship mapping.The patterns themselves are long on description and short on explanation.And of course,with every pattern there are exceptions.Most consumers of our research only care about our conclusions and pedagogical implications.As researchers,however,it is our responsibility to make sure that we have selected the right tool,and that the tool we have used to reach our conclusions is indeed trustworthy.The questionnaire as a research instrument has been a concern for applied linguistics for some time.As early as 1990,Reid (1990) traced the processes in developing her learning styles measure and boldly revealed the ‘dirty laundry of ESL [English as a second language] surveyresearch’.Luppescu and Day (1990) attempted to develop a questionnaire to canvass the attitudes and beliefs of Japanese learners and teachers towards the learning of English,but found it difficult to validate the student questionnaire.In fact,instead of arriving at any conclusions about student beliefs,they presented a ‘more important conclusion’,that ‘questionnaire data should not blindly be accepted or considered meaningful unless they have been properly validated.Compared to other fields such as the social sciences,education,marketing,and medicine that extensively make use of questionnaires,applied linguists have paid only cursory attention to the questionnaire tool itself.Despite an occasional paper on questionnaire validation (e.g.Brown,1997) or issues involved in using the Likert scale (e.g.Busch,1993; Gu,Wen,& Wu,1995),questionnaire design and validation remained a topic rarely touched upon until the end of the 20th century.Even up until this day,Dörnyei and Taguchi (2010; first edition published in 2003).Only a few papers (e.g.Petrić & Czárl,2003) showcase how a questionnaire can be validated.With these precious efforts,novice researchers today have much clearer guidance in the process of questionnaire design and validation.However,applied linguists have barely explored the major issues involved in the analysis of questionnaire data.One example would be the analysis of Likert scale data.It is now time for the field to pay more attention not just to what a questionnaire study reveals,but also to how the questionnaire is designed,validated,and analyzed.Questionnaire research is often seen as ‘quick and dirty’.While the administration of a questionnaire does seem quick,the development,validation,and analysis of the questionnaire are far from a quick and easy process.Another common practice that makes it feel easy is theadoption/adaptation of existing questionnaires.Again,this is not as straightforward as it seems.In commenting on Park’s (2014) recent validation of the FLCAS,Horwitz (2016) emphasized the point that the factor structure of the foreign language anxiety scale may vary from group to group,and that: future empirical efforts to understand the components of FLA [foreign language anxiety] should clearly delineate the specific learner population and learning context being examined to determine if,and if so,how the construct of FLA differs across learning populations and situations.References[1]Brown,J.D.Designing surveys for language programs.In D.T.Griffee & D.Nunan (Eds.),Classroom teachers and classroom research[J].Tokyo: JALT,1997: 109-121.[2] Brown,J.D.Likert items and scales of measurement[J]? SHIKEN: JALT Testing & Evaluation SIG Newsletter,2011,15:10-14.[3]Gu,Y.,Wen,Q.,& Wu,D.How often is often: Reference ambiguities of the Likert-scale in language learning strategy research[J].Occasional Papers in English Language Teaching,ERIC Document Reproduction Service No.ED 391 358.1995,5:19-35.作者简介郭萍(1981.07—),女,汉族,籍贯:四川隆昌县,讲师,学历:本科;单位:成都艺术职业学院;研究方向:外国语语言与教学。
Transactions of the Philological Society Volume103:2(2005)121–146VERTICAL AND HORIZONTAL TRANSMISSION INLANGUAGE EVOLUTION1By W ILLIAM S-Y.W ANG a AND J AMES W.M INETT ba Chinese University of Hong Kong and Academia Sinica,Taiwan,b Chinese University of Hong KongA BSTRACTIt has been observed that borrowing within a group of genetically related languages often causes the lexical similar-ities among them to be skewed.Consequently,it has been proposed that borrowing can sometimes be inferred from such skewing.However,heterogeneity in the rate of lexical replacement,as well as borrowing from other languages,can also give rise to skewed lexical similarities.It is important, therefore,to determine to what degree skewing is a statisti-cally significant indicator of borrowing.Here,we describe a statistical hypothesis test for detecting language contact based on skewing of linguistic characters of arbitrary type.Signifi-cant probabilities of correct detection of contact are main-tained for various contact scenarios,with low false alarm probability.Our experiments show that the test is fairly robust to substantial heterogeneity in the retention rate,both across characters and across lineages,suggesting that the method can provide an objective criterion against which claims of signi-ficant skewing due to contact can be tested,pointing the way for more detailed analysis.1This work has been supported in part by grants1224⁄02H and1127⁄04H awarded by the Research Grants Council of Hong Kong to the Chinese University of Hong Kong.We would like to thank the anonymous reviewers of an earlier draft of this paper for their constructive criticism.We are also grateful to Drs.Jinyun Ke and Feng Wang,and Miss Joyce Cheung for their helpful comments.Additional supplementary material can be found at /transactions.aspÓThe Philological Society2005.Published by Blackwell Publishing,9600Garsington Road,Oxford OX42DQ and350Main Street,Malden,MA02148,USA.1.I NTRODUCTIONThe tree diagram allows the hierarchy of language splits that are hypothesized to have taken place in a language stock to be displayed clearly and simply.However,the underlying assumption that languages split discretely into two (or more)lineages,each evolving independently thereafter by vertical transmission only,is far from realistic.As Schmidt recognized when he proposed the Wellentheorie ,languages often do not split nearly so cleanly as supposed in the Stammbaumtheorie .Furthermore,innovations that arise in one language may come to be acquired by other nearby languages with which it comes into contact.When such horizontally transmitted innovations are incorrectly interpreted as vertically transmitted innovations,and are used to infer a language tree,the topology of the tree provides a warped representation of the genetic relationships.In order to prepare tree diagrams that accurately reflect only the genetic relationships among a set of languages,the two main mechanisms by which innovations come to be shared must be distinguished,and the horizontal transmission filtered out.Alternatively,if a hybrid picture of the evolution of a set of languages is sought,comprising both modes of transmission,the tree diagram must be discarded and replaced by a network diagram on which are marked both the lines of descent by vertical transmission and the contact events involving horizontal transmis-sion.One method of distinguishing vertical and horizontal transmis-sion,which can be traced back to Hu bschmann’s (1875)work on Armenian,is to stratify the correspondences between two lineages and to recognize that only the oldest stratum can possibly reflect the vertically transmitted signal.This idea has been taken up by Wen (1940),and more recently by Sagart &Xu (2001),to resolve contact among Sino-Tibetan languages,and forms part of the Distillation Procedure for reconstruction recently proposed by Wang (2004).Other,more quantitative approaches to detecting horizontal transmission have also been proposed.One such class of methods is based on cladistics.For example,Warnow and Ringe,and their colleagues,have promoted the use of cladistic methods in linguisticTRANSACTIONS OF THE PHILOLOGICAL SOCIETY 103,2005122WANG&MINETT–VERTICAL AND HORIZONTAL TRANSMISSION123 classification,publishing a number of papers(e.g.,Warnow et al. 1995;Ringe et al.2002)in which they apply their own implemen-tation of the maximum compatibility method to refine the classi-fication of Indo-European.In their approach,determination of the optimal position of each sub-group is undertaken by seeking the topologies—there might be more than one—that are compatible with the greatest number of characters.The remaining,incompat-ible characters are viewed as having been subject to non-genetic processes,such as borrowing,and are not used to determine the optimal topologies.Application of the computational technique answer set programming to the automatic assessment of contact-induced innovations that are optimal within the Warnow-Ringe paradigm is currently under investigation(Erdem et al.2003; Brooks et al.2005),and shows promising results.Minett&Wang(2003)have described another cladistic method for detecting borrowed characters.In this method,the optimal trees are sought according to the maximum parsimony criterion.Each innovation is assumed to arise independently only once,hence instances of some innovation after itsfirst occurrence are considered to be due to contact.The method has been applied to detect lexical borrowing among representative dialects of each of the seven main sub-groups of Chinese.More tests must be performed to check that these cladistic methods are indeed detecting contact-induced change and not leading toward plausible, but spurious,inferences.Methods based on lexicostatistics have also been attempted.A number of Africanists,including Hinnebusch(1999)and Heine (1971)before him,have noted that the lexical similarities between sub-families of closely genetically related languages tend to be approximately equal when only genetic effects have influenced the languages,but tend to be different when borrowing or other non-genetic effects have influenced them.This observation has led Hinnebusch(1996)to propose a lexicostatistical concept for identifying languages that have come into contact by looking for skewing in the lexical similarities among them—s kewing is just the difference that is observed between the similarities of one language with respect to two other languages;a more formal definition of skewing is given in Section2.Hinnebusch cites a comment by Heine (1974:17),regarding possible contact among three Nilotic languages,which we repeat here to illustrate the concept (Hinnebusch,1999:177):The Nilotic languages Samburu and Nandi share 9.9percent lexical resemblances on the basis of the 200-word list.The percentage between Masai and Nandi,on the other hand,amounts to 15.7.These two languages have been in close contact over the last few centuries.It seems reasonable to assume that the difference of 5.8percent between Samb-uru ⁄Nandi and Masai ⁄Nandi is a result of the process of borrowing which took place between Masai and Nandi.The claim that the 5.8%difference,or skewing ,between the lexical similarities observed for Samburu ⁄Nandi and Masai ⁄Nandi is due to borrowing between Masai and Nandi is certainly intuitively appealing.However,notwithstanding the fact that Masai and Nandi are known to have come into contact recently,an alternative explanation for the skewing noted by Heine might simply be that Masai has been more conservative than Samburu since splitting from it,thereby causing the lexical similarity between Masai ⁄Nandi to exceed that between Samburu ⁄Nandi.Another possible explan-ation is that Samburu might have come into contact with some other language,causing some lexical items that had hitherto been cognate with lexical items in Nandi to be replaced,so reducing the lexical similarity for Samburu ⁄Nandi.Chen (2000)has proposed a lexicostatistical method for identi-fying whether a pair of languages have come into contact.The method,called rank analysis ,divides the lexical items into groups,or ranks ,that are known (or assumed)to have different average rates of replacement,and works with the lexical similarities between the languages in each of those ranks.In its simplest implementa-tion,universal rank analysis ,the Swadesh basic words are grouped into two ranks:Rank 1,consisting of the Swadesh 100word-list (Swadesh,1955),and Rank 2,the remaining words of the Swadesh 200word-list (Swadesh,1951).The lexical similarity between the pair of languages is then calculated for each rank.Based on the assumption that the Rank 1words are more conservative and less susceptible to borrowing than the Rank 2words,the two languagesTRANSACTIONS OF THE PHILOLOGICAL SOCIETY 103,2005124WANG&MINETT–VERTICAL AND HORIZONTAL TRANSMISSION125 are inferred to be genetically related only if the Rank1similarity exceeds the Rank2similarity;otherwise,the shared resemblances are inferred to have come about through ed in conjunction with stratification,whereby the rank analysis is applied only to the oldest detected stratum,the method may prove to be a powerful tool for detecting horizontal transmission.However,it is not yet clear how accurate this method is.Several other lexicostatistical methods for detecting horizontal transmission have also been developed:For example,both Sankoff(1972)and Embleton(1981;1986)have extended the traditional implementation of lexicostatistics to account for both heterogene-ous retention rates and borrowing,modelling the borrowing between languages in terms of their geographic neighbourhood. However,these methods do not in themselves allow the detection of horizontal transmission.Rather,they use estimates of the borrow-ing rates between neighbouring languages to improve the classifi-cation produced by the lexicostatistical analyses.In his study of the settlement of Taiwan,Wang(1989)postulated that patterns in the lexicostatistical error matrix(the absolute difference between the input lexical distances and the distances reconstructed from the optimal lexicostatistical tree)are indicative of horizontal transmission.While this approach has produced suggestive results,its efficacy is yet to be verified.In addition to developing a cladistic method for detecting contact,Minett&Wang(2003)also proposed a lexicostatistical approach to detecting contact,hypothesizing that branches of negative length in the trees built by distance-based tree-building algorithms,such as Neighbor-Joining(Saitou and Nei,1987),are indicative of contact.This hypothesis,however,turned out to be false.Yet another approach,suggested by Cavalli-Sforza et al.(1994) for detecting admixture among human populations,is to use bootstrapping in conjunction with an arbitrary tree-building algorithm.Bootstrapping works by generating multiple trees,with one or more randomly selected languages removed from the analysis each time.The stability of the topologies so produced are then examined—clusters of languages that are grouped together for many bootstrap samples are considered to be representative ofvalid genetic relationships.But when a language is unstable in the sampling,shifting from one sub-grouping to another,it is consid-ered likely that that language has come into contact with other languages.The multiple allegiances of such a language may perhaps be identified by examining for which bootstrap samples it shifts sub-group.This method has been applied to Indo-European by Ogura and Wang (1996)with some success,correctly detecting the heavy lexical borrowing by English from both French and the Scandina-vian languages.It is also important to mention the split decomposition method for phylogenetic analysis (Bandelt &Dress,1992).The majority of classification algorithms that have been applied to historical linguistics constrain the topologies that are produced to be trees.However,as we have mentioned,horizontal transmission cannot be shown on a language tree,and actually warps the tree away from representing genetic relationships accurately.Vertically transmitted characters and horizontally transmitted characters,if not distin-guished,tend to produce contradictory sub-groupings on the tree —in other words,they tend to be incompatible.The split decompo-sition method,however,does not constrain the topology to be a tree,but transforms the characters into a set of splits to construct a so-called splits graph .Only when the characters are compatible —suggesting that there has been no horizontal transmission —does the split decomposition method construct a tree;otherwise,a tree-like network is constructed.As more characters become subject to horizontal transmission,so the departure from a tree topology becomes more pronounced.Our aim in this paper is to place Hinnebusch’s idea for using skewing to detect language contact on firm ground by deriving a statistical hypothesis test that can detect contact under idealized conditions at prescribed levels of significance,and to investigate its level of performance under several less-idealized conditions.The paper is laid out as follows.Section 2summarizes the skewing concept and our implementation of Hinnebusch’s method for detecting contact among an arbitrary number of genetically related languages.In Section 3,we show results for a number of contact scenarios that illustrate the robustness of the test.Some concluding remarks are given in Section 4.TRANSACTIONS OF THE PHILOLOGICAL SOCIETY 103,2005126WANG&MINETT–VERTICAL AND HORIZONTAL TRANSMISSION127 2.T HE SKEWING METHOD FOR DETECTING LANGUAGE CONTACTThe skewing method for inferring language contact outlined by Hinnebusch is a similarity-based lexicostatistical method.In a standard lexicostatistical approach,the lexical similarity of two languages is calculated by counting the proportion of some set of pre-selected meanings for which the corresponding glosses appear to be reflexes of the same nguages having a greater lexical similarity are considered to be more closely genetically related than languages having a lesser lexical similarity.There are a number of theoretical problems with the lexicosta-tistical method:Sometimes,no word can found to express a particular meaning.At other times,multiple words are found to correspond to a certain meaning—which word should the linguist use to encode the character?However,looking at these problems from the viewpoint of statistics,the lexical similarity calculated for any two languages is simply an estimate of their similarity based on noisy data.As long as there are not too many such noisy characters and the linguist adopts a consistent approach to handling them,we believe that the lexical similarity can still be a useful tool for estimating how closely related are two languages.Also,Blust(2000) has reminded us that use of lexicostatistics can lead to incorrect classifications when the retention rate across lineages is heteroge-neous.While it is true that lexicostatistics can perform poorly,even for only slight heterogeneity in the retention rate,it remains an important question as to when and how often lexicostatistics performs poorly.We emphasise,however,that in the skewing method described here,no attempt is made to actually classify languages using lexicostatistics.2.1.SkewingIn order to explain more clearly what we mean by skewing and how skewing might be used to detect language contact,wefind it convenient to make use of certain concepts used in cladistics.In cladistics,a character can be defined,rather loosely,as some feature of the taxa being classified,here languages,that allows them to be categorized on the basis of the different character states thatselected characters manifest.Suppose that character state data is available for two sets of languages that are known to be members of two distinct,genetically related sub-groups of some language family,but for which the genetic relationships within each of the two sub-groups are unknown.Our aim is to formalize Hinnebusch’s method into a statistical hypothesis test that can be used to infer whether there has been contact between the languages in two such sub-groups.We begin by defining the skewing between two sibling languages,A and B,with respect to a third language,C,as the similarity of A and C minus the similarity of B and C.The similarity measure can be simply the usual lexical similarity that is adopted in lexicosta-tistical studies (e.g.,as in Dyen et al.1992)or some other measure of similarity based on characters of arbitrary type.If contact has occurred between,say,recipient language A and donor language C,A will tend to have a higher similarity with C than does B,resulting in positive skewing between A and B with respect to C.This tendency for contact to induce positive skewing forms the basis of the skewing method.As Hinnebusch (1999:184)observes,‘‘lan-guages which group together lexicostatistically will tend to have a numerical symmetry with other noncontiguous languages in the comparison set if in fact the grouped languages form a genetic group.’’In other words,we would expect languages within one sub-group of languages to exhibit little skewing with respect to languages of another,related sub-group of languages.When,nonetheless,skewing is observed,language contact is one possible cause.We also define the aggregate skewing of a language,A,with respect to another language,C,as the average skewing between A and each of its siblings with respect to C.The potential use of skewing as an indicator of language contact is best explained by means of an example.Consider the lexical similarities shown in Table 1among eight Bantu dialects:four Mijikenda dialects (Chonyi,Giriyama,Duruma and Digo)and four Comorian dialects (Ngazija,Mwali,Nzuani and Maore),all members of the Sabaki sub-group of Bantu (data from Hinnebusch,1999).Calculation of the aggregate skewing for the Mijikenda dialect Digo with respect to each of the Comorian dialects is summarizedTRANSACTIONS OF THE PHILOLOGICAL SOCIETY 103,2005128in Table 2.So,for example,the aggregate skewing of Digo with respect to Ngazija is )3per cent.Proceeding in this way,we can calculate the aggregate skewing for each of the Mijikenda dialects with respect to each of the Comorian dialects and vice versa ,the results for which are shown in Table 3.Notice that the aggregate skewing is not symmetric.For example,the skewing of Digo with respect to Ngazija,)3per cent,does not equal that of Ngazija with respect to Digo,)1⁄3per cent.This is because the former value is obtained by summing the skewing for Digo and Ngazija over all the Mijikenda dialects while the latter value is obtained by summing over all the Comorian dialects.These values indicate that Digo is lexically less similar to Ngazija than are the other Mijikenda dialects,but that Ngazija is about as similar to Table 1.Lexical similarities among two sub-groups of Nilotic languages,after Hinnebusch (1999).LexicalMijikenda:Comorian:Similarity (%)Chonyi Giriyama Duruma Digo Ngazija Mwali Nzuani Maore Chonyi10081786859605959Giriyama100776660585960Duruma1007060585859Digo10056545659Ngazija100817780Mwali1008384Nzuani10083Maore 100Table 2.An example of the calculation of aggregate skewing.Aggregate skewing is calculated for the Mijikenda dialect Digo with respect to four Comorian dialects.Comorian:(%)Ngazija Mwali Nzuani Maore Chonyi56–59¼)354–60¼)656–59¼)359–59¼0Giriyama56–60¼)454–58¼)456–59¼)359–60¼)1Duruma56–60¼)454–58¼)456–58¼)259–59¼0d S Comorian Digo )3)4)2)1⁄3WANG &MINETT –VERTICAL AND HORIZONTAL TRANSMISSION 129Digo as are the other Comorian dialects.Digo is also negatively skewed with respect to both Mwali and Nzuani.One possible cause is that Digo has borrowed from a separate sub-group.Indeed,Hinnebusch suggests that this is so,arguing that Digo has come into heavy contact with Swahili.If this is indeed the case,negative skewing would seem to indicate contact with some language outside the group.However,an alternative explanation —one based only on the skewing data —is simply that Digo is less conservative than the other Mijikenda dialects,causing it to exhibit fewer similarities with the Comorian dialects than its Mijikenda siblings.This would also account for the relatively low similarity of Digo with its siblings (shown in Table 1).Examining Table 3we see large magnitude positive skewing for Maore with respect to Digo,+3per cent,and for Chonyi with respect to Mwali,+3per cent.If then contact between two languages tends to induce positive skewing,these values imply possible contact between Maore and Digo,and between Chonyi and Mwali.But what is the probable direction of transmission?Consider the case of Chonyi and Mwali.If the direction of transmission were from Mwali to Chonyi,we would expect some of Table 3(a).Aggregate skewing percentages of the Mijikenda dialects with respect to the Comorian dialects.d S Comorian MijikendaNgazija Mwali Nzuani Maore Chonyi +1⁄3+3+1)1⁄3Giriyama +1+2⁄3+1+1Duruma+1+2⁄30)1⁄3Digo )3)4)2)1⁄3Table 3(b).Aggregate skewing of the Comorian dialects with respect to the Mijikenda dialects.d S MijikendaComorianChonyi Giriyama Duruma Digo Ngazija)1⁄3+1+1)1⁄3Mwali+1)1)1)3Nzuani)1⁄3)1)1)1⁄3Maore )1⁄3+1+1⁄3+3TRANSACTIONS OF THE PHILOLOGICAL SOCIETY 103,2005130Table4.Parameters values used to generate pseudo-random character state data in the absence of contact.Number of languages in sub-group K1:5 Number of languages in sub-group K2:5 Time depth of family:2 Time depth of sub-groups:1 Retention rate:90% Number of characters:100 the character states acquired from Mwali by borrowing to also be present in the other Comorian dialects.Therefore,in addition to positive skewing of Chonyi with respect to Mwali,we would also expect to observe at least some positive skewing of Chonyi with respect to the other Comorian dialects.On the other hand,if the direction of transmission were from Chonyi to Mwali,we would not expect the lexical similarity between Chonyi and Mwali’s Comorian siblings to be affected.Consequently,we would not expect any significant positive skewing of Chonyi with respect to the other Comorian dialects.Examining Table3,it is apparent that horizontal transmission from Mwali into Chonyi is the more probable scenario.But of course,the skewing might be caused simply by heterogeneity of the retention rates.2.2.Distribution of Aggregate Skewing—no contactWe proceed by deriving the distribution of aggregate skewing when there is no contact between the two sub-groups.Pseudo-random character state data are generated for ten languages that have split into two sub-groups,each sub-group comprisingfive languages.2 The parameter values chosen for this experiment are summarized in Table4;there is no contact.Note that the retention rate is set to be homogeneous—both across lineages and across characters—at 90%.We estimate the distribution of aggregate skewing by Monte-Carlo simulation,observing the relative frequency of different 2The algorithm used to generate the pseudo-random character state data is described in Appendix A in the supplementary material,available at http:// /More/philsoc/Transactions/html.values of aggregate skewing in multiple runs of the algorithm.Figure 1shows the distribution of aggregate skewing for all pairs of languages observed over 1000runs with the parameters specified in Table 4—both the frequency (the vertical bars)and the cumulative frequency (the curve)of aggregate skewing are shown in the figure.Notice that the aggregate skewing is approximately Gaussian distributed with zero mean.2.3.Distribution of Aggregate Skewing —contactWe now examine how the distribution of skewing changes when there is contact between the two sub-groups by injecting borrowing of various degrees between a single donor language and a single recipient language,one language in each sub-group.For the first such set of runs,the parameter values of the algorithm are set to those values specified in Table 4;the contact rate is set to 10%.Figures 2&3summarize the results of this experiment:Figure 2shows the distribution of aggregate skewing for the pairs of languages that have not come into contact;Figure 3shows the distribution of aggregate skewing of the recipient languagewithrespect to the donor language.In both cases,the distribution of aggregate skewing is roughly Gaussian.For the pairs of languages that have not come into contact,the mean level of aggregate skewing is)0.1%(Figure2),only slightly lower than the zero-mean skewing observed under the no-contact scenario(cf.Figure1). However,due to the contact between the donor and recipient languages,we expect these two languages to exhibit positive aggregate skewing.In fact,the observed mean level of aggregate skewing of the recipient language with respect to the donor language is+3.3%(Figure3).Wefind then that language contact tends to induce positive aggregate skewing between the donor and recipient,but does not greatly affect the aggregate skewing for languages that have not come into contact.2.4.Hypothesis test for detecting language contactThe abovefindings point to a method for identifying languages that have come into contact—positive aggregate skewing tends to indicate language contact.But what amount of aggregate skewing is a significant indicator of contact?As Figure1indicates,when there is no contact,less than5%of language pairs have aggregate skewing of+3.7%or greater;but slightly more than5%of language pairs have aggregate skewing of+3.6%or greater. Hence,for a5%probability of false alarm(the significance), language contact can be inferred whenever the aggregate skewing exceeds the threshold value,h,of+3.7%.Looking back at Figure2,which shows the distribution of aggregate skewing among language pairs which have not come into contact when contact has occurred between some other language pair,we observe that6.0% of language pairs exhibit significant skewing that exceeds the threshold,only slightly greater than the specified level of signifi-cance of5%.This is further evidence(for this set of parameters at least)that contact between a single pair of languages does not induce significant aggregate skewing between other language pairs. From Figure3,we observe significant aggregate skewing of the recipient language with respect to the donor language with frequency42.2%.Thus we can correctly infer contact between a single pair of languages with probability roughly42%as long as weare prepared to accept 6%chance of incorrectly inferring contact between a pair of languages between which no contact has occurred.3.R ESULTS AND DISCUSSIONWe now examine the performance of the method described in Section2for inferring contact in a variety of contact situations. One thousand pseudo-random data sets are generated for each experiment detailed below.In each case,the observed distribution of aggregate skewing is determined so that the probabilities of incorrectly inferring contact,the false alarm rate,and of correctly inferring contact,the detection rate,can be estimated. Experiment1:The examples given in Section2were implemented based on the assumption that the sub-group time depth,as well as the time depth of the entire family,were assumed to be known, allowing an appropriate threshold value to be calculated.Here we consider the effect of setting the threshold based on an incorrect evaluation of the sub-group time depth.The sub-group time depth is set successively to0.25,0.50,…,1.75,while the contact rate is set to10%.However,for each value of the actual sub-group time depth,the performance is assessed for a threshold optimised for each value of the assumed,or nominal,sub-group time depth.Thus, in most cases,the chosen threshold is not optimised for the actual value of the sub-group time depth.Figure4shows the resultant performance for10%contact:(a) the detection rate,and(b)the false alarm rate.Examining the detection rate curves,we observe that for each value of the nominal sub-group time depth the detection rate is roughly constant for all actual values of the sub-group time-depth(although there is a slight dependence on the actual time depth for the more extreme values of the nominal time depth).This means that a reasonably accurate assessment of the detection rate can be obtained based on just the nominal sub-group time depth—the greater its value,the greater the detection rate.Examining now the false alarm rate curves in Figure4(b),we see that the false alarm rate certainly does depend on the values of both the actual sub-group time depth and the nominal sub-group time。
words that differ in only one soundThey differ in meaning, they differ only in one sound segment, the different sounds occur in the same environmentExample: beat, bit They form a minimal pairSo /ea/ and /i/ are different sounds in EnglishThey are different phonemes1.the Sapir-Whorf Hypothesislinguistic determinism (语言决定论) -Language determines thought.and linguistic relativity (语言相对论)-There is no limit to the structural diversity of languages.2.BehaviorismBehaviorism in linguistics holds the view that Children learn language through a chain of stimulus-response-reinforcement (刺激—反应—强化), and adults’ use of language is also a process of stimulus-response.3.discovery proceduresA grammar is discovered through the performing of certain operations on a corpus of data4.Universal GrammarUG consists of a set of innate grammatical principles.Each principle is associated with a number of parameters.5.Systemic GrammarIt aims to explain the internal relations in language as a system network, or meaning potential.6.Ideational MetafunctionThe Ideational Function (Experiential and Logical) is to convey new information, to communicate a content that is unknown to the hearer. It is a meaning potential.It mainly consists of “transitivity” and “voice”. This function no t only specifies the available options in meaning but also determines the nature of their structural realisations. For example, “John built a new house” can be analysed as a configuration of the functions (功能配置): Actor: JohnProcess: Material: Creation: builtGoal: Affected: a new house7.Interpersonal MetafunctionThe INTERPERSONAL FUNCTION embodies all uses of language to express social and personal relations. This includes the various ways the speaker enters a speech situation and performs a speech act.8.basic speech rolesThe most fundamental types of speech role are just two: (i) giving, and (ii) demanding.Cutting across this basic distinction between giving and demanding is another distinction that relates to the nature of the commodity being exchanged. This may be either (a) goods-&-services or (b) information.9.finite verbal operatorsFiniteness is thus expressed by means of a verbal operator which is either temporal or modal.10.Textual MetafunctionThe textual metafunction enables the realization of the relation between language and context, making the language user produce a text which matches the situation.It refers to the fact that language has mechanisms to make any stretch of spoken or written discourse into coherent and unified texts and make a living passage different from a random list of sentences.It is realized by thematic structure, information structure and cohesion.11.theme and rhemeThe Theme is the element which serves as the point of departure of the message. The remainder of the message, the part in which the Theme is developed, is called the Rheme.As a message structure, a clause consists of a Theme accompanied by a Rheme. The Theme is the first constituent of the clause. All the rest of the clause is simply labelled the Rheme12.experientialismExperientialism assumes that the external reality is constrained by our uniquely human experience.The parts of this external reality to which we have access are largely constrained by the ecological niche we have adapted to and the nature of our embodiment. In other words, language does not directly reflect the world. Rather, it reflects our unique human construal of the world: our ‘world view’ as it appears to us through the lens of our embodiment.This view of reality has been termed experientialism or experiential realism by cognitive linguists George Lakoffand Mark Johnson. Experiential realism acknowledges that there is an external reality that is reflected by concepts and by language. However, this reality is mediated by our uniquely human experience which constrains the nature of this reality ‘for us’.13.image schemataAn image schema is a recurring structure within our cognitive processes which establishes patterns of understanding and reasoning. Image schemas are formed from our bodily interactions, from linguistic experience, and from historical context.14.prototype theoryPrototype theory is a mode of graded categorization in cognitive science, where some members of a category are more central than others. For example, when asked to give an example of the concept furniture, chair is more frequently cited than, say, stool. Prototype theory has also been applied in linguistics, as part of the mapping from phonological structure to semantics.二、Directions: Please answer the following questions.1.Why is Saussure called “one of the founders of structural linguisticsand “father of modern linguistics”He helped to set the study of human behavior on a new footing (basis).He helped to promote semiology.He clarified the formal strategies of Modernist thoughts.He attached importance to the study of the intimate relation between language and human mind.2.W hat are the similarities and differences between Saussure’s langue andparole and Chomsky’s competence a nd performanceThe similarities (1) language and competence mainly concerns the user’s underlying knowledge; parole and performance concerns the actual phenomena (2) language and competence are abstract; parole and performance are concrete.The differences (1) according to Saussure, language is a mere systematic inventory of items; according to Chomsky, competence should refer to the underlying competence as a system of generative processes (2)According to Saussure, language mainly base on sociology, in separating language from parole, we separate social from individual; according to Chomsky, competence was restricted to a knowledge of grammar.3.What is the conflict between descriptive adequacy and explanatoryadequacy A nd what is Chomsky’s solution to thi s conflicta theory of grammar: descriptively adequate should adequately describethe grammatical dada of a language.should not just focus on a fragment of a language.a theory of grammar: explanatorily adequateshould explain the general form of language.should choose among alternative descriptively-adequate grammars.should essentially be about how a child acquires a grammar.A theory of grammar should be both descriptively and explanatorilyadequate.But there is a conflict:To achieve DA, the grammar must be very detailed.To achieve EA, the grammar must be very simple. (think why)because the child can learn a language very easily on very little language exposure.Chomsky’s solution:construct a simple UGlet individual grammars be derivable from UG4.What are Chomsky’s contribution s to the linguistic revolutionChomsky’s contribution to the linguistic revolution is that he showed the world a totally new way of looking at language and at human nature, particularly the human mind. Chomsky challenged behaviorism and empiricism because he believes that language is innate.Rationalism (vs. empiricism in philosophy)Empiricist evidence is often unreliable.Innateness (vs. behaviorism in psychology)Children can acquire a complicated language on the basis of very limited exposure to speech.This indicates that UG is innate faculty.5.How to compare and contrast Generative Linguistics andSystemic-Functional Linguistics from perspectives of epistemology, theoretical basis, research tasks and methodology6.How many process types are there in the transitivity system Pleaseillustrate each type by a proper example.Six. Material Processes, Mental Processes, Relational Processes, Behavioural Processes, Verbal Processes, Existential Processes The typical types of outer experience are actions, goings-on and events: actions happen, people act on other people or things, or make things happen. This type of process is called Material Processes.The inner experience is that in our consciousness or imagination. You may react on it, think about it, or perceive it. This type of process is called Mental Processes.Then there is a third type of process: we learn to generalize, to relate one fragment of experience to another. It does this by classifying or identifying. This kind of process is called Relational Processes.These three processes are called major processes. Related to them are three minor processes: each one lies at the boundary between two processes of the three. Not so clearly set apart, they share some features of each, and finally acquire the character of their own.On the borderline between material and mental are the Behavioural Processes: those that represent outer manifestations of inner workings, the acting out of processes of consciousness and physiological states.On the borderline of mental and relational is the category of Verbal Processes: symbolic relationships constructed in human consciousness and enacted in the form of language.Then on the borderline between the relational process and the material process are Existential Processes, by which phenomena of all kinds are recognized to be or to exist.7.What is a multiple Theme to be contrasted with a simple Theme What isa marked Theme to be contrasted with an unmarked Theme Please illustratethem with proper examples.Conjunctions in ThemeConjunctive and modal Adjuncts in ThemeTextual, interpersonal and experiential elements in ThemeInterrogatives as multiple Themes8.What are the similarities and differences between conceptual metaphorand conceptual metonymyMetaphor and metonymy are viewed as phenomena fundamental to the structure of the conceptual system rather than superficial linguistic‘devices’.Conceptual metaphor (概念隐喻) maps structure from one conceptual domain onto another, while metonomy highlights an entity by referring to another entity within the same domain.隐喻就是把一个领域的概念投射到另一个领域,或者说从一个认知域(来源域)投射到另一个认知域(目标域)。
一、单项选择题(每小题1分,共20分)在下列每小题的四个备选答案中选出一个正确的答1. Which of the following words is entirely arbitrary?__________A. treeB. crashC. typewriterD. bang2. ________ made the distinction between competence and performance.A. SaussureB. ChomskyC. BloomfieldD. Sapir3. Conventionally a ______ is put in slashes.A. allophoneB. phoneC. phonemeD. morpheme4. The word “hospitalize” is an example of __________.A. compoundB. derivationC. inflectionD. blending5. Constituent sentences is the term used in ___________.A. structural linguisticsB. functional analysisC. TG GrammarD. traditional grammar6. Cold and hot is a pair of ___________ antonyms.A. gradableB. complementaryC. reversalD. converse7. According to Searle, those illocutionary acts whose point is to commit the speaker to some futurecourse of action are called________.A. commissivesB. directivesC. expressiveD. declaratives8. Speech variety may be used instead of _______.A. vernacular language, dialect, pidgin, creoleB. standard languageC. both A and BD. none of the above9.______ deals with how language is acquired, understood and produced.A. SociolinguisticsB. PsycholinguisticsC. PragmaticsD. Morphology10. Discovering procedures are practiced by ________.A. descriptive grammarB. TC GrammarC. traditional grammarD. functional grammar11. The function of the sentence “Water boils at 100 degrees centigrade” is _________.A. interrogativeB. directiveC. informativeD. performative12. _________ refers to the abstract linguistic system shared by all the members of a speech community.A. ParoleB. LangueC. SpeechD. Writing13. The opening between the vocal cords is sometimes referred as _________.A. glottisB. vocal cavityC. pharynxD. uvula14. ________ refers to the study of the internal structure of words, and the rules by which words areformed.A. MorphologyB. SyntaxC. SemanticsD. Phonology15. “When did you stop taking this medicine?” is an example of _________in sense relationships.A. entailmentB. presuppositionC. assumptionD. implicature16. Idioms are ________.A. sentencesB. naming unitsC. phrasesD. communication units17. An illocutionary act is identical with________.A. sentence meaningB. the speaker’s intentionC. language understandingD. the speaker's competence18. In sociolinguistics, ______ refers to a group of institutionalized social situations typically constrainedby a common set of behavioral rules.A. domainB. situationC. societyD. community19. ______ refers to the gradual and subconscious development of ability in the first language by using itnaturally communicative situations.A. LearningB. CompetenceC. PerformanceD. Acquisition20. In which of the following stage did Chomsky add the semantic component to his TG Grammar forthe first time? __________A. The Classic TheoryB. The Standard TheoryC. The Extended Standard TheoryD. The Minimalist Program1. In Chinese when someone breaks a bowl or a plate the host or the people present arelikely to say sui sui ping an (every year be safe and happy)as a means of controlling theforces which the belivers feel might affect their lives. Which function does it perform?__________A. interrogativeB. EmotiveC. PerformativeD. Recreational2. Which of the following properties of language enables language users to overcome the barriers caused by time and place, due to this feature of language, speakers of a language are free to talk about anything in any situation? ___________A. InterchangeableB. DualityC. DisplacementD. Arbitrariness.3. Which of the following is not the major branch of linguistics? ___________A. PhonologyB. PragmaticsC. SyntaxD. Speech4._______ deals with language application to other fields, particularly education.A. Linguistic geographyB. SociolinguisticsC. Applied linguisticsD. Comparative linguistics5. A phoneme is a group of similar sounds called_________.A. minimal pairsB. allomorphsC. phonesD. allophones6. Which one is different from the others according to manners of articulation? _________A. [z]B. [w]C. [h]D. [v]7.________ doesn’t belong to the most productive means of word-formation.A. AffixationB. CompoundingC. ConversionD. Blending8. Nouns, verbs, and adjectives can be classified as __________.A. lexical wordsB. grammatical wordsC. function wordsD. form words9. ________ refers to the relations holding between elements replaceable with each other at particular place in structure, or between one element present and the others absent.A. Syntagmatic relationB. Paradigmatic relationC. Co-occurrence relationD. Hierarchical relation10. According to Standard Theory of Chomsky, ________ contain all the informationnecessary for the semantic interpretation of sentences.A. deep structureB. surface structuresC. transformational rulesD. PS-rules11. ________describes whether a proposition is true or false.A. TruthB. Truth valueC. Truth conditionD. Falsehood12. “John hit Peter” and “Peter was hit by John” are the same ________.A. propositionB. sentenceC. utteranceD. truth13. ________ is a branch of linguistics which is the study of meaning in the context of use.A. MorphologyB. SyntaxC. PragmaticsD. Semantics14. ________is the study of how speaker of a language use sentences to affect successfulcommunication.A. SemanticsB. PragmaticsC. SociolinguisticsD. Psycholinguistics15.______is defined as any regionally or socially definable human group identified byshared linguistic system.A. A speech community A. A raceC. A societyD. A country16.______variation of language is the most discernible and definable in speech variation.A. RegionalB. SocialC. StylisticD. Idiolectal17. In first language acquisition children usually _________ grammatical rules from thelinguistic information they hear.A. useB. acceptC. generalizeD. reconstruct18. By the time children are going beyond the ______ stage, they begin to incorporate someof the inflectional morphemes.A. telegraphicB. multiwordC. two-wordD. one-word19. According to Halliday, the three metafunctions of language are ________.A. ideational, interpersonal and textualB. ideational, informative and textualC. metalinguistic, interpersonal and textualD. ideational, interpersonal and referential20. The person who is often described as “'father of modern linguistics” is _______.A. FirthB. SaussureC. HallidayD. Chomsky1. Study the following dialogue. What function does it play according to the functions of language? ___________- A nice day, isn’t it?- Right! I really enjoy the sunlight.A. EmotiveB. PhaticC. PeformativeD.Interpersonal2. Unlike animal communication systems, human language is __________.A. stimulus freeB. stimulus boundC. under immediate stimulus controlD. stimulated by some occurrence of communal interest3. Which branch of linguistics studies the similarities and differences among language?___________A. Diachronic linguisticsB. Synchronic linguisticsC. Prescriptive linguisticsD. Comparative linguistics4. __________ has been widely accepted as the forefather of modern linguistics.A. ChomskyB. SaussureC. BloomfieldD. John Lyons5. Which vowel is different from the others according to the tongue position of vowels?___________A. [i]B. [u]C. [e]D.[a]6. Liquids are classified in the light of __________.A. manner of articulationB. place of articulationC. place of tongueD. none of the above7. Morphemes that represent tense, number, gender and case are called _____ morphemes.A. inflectionalB. freeC. boundD. derivational8. There are _______ morphemes in the word denationalization.A. threeB. fourC. fiveD. six9. In English, theme and rhyme are often expressed by ________ and ________.A. subject, objectB. subject, predicateC. predicate, objectD. object, predicate10. The semantic triangle holds that the meaning of a word ________.A. is interpreted through the mediation of conceptB. is related to the thing it refers toC. is the idea associated with that word in the minds of speakersD. is the image it is represented in the mind.11. “John killed Bill but Bill didn’t die” is a (n) ___________.A. entailmentB. presuppositionC. anomalyD. contradiction12. ________ found that natural language had its own logic and concludes cooperativeprinciple.A. John AustinB. John FirthC. Paul GriceD. William Jones13. _______ proposed that speech acts can fall into five general categories.A. AustinB. SearleC. SapirD. Chomsky14. ______ is not a typical example of official bilingualism.A. CanadaB. FinlandC. BelgiumD. Germany15. The most recognizable difference between American English and British English are in_____ and vocabulary.A. diglossiaB. bilingualismC. pidginizationD. blending16. ______ transfer is a process that is more commonly known as interference.A. AcquisitionB. PositiveC. NegativeD. Interrogative17. In general, the two-word stage begins roughly in the _____ half of the child’s secondyear.A. earlyB. lateC. firstD. second18. The most important contribution of the Prague School to linguistics is that it seeslanguage in terms of ________.A. functionB. meaningC. signsD. system19. The principal representative of American descriptive linguistics is________.A. BoasB. SapirC. BloomfieldD. Harris20. At the _______ stage negation is simply expressed by single words with negativemeaning.A. prelinguisticsB. multiwordC. two-wordD. one-word1. Which of the following is the most important function of language? ___________A. Interpersonal functionB. Performative functionC. Informative functionD. Recreational function2. In different languages, different terms are used to express the animal "狗", this shows thenature of ______ of human language.A. arbitrarinessB. cultural transmissionC. displacementD. discreteness3. The study of language as a whole is often called ____________.A. general linguisticsB. sociolinguisticsC. psycholinguisticsD. applied linguistics4. The study of language meaning is called __________.A. syntaxB. semanticsC. morphology D pragmatics5. In English, there is one glottal fricative. It is _______.A. [I]B. [h]C. [k]D. [f]6. The phonetic symbol for “voiced bilabial glide” is _________.A. [v]B. [d]C. [f]D. [w]7. In English -ise and -tion are called ________.A. prefixesB. suffixesC. infixesD. free morphemes8. Morphology is generally divided into two fields: the study of word-formation and ______.A. affixationB. etymologyC. inflectionD. root9. The sense relationship between “John plays the violin”and “John plays a musicalinstrument” is ________.A. hyponymyB. antonymyC. entailmentD. presupposition10. Conceptual meaning is ________.A. denotativeB. connotativeC. associativeD. affective11. Promising, undertaking, vowing are the most typical of the_______.A. declarationsB. directivesC. sociolinguisticsD. Chomsky12. The violation of one or more of the conversational _______ (of the CP) can, when thelistener fully understands the speaker, create conversational implicatures, and humor sometimes.A. standardsB. principlesC. levelsD. maxim13. _______variety refers to speech variation according to the particular area where aspeaker comes from.A. RegionalB. SocialC. StylisticD. Register variety14. In a speech community people have something in common ______ -a language or aparticular variety of language and rules for using it.A. sociallyB. linguisticallyC. culturallyD. pragmatically15. The optimum age for SLA is _______.A. childhoodB. early teensC. teensD. adulthood16. In general, ________ language acquisition refers to children's development of theirlanguage of the community in which a child has been brought up.A. firstB. secondC. thirdD. fourth17. Children follow a similar ________ schedule of predictable stages along the route oflanguage development across cultures.A. learningB. studyingC. acquisitionD. acquiring18. The theory of _______ considers that all sentences are generated from a semanticstructure.A. Case GrammarB. Stratificational GrammarC. Relational GrammarD. Generative Semantics19. _______ grammar is the most widespread and the best understood method of discussingIndo-European language.A. TraditionalB. StructuralC. FunctionalD. Generative20. Hjelmslev is a Danish linguist and the central figure of the ______.A. Prague SchoolB. Copenhagen SchoolC. London SchoolD. Generative Semantics21. The relation between form and means in human language is natural.22. Descriptive linguistics studies one specific language.23. Phonetics is the science that deals with the sound system.24. Phonology is the study of speech sounds of all human languages.25. All consonants are produced with vocal-cord vibration.26. Inflectional morphology is one of the two sub-branches of morphology.27. The structure of words is not governed by rules.28. If a word has sense, it must have reference.29. “He didn't stop smoking” presupposes that he had been smoking.30. A locutionary act is the act of expressing the speaker’s intention.31. A text is best regarded as a semantic unit, a unit not of form but of meaning.32. Although the age at which children will pass through a given stage can vary significantfrom child to child, the particular sequence of stages seems to be the same for all children acquiring a given language.33. It’s normally assumed that, by the age of five, with an operating vocabulary of more2,000 words, children have completed the greater part of the language acquisition process.34. “Tom hit Mary and Mary hit Tom”is an exocentric construction while “men andwomen” is an endocentric construction.35. Following Saussure’s distinction between langue and parole, Trubetzkoy argued thatphonetics belonged to langue whereas phonology belonged to parole.36. The subject-predicate distinction is the same as the theme and functional linguistics.37. Langue refers to the abstract linguistic system shared by all the members of a speechcommunity.38. Consonant sounds can be either voiced or voiceless, while all vowel sounds arevoiceless.39. The standard language is a superposed, socially prestigious dialect of language.40. An illocutionary act is identical with the speaker’s intention.21. When language is used to get information from other, it serves an informative function.22. All the English words are not symbolic.23. All sounds produced by human speech organs are linguistics symbols.24. There are 72 symbols for consonants and 25 for vowels in English.25. The sound [z] is an oral voiced post-alveolar fricative.26. A morpheme is the basic unit in the study of morphology.27. Derivational affixes are added to an existing form to create a word.28. The grammatical meaning of a sentence refers to its grammaticality.29. There is only one argument in the sentence “Kids like apples”.30. While conversation participants nearly always observe the CP, they do not alwaysobserve these maxims strictly.31. Inviting, suggesting, warning, ordering are instance of commissives.32. Cohesion and coherence is identical with each other in essence.33. It has been recognized that in ideal acquisition situation, many adults can reachnative-like proficiency in all aspects of a second language.34. All roots are free morphemes while not all free morphemes are roots.35. In the Classical theory, Chomsky’s aim is to make linguistics a science. This theory ischaracterized by three features: emphasis on prescription of language, introduction of transformational rules, and grammatical description regardless of language formation. 36. Generative grammar is a system of rules that in some explicit and well-defined wayassigns structural descriptions to sentences.37. All words may be said to contain a root morpheme.38. Phrase structure rules allow us to better understand how words and phrases formsentences, and so on.39. Promising, undertaking, vowing are the most typical of the psycholinguistics.40. Halliday’s Systemic Grammar contains a functional component, and the theory behindhis Function Grammar is systemic.21. Most animal communication systems lack the primary level of articulation.22. Langue is more abstract than parole and therefore is not directly observable.23. General linguistics deals with the whole human language.24. Auditory phonetics investigates how a sound is perceived by the listener.25. In English, there are two nasal consonants. There are [m] and [n].26. Phonetically, the stress of a compound always falls on the first element, while thesecond element receives secondary stress.27. The meaning of the word we often used is the primary meaning.28. Meaning is central to the study of communication.29. Of the three speech acts, linguists are most interested in the illocutionary act becausethis kind of speech is identical with the speaker’s intention.30. As the process of communication is essentially a process of conveying meaning in acertain context, pragmatics can also be considered as a kind of meaning study.31. If a text has no cohesive words, we say the text is not coherent.32. The optimum age for SLA always accords with the maxim of “the younger the better”.33. In general, language acquisition refers to children’s development of their first language,that is, the native language of the community in which a child has been brought up.34. The London School is also known as systemic linguistics and functional linguistics.35. Coarticulation refers to the phenomenon of sounds continually show the influence oftheir neighbors.36. Band morphemes are independent units of meaning and can be used freely all bythemselves.37. In the history of American linguistics, the period between 1933 and 1950 is also knownas the Bloomfieldian Age.38. Paul Grice found that artificial language had its own logic and conclude cooperativeprinciple.39. Cultural transmission refers to the fact that language is cultural transmitted. It is passedon from one generation to the next through teaching and learning, rather than by instinct.40. Linguistic potential is similar to Saussure’s langue and Chomsky’s performance.21. Language change is universal, ongoing and arbitrary.22. Competence is more concrete than performance.23. Descriptive linguistics attempts to establish a theory which accounts for the rules oflanguage in general.24. The space between the vocal cords is called glottis.25. Stops can be divided into two types: plosives and nasals.26. All roots are free and all affixes are bound.27. The sentence “Tom, smoke!” and“Tom smokes” have the same semantic predication.28. The sentence that contains the same words is the same in meaning.29. A sentence is a grammatical unit and an utterance is a pragmatic notion.30. “John has been to Asia” entails “John has been to Japan”.31. Coherence is a logical, orderly and aesthetical relationship between parts, in speech,writing, or argument.32. Language acquisition is in accordance with language learning on the assumption thatthere are different processes.33. SLA is primarily the study of how learners acquire or learn an additional language afterthey acquired their first language.34. According to Firth, a system is a set of mutually exclusive options that come into play atsome point in a linguistic structure.35. American structuralism is a branch of diachronic linguistics that emerged independentlyin the United States at the beginning of the twentieth century.36. Phonological knowledge is a native speaker’s intuition about the sounds and soundpatterns of his language.37. Phonetics has three sub-branches: acoustic phonetics, auditory phonetics andarticulatory phonetics.38. The paradigmatical relation shows us the inner layering of sentences.39. An ethnic dialect is spoken mainly by a less privileged population that has experiencedsome sort of social isolation, such as racial discrimination.40. Searle proposed that speech act can fall into six general categories.41. _______ is the actual realization of one's linguistic knowledge in utterances.42. Combining two parts of two already existing words is called ________ inword-formation.43. Lexicon, in most cases, is synonymous with _______.44. A ________ is a structurally independent unit that usually comprises a number of wordsto form a complete statement, question or command.45. _______ studies the sentence structure of language.46. In semantic analysis, ________ is the abstraction of the meaning of a sentence.47. A speech _______is a group of people who share the same language or a particularvariety of language.48. In learning a second language, a learner will subconsciously use his L1 knowledge. Thisprocess is called language _______.49. The development of a first or native language is called first language________.50. ________ is a branch of linguistics which is the study of meaning in the context of use.41. In any language words can be used in new ways to mean new things and can becombined into innumerable sentences based on limited rules. This feature is usually termed _________ or creative.42. The description of a language as it changes through time is a ___________ study.43. The qualities of vowels depend upon the position of the _________ and the lips.44. Consonants differ from vowels in that the latter are produced without___________.45. ________________ is a reverse process of derivation, and therefore is a process ofshortening.46. For ______________________ antonyms, it is a matter of either one or the other.47. A ___________ language is originally a pidgin that has become established as a nativelanguage in some speech community.48. A linguistic _____________ refers to a word or expression that is prohibited by the“polite” society from general use.49. For the vast majority of children, language development occurs spontaneously andrequires little conscious _____________ on the part of adults.50. Systemic-Functional Grammar is a(n) _________________ oriented functionallinguistics approach.41. One general principle of linguistic analysis is the primacy of _______ over writing.42. ___________ is the branch of linguistics which studies the form of words.43. A word formed by derivation is called a ____________, and a word formed bycompounding is called a ___________.44. ____________ is a science that is concerned with how words are combined to formphrases and how phrases are combined by rules to form sentences.45. The ________________ relation is a kind of relation between linguistic forms in asentence and linguistic forms outside the sentence.46. The various meanings of a _____________ word are related to some degree.47. The pre-school years are a ____________ period for first language acquisition.48. Whorf proposed that all higher levels of thinking are dependent on ____________.49. _______________ deals with how language is acquired, understood and produced.50. Structuralism is based on the assumption that grammatical categories should be definednot in terms of meaning but in terms of ________________.41. Language is a system of arbitrary _________ symbols used for human communication.42. Langue or competence is _________ and not directly observable, while parole orperformance is concrete and directly observable.43. The vocal tract can be divided into two parts: the oral cavity and the __________.44. The combination of two or sometimes more than two words to create new words incalled _____________.45. The words of English are classified into native words and __________ words.46. Language itself is not sexist, but its use may reflect the ______________ attitudeconnoted in the language that is sexist.47. _____________ refers to the gradual and subconscious development of ability in thefirst language by using it naturally communicative situations.48. In first language acquisition children usually __________ grammatical rules from thelinguistic information they hear.49. The starting point of Chomsky's TG Grammar is his ___________ hypothesis.50. A ____________ analysis of an utterance will reveal what the speaker intends to dowith it.51. discreteness52. competence53. triphthongs54. bound morpheme55.syntax51. design features52. performance53. minimal pair54. morpheme55. polysemy51. arbitrarinessngue53.vowel54. affixs55. reference51. language52. phonemes, phones53. backformation54.lexical semantics55.speech community56. How does a linguist construct a rule?57. How can we decide a minimal pair or a minimal set?58. Explain the interrelations between semantic and structural classifications of morphemes.59. List the differences between surface structure and deep structure of a sentence.60. How does competence differ from performance?56. Explain the differences between langue and parole.57. Use examples to illustrate the difference between a compulsory constituent and an optional constituent.58. Define the two terms: phonemes and allophones.59. What are the three types of distribution?60. How many types of linguistic knowledge does a native speaker possess? What are they?56.What are the five sub-branches of linguistics?57. What are the suprasegmental features are?58. What is the difference between cohesion and coherence?59. What is ethnic dialect?60. What is learner language and target language?56. What is the difference between synchronic linguistics and diachronic linguistics?57. What are the functions of language?58.Explain the relationship between speech and writing.59. Analyze the word “disestablishment” by IC analysis:60.What does morphology study?61. What are the differences between inflectional and derivational affixes in terms of both function and position?61. List the differences between surface structure and deep structure of a sentence.61. Define the three types of distribution respectively.61. Describe with examples various types of morpheme used in English.。
一.概论Chapter 1. Introducing SLA1.Second language acquisition (SLA)2.Second language (L2)(也可能是第三四五外语) also commonly called a target language (TL)Refers to: any language that is the aim or goal of learning.3.Basic questions:1). What exactly does the L2 learner come to know2). How does the learner acquire this knowledge3). Why are some learners more successful than othersDifferent answers from different fields4.3 main perspectives:linguistic; psychological; social.Only one (x) Combine (√)Chapter 2. Foundations of SLAⅠ. The world of second languages1.Multi-; bi-; mono- lingualism1)Multilingualism: the ability to use 2 or more languages.(bilingualism: 2 languages; multilingualism: >2)2)Monolingualism: the ability to use only one language.3)Multilingual competence (Vivian Cook, Multicompetence)Refers to: the compound state of a mind with 2 or more grammars.4)Monolingual competence (Vivian Cook, Monocompetence)Refers to: knowledge of only one language.2.People with multicompetence (a unique combination) ≠ 2monolingualsWorld demographic shows:3.Acquisition4.The number of L1 and L2 speakers of different languages can onlybe estimated.1)Linguistic information is often not officially collected.2)Answers to questions seeking linguistic information may notbe reliable.3)A lack of agreement on definition of terms and on criteria foridentification.Ⅱ. The nature of language learning1.L1 acquisition1). L1 acquisition was completed before you came to school andthe development normally takes place without any conscious effort.2). Complex grammatical patterns continue to develop through theschool years.Time Children will< 6 months (infant)Produce all of the vowel sounds and most of the consonant sounds of any language in the world. Learn to discriminate the among the sounds that make a different in the meaning of words (the phonemes)< < 3 years old Master an awareness of basic discourse patterns < 3 years old Master most of the distinctive sounds of L1< 5 or 6 years old Control most of the basic L1 grammatical patterns2. The role of natural ability1) Refers to: Humans are born with an innate capacity to learnlanguage.2) Reasons:Children began to learn L1 at the same age and in much thesame way.…master the basic phonological and grammatical operations in L1 at 5/ 6.…can understand and create novel utterances; and are not limited to repeating what they have heard; the utterances they produce are often systematically different from those of the adults around them.There is a cut-off age for L1 acquisition.L1 acquisition is not simply a facet of general intelligence.3)The natural ability, in terms of innate capacity, is that partof language structure is genetically “given”to every human child.3. The role of social experience1) A necessary condition for acquisition: appropriate socialexperience (including L1 input and interaction) is2) Intentional L1 teaching to children is not necessary and mayhave little effect.3) Sources of L1 input and interaction vary for cultural andsocial factors.4) Children get adequate L1 input and interaction→sources haslittle effect on the rate and sequence of phonological and grammatical development.The regional and social varieties (sources) of the input→pronunciationⅢ. L1 vs. L2 learning1.L1 and L2 development:2.Understanding the statesⅣ. The logical problem of language learning1.Noam Chomsky:1)innate linguistic knowledge must underlie languageacquisition2)Universal Grammar2.The theory of Universal Grammar:Reasons:1)Children’s knowledge of language > what could be learned fromthe input.2)Constraints and principles cannot be learned.3)Universal patterns of development cannot be explained bylanguage-specific input.Children often say things that adults do not.Children use language in accordance with general universalrules of language though they have not developed thecognitive ability to understand these rules. Not learnedfrom deduction or imitation.Patterns of children’s language development are notdirectly determined by the input they receive.Ⅴ. Frame works for SLA≤1950s1960s1970s1980s1990s。
Language Teaching Methodology AbstractMethodology in language teaching has been characterized in a variety of ways. A more or less classical formulation suggests that methodology is that which links theory and practice. Language teaching has not profited much from more general views of educational design. The curriculum perspective comes from general education and views successful instruction as an interweaving of Knowledge, Instructional, Learner, and Administrative considerations.I. IntroductionLanguage teaching came into its own as a profession in the last century. Central to this phenomenon was the emergence of the concept of "methods" of language teaching. The method concept in language teaching the notion of a systematic set of teaching practices based on a particular theory of language and language learning is a powerful one, and the quest for better methods was a preoccupation of teachers and applied linguists throughout the 20th century.Howatt's overview documents the history of changes of practice in language teaching throughout history, bringing the chronology up through the Direct Method in the 20th century. One of the most lasting legacies of the Direct Method has been the notion of "method" itself.II. Language Teaching MethodologyMethodology in language teaching has been characterized in a variety of ways. A more or less classical formulation suggests that methodology is that which links theory and practice. Theory statements would include theories of what language is and how language is learned or, more specifically, theories of second language acquisition. Such theories are linked to various design features of language instruction. These design features might include stated objectives, syllabus specifications, types of activities, roles of teachers, learners, materials, and so forth. Design features in turn are linked to actual teaching and learning practices as observed in the environments where language teaching and learning take place. This whole complex of elements defines language teaching methodology.2.1 Schools of Language Teaching MethodologyWithin methodology a distinction is often made between methods and approaches, in which methods are held to be fixed teaching systems with prescribed techniques and practices, whereas approaches represent language teaching philosophies that can be interpreted and applied in a variety of different ways in the classroom. This distinction is probably most usefully seen as defining a continuum of entities ranging from highly prescribed methods to loosely described approaches.The period from the 1950s to the 1980s has often been referred to as "The Age of Methods," during which a number of quite detailed prescriptions for language teaching were proposed. Situational Language Teaching evolved in the United Kingdom while a parallel method, Audio-Lingualism, emerged in the United States. In the middle-methods period, a variety of methods were proclaimed as successors to the then prevailing Situational Language Teaching and Audio-Lingual methods. These alternatives were promoted under such titles as Silent Way, Suggestopedia, Community Language Learning, and Total Physical Response. In the 1980s, these methods in turn came to be overshadowed by more interactive views of language teaching, which collectively came to be known as Communicative Language Teaching (CLT). Communicative Language Teaching advocates subscribed to a broad set of principles such as these:•Learners learn a language through using it to communicate.•Authentic and meaningful communication should be the goal of classroom activities.•Fluency is an important dimension of communication.•Communication involves the integration of different language skills.•Learning is a process of creative construction and involves trial and error.2.2 CLTHowever, CLT advocates avoided prescribing the set of practices through which these principles could best be realized, thus putting CLT clearly on the approach rather than the method end of the spectrum.Communicative Language Teaching has spawned a number of off-shoots that share the same basic set of principles, but which spell out philosophical details or envision instructional practices in somewhat diverse ways. These CLT spin-offapproaches include The Natural Approach, Cooperative Language Learning, Content-Based Teaching, and Task-Based Teaching.It is difficult to describe these various methods briefly and yet fairly, and such a task is well beyond the scope of this paper. However, several up-to-date texts are available that do detail differences and similarities among the many different approaches and methods that have been proposed. (See, e.g., Larsen-Freeman, 2000, and Richards & Rodgers, 2001). Perhaps it is possible to get a sense of the range of method proposals by looking at a synoptic view of the roles defined for teachers and learners within various methods. Such a synoptic (perhaps scanty) view can be seen in the following chart.As suggested, some schools of methodology see the teacher as ideal language model and commander of classroom activity (e.g., Audio-Lingual Method, Natural Approach, Suggestopedia, Total Physical Response) whereas others see the teacher as background facilitator and classroom colleague to the learners (e.g., Communicative Language Teaching, Cooperative Language Learning).2.3 Other Global IssuesThere are other global issues to which spokespersons for the various methods and approaches respond in alternative ways. For example, should second language learning by adults be modeled on first language learning by children? One set of schools (e.g., Total Physical Response, Natural Approach) notes that first language acquisition is the only universally successful model of language learning we have, and thus that second language pedagogy must necessarily model itself on first language acquisition. An opposed view (e.g., Silent Way, Suggestopedia) observes that adults have different brains, interests, timing constraints, and learning environments than do children, and that adult classroom learning therefore has to be fashioned in a way quite dissimilar to the way in which nature fashions how first languages are learned by children.Another key distinction turns on the role of perception versus production in early stages of language learning. One school of thought proposes that learners should begin to communicate, to use a new language actively, on first contact (e.g., Audio-Lingual Method, Silent Way, Community Language Learning), while the other school of thought states that an initial and prolonged period of reception (listening, reading) should precede any attempts at production (e.g., Natural Approach).III. The teaching methodology in the futureThe future is always uncertain, and this is no less true in anticipating methodological directions in second language teaching than in any other field. Some current predictions assume the carrying on and refinement of current trends; others appear a bit more science-fiction-like in their vision. Outlined below are 10 scenarios that are likely to shape the teaching of second languages in the next decades of the new millenium. Thesemethodological candidates are given identifying labels in a somewhat tongue-in-cheek style, perhaps a bit reminiscent of yesteryear's method labels.3.1 Teacher/Learner CollaboratesMatchmaking techniques will be developed which will link learners and teachers with similar styles and approaches to language learning. Looking at the Teacher and Learner roles sketched in Figure 2, one can anticipate development of a system in which the preferential ways in which teachers teach and learners learn can be matched in instructional settings, perhaps via on-line computer networks or other technological resources. Renewed interest in some type of "Focus on Form" has provided a major impetus for recent second language acquisition (SLA) research."Focus on Form" proposals, variously labeled as consciousness-raising, noticing, attending, and enhancing input, are founded on the assumption that students will learn only what they are aware of. Whole Language proponents have claimed that one way to increase learner awareness of how language works is through a course of study that incorporates broader engagement with language, including literary study, process writing, authentic content, and learner collaboration.3.2 Curriculum DevelopmentalismLanguage teaching has not profited much from more general views of educational design. The curriculum perspective comes from general education and views successful instruction as an interweaving of Knowledge, Instructional, Learner, and Administrative considerations. From this perspective, methodology is viewed as only one of several instructional considerations that are necessarily thought out and realized in conjunction with all other curricular considerations. Content-based instruction assumes that language learning is a by-product of focus on meaning--on acquiring some specific topical content--and that content topics to support languagelearning should be chosen to best match learner needs and interests and to promote optimal development of second language competence. A critical question for language educators is "what content" and "how much content" best supports language learning. The natural content for language educators is literature and language itself, and we are beginning to see a resurgence of interest in literature and in the topic of "language: the basic human technology" as sources of content in language teaching.3.3 Total Functional ResponseCommunicative Language Teaching was founded (and floundered) on earlier notional/functional proposals for the description of languages. Now new leads in discourse and genre analysis, schema theory, pragmatics, and systemic/functional grammar are rekindling an interest in functionally based approaches to language teaching. One pedagogical proposal has led to a widespread reconsideration of the first and second language program in Australian schools where instruction turns on five basic text genres identified as Report, Procedure, Explanation, Exposition, and Recount. Refinement of functional models will lead to increased attention to genre and text types in both first and second language instruction.The notion here is adapted from the Multiple Intelligences view of human talents proposed by Howard Gardner (1983). This model is one of a variety of learning style models that have been proposed in general education with follow-up inquiry by language educators. The chart below shows Gardner's proposed eight native intelligences and indicates classroom language-rich task types that play to each of these particular intelligences. The challenge here is to identify these intelligences in individuallearners and then to determine appropriate and realistic instructional tasks in response.IV. Conclusion"Learning to Learn" is the key theme in an instructional focus on language learning strategies. Such strategies include, at the most basic level, memory tricks, and at higher levels, cognitive and metacognitive strategies for learning, thinking, planning, and self-monitoring. Research findings suggest that strategies can indeed be taught to language learners, that learners will apply these strategies in language learning tasks, and that such application does produce significant gains in language learning. Simple and yet highly effective strategies, such as those that help learnersremember and access new second language vocabulary items, will attract considerable instructional interest in Strategopedia.We know that the linguistic part of human communication represents only a small fraction of total meaning. At least one applied linguist has gone so far as to claim that, "We communicate so much information non-verbally in conversations that often the verbal aspect of the conversation is negligible." Despite these cautions, language teaching has chosen to restrict its attention to the linguistic component of human communication, even when the approach is labeled Communicative. The methodological proposal is to provide instructional focus on the non-linguistic aspects of communication, including rhythm, speed, pitch, intonation, tone, and hesitation phenomena in speech and gesture, facial expression, posture, and distance in non-verbal messaging.In a word, the curriculum perspective comes from general education and views successful instruction as an interweaving of Knowledge, Instructional, Learner, and Administrative considerations.V. Bibliography[1]Gardner, H. (1983). Frames of mind. New York: Basic Books.[2]Howatt, A. (1984). A history of English language teaching. Oxford: OxfordUniversity Press.[3]Larsen-Freeman, D. (2000). Techniques and principles in language teaching.Oxford: Oxford University Press.[4]Pawley, A., & Syder, F. (1983). Two puzzles for linguistic theory: Native-likeselection and native-like fluency. In J. Richards & R. Schmidt (Eds.), Language and communication. London: Longman.[5] Christison, M. (1998). Applying multiple intelligences theory in preserviceand inservice TEFL education programs. English Teaching Forum, 36 (2), 2-13.。
英语谐音梗小作文全文共3篇示例,供读者参考篇1Title: Funny English Homophonic JokesIntroduction:English homophonic jokes are a popular form of wordplay that humorously exploits the similarity in sound between words. These jokes rely on puns and phonetic similarities to create humor, making them a favorite among language enthusiasts and those with a good sense of humor. In this article, we will explore some funny English homophonic jokes that are sure to make you smile.I. What do you call a bear with no teeth?A gummy bear!Explanation: This joke plays on the similarity in sound between "gummy bear" and "gummy bear." The humor comes from the unexpected twist in the punchline, as the listener is initially led to believe that the joke is about a bear without teeth, only to discover that it's actually a play on words.II. Why did the math book look sad?Because it had too many problems.Explanation: In this joke, the homophonic nature of "problems" and "problems" creates the humor. The listener is led to think that the math book is sad due to its difficulty, only to discover that the punchline is a clever play on words.III. What did one wall say to the other wall?I'll meet you at the corner!Explanation: This joke relies on the homophonic relationship between "corner" and "corner." The humor stems from the personification of the walls and the quick-witted response given by one of them.IV. Why did the scarecrow win an award?Because he was outstanding in his field!Explanation: The pun in this joke lies in the similarity in sound between "outstanding" and "outstanding." The unexpected twist in the punchline creates humor and showcases the clever wordplay involved.V. What do you get when you cross a snowman and a vampire?Frostbite!Explanation: This joke plays on the homophonic relationship between "frostbite" and "Frostbite." The humor comes from the unexpected combination of a snowman and a vampire, as well as the clever play on words in the punchline.Conclusion:English homophonic jokes are a fun and creative form of wordplay that never fails to bring a smile to people's faces. By leveraging puns, phonetic similarities, and clever wordplay, these jokes entertain and engage their audience in a unique and humorous way. Next time you're in need of a good laugh, try sharing some of these funny English homophonic jokes with your friends and family.篇2A Funny Journey with English HomophonesHave you ever heard of the term homophones in English? Homophones are words that sound the same but have different meanings and spellings. They can lead to some hilarious misunderstandings and confusion if not used correctly. In this article, we will explore some funny examples of homophones and how they can add a touch of comedy to our language.One classic example of homophones is the pair of words "knight" and "night." While "knight" refers to a medieval warrior who served his lord, "night" refers to the time when the sun goes down and darkness falls. Imagine the confusion when someone says, "I saw a knight in shining armor last night!" Did they mean they saw a warrior or simply that they saw something in the darkness?Another amusing pair of homophones is "hear" and "here." "Hear" means to perceive sounds with the ears, while "here" is used to indicate a location. Imagine if someone said, "I can hear you here!" It could cause some confusion as to whether they heard you or if they are physically present.One more example of homophones causing confusion is the pair "cell" and "sell." "Cell" refers to a small room or compartment, while "sell" means to exchange goods for money. Imagine someone saying, "I need to cell my old phone." Are they trying to sell their phone or put it in a small room?Homophones can also add a touch of humor to our language when used in puns and jokes. For example, "Why did the scarecrow win an award? Because he was outstanding in his field!" In this joke, the pun lies in the homophones "outstanding" and "out standing," creating a clever play on words.Overall, homophones can be a fun and sometimes confusing aspect of the English language. Whether they lead to misunderstandings, funny jokes, or creative wordplay, homophones add a unique charm to our language. So next time you come across a homophone, take a moment to appreciate the humor and creativity they bring to our conversations.篇3Title: The Art of English Homophonic JokesHomophonic jokes, also known as puns, are a popular form of humor that relies on the sound of words to create humor. In the world of English, puns are a common and beloved form of comedy that can be found in everything from literature to everyday conversation. From simple wordplay to clever linguistic manipulation, puns are a diverse and versatile form of humor that can bring laughter to people of all ages and backgrounds.One of the most famous examples of English puns can be found in the works of William Shakespeare. In plays such as "Romeo and Juliet" and "Much Ado About Nothing," Shakespeare used puns to add humor and depth to his characters. For example, in "Romeo and Juliet," the character Mercutio makes a series of puns on the word "grave" when he ismortally wounded, using the word to mean both a burial site and a serious situation. This clever wordplay adds a layer of humor and irony to an otherwise tragic scene.In addition to literature, puns are also often used in advertising and marketing to create catchy slogans and taglines. Companies such as KitKat ("Have a break, have a KitKat") and Nike ("Just do it") have used puns to create memorable and effective advertising campaigns that resonate with consumers. By using clever wordplay and humor, these companies are able to connect with their audience on a more personal and emotional level.In everyday conversation, puns are a popular form of humor that can be used to lighten the mood and create a sense of camaraderie among friends and family. Whether it's a simple play on words or a more elaborate linguistic joke, puns are a fun and creative way to connect with others and bring laughter into our lives.Overall, the art of English homophonic jokes is a unique and versatile form of humor that has been enjoyed by people for centuries. From Shakespearean plays to modern advertising campaigns, puns continue to be a beloved form of comedy thatcan bring joy and laughter to people of all ages. So next time you hear a pun, don't be afraid to laugh – after all, it's all in good fun.。
企业如何竞争?企业如何赚取高于正常的回报吗?什么是需要长期保持卓越的性能呢?一个日益强大的经营策略这些基本问题的答案在于动态能力的概念。
这些的技能,程序,例程,组织结构和学科,使公司建立,聘请和协调相关的无形资产,以满足客户的需求,并不能轻易被竞争对手复制。
具有较强的动态能力是企业强烈的进取精神。
他们不仅适应商业生态系统,他们也塑造他们通过创新,协作,学习和参与。
大卫·蒂斯是动态能力的角度来看的先驱。
它植根于25年,他的研究,教学和咨询。
他的思想已经在企业战略,管理和经济学的影响力,创新,技术管理和竞争政策有关。
通过他的顾问和咨询工作,他也带来了这些想法,承担业务和政策,使周围的世界。
本书的核心思想动态能力是最清晰和最简洁的语句。
蒂斯解释其成因,应用,以及如何他们提供了一个替代的方法很多传统的战略思想,立足于简单和过时的产业组织和竞争优势的基础的理解。
通俗易懂撰写并发表了,这将是一个非常宝贵的工具,为所有那些谁想要理解这一重要的战略思想的贡献,他们的MBA学生,学者,管理人员,或顾问和刺激。
Strategic Management Journal, V ol. 18:7, 509–533 (1997)The dynamic capabilities framework analyzes the sources and methods of wealth creation and capture by private enterprise firms operating in environments of rapid technological change. The competitive advantage of firms is seen as resting on distinctive processes (ways of coordinating and combining), shaped by the firm’s (specific) asset positions (such as the firm’s portfolio of difficult-to-trade knowledge assets and complementary assets), and the evolution path(s) it has adopted or inherited. The importance of path dependencies is amplified where conditions of increasing returns exist. Whether and how a firm’s competitive advantage is eroded depends on the stability of market demand, and the ease of replicability (expanding internally) and imitatability (replication by competitors). If correct, the framework suggests that private wealth creation in regimes of rapid technological change depends in large measure on honing internal technological, organizational, and managerial processes inside the firm. In short, identifying new opportunities and organizing effectively and efficiently to embrace them are generally more fundamental to private wealth creation than is strategizing, if by strategizing one means engaging in business conduct that keeps competitors off balance, raises rival’s costs, and excludes new entrants. (C) 1997 by John Wiley & Sons, Ltd.战略管理杂志。
Exploiting Similarities among Languages for Machine TranslationTomas Mikolov Google Inc.Mountain Viewtmikolov@ Quoc V .Le Google Inc.Mountain View qvl@ Ilya Sutskever Google Inc.Mountain View ilyasu@AbstractDictionaries and phrase tables are the basis of modern statistical machine translation sys-tems.This paper develops a method that can automate the process of generating and ex-tending dictionaries and phrase tables.Our method can translate missing word and phrase entries by learning language structures based on large monolingual data and mapping be-tween languages from small bilingual data.It uses distributed representation of words and learns a linear mapping between vector spaces of languages.Despite its simplicity,our method is surprisingly effective:we can achieve almost 90%precision@5for transla-tion of words between English and Spanish.This method makes little assumption about the languages,so it can be used to extend and re-fine dictionaries and translation tables for any language pairs.1IntroductionStatistical machine translation systems have been developed for years and became very successful in practice.These systems rely on dictionaries and phrase tables which require much efforts to generate and their performance is still far behind the perfor-mance of human expert translators.In this paper,we propose a technique for machine translation that can automate the process of generating dictionaries and phrase tables.Our method is based on distributed representations and it has the potential to be com-plementary to the mainstream techniques that rely mainly on the raw counts.Our study found that it is possible to infer missing dictionary entries using distributed representations of words and phrases.We achieve this by learning a linear projection between vector spaces that rep-resent each language.The method consists of two simple steps.First,we build monolingual models of languages using large amounts of text.Next,we use a small bilingual dictionary to learn a linear projec-tion between the languages.At the test time,we can translate any word that has been seen in the mono-lingual corpora by projecting its vector representa-tion from the source language space to the target language space.Once we obtain the vector in the target language space,we output the most similar word vector as the translation.The representations of languages are learned using the distributed Skip-gram or Continuous Bag-of-Words (CBOW)models recently proposed by (Mikolov et al.,2013a).These models learn word representations using a simple neural network archi-tecture that aims to predict the neighbors of a word.Because of its simplicity,the Skip-gram and CBOW models can be trained on a large amount of text data:our parallelized implementation can learn a model from billions of words in hours.1Figure 1gives simple visualization to illustrate why and how our method works.In Figure 1,we visualize the vectors for numbers and animals in En-glish and Spanish,and it can be easily seen that these concepts have similar geometric arrangements.The reason is that as all common languages share con-cepts that are grounded in the real world (such as1The code for training these models is available at https:///p/word2vec/a r X i v :1309.4168v 1 [c s .C L ] 17 S e p 2013Figure1:Distributed word vector representations of numbers and animals in English(left)and Spanish(right).Thefive vectors in each language were projected down to two dimensions using PCA,and then manually rotated to accentuate their similarity.It can be seen that these concepts have similar geometric arrangements in both spaces,suggesting that it is possible to learn an accurate linear mapping from one space to another.This is the key idea behind our method of translation.that cat is an animal smaller than a dog),there is often a strong similarity between the vector spaces. The similarity of geometric arrangments in vector spaces is the key reason why our method works well. Our proposed approach is complementary to the existing methods that use similarity of word mor-phology between related languages or exact context matches to infer the possible translations(Koehn and Knight,2002;Haghighi et al.,2008;Koehn and Knight,2000).Although we found that mor-phological features(e.g.,edit distance between word spellings)can improve performance for related lan-guages(such as English to Spanish),our method is useful for translation between languages that are substantially different(such as English to Czech or English to Chinese).Another advantage of our method is that it pro-vides a translation score for every word pair,which can be used in multiple ways.For example,we can augment the existing phrase tables with more candi-date translations,orfilter out errors from the trans-lation tables and known dictionaries.2The Skip-gram and ContinuousBag-of-Words ModelsDistributed representations for words were proposed in(Rumelhart et al.,1986)and have become ex-tremely successful.The main advantage is that the representations of similar words are close in the vec-tor space,which makes generalization to novel pat-terns easier and model estimation more robust.Suc-cessful follow-up work includes applications to sta-tistical language modeling(Elman,1990;Bengio et al.,2003;Mikolov,2012),and to various other NLP tasks such as word representation learning,named entity recognition,disambiguation,parsing,and tag-Figure2:Graphical representation of the CBOW model and Skip-gram model.In the CBOW model,the distributed representations of context(or surrounding words)are combined to predict the word in the middle.In the Skip-gram model,the distributed representation of the input word is used to predict the context.ging(Collobert and Weston,2008;Turian et al., 2010;Socher et al.,2011;Socher et al.,2013;Col-lobert et al.,2011;Huang et al.,2012;Mikolov et al.,2013a).It was recently shown that the distributed repre-sentations of words capture surprisingly many lin-guistic regularities,and that there are many types of similarities among words that can be expressed as linear translations(Mikolov et al.,2013c).For ex-ample,vector operations“king”-“man”+“woman”results in a vector that is close to“queen”.Two particular models for learning word repre-sentations that can be efficiently trained on large amounts of text data are Skip-gram and CBOW models introduced in(Mikolov et al.,2013a).In the CBOW model,the training objective of the CBOW model is to combine the representations of surround-ing words to predict the word in the middle.The model architectures of these two methods are shown in Figure2.Similarly,in the Skip-gram model,the training objective is to learn word vector representa-tions that are good at predicting its context in the same sentence(Mikolov et al.,2013a).It is un-like traditional neural network based language mod-els(Bengio et al.,2003;Mnih and Hinton,2008; Mikolov et al.,2010),where the objective is to pre-dict the next word given the context of several pre-ceding words.Due to their low computational com-plexity,the Skip-gram and CBOW models can be trained on a large corpus in a short time(billions of words in hours).In practice,Skip-gram gives bet-ter word representations when the monolingual data is small.CBOW however is faster and more suitable for larger datasets(Mikolov et al.,2013a).They also tend to learn very similar representations for lan-guages.Due to their similarity in terms of model architecture,the rest of the section will focus on de-scribing the Skip-gram model.More formally,given a sequence of training words w1,w2,w3,...,w T,the objective of the Skip-gram model is to maximize the average log probabil-ity1TTt=1kj=−klog p(w t+j|w t)(1) where k is the size of the training window(whichcan be a function of the center word w t).The in-ner summation goes from−k to k to compute the log probability of correctly predicting the word w t+j given the word in the middle w t.The outer summa-tion goes over all words in the training corpus.In the Skip-gram model,every word w is associ-ated with two learnable parameter vectors,u w and v w.They are the“input”and“output”vectors of the w respectively.The probability of correctly predict-ing the word w i given the word w j is defined asp(w i|w j)=expu wiv wjVl=1expu l v wj(2)where V is the number of words in the vocabulary. This formulation is expensive because the cost of computing∇log p(w i|w j)is proportional to the number of words in the vocabulary V(which can be easily in order of millions).An efficient alter-native to the full softmax is the hierarchical soft-max(Morin and Bengio,2005),which greatly re-duces the complexity of computing log p(w i|w j) (about logarithmically with respect to the vocabu-lary size).The Skip-gram and CBOW models are typically trained using stochastic gradient descent.The gradi-ent is computed using backpropagation rule(Rumel-hart et al.,1986).When trained on a large dataset,these models capture substantial amount of semantic information. As mentioned before,closely related words have similar vector representations,e.g.,school and uni-versity,lake and river.This is because school and university appear in similar contexts,so that during training the vector representations of these words are pushed to be close to each other.More interestingly,the vectors capture relation-ships between concepts via linear operations.For example,vector(France)-vector(Paris)is similar to vector(Italy)-vector(Rome).3Linear Relationships BetweenLanguagesAs we visualized the word vectors using PCA,we noticed that the vector representations of similar words in different languages were related by a linear transformation.For instance,Figure1shows that the word vectors for English numbers one tofive and the corresponding Spanish words uno to cinco have similar geometric arrangements.The relation-ship between vector spaces that represent these two languages can thus possibly be captured by linear mapping(namely,a rotation and scaling).Thus,if we know the translation of one and four from English to Spanish,we can learn the transfor-mation matrix that can help us to translate even the other numbers to Spanish.4Translation MatrixSuppose we are given a set of word pairs and their associated vector representations{x i,z i}n i=1,where x i∈R d1is the distributed representation of word i in the source language,and z i∈R d2is the vector representation of its translation.It is our goal tofind a transformation matrix W such that W x i approximates z i.In practice,W can be learned by the following optimization problemminWni=1W x i−z i 2(3)which we solve with stochastic gradient descent.At the prediction time,for any given new word and its continuous vector representation x,we can map it to the other language space by computing z= W x.Then wefind the word whose representation is closest to z in the target language space,using cosine similarity as the distance metric.Despite its simplicity,this linear transformation method worked well in our experiments,better than nearest neighbor and as well as neural network clas-sifiers.The following experiments will demonstrate its effectiveness.5Experiments on WMT11DatasetsIn this section,we describe the results of our trans-lation method on the publicly available WMT11 datasets.We also describe two baseline techniques: one based on the edit distance between words,and the other based on similarity of word co-occurrences that uses word counts.The next section presents re-sults on a larger dataset,with size up to25billion words.In the above section,we described two methods, Skip-gram and CBOW,which have similar archi-tectures and perform similarly.In terms of speed,CBOW is usually faster and for that reason,we used it in the following experiments.25.1Setup DescriptionThe datasets in our experiments are WMT11text data from website.3Using these corpora,we built monolingual data sets for En-glish,Spanish and Czech languages.We performed these steps:•Tokenization of text using scripts from •Duplicate sentences were removed •Numeric values were rewritten as a single token •Special characters were removed(such as!?,:¡) Additionally,we formed short phrases of words us-ing a technique described in(Mikolov et al.,2013b). The idea is that words that co-occur more frequently than expected by their unigram probability are likely an atomic unit.This allows us to represent short phrases such as“ice cream”with single tokens, without blowing up the vocabulary size as it would happen if we would consider all bigrams as phrases. Importantly,as we want to test if our work can provide non-obvious translations of words,we dis-carded named entities by removing the words con-taining uppercase letters from our monolingual data. The named entities can either be kept unchanged,or translated using simpler techniques,for example us-ing the edit distance(Koehn and Knight,2002).The statistics for the obtained corpora are reported in Ta-ble1.To obtain dictionaries between languages,we used the most frequent words from the monolingual source datasets,and translated these words using on-line Google Translate(GT).As mentioned pre-viously,we also used short phrases as the dictionary entries.As not all words that GT produces are in our vocabularies that we built from the monolingual WMT11data,we report the vocabulary coverage in each experiment.For the calculation of translation 2It should be noted that the following experiments deal mainly with frequent words.The Skip-gram,although slower to train than CBOW,is preferable architecture if one is inter-ested in high quality represenations for the infrequent words.3/wmt11/training-monolingual.tgz Table1:The sizes of the monolingual training datasets from WMT11.The vocabularies consist of the words that occurred at leastfive times in the corpus.Language Training tokens Vocabulary size English575M127KSpanish84M107KCzech155M505K precision,we discarded word pairs that cannot be translated due to missing vocabulary entries.To measure the accuracy,we use the most fre-quent5K words from the source language and their translations given GT as the training data for learn-ing the Translation Matrix.The subsequent1K words in the source language and their translations are used as a test set.Because our approach is very good at generating many translation candidates,we report the top5accuracy in addition to the top1ac-curacy.It should be noted that the top1accuracy is highly underestimated,as synonym translations are counted as mistakes-we count only exact match asa successful translation.5.2Baseline TechniquesWe used two simple baselines for the further ex-periments,similar to those previously described in(Haghighi et al.,2008).Thefirst is using simi-larity of the morphological structure of words,and is based on the edit distance between words in dif-ferent languages.The second baseline uses similarity of word co-occurrences,and is thus more similar to our neural network based approach.We follow these steps:•Form count-based word vectors with dimen-sionality equal to the size of the dictionary•Count occurrence of in-dictionary words withina short window(up to10words)for each testword in the source language,and each word in the target language•Using the dictionary,map the word count vec-tors from the source language to the target lan-guage•For each test word,search for the most similar vector in the target languageTable2:Accuracy of the word translation methods using the WMT11datasets.The Edit Distance uses morphological structure of words tofind the translation.The Word Co-occurrence technique based on counts uses similarity of contexts in which words appear,which is related to our proposed technique that uses continuous representations of words and a Translation Matrix between two languages.Translation Edit Distance Word Co-occurrence Translation Matrix ED+TM Coverage P@1P@5P@1P@5P@1P@5P@1P@5En→Sp13%24%19%30%33%51%43%60%92.9% Sp→En18%27%20%30%35%52%44%62%92.9% En→Cz5%9%9%17%27%47%29%50%90.5% Cz→En7%11%11%20%23%42%25%45%90.5%Additionally,the word count vectors are normalized in the following procedure.First we remove the bias that is introduced by the different size of the training sets in both languages,by dividing the counts by the ratio of the data set sizes.For example,if we have ten times more data for the source language than for the target language,we will divide the counts in the source language by ten.Next,we apply the log func-tion to the counts and normalize each word count vector to have a unit length(L2norm).The weakness of this technique is the computa-tional complexity during the translation-the size of the word count vectors increases linearly with the size of the dictionary,which makes the translation expensive.Moreover,this approach ignores all the words that are not in the known dictionary when forming the count vectors.5.3Results with WMT11DataIn Table2,we report the performance of several approaches for translating single words and short phrases.Because the Edit Distance and our Trans-lation Matrix approach are fundamentally different, we can improve performance by using a weighted combination of similarity scores given by both tech-niques.As can be seen in Table2,the Edit Distance worked well for languages with related spellings (English and Spanish),and was less useful for more distant language pairs,such as English and Czech. To train the distributed Skip-gram model,we used the hyper-parameters recommended in(Mikolov et al.,2013a):the window size is10and the dimen-sionality of the word vectors is in the hundreds. We observed that the word vectors trained on the source language should be several times(around 2x–4x)larger than the word vectors trained on the target language.For example,the best perfor-mance on English to Spanish translation was ob-tained with800-dimensional English word vectors and200-dimensional Spanish vectors.However, for the opposite direction,the best accuracy was achieved with800-dimensional Spanish vectors and 300-dimensional English vectors.6Large Scale ExperimentsIn this section,we scale up our techniques to larger datasets to show how performance improves with more training data.For these experiments,we used large English and Spanish corpora that have sev-eral billion words(Google News datasets).We per-formed the same data cleaning and pre-processing as for the WMT11experiments.Figure3shows how the performance improves as the amount of mono-lingual data increases.Again,we used the most fre-quent5K words from the source language for con-structing the dictionary using Google Translate,and the next1K words for test.Our approach can also be successfully used for translating infrequent words:in Figure4,we show the translation accuracy on the less frequent words. Although the translation accuracy decreases as the test set becomes harder,for the words ranked15K–19K in the vocabulary,Precision@5is still reason-ably high,around60%.It is surprising that we can sometimes translate even words that are quite un-related to those in the known dictionary.We will present examples that demonstrate the translation quality in the Section7.We also performed the same experiment where we translate the words at ranks15K–19K using theFigure3:The Precision at1and5as the size of the mono-lingual training sets increase(EN→ES).models trained on the small WMT11datasets.The performance loss in this case was greater–the Presi-cion@5was only25%.This means that the models have to be trained on large monolingual datasets in order to accurately translate infrequent words.6.1Using Distances as Confidence Measure Sometimes it is useful to have higher accuracy at the expense of coverage.Here we show that the distance between the computed vector and the closest word vector can be used as a confidence measure.If we apply the Translation Matrix to a word vector in En-glish and obtain a vector in the Spanish word space that is not close to vector of any Spanish word,we can assume that the translation is likely to be inac-curate.More formally,we define the confidence score as max i∈V cos(W x,z i),and if this value is smaller than a threshold,the translation is skipped.In Ta-ble3we show how this approach is related to the Table3:Accuracy of our method using various confi-dence thresholds(EN→ES,large corpora).Threshold Coverage P@1P@50.092.5%53%75%0.578.4%59%82%0.654.0%71%90%0.717.0%78%91%Figure4:Accuracies of translation as the word frequency decreases.Here,we measure the accuracy of the transla-tion on disjoint sets of2000words sorted by frequency, starting from rank5K and continuing to19K.In all cases, the linear transformation was trained on the5K most fre-quent words and their translations.EN→ES. translation accuracy.For example,we can translate approximately half of the words from the test set(on EN→ES with frequency ranks5K–6K)with a very high Precision@5metric(around90%).By adding the edit distance to our scores,we can further im-prove the accuracy,especially for Precision@1as is shown in Table4.These observations can be crucial in the future work,as we can see that high-quality translations are possible for some subset of the vocabulary.The idea can be applied also in the opposite way:instead of searching for very good translations of missing entries in the dictionaries,we can detect a subset of the existing dictionary that is likely to be ambiguous or incorrect.Table4:Accuracy of the combination of our method with the edit distance for various confidence thresholds. The confidence scores differ from the previous table since they include the edit distance(EN→ES,large corpora).Threshold Coverage P@1P@50.092.5%58%77%0.477.6%66%84%0.555.0%75%91%0.625.3%85%93%7ExamplesThis section provides various examples of our trans-lation method.7.1Spanish to English Example Translations To better understand the behavior of our translation system,we show a number of example translations from Spanish to English in Table5.Interestingly, many of the mistakes are somewhat meaningful and are semantically related to the correct translation. These examples were produced by the translation matrix alone,without using the edit distance simi-larity.Table5:Examples of translations of out-of-dictionary words from Spanish to English.The three most likely translations are shown.The examples were chosen at ran-dom from words at ranks5K–6K.The word representa-tions were trained on the large corpora.Spanish word Computed English DictionaryTranslations Entry emociones emotions emotionsemotionfeelingsprotegida wetland protectedundevelopableprotectedimperio dictatorship empireimperialismtyrannydeterminante crucial determinantkeyimportantpreparada prepared preparedreadypreparemillas kilometers mileskilometresmileshablamos talking talktalkedtalkdestacaron highlighted highlightedemphasizedemphasised Table6:Examples of translations from English to Span-ish with high confidence.The models were trained on the large corpora.English word Computed Spanish DictionaryTranslation Entrypets mascotas mascotas mines minas minas unacceptable inaceptable inaceptable prayers oraciones rezo shortstop shortstop campocorto interaction interacci´o n interacci´o n ultra ultra muybeneficial beneficioso beneficioso beds camas camas connectivity conectividad conectividad transform transformar transformar motivation motivaci´o n motivaci´o n 7.2High Confidence TranslationsIn Table6,we show the translations from English to Spanish with high confidence(score above0.5).We used both edit distance and translation matrix.As can be seen,the quality is very high,around75% for Precision@1as reported in Table4.7.3Detection of Dictionary ErrorsA potential use of our system is the correction of dic-tionary errors.To demonstrate this use case,we have trained the translation matrix using20K dictionary entries for translation between English and Czech. Next,we computed the distance between the trans-lation given our system and the existing dictionary entry.Thus,we evaluate the translation confidences on words from the training set.In Table7,we list examples where the distance between the original dictionary entry and the out-put of the system was large.We chose the exam-ples manually,so this demonstration is highly rmally,the entries in the existing dic-tionaries were about the same or more accurate than our system in about85%of the cases,and in the re-maining15%our system provided better translation. Possible future extension of our approach would be to train the translation matrix on all dictionary en-tries except the one for which we calculate the score.Table7:Examples of translations where the dictionary entry and the output from our system strongly disagree. These examples were chosen manually to demonstrate that it might be possible to automaticallyfind incorrect or ambiguous dictionary entries.The vectors were trained on the large corpora.English word Computed Czech DictionaryTranslation Entry saidˇr ekl uveden´y(said)(listed)will m˚uˇz e v˚u le(can)(testament) did udˇe lal ano(did)(yes)hit zas´a hl hit(hit)-must mus´ımoˇs t(must)(cider) current st´a vaj´ıc´ıproud(current)(stream) shot vystˇr elil shot(shot)-minutes minut z´a pis(minutes)(enrollment) latest nejnovˇe jˇs´ıposledn´ı(newest)(last) blacksˇc ernoˇs iˇc ern´a(black people)(black color) hub centrum hub(center)-minus minus bez(minus)(without) retiring odejde uzavˇr en´y(leave)(closed) grown pˇe stuje dospˇe l´y(grow)(adult) agents agenti prostˇr edky(agents)(resources) 7.4Translation between distant languagepairs:English and VietnameseThe previous experiments involved languages that have a good one-to-one correspondence between words.To show that our technique is not limited by this assumption,we performed experiments on Vietnamese,where the concept of a word is differ-ent than in English.For training the monolingual Skip-gram model of Vietnamese,we used large amount of Google News data with billions of words.We performed the same data cleaning steps as for the previous languages,and additionally automatically extracted a large number of phrases using the technique de-scribed in(Mikolov et al.,2013b).This way,we ob-tained about1.3B training Vietnamese phrases that are related to English words and short phrases.In Table8,we summarize the achieved results.Table8:The accuracy of our translation method between English and Vietnamese.The edit distance technique did not provide significant improvements.Although the ac-curacy seems low for the EN→VN direction,this is in part due to the large number of synonyms in the VN model.Threshold Coverage P@1P@5En→Vn87.8%10%30%Vn→En87.8%24%40%8ConclusionIn this paper,we demonstrated the potential of dis-tributed representations for machine -ing large amounts of monolingual data and a small starting dictionary,we can successfully learn mean-ingful translations for individual words and short phrases.We demonstrated that this approach works well even for pairs of languages that are not closely related,such as English and Czech,and even English and Vietnamese.In particular,our work can be used to enrich and improve existing dictionaries and phrase tables, which would in turn lead to improvement of the current state-of-the-art machine translation systems. Application to low resource domains is another very interesting topic for future research.Clearly,there is still much to be explored.References[Bengio et al.2003]Yoshua Bengio,Rejean Ducharme, Pascal Vincent,and Christian Jauvin.2003.A neural probabilistic language model.In Journal of Machine Learning Research,pages1137–1155.。