当前位置:文档之家› The Dynamics and Semantics of Collaborative Tagging ABSTRACT

The Dynamics and Semantics of Collaborative Tagging ABSTRACT

The Dynamics and Semantics of Collaborative Tagging ABSTRACT
The Dynamics and Semantics of Collaborative Tagging ABSTRACT

The Dynamics and Semantics of Collaborative Tagging

Harry Halpin University of Edinburgh 2Buccleuch Place Edinburgh,Scotland H.Halpin@https://www.doczj.com/doc/6e5708284.html,

Valentin Robu

Dutch Center for Mathematics

and Computer Science

Kruislaan413

Amsterdam,Netherlands

robu@cwi.nl

Hana Shepherd

Princeton University

Wallace Hall

Princeton,NJ USA

hshepher@https://www.doczj.com/doc/6e5708284.html,

ABSTRACT

The debate within the Web community over the optimal means by which to organize information often pits formalized classi?ca-tions against distributed collaborative tagging systems.A number of questions remain unanswered,however,regarding the nature of collaborative tagging systems including the dynamics of such sys-tems and whether coherent classi?cation schemes can emerge from undirected tagging by users.Currently millions of users are using collaborative tagging without centrally organizing principles,and many suspect this exhibits features considered to be indicative of a complex system.If this is the case,it remains to be seem whether collaborative tagging by users over time leads to emergent classi-?cation schemes that could be formalized into an ontology usable by the Semantic Web.

This paper uses data from“popular”tagged sites on the social bookmarking site https://www.doczj.com/doc/6e5708284.html, to examine the dynamics of such col-laborative tagging systems.In particular,we are trying to deter-mine whether the distribution of tag frequencies stabilizes,which indicates a degree of cohesion or consensus among users about the optimal tags to describe particular sites.We use tag co-occurrence networks for a sample domain of tags to analyze the meaning of particular tags given their relationship to other tags and automati-cally create an ontology.We also produce a generative model of collaborative tagging in order to model and understand some of the basic dynamics behind the process.

1.INTRODUCTION

1.1Folksonomies and Ontologies

The issue of how metadata on web resources should be generated for the greatest ef?ciency and ef?cacy continues to be a central de-bate.A small but increasingly in?uential set of websites,including the social bookmarking site del.ici.ous,Flickr,Furl,Rojo,Con-notea,Technorati,and Amazon allow users to“tag”objects with keywords to facilitate retrieval both for the user and for other users. Their categories are based on the set of tags that are used to charac-terize some resource,and these categories are commonly referred to as“folksonomies.”This approach to organizing online informa-tion is usually contrasted with formal ontologies that are imposed by experts,not by users[16].

There are both bene?ts and drawbacks to the tagging approach. Tagging is considered a categorization process in contrast to a pre-optimized classi?cation process,as exempli?ed by expert-created Semantic Web ontologies.Jacob de?nes the distinction between these two processes in the following way:“Categorization divides the world of experience into groups or categories whose members share some perceptible similarity within a given context.That this context may vary and with it the composition of the category is the very basis for both the?exibility and the power of cognitive cat-egorization”while“classi?cation as process involves the orderly and systematic assignment of each entity to one and only one class within a system of mutually exclusive and non-overlapping classes; it mandates consistent application of these principles within the framework of a prescribed ordering of reality”[9].Tagging systems allow much greater malleability and adaptability in organizing in-formation than do formal classi?cation systems.Shirky explains the pitfalls of imposed classi?cation:“If you’ve got a large,ill-de?ned corpus,if you’ve got naive users,if your cataloguers aren’t expert,if there’s no one to say authoritatively what’s going on,then ontology is going to be a bad strategy”[16].Proponents of tagging systems argue that“Groups of users do not have to agree on a hi-erarchy of tags or detailed taxonomy,they only need to agree,in a general sense,on the‘meaning’of a tag enough to label similar ma-terial with terms for there to be cooperation and shared value.”[11]. Tagging is able retrieve the data and share data more ef?ciently than classifying:“Free typing loose associations is just a lot easier than making a decision about the degree of match to a pre-de?ned category(especially hierarchical ones).It’s like90%of the value of a proper taxonomy but10times simpler.”[3].However,a num-ber of problems stem from organizing information through tagging systems including ambiguity in the meaning of tags,the use of syn-onyms which creates informational redundancy,and the possibility of idiosyncratic naming conventions where individuals string to-gether many words or label items according to their personal utility, such as tagging a bookmarked site with“toread.”These drawbacks are serious in that they have the ability to jeopardize the coherence of the informational content of the tagging system and render tag-ging systems less useful for groups of users.

Given the debate over the utility of collaborative tagging sys-tems compared to other methods of organizing information,it is in-creasingly important to understand whether a coherent and socially navigable way of organizing metadata can emerge from distributive tagging systems and if so,how this might occur and whether par-ticular features of sites using tagging facilitate or inhibit the emer-gence of coherence.This paper will empirically examine elements of two subsidiary issues of this larger project.In Section4we ex-amine the dynamics of tag frequency in“popular”https://www.doczj.com/doc/6e5708284.html, tags in order to detect whether the tag frequencies converge to a sta-ble distribution and thus a categorization scheme.There is hope among the proponents of collaborative tagging systems that a sta-ble sort of distribution might arise from these systems.Note that by “stable”we do not mean that users stop tagging the resource,but that the tagging eventually settles to a group of tags that describe the resource well and new users mostly reinforce already present tags in the same frequency as they are given in the stable distri-bution.Online tagging systems have a variety of features that are

often associated with complex systems such as a large number of users,a lack of central coordination,and non-linear dynamics and these sort of systems are known to produce a type of distribution known as a“power-law.”In Section5we examine how the infor-mation content of particular tags in relation to one another might be used to extract a classi?cation scheme(ontology)from a catego-rization scheme(folksonomy).We present in detail some empirical work on the the?rst topic,and then more hypothetical work on the second.

1.2Dynamics of Tagging

What are the underlying dynamics that could cause tagging to reach some point of stability such that the distribution of tags con-verge?Researchers have observed,some casually,some more rig-orously,that the distribution of tags applied to particular URLs in tagging systems follows a power law distribution where there are a relatively small number of tags that are used with great frequency and a great number of tags that are used infrequently[11].Work by Golder and Huberman using del.ici.ous data has noted a number of patterns in tagging dynamics.The majority of sites reach their peak popularity,the highest frequency of tagging in a given time period, within10days of being saved on https://www.doczj.com/doc/6e5708284.html,(67%in the data set of Golder and Huberman)though some sites are“rediscovered”by users(about17%in their data set),suggesting stability in most sites but some degree of“burstiness”in the dynamics[7].Most impor-tantly for this paper,Golder and Huberman?nd that the proportion of frequencies of tags within a given site stabilize over time;they ?nd it occurs usually after around being bookmarked100times[7]. To make inferences about the existence of some sort of structure in the distribution of tag frequencies,we need to understand the in-formation inherent in the tags based on calculating the frequencies with which particular tags co-occur with other tags.Again,a num-ber of critical questions remain regarding the informational value of tags used.By“informational value”we mean whatever informa-tion is conveyed by the natural language term used in the tag and how this makes the tag useful or not.Since the“meaning”of tags is elusive,one way to model their informational value is to look at their co-occurrence with other tags,and to try to answer ques-tions about how these co-occurrence models re?ect the informa-tional value of particular tags:Does the structure of tag networks based on co-occurrence make intuitive sense,doing justice to the common-sense ideas we have about the relationships between the concepts under scrutiny?Can tagging provide users with any new insight into the meaning of resources just by analyzing the struc-ture of networks based on co-occurrence?Shen and Wu analyze the structure of a tagging network for https://www.doczj.com/doc/6e5708284.html, data as we do in Sec-tion5,although unlike in our examples their graph is unweighted [15].They examine the degree distribution(the distribution of the number of other nodes each node is connected to)and the cluster-ing coef?cient(based on a ratio of the total number of edges in a subgraph to the number of all possible edges)of this network and ?nd that the network is scale free and has the features Watts and Strogatz found to characterize small world networks:small aver-age path length and relatively high clustering coef?cient[19].A large amount of work exploring the structural properties of nature language networks?nds similar results[5].

The dynamics of tagging systems are closely coupled to the in-formational value of tags.Golder and Huberman cite two important features of such collaborative tagging systems that might give rise to this type of stability:imitation of others and shared knowledge [7].One of the speci?c features of https://www.doczj.com/doc/6e5708284.html, is the inclusion of “most common tags”for a given site when a user saves that site,fa-cilitating the use of the tags others have used with the greatest fre-quency.They explain the stability of the less common tags,which are not displayed for users when they save a site,based on a shared background and set of assumptions among users.Given that the stability of tag frequencies presumably relies on both the interac-tion between users(imitation)and the shared cultural knowledge of users,the stability and patterns of tag frequencies might lend insight into the degree to which there is consensus within a com-munity about how to characterize some site or into whether there are different groups of users with different sets of assumptions and who are tagging the same site.Or,as Golder and Huberman sug-gest,changes in the stability of such patterns might suggest that groups of users are migrating away from a particular consensus on how to characterize a site and its content or negotiating the chang-ing meaning of that site.To the extent this consensus is stable,it is ripe for development into a classi?cation system and formalization into ontology.

1.3Ontologies and Observed Patterns

Merholz uses the metaphor of“desire lines”for tagging systems; these“are the foot-worn paths that sometimes appear in a landscape over time”such that“a smart landscape designer will let wander-ers create paths through use,and then pave the emerging walk-ways,ensuring optimal utility”[12].This metaphor points towards a way of developing ontologies for the Semantic Web that main-tains the advantages of both taxonomic classi?cation and collabo-rative tagging,bridging the two sides of the debate about organiz-ing metadata.After users have explored the space of possibilities and discovered some optimum categorization,an ontology could be formalized for classi?cation purposes.Avoiding pre-optimization, a user-optimized ontology would take advantage of the often un-expected ways users categorize data,yet provide the amount of classi?catory power provided by a smaller set of terms that can then be mapped to a Semantic Web ontology capable of expressing structured data facets,complex relationships,and scaling across the Web,which current collaborative tagging systems are incapable of doing.It is possible that in order to share data effectively users as a group,naturally and without external in?uence restricting their vocabulary,converge to tagging each URI with a fairly small set of semantically distinct tags.

Is it possible that such a classi?cation structure can be detected? While some have claimed that it is not possible since the“respon-siveness and?exibility”of user categorization“effectively prohibit the establishment of meaningful relationships”because they are “?eeting and ephemeral,”there are a number of other cases where complex structure emerges from simple behavior[9].What are the types of local rules users might be employing which generate these observed aggregate patterns and can they be described mathemati-cally?The paradigmatic case of local rules generating structure is natural language itself,where“one of the key questions to under-stand[is]how a communication system can arise...how distributed agents without a central authority and without prior speci?cation can nevertheless arrive at a suf?ciently shared language conven-tions to make communication possible”[18].

2.THE TRIPARTITE STRUCTURE OF TAG-

GING

To begin,we need a conceptual model to describe generic col-laborative tagging systems which is capable of being formalized so that we can both make predictions about collaborative tagging systems based on empirical data and based on generative features of the model.A well-accepted tripartite model has already been theorized[10,13],although we hope to clarify it below:

There are three main entities that compose any tagging system:?The users of the system(people who actually do the tagging)?The tags themselves

?The resources being tagged(in this case,the websites) Each of these can be seen as forming separate spaces consisting of sets of vertices,which are linked together by edges(see Fig.1). The?rst space,the user space,consists of the set of all users of the tagging system,where each vertex is a user.The second space is the tag space,the set of all tags,where a tag corresponds to a term (“music”)or neologism(“toread”)in natural language.The third space is the resource space,the set of all resources,where each resource is normally denoted by a unique URI.1A tagging instance can be seen as the two edges that links together a user to a tag and then that tag to a given website or resource.Note that a tagging instance can associate a date with its tuple of a user,a tag(s),and a

resource.

Figure1:Tripartite graph structure of a tagging system.An edge linking a user,a tag and a resource(website)represents one tagging instance

From the above model and Fig.1,we observe that tags provide the link between the users of the system and the resources or con-cepts they search for.

In particular,this analysis reveals a number of dimensions of tagging that are often under-emphasized.In particular,tagging is a methodology for information retrieval,much like traditional search engines,but with a number of key differences.To simplify drasti-cally,with a traditional search engine a user enters a number of tags and then an automatic algorithm labels the resources with some measure of relevancy to the tags pre-discovery,displaying relevant resources to the user.In contrast,with collaborative tagging a user ?nds a resource,then adds one or more tags to the resource manu-ally,with a system storing the resource and the tags post-discovery. When faced with a case of retrieval,an automatic algorithm does not have to assign tags to the resource automatically,but can fol-low the tags used by the user.The difference between this and traditional searching algorithms is two-fold:collaborative tagging relies on human knowledge,as opposed to an algorithm,to directly

P R(i),where R(x)is the number of

times that particular previous tag x has been chosen in the past and P R(i)is the total sum of all previous tags.This leads to tags that have been heavily reinforced in the past being further reinforced in the future.

We illustrate this with a simple example,as given by Figure2, where P(tag)is P(o)and assuming for simpli?cation P(a)=1. Also,we will have a user only add one new tag per time step.At time step1in our example,the user has no choice but to add a new tag,“piano”to the page.At the next stage,the user does not reinforce a new tag but chooses a new tag,“music”,and so P(piano)=1

2

.At t=3,the user re-inforces a previous“piano”tag and so P(piano)increases to2

3

.At t=4,a new tag is chosen (“digital”),and so P(piano)goes up while P(music)decreases to1

4

.Taken to its conclusion,this process produces a“power-law”distribution.

Preferential attachment models do not explain why a particular new tag is added to a resource;in practice,tags are not added at random because their informational value is taken into account.For example,the oldest tags for a resource are not always the most pop-ular tags.A new tag may be added that uncovers an informational dimension not captured by older tags,and if this new dimension proves both relevant and useful then other users will reinforce the tag that represents the dimension,perhaps at the expense of older tags with less relevant informational dimensions.In this case,the new relevant tag would experience a burst of reinforcement,per-haps surmounting the frequency with which older tags were used

tag=piano t=1tag=music tag=piano tag=digital

t=2t=3t=4

Figure 2:An example of how shuf?ing leads to preferential at-tachment

and eventually stabilizing towards the top of the tag distribution for a resource.The entire tagging process might be considered an “ex-ploration”versus “exploitation”process where the exploration of possibly relevant dimensions of a resource is balanced with the ex-ploitation of previously tagged dimensions of a resource.A stabi-lized distribution theoretically represents a state where the optimal number of dimensions have been tagged.

While it is impossible for a generic model to assign a priori the exact informational value of a resource,it is possible to at least par-tially model the informational value of a speci?c tag.A hypotheti-cal tag applied to every relevant resource would,if used in a search by a user to discover resources,retrieve every document (imagine a tag such as “website,”but used once by at least one user on ev-ery resource).This type of tag has an informational value (I )of 0,and we assume that the informational value of a tag that retrieves no resources is also 0.Another tag that hypothetically selects only the resource needed,would have have an informational value (I )of 1.This does not occur so precisely in practice,as users presum-ably want the optimal tag to return some cognitively appropriate (k )number of resources,such as the number of resources that ?t on the screen or that allow users to effectively browse an area,and this may vary per user.However,for the purposes of our model we will assume that k =1when quantifying informational value to simplify our exposition.Notice also that a user may use multiple tags and these tag combinations may have different informational values that are not additive.In our work with https://www.doczj.com/doc/6e5708284.html,,we can empirically estimate the informational value of a tag by retrieving the number of web-pages a https://www.doczj.com/doc/6e5708284.html, search with a tag (or combi-nation of tags)returns and converting it into a probability,as done in Section 5.

In order to explain tight binding between information retrieval and value,we show an abstract example in Figure 3.In this ex-ample the act of “tagging”by a user (u x )can be considered the assignment of a tag (t y )to a given resource (r z ).Thus,a given search can be considered a transversal from u x via a number of tags to a number of resources.The user wishes to minimize the number of tags needed to retrieve the relevant resources,which is unknown to both the system and the user.Following Zipf’s famous “Principle of Least Effort,”users presumably minimize the number of tags used.[20].In our example the user u 2wishes to use a group of tags to discover a relevant resource,which an oracle would tell us is r 2.While tag t 1and t 5retrieve exactly one resource I (t 1)and I (t 5)=1,these tags do not identify r 2.I (t 3)=0,since it re-trieves all resources in the data-set.While I (t 2)and I (t 4)>I (t 3),

the combination of both tags retrieve exactly the resource r 2in our example so I (t 3,t 2)=1>I (t 2)and I (t 3).Notice that informa-tional value is not additive,since I (t 1,t 5)=0while both I (t 1)and I (t 5)=1

.

USERS TAGS RESOURCES

Figure 3:Tripartite tagging system graph used for search.The dotted edges represent options,while the dark edges represent a particular user engaging in a search for the shaded resource If the user is satis?ed with the search results and wishes to add a retrieved resource to their personal collection,they will reinforce one of the existing tags of the resource by repeating one of the pre-existing tags,and they might also add a new tag.If the user is not satis?ed with the search results,they will likely add a new tag to a retrieved resource.This tag may allow them to use fewer tags in future searches to retrieve the same resource.Thus,if we linearly combine our two models of informational value and preferential attachment,we can generate the probability of a tag x being rein-forced or added as a linear interpolation of preferential attachment and information value,with λbeing used to weigh the factors:P (x )=λ?P (I (x ))+(1?λ)?P (a )?P (o )?P (R (x )

4.1Power Law Distributions:De?nition

A power law is a relationship between two scalar quantities x and y of the form:

y=cxα(1) Whereαand c are constants characterizing the given power law. Without loss of generality,Eq.1can also be written as:

log y=αlog x+log c(2)

When written in this form,a fundamental property of power laws becomes apparent–when plotted in log-log space,power laws rep-resent straight lines.Therefore,the easiest way to check whether a distribution follows a power law is to apply a logarithmic trans-formation,and then use linear interpolation of the data points to determine the parametersαand c.

In our tagging domain,the intuitive explanation of the above pa-rameters is that c represents the number of times the most chosen tag for that website is used,whileαgives the power law decay pa-rameter for the frequency of tags at subsequent positions.Thus,the number of times the tag in position p is used(for p=1to25)should be approximated by a function of the form(where?α>0):

F requency(p=1)

F requency(p)=

age user employs per website.However,this observation does not affect our basic result that tag distributions follow power laws. We note,however that the above analysis refers to heavily tagged sites(tagged more than1000times),and considers the most used 25tags for each site.We have also looked at a set of less popu-lar sites,for which the power law interpolation produces somewhat less clear results-although some of these can be expected to be-come more heavily tagged and eventually evolve clear power law distributions.Furthermore,for each of the sites,below the?rst25 highest-ranking tags there are a lot of unique tags that are used more scarcely(some only by a few people).This forms the“long tail”of the distribution,which does not usually follow the same power law decay pattern as the head.

5.CONSTRUCTING INTER-TAG CORRE-

LATION GRAPHS

So while we have shown that power laws evolve on popular sites, is there any way to model the informational value that partially drives the process?We look at one of the simplest information structures that can be derived through collaborative tagging:inter-tag correlation graphs.First,we discuss the methodology used for getting such graphs.Next we illustrate our approach through an example,with tags from a limited domain.Finally,we discuss the importance of tag-tag graphs and how they could be used to shed light on the underlying dynamics of the tagging process.

5.1Methodology

The act of tagging resources by different users induces,at the tag level,a simple distance measure between any pair of tags.In our case,de?ne the distance between two tags T i,T j through a cosine distance measure:

Dist(T i,T j)=

N(T i,T j)

N(T i)?N(T j)

(4)

Where we denote by N(T i),respectively N(T j),the number of times each of the tags was used individually to tag all pages,and by N(T i,T j)the number of times two tags are used to tag the same page(summed up over all pages).The distance measure captures a degree of co-occurrence(which we interpret as a similarity metric) between the concepts represented by the two tags.The distance measure can play a big role in actual structure retrieved and we note that there are more sophisticated distance measures proposed both in item-item collaborative?ltering(see[14]),and from text mining literature.For this paper,cosine distance seemed to work well enough.

Next,from these similarities we can construct a tag-tag corre-lation graph or network,where the nodes represent the tags them-selves(weighed by their absolute frequencies),while the edges are weighed with the cosine distance measure.We build a visualiza-tion of this this weighed tag-tag correlation,by using a“spring-embedder”type of algorithm-in our case we preferred the well-known Kawada-Kawai algorithm[1].An analysis of the structural properties of such tag graphs may provide important insights into how people tag and how semantic structure emerges in distributed folksonomies(we return to this issue in Section5.3,where we dis-cuss the relation between this approach and the structures derived in the literature on language evolution).

While it would be dif?cult if not impossible for independent re-searchers to collect enough data to construct and analyze the entire space of tags used in https://www.doczj.com/doc/6e5708284.html,,we did collect enough data to pro-vide an illustration of the approach for a restricted sub-domain.5.2Constructing tag-tag correlation networks In order to exemplify our approach,we collected the data and constructed visualizations for a restricted class of15tags,all re-lated to the tag“complexity.”Our goal,in this example,was to examine which sciences does the user community of https://www.doczj.com/doc/6e5708284.html, see as most related to“complexity”science(a problem which has traditionally elicited some discussion).2The visualizations were made on Pajek[1].The purpose of the visualization was to study whether the proposed method retrieves connection between a cen-tral tag“complexity”and related disciplines.We considered two cases:

?Only the dependencies between the tag“complexity”and all other tags in the subset are taken into account when building the graph(Fig.6).

?30other edges(i.e.45edges in total for15tags)are con-sidered(Fig.7).These taken as the ones with the high-est expected correlations,though in future work we’ll con-sider more sophisticated methods for determining the cut-off, based on examining the deviation from the mean.

In both?gures,the size of the nodes is proportional to the ab-solute frequencies of each tag,while the distances are,roughly speaking,inversely related to the distance measure(as returned by the“spring-embedder”algorithm).3We tested two energy mea-sures for the“springs”attached to the edges in the visualization: Kamada-Kawai and Fruchterman-Reingold[1].For lack of space, only the visualization returned by Kamada-Kawai is presented here, since we feel it is more faithful to the proportions present in the data.

The results from the visualization algorithm do match well what one would intuitively expect to see in this domain.Some nodes are much larger than others,which,again shows the taggers prefer to use to general,heavily used tags(e.g.the tag“art”was used25 times more than“chaos”).Tags such as“chaos”,“alife”,“evolu-tion”or“networks”which correspond to topics generally seen as close to complexity science(some of them were actually developed in the context of complex systems),come close to it.At the other end,the tag art is a large,distant node from complexity.This is not so much due to the absence of sites discussing the mathemat-ics/complexity aspects in art.In fact,there are quite a few of such sites-but they represent only a small proportion of the total sites tagged with“art”,leading to a large distance measure.There are, however,some problems in the structure retrieved:the tag“ecol-ogy”would be expected to appear much closer to“complexity,”since much research on complexity in biological systems has fo-cused on applications in ecology.

We should mention that in similar work,Mika[13]concluded (for another domain than the one in this paper),that?ltering based on users produces more useful results than?ltering based on items. Although we cannot preciscely assess whether the same measures were used for building the graph,for this particular domain,our approach produced reasonably useful results.Before reaching any de?nite conclusions,however,further work examining the results with different similarity criteria,with different methods of applying

Figure6:Visualization of a tag correlation network,considering only the correlations corresponding to one central node“complexity”

Figure7:Visualization of a tag correlation network,considering all relevant correlations

spring-embedder algorithms and based on larger and more precise data sets is needed.

Overall,we expect that when applying fully automated retrieval methods on larger data sets(for example,not choosing the subset of tags considered“by hand”),the same problems we identi?ed for the tag distributions on individual sites would appear.This means, some very common,general-purpose tags could have a very large weight and centrality,although for a particular domain or applica-tion they do not carry too much information.

5.3Tag Graphs and Human Language Net-

works

In the previous section,we have shown that tag networks can be easily constructed and visualized and that they could prove useful in simple information retrieval.However,exploring the properties of these tag graphs(e.g.node centrality,degree distribution,etc.)-and their evolution-can provide us with much deeper insights into how folksonomies develop from the aggregate behavior of individ-ual users.They could additionally provide insight into how more complex semantic structures evolve.

A starting point in our further modelling is the work that seeks to explain the emergence of structure and syntax in human lan-guage.In recent high-pro?le work,Ferrer i Cancho and Sole[17, 4]study the evolution of several human languages,by constructing their graphical protostructure.They do this by taking large cor-puses of(natural language)texts and constructing inter-correlation graphs between all pairs of words in the language,based on the distance they appear from each other in these texts.

Next,they analyze the resulting graphical structure for each of the considered languages.Following the seminal work of Zipf,they show that the retrieved networks,far from having the structure pre-dicted by random graph theory for such large networks[2],have, in fact a“small world”structure.4Furthermore,this protostructure is remarkably similar across different languages.

Graphs which exhibit a small world network effect have the dis-tribution of the mean degree of the edges follow Zipf’s law.5Sole et al.[17]argue that,far from being a mere coincidence,this is an es-sential underlying property of human languages,and furthermore, syntax and structure in human languages emerges“for free”from these simpler structures.In[5],they simulate a version of Zipf’s classic generative model of human language:speakers prefer to use ambiguous,general words which have minimum entropy(and minimize their effort for choosing the word),while hearers prefer words with high entropy,and thus high information content. Comparing this setting with the considered tripartite model of tagging systems(presented above and in Fig.1),we observe some important similarities to models of language evolution.The re-sources(websites)could correspond to the objects in the real world -that need to be described by the language,the users to the speak-ers of the language,and the tags to the tokens of the language(i.e. the words).Tags also likely have a Zipf law distribution of node degrees,and while the massive data harvesting needed to show this is dif?cult,even our provisional results do point in this direction. In such a case,generative models proposed by Sole et al.[5]may be useful to explain the online behavior of taggers as regards infor-mational value.Thus,folksonomy structure could also be seen as emerging at the intersection between the efforts of taggers,who try to minimize their effort,and thus prefer to choose more common

tioned without“piano,”so we can safely say that the two terms “digital”and“piano”can be considered in some contexts just“dig-ital piano.”However,“piano”is a more abstract class than“digital piano”since“piano”often appears in the context of other tagging instances without the tag“digital.”

Since we do not have access to the entire https://www.doczj.com/doc/6e5708284.html, tagging database,we have to focus on creating ontologies from a given resource,looking at what knowledge can be extracted from a web-page about a digital piano rather than the set of all pages about dig-ital pianos tagged by users.The network is a directed and weighted graph whose weights are given by how many times two given tags (t x t y)co-occur in a tagging instance(i.e.when a user tags the site)and dividing this by their total number of occurrences of the tags,so C(y,x)=t x t y

Figure9:Tag-tag correlation graph for the individual tags of a resource about digital pianos

graphs that lend insight into the categorization process and into existing intuitions about how concepts are related.This provides preliminary evidence that some type of latent classi?cation scheme and taxonomic structure may lie behind tagging.

Finally,we have shown a simple methodology for extracting Se-mantic Web ontologies(in particular RDF and RDF Schema)that can be used on tagged resources whose tagging distribution has stabilized into a power law.Again,we need more empirical data to validate these ontologies and produce them en masse,and cur-rently we are gathering this data and as such will likely re?ne our heuristics in time.From these results,it seems quite plausible that folksonomies and ontologies,which are merely new incarnations of categorization and classi?cation respectively,are not mortal en-emies,but fundamentally compatible,as tagging-based categoriza-tion can evolve into stable classi?cation schemes that can be for-malized as ontologies.Further work will contribute more rigorous analyses to these observations.

8.ACKNOWLEDGEMENTS

This work was performed during the authors’visit at the Santa Fe Institute,Santa Fe,NM,USA.The authors wish to thanks the SFI for its support in the initial stages of this research.

9.REFERENCES

[1]V.Batagelj and A.Mrvar.Pajek-A program for large

network analysis.Connections,21:47–57,1998.

[2]Bela Bollobas.Random Graphs.Academic Press,London,

England,1985.

[3]Stuart Butter?eld.Folksonomy,2004.

https://www.doczj.com/doc/6e5708284.html,/personal/2004/08/folksonomy-

social-classi?cation-great.html.

[4]R.Ferrer Cancho and R.V.Sole.The small world of human

language.Proc.Roy.Soc.London,B268:2261–2266,2001.

[5]R.Ferrer Cancho and R.V.Sole.Least effort and the origins

of scaling in human https://www.doczj.com/doc/6e5708284.html,A,

100:788–791,2003.

[6]P.Diaconis,M.McGrath,and J.Pitman.Rif?e shuf?es,

cycles and https://www.doczj.com/doc/6e5708284.html,binatorica,15:11–29,1995.

[7]Scott Golder and Bernardo Huberman.The structure of

collaborative tagging systems,2006.HP Labs Technical

Report https://www.doczj.com/doc/6e5708284.html,/research/idl/papers/tags/.

[8]Pat Hayes.RDF Semantics,W3C Recomendation,2004.

https://www.doczj.com/doc/6e5708284.html,/TR/2004/REC-rdf-mt-20040210/.

[9]E.Jacob.Classi?cation and categorization:A difference that

makes a difference.Library Trends,52(3):515–540,2004. [10]Cameron Marlow,Mor Naaman,Danah Boyd,and Marc

Davis.Position paper,tagging,taxonomy,?ickr,article,

toread.In Collaborative Web Tagging Workshop at

WWW’06,Edinburgh,UK,2006.

[11]Adam Mathes.Folksonomies:Cooperative classi?cation and

communication through shared metadata,2004.

https://www.doczj.com/doc/6e5708284.html,/academic/computer-mediated-

communication/folksonomies.html.

[12]Peter Merholz.Metadata for the masses,2004.

https://www.doczj.com/doc/6e5708284.html,/publications/essays/archives/000361.php.

[13]Peter Mika.Ontologies are us:A uni?ed model of social

networks and semantics.In Proc.of the4th Int.Semantic

Web Conference(ISWC’05).Springer LNCS vol.3729,2005. [14]V.Robu and https://www.doczj.com/doc/6e5708284.html, Poutr′e.Retrieving utility graphs used in

multi-item negotiation through collaborative?ltering.In

Proc.of RRS’06,Hakodate,Japan,2006.

[15]Kaikai Shen and Lide Wu.Folksonomy as a complex

network,2005.https://www.doczj.com/doc/6e5708284.html,/abs/cs.IR/0509072.

[16]Clay Shirky.Ontology is over-rated,2005.

https://www.doczj.com/doc/6e5708284.html,/writings/ontology-overrated.html.

[17]R.V.Sole.Syntax for free?Nature,434:289,2005.

[18]Luc Steels.The evolution of communication systems by

adaptive agents.In Adaptive agents and multi-agent systems,

pages125–140.Springer LNAI,2004.

[19]Duncan Watts and Steve Strogatz.Collective dynamics of

’small-world’networks.Nature,393(6684):440–442,1998. [20]G.K.Zipf.Human Behaviour and the Principle of Least

Effort.Addison-Wesley,Cambridge,Massachusets,1949.

从语义学角度谈汉语中的歧义现象

从语义学角度谈汉语中的歧义现象 摘要:语言学家普遍认为, 歧义现象是指一个句子的含义模棱两可, 可以作两种或多种解释。歧义是一种普遍的语言现象,存在于一切自然语言之中。本文尝试从语义学角度对汉语中的歧义现象进行分析,对引起歧义的各种因素进行剖析,指出歧义在实际的语言应用中的影响,并提出一些消除歧义的手段。 关键词:语义学,汉语歧义,歧义成因,消除歧义 一、文献综述 语义学是研究语言意义以及语言表达之间的意义关系的学科。语义学研究自然语言的语义特征;它把语义作为语言的一个组成部分去研究,探讨它的性质、内部结构及其变异和发展以及语义间的关系等。 “语义学”这一术语是由法国语言学家Michel Breal 在1893 年首先提出来的。1897 年,Breal 编著的《语义学探索》一书问世,标志着今天所理解的语义学开始逐步形成。1900 年,这本书被译成英文,书名为Semantics : Studies in the Science of Meaning。这本书是第一部语言语义学著作,其研究的重点在词义的历史发展方面,兼顾词汇意义和语法意义。随后,语义学的发展经历了一个曲折的过程。语义学在19 世纪末、20 世纪初开始成为一门独立的学科,50 年代逐步显露发展势头,从70 年代起才获得了充分的发展。随着越来越多的语言学家开始关注和研究语义学,发表关于语义学的著作,语义学逐渐成为一个成熟的体系。 近十年来,现代语义学呈现出多学科、跨学科、多纬度和多层次的几个显著的特点。如果说传统的语义学研究主要局限在词汇意义层次的话,现代语义学的最大特点之一是对语言意义的多层次的考察。笔者经过对近十年的文献研究发现,认知语义学和规范语义学是当代国际语义学研究的两大主流取向。认知语义学就是在认知学的框架内研究语义,对心智进行经验性的研究,它对传统形式语义学的外部缺陷有着深刻的认识,是当代一个热门的研究方向。规范语义学的核心概念是形式体系,即一种抽象的模型结构,模型的抽象性意味着广泛的语义空间,进而彰显出语义自身的自由度。从蒙塔古最初建立规范语义学到后来克里普克等人的继续发展,出现了诸如类型理论,模态理论范畴语法,博弈语义学等新的学说,特别是后来帕蒂等人对蒙塔古语义学的不断完善,充分表明了规范语义学具有极强的生命力。 而国内近十年的语义学研究也遍布了语言研究的各个方面。主要有以下几个方向:对认知语义学的研究,例如张辉的《认知语义学述评》;对框架语义学的研究,例如陶明忠、马玉蕾合著的《框架语义学———格语法的第三阶段》;语义学在对隐喻的研究中的应用,例如于莹的《认知语义学框架下的隐喻研究》;对语义学与语用学之间的关系的研究,例如康灿辉的《试论语义学与语用学的互补性》;对语义学的实际应用的研究,例如王向君的《浅谈语义学与语法教学》。而对于歧义现象的语义学研究主要是对英语中的歧义现象进行研究,例如高桂莲、陈颖、王海岩合著的《对英语歧义现象的语义分析》,却鲜有对汉语中歧义现象的语义分析。所以本文尝试从语义学角度对汉语中的歧义现象进行分析,对引起歧义的各种因素进行剖析,指出歧义在实际的语言应用中的影响,并提出一些消除歧义的手段。 二、歧义的成因 语言学家普遍认为, 歧义现象是指一个句子的含义模棱两可, 可以作两种或多种解释。语言是一种约定俗成的社会现象, 而不是人们根据科学规律创造出来的, 因此, 不论哪种语言都存在大量的歧义现象。引起歧义的原因有很多,下面笔者将对歧义的主要成因进行分类和剖析。 (一)、语音歧义 汉语中的语音歧义多是由于汉语的一个音节可以对应多个不同的语素引起的,一般存在于口语中。语音歧义主要可以分成以下三种情况。 1、同音字引起的歧义 例句:甲:“请问您贵姓?” 乙:“免贵姓zhang。” 在这个例句中,zhang既可以是“张”,也可以是“章”。在汉语中,有很多读音相同但拼写和意义不同的汉字,因此容易产生语音歧义。 2、一词多音引起的歧义

语言学基础知识

一、语言和语言学 1、语言的区别性特征:Design of features of language 任意性arbitrariness 指语言符号和它代表的意义没有天然的联系 二重性duality 指语言由两层结构组成 创造性creativity 指语言可以被创造 移位性displacement 指语言可以代表时间和空间上不可及的物体、时间、观点 2、语言的功能(不是很重要) 信息功能informative 人际功能interpersonal 施为功能performative 感情功能emotive function 寒暄功能phatic communication 娱乐功能recreational function 元语言功能metalingual function 3、语言学主要分支 语音学phonetics 研究语音的产生、传播、接受过程,考查人类语言中的声音 音位学phonology研究语音和音节结构、分布和序列 形态学morphology研究词的内部结构和构词规则 句法学syntax 研究句子结构,词、短语组合的规则 语义学semantics 不仅关心字词作为词汇的意义,还有语言中词之上和之下的意义。如语素和句子的意义 语用学pragmatics 在语境中研究意义 4、宏观语言学macrolingustics 心理语言学psycholinguistics社会语言学sociolinguistics 人类语言学anthropological linguistics计算机语言学computational linguistics 5语言学中的重要区别 规定式和描写式:规定式:prescriptive说明事情应该是怎么样的 描写式:descriptive说明事情本来是怎么样的 共时研究和历时研究:共时:synchronic研究某个特定时期语言 历时:diachronic 研究语言发展规律 语言和言语:语言:langue指语言系统的整体 言语:parole指具体实际运用的语言 语言能力和语言运用:乔姆斯基(chomsky提出) 能力:competence用语言的人的语言知识储备 运用:performance真实的语言使用者在实际中的语言使用 二、语音学 1、语音学分支 发音语音学articulatory phonetics研究语言的产生 声学语言学acoustic phonetics研究语音的物理属性 听觉语音学auditory phonetics研究语言怎样被感知 2 IPA(国际音标)是由daniel Jones琼斯提出的 三、音位学 1、最小对立体minimal pairs 2、音位phoneme

北语汉语国际教育学部考研复试经验

北语汉语国际教育学部考研复试经验 北语汉语国际教育学部考研复试公共课的复习 对于公共课的复习,我想最重要的是要控制自己的贪欲,不要在海量的资料中迷失。时间有限你不会看完所有的资料,即便它们看上去都很美很真实。选择被时间验证过的经典书目可以有效避免浪费宝贵时间,最好的方式是向前辈咨询而不是听信各种宣传。 无论英语政治还是专业课,最基础的教材都是第一位的,任何技巧都是在基础知识之上的。技巧并非投机取巧,它可以让你用最短的时间掌握对考试最有价值的东西,而努力仍是必须的。课本和历年真题是基础,其他的资料只是帮助你更好的理解考点。 (一)北语汉语国际教育学部考研复试英语 在英语方面我不是牛人,但近两年的学习自己也深得体会。不论你之前的英语功底有多好,考研英语都不能怠慢。因为考研英语是几位灭绝师太凑在一起将一篇浅显的文章修改地千疮百孔之后拿来考我们,所以有时是不能用正常的思维顺序去思考的。读懂文章是一方面,做对题是另一方面。而对考研单词的熟悉程度、阅读速度和对真题逻辑的研读是应对一切的法宝。首先,单词是重中之重。 考研阅读本身就包含大量生词,需要联系上下文猜测,试想,如果你连最基本的单词都不认识,还怎么猜测。我习惯背小一点的单词书,总觉得大的单词书虽然每个词的意义概括的很全面,但是不方便记忆,常常是看了半天才翻过去一页,效果不好,也没有成就感。如果你不放心,可以买两种,小的用来记忆,大的当字典用来查阅。我当时买了星火的便携本。每天用一小时的时间记两三单元,前几遍先记基本意思和自己原本知道的偏义,因为考研英语虽然很少考基本意思,但考察的引申义从语境中也能根据基本义进行推断。这样反复过多次之后,我用书签盖住意思,然后看单词想它的意思,开始反应有点慢,反复过几次后逐渐快了。我觉得这个过程很重要,因为对单词的反应程度会决定阅读的速度。这之后仍有一些顽固的单词记不住,我把他们写在五颜六色的小卡片上,正面是单词,背面是意思,前前后后整理了近二十摞。 每天快下自习的前几分钟就拿出来看一小摞,感觉效果很不错。对于真题中的单词我也是用这种方法记,不知道单词书和小卡片被我翻了多少遍,到最后一两周,为了节省时间背政治,我只看自己小卡片上不熟的单词,直到考研的前一天我还在背单词,不是因为还没记住,只是想保持看到单词的灵敏度和熟悉度。这种记单词的方法真得很受用。

英语习语解析——认知语义学视角

Vol.28No.3 M ar.2012 赤峰学院学报(自然科学版)Journal of Chifeng University (Natural Science Edition )第28卷第3期(下) 2012年3月英语习语凝聚了英语民族人民的勤劳和智慧,是英语语言国家历史文化发展的结晶,体现了英语国家丰富的历史文化背景.学习英语习语可以成为英语专业学生学习和了解英语语言文化的一扇窗.可是,英语习语却因其结构与语义之间的差异,涉及面广泛繁杂,大多与民族历史发展、地理环境、传统习俗、宗教信仰,神话传说有关,而使学生深感习语语义难以理解,难以记忆,用法难以掌握.认知语义学研究表明,语言的意义与人的认知经验密切相关,习语的意义也是在人类在认知世界的过程中,通过一定的认知机制发展起来的,本文拟从认知语义学的角度出发来揭示英语习语的本质,并运用概念隐喻、转喻与常规知识等主要认知机制分析英语习语的语义.1 传统语义学对习语语义的阐释 传统语义学认为,习语是具有固定的结构,在语义和语法上能独立运用的词组.它的意义一般不能由各组成部分推断出来(王宗炎,1988).习语是独立于大脑理性思维和人类认知经验之外的抽象符号,它的本质是词汇,是语言系统的一个部分,而不是概念性的. 习语语义具有完整性,我们必须把习语作为一个整体来理解,构成习语的各个词汇失去了它们独立的语义,正如构成单词的每一个字母一般,分开以后毫无意义可言.习语的意义不是构成它的各个单词的意义的相加;如果将其拆分开,尽管可以理解每个单词的意义,但是仍然无法理解这个习语的意义.比如,“I am under the weather ”一句中,un-der the weather 意指“unhappy ”,单从字面意思来理解,毫无无法解释,一个人怎么能“在天气之下”呢?这也正是传统语言学家从习语的语义整体性来研究习语的原因. 习语具有结构的固定性,一般不能用别的词来代替,即使是同义词也不可以,比如,have an axe to grind (另有打算),不能换成“have a hatchet to grind ”.还有,习语中冠词的用法,名词单复数的用法,动词的主动或被动形式等都是固定的,大部分都不能随便进行句法转换.传统语义学认为,习语是一种约定俗成的习惯用法,是一种死喻,对于习语的学习就是死记硬背,模仿和记忆.2认知语义学视角下的习语意义阐释 2.1 习语是概念化的产物 针对传统的客观主义语义学观点,Lakoff &Johnson (1999)提出了基于体验哲学的认知语义观.他们认为人类的知识结构并非符号结构,也不能与客观世界直接对应,人类的知识结构,是概念结构,是人客观世界互动过程中逐渐获得.意义是基于体验的心智现象,是主客观互动的结果.意义需依靠原型范畴,概念化、意象图示来限定的.范畴、概念、推理和心智并不是外部现实客观的、镜像的反映,也不是先天就有的,而是人们在对客观外界感知和体验的基础上认知加工而形成的.完全可以想象,我们的祖先是从认识空间和自身开始认识世界的.人们在经验和行为中形成了范畴和概念,与此同时也就形成了意义(王寅,2007). 从这个意义上来看,习语也是人类认知和体验的结果.因此,我们可以这样认为,在语言和现实之间存在思维和认知这一中间层次,如果不依赖范畴知识、概念结构和认知方式,就无法接近现实.习语是人类概念体系的产物,不仅仅属于语言本身的问题.习语确实有它的特殊意义,我们应看到这些特殊意义正是来源于人类对客观物质世界的认识,而我们的概念体系正体现了这种认识. 在日常生活中,人们往往参照他们熟知的,有形的,具体的概念来认识、思维、经历、对待无形的、难以定义的概念,形成了一个不同概念之间相互关联的认知方式.以head 一词为例,the head of deparment,head of state,head of government,head of page,head of queue,head of a flower,head of stairs,head of a bed,head of a tape recorder,head of syntactic construction …在head 的所有例子中,所表达的概念都与“头”这一概念紧密联系,表达“the front part of …”或“the vitally important part ”. 比如,在英语中有大量这样以身体词汇概念,产生的习语,play it by ear(随机应变,见机行事),rack your brains (绞尽脑汁想),turn a blind eye (熟视无睹),a stiff upper lip (泰然自若,坚定不移),keep your mouth shut and your eyes open (多看少说),从这些习语的语义,可以看出人类的认知 英语习语解析———认知语义学视角 李红珍 (孝感学院外国语学院,湖北孝感432000) 摘要:认知语义学为我们提供了探索习语语义的新视角。传统的习语意义观认为习语是不可分析、任意的.认知语义学则认为习语是概念体系的产物,其意义有理据和可分析性.在英语教学中运用认知语义学理论解释和分析英语习语的语义,可以提高学生准确运用习语的能力. 关键词:习语;认知语义学;认知机制中图分类号:G642.3 文献标识码:A 文章编号:1673-260X (2012)03-0250-02 250--

语言学及其研究方法

1 学方法、转换生成语言学方法和综合性的方法。指出了语言研究应由“表层结构”向“深层结构”转换, 通过循序渐进的层次对比研究,抓住寓于语言深部的综合特征, 从而准确把握对比中的语言特征, 充分发挥对比语言的对策性。 对比语言学(contrastive linguistics. Contrastive analysis) 在1980 年上海辞书出版社的《语言与语言学辞典》中, 将contrastive analysis 只译作“对比分析”, 许多属于对比语言学范畴的我国学者们的论著, 也大都冠以“比较”二字, 归在比较语言学的领域。也就是说, 对于对比语言学这门学科, 人们还缺乏充分认识。然而就方法论而言, 对比语言学是一门共时语言学(synchronic), 它只是共时地对两种或两种以上语言进行考查分析, 指出它们之间的语音、语法、词汇等各个部门里的同异点, 并努力运用哲学、心理学、民族学等各学科的知识与理论去说明这些同异点之所以产生的根源。 一、对比语言学的历史演变 虽然对比语言学的研究经历了较长的孕育期, 但真正的研究史只有短短的五十多年。如果要讨论对比语言学的雏形, 我们可以追溯到非常久远的年代。公元初年,佛教开始传入我国; 接着, 我们中华民族光辉灿烂的古代文化传播到日本、朝鲜等国, 形成了举世瞩目的汉字文化圈。在这种吸收消化异族文化与学习吸取异邦文明的过程中, 无疑将首先对其运载工具—— 语言进行研究, 而这种研究也只能是建立在同民族的语言比较对比的基础上。因此, 从这个意义上来说, 我国古代的先哲们译释佛教经典, 五六世纪的日本人“训读”我们的古文的工作都可以认为是对比性的。不过, 事实告诉我们, 这种译释或“训读”远远不是一种建立在严格的语言科学基础上的成体系的工作, 因此, 我们无从把它划入对比语言学的范畴。 而真正的对比语言学诞生于20 世纪50 年代, 它是结构主义语言理论和外国语教育发展的联合产物。19世纪末, 索绪尔树立起语言学史上第一座划时代的丰碑, 他的语言理论影响了包括布龙菲尔德在内的众多的语言学家。

Semantics 语义学

Semantics 1. What is Semantics? Semantics is the study of the meaning of words, phrases and sentences.语义学是研究单词、短语和句子的意义的学科 2.Geoffrey Leech利奇Seven types of meaning7种意义类型: ①Conceptual meaning概念意义 ②Connotative meaning内涵意义 ③Social meaning社会意义 ④Affective meaning 感情意义Associative Meaning联想意义(②——⑥) ⑤Reflected meaning反射意义 ⑥Collocative meaning搭配意义 ⑦Thematic meaning主位意义 3.Conceptual meaning(概念意义)is also called “denotative”(外延义)and it is concerned with the relationship between a word and the thing it refers to.概念意义也叫外延义,它关注词语跟它所指称事物之间的联系 Conceptual meaning is meaning given in the dictionary. 4.Associative meaning (联想意义) is the total of all the meanings a person thinks of when they hear the word Associative meaning is the meaning which a word suggests or implies. 5.Thematic meaning (主位意义) is “what is communicated by the way in which the message is organized in terms of order and emphasis.”它是由词序和词语重音所决定的 6. The Referential Theory(指称理论): ① The Referential Theory ② The Semantic Triangle ③ Sense and Reference 7.The referential theory指称理论 is the theory of meaning which relates the meaning of a word to the thing it refers to.指称论是把词语意义跟它所指称的事物联系起来的理论 8. The semantic triangle语意三角 is the indirect relation between a word and a thing it refers to and it is mediated by concept.语意三角指词和所指事物之间没有直接关系,它们是以概念为中介的 9.Sense (涵义) is a set of properties possessed by a name. 10.Reference (指称) is the symbolic relationship that a linguistic expression has with the concrete object. 11. The sense of an expression is the thought it expresses, while its reference is the object it represents Every word has a sense, but not every word has a reference. 12. Sense Relations涵义关系 ①Synonymy(同义关系) ②Antonymy(反义关系)(Gradable、Complementary、Converse) ③Hyponymy(上下义关系) 13.But total synonymy is rare. They may differ in style, connotations and dialect.

从词汇语义学角度对比分析《呼啸山庄》的中译

从词汇语义学角度对比分析《呼啸山庄》的中译 【摘要】滋生于欧美的当代比较语言学,并不一定能成功解决所有语言教学难题,但其重要性却不容忽视。文章从词汇层次,特别是从语义学角度的词义分类和动机方面对《呼啸山庄》的中译进行对比分析。此外,这项研究的局限性可为进一步作这方面的研究起一定的铺垫作用。 【关键词】对比分析《呼啸山庄》语义学动机词义分类 词汇的对比研究包括分析词态学和语义学。本文从后者,即语义学的角度出发,对《呼啸山庄》的中译进行对比分析。语义学是法国语言学家Michel Breal 1984年在美国举行的一 次研讨会上提出来的,六年后英文本《语言学》出版。Breal 在书中首次对语义学的研究从目标和方法角度作出了系统 分析。语义学被确立为一门分支学科后,经历了四个发展阶段,即早期语义学、语源学、结构语义学和多元语义系统研究。与传统语义学不同,现代语义学注重对词汇语义学和句法语义学的研究。本文把重点放在词汇语义学上,这对于从词汇角度对比研究《呼啸山庄》的中译很重要。 许余龙曾对比较语言学这样下定义:比较语言学是一门针对两种(或两种以上)语言进行有系统的共时描述,旨在为与语言相关的活动确定它们之间的相似和相异之处,并找

出这种相似和相异含义的语言学分支。这为一些语言学家所接受,比较语言学也在语言教学过程中开始起到了一定的作用。因此,国内外的语言学家们关注过或一直重视着比较研究这一课题,如赵元任、吕淑湘、刘糜庆、王宗言、许国璋和B.L.Worf,Robert Lado(1957), Catford, Holmes, Hatim。《呼啸山庄》(Wuthering Heights)英文原版是英国文学史上的一位杰出人物艾米莉?q勃朗特的作品,艾米莉?q勃朗特因所写小说的独特而被英国著名小说家和评论家William Somerset Maugham 称为十大杰出人物之一,中国大陆上出现了三种中译版本,其中以杨苡和张玲、张扬两种译本最为流行,本文选这两种译本作为对比研究的对象。 一、文献回顾 1、词义的分类 毋庸置疑,现代语义学的研究目标是词汇意义,而词义是难以定义和分类的。根据现代朗文英语词典,“meaning”(意义)一词指的是:①意思;含义。②重要性;价值;意义。 ③意味深长的。[1]684本文在此采用第一种定义。一般而言,西方语言学界对词义有三种典型的分类。Grice从应用语言的角度把语义分为四种类型:永恒意义、应用永恒意义、场景意义和说话场景意义。而Kitty则在Grice的研究基础上把句

语言决定思维

2008-01-06 14:24 许国璋先生在《中国大百科全书》(语言文字卷)对语言作了这样的解释:“语言是人类特有的一种符号系统。当作用于人与人的关系的时候,它就表达相互反映的中介;当作用于人与客观事件的关系的时候,它是认知事物的工具;当作用于文化的时候,它是文化作息的载体。”在这个定义之后,许先生从自己所定义的语言特色、语言的功能、语言的发生、语言的模式等四个方面作了较为详细的论证。传统语言观受到了来自各个方面的挑战。其中一个最为重要的挑战就是语言和思维到底是一种什么样的关系。 语言与思维的关系问题是多种学科,如哲学、心理学、病理学、生物学、人类学和信息科学等都密切关注和长期争论的重大理论问题,当然更是心理语言学或语言心理学的一个重要课题。沈阳说要讨论语言和思维的关系,先要搞清楚两个问题:一是什么是“思维”;二是“语言与思维的关系”是一个什么样的问题。

那么什么是“思维”呢?沈阳在《语言学常识十五讲》中总结说:“狭义的‘思维’可以说就只是‘思考’,即只包括‘想的活动’,但是较广义的‘思维’可以说是包括‘思考’和‘思想’两个方面,即不但指不同程度或不同阶段‘想的活动’,也指不同程度或不同阶段‘想的结果’。”(1) 那么“语言与思维的关系”又是一个什么样的问题呢?这实际上就是要搞清楚人是怎么“想问题”的,或者说是人们在进行“想的活动”和了解“想的结果”的时候,到底主要靠的是什么。其实语言就是思维的工具,思维的各个方面,即“想的活动”和“想的结果”,实际上都离不开语言。人们正是靠了语言才能够知道大家都想了些什么,同时也才能把思维的结果固定起来和传递下去。(2) 不过仅仅说语言是思维的工具,思维离不开语言,这还比较笼统。因为从地位和作用上说,我们还不知道语言和思维是相互作用

英语语义学 Semantics

Semantics Semantics Semantics is the study of the meaning of linguistic units, words and sentences in particular. Semantics = Theory of Meaning Its goal is to reveal how language are matched with their proper meanings by the speakers of that language. Lexical semantics—the study of word meanings. it deals not only with the meanings of individual words but also the relationship between the meanings of different words. Compositional semantics—is concerned exclusively with the meanings of phrases and sentences. History of Semantics 1893 - French linguist Breal coined ―semantique‖. 1897 – Breal first use it as the science of meaning. 1900 – Its English version came out. 1980s – semantics began to be introduced into China. One of the most famous books on semantics is The Meaning of Meaning(1923). Semantic Triangle 语义三角 Concept(meaning) refers to symbolizes Symbol Thing (word)(referent) stands for The Referential Theory 指称论 Concept (Meaning): the mental image, the abstraction or generalization of objects of the same kind. Referent (Thing): the physical entity or actual object, event, idea or whatever if denoted by a word, phrase or expression. Concept VS. Referent (Thing) A referent may exist in the physical world. The concept is abstracted from the referent and labeled by a word. Sense 语义 Sense is the inherent meaning of the linguistic form independent of situational context. It’s abstract and de-contextualized. Sense Relations Sense relations between words

从语义学角度分析英语中的词汇歧义现象-2019年教育文档

从语义学角度分析英语中的词汇歧义现象 摘要:歧义是存在于古今中外所有语言中的一种常见的语言学现象,是语言结构形式与其意义之间的一种特殊关系。语言学家认为一个词或一个句子的含义模糊现象,或者存在两种或多种意义解释的现象称之为语言歧义。由于英语的词汇量很大,语法又比较灵活,语言歧义现象在英语中表现得尤为突出。因此本文从语义学角度对英语中的词汇歧义现象进行分类讨论,分析引起歧义的各种因素,具有很重要的理论和现实意义。 一、引言 语义学,也可以称为“语意学”,是涉及计算机科学、自然语言处理、语言学、心理学、逻辑学以及认知科学等诸多领域的学科专用术语,以自然语言涵义为对象,以对语言的结构、性质以及相互间的关系进行分析、研究为主要内容。歧义的“歧”是指“不一致”,“义”指的是意义。语言学家认为:“语言歧义现象是指在语言交流过程中对一个词或一个句子的意思有不同的理解,可以作两种或多种解释”①。歧义在语言运用中是不可避免的,正如美国语言学家Kaplan曾说:“歧义是语言中反常的通病”②。因此,研究语言中的歧义现象,不仅能促进语言学理论的发展,还能有效避免语言歧义在交流中造成的误解和障碍,从而提高语言交际的准确性、严密性。因此,从语义学角度深入探讨歧义现象具有极大研究价值和现实指导意义。

二、词汇歧义现象分析 2.1 多义词歧义 多义词指具有二个或二个以上意义的词。在句子中,多义词的出现往往使句子产生歧义。美国语言学家G.L.Brook曾说:“一词多义是歧义的语言基础”③。把一个多义词用在特定的语境中,通常情况下它不会产生歧义。但是,如果一个多义词的几种意义在同一个句子中都能成立,那么,此句就有了歧义。例如“Are you engaged?”一句既可以理解成“你忙吗?”又可以理解成“你定婚了吗?”。 2.2 同形异义词歧义 同形异义词是指那些拼写相同而意义不同的词。同形异义词并不是同一个词,它不同于多义词,而是有着不同词源的两个或两个以上形式相同,但是意义不同的一种语言现象。同形异义词又可以分为:同音异义词、同形同音异义词及同形异义词三种形式。例如:Im More satisfied.Ask for more.这是摩尔牌香烟广告,该商标的同音同形异义词是英语中一个常用的与数量有关的形容词。 为什么我们能同时处理多个意义而不产生混乱呢?认知语义学中的家族相似性理论能够说明其原因,“家族成员中具有某种相似特征:体态、相貌、眼睛的颜色、步态和气质都有一些相似和重叠地方”④。人们凭直觉既可准确判断某人属于某一家族,又可识别其家族成员之间的细微差别。世界是由无限种类

对比语言学的定义-起源和发展

对比语言学的定义、起源与发展 对比语言学(Contrastive Linguistics的定义 1、语言学中的比较与对比 比较是人类认识事物、研究事物的一种基本方法,也是语言学研究的一种基本方法。如果说,语言学的根本任务是对语言的某种现象加以阐述的话,那么要对某一语言现象作出阐述,总是需要对这一现象的种种表现加以比较和分析(Harlmann1980:22。因而,按其本质来说,对比语言学也是一种比较,不过是一种具有特定含义的语言学中的比较。下面,先让我们来看看对比语言学的比较,与语言学中其他分支的比较有什么不同,从而使我们能够确定对比语言学在整个语言学中的位置,及其与其他语言学研究的联系。 在进行语言学比较时,根据比较对象的不同,可以沿两条轴线来进行。一方面,可以选择共时或历时的语言现象来进行比较;另一方面,可以选择在某一语言内部或各种语言之间的语言现象来进行比较。这两条轴线的互相交叉,便形成了如下四个象限,这四个象限将语言学研究分成四大类性质和目的不同的比较。

象限I代表了同一语言内部的共时比较。这类比较是对某一语言在其历史发展的某一阶段(特别是现时阶段的语音、语法和词汇等系统的内部构成成分及组织结构的比较。 在共时语言学研究中,要对某一语言的某一结构系统进行描述,就必须对这一结构系统里的各种语言现象加以比较分析。例如,如果我们要研究一种语言的语音系统,我们就要比较这个系统里的各个音素的发音部位和方法有什么不同,它们的声学物理属性有什么不同,在音节中的分布又有什么不同的规律,我们就必须比较这个语言中各类词的语法作用有什么不同,组合搭配有什么特点,等等。而且,要确定一个语言中的词可以区分为哪几个词类,这本身就要进行大量的形态、语义、语法特征等方面的比较。因此可以说,同一语言内的共时比较是语音学、语法学、词汇学等构成当代语言学主流的各个分支学科的一种主要研究方法。 象限Ⅱ代表了同一语言内部的历时比较。这类比较是对某一语言在其历史演变的不同阶段的语音、语法和词汇等系统加以比较,从而使我们了解这一语言的发展历史,找出其基本发展演变规律。例如,通过对英语的历时比较,语言学家一般认为,英语的演变经历了古英语、中古英语、早期现代英语和现代英语等四个阶段。其语法演变的总趋势表现为从一个综合型的语言逐步向一个分析型的语言发展,即词的屈折变化逐渐减少,语法意义的表达越来越多地依赖语序以及介词等语法作用词的运用。这类比较是对某一语言的语言史及其分科(如词源学、古今比较语法学等研究的主要方法。 象限Ⅲ代表了不同语言之间的历时比较。这类比较是对不同语言(一般是亲属语言在各个历史发展阶段的语音、语法和词汇等系统进行比较,其目的主要是探讨语言之间的历史联系,并据此对世界上的语言进行谱系分类,重建或构拟某一组亲属语的共同原始语(proto-language,找出它们之间的某些共同发展规律. 例如,语言学家通过对印欧语系诸语言之向的历时比较研究,使我们能够大致了解这些语言在历史演变过程中的关系,推断出原始印欧语的大致形式。不同语言之间的历时比较往往

汉语国际教育硕士必读书目及相关论文

汉语国际教育硕士必读书目及相关论文 对外汉语教学基础理论 书目: 赵金铭主编《对外汉语教学概论》,商务印书馆,2004 周小兵等主编《对外汉语教学入门》,中山大学出版社,2004 刘询《对外汉语教育学引论》,北京语言文化大学出版社,2000 刘询主编《对外汉语教学概论》,北京语言文化大学出版社,1997 吕必松《华语教学讲习》,北京语言学院出版社,1992 吕必松《对外汉语教学研究》,北京语言学院出版社,1993 吕必松《对外汉语教学概论(讲义)》,1996 盛炎《语言教学原理》,重庆出版社,1990 李泉主编《对外汉语教学理论》,商务印书馆,2006 程棠《对外汉语教学目的原则方法》,华语教学出版社,2000 徐子亮、吴仁甫《实用对外汉语教学法》,北京大学出版社,2006 鲁健骥《对外汉语教学思考集》,北京语言文化大学出版社,1999 李扬《中高级对外汉语教学论》,北京大学出版社,1993 李扬《对外汉语本科教育研究》,北京语言文化大学出版社,1999 国家汉办汉语水平考试部《汉语水平词汇与汉字等级大纲》,北京语言学院出版社,1992 国家汉办《高等学校外国留学生汉语言专业教学大纲》,北京语言文化大学出版社,2001 国家汉办《高等学校外国留学生汉语教学大纲(长期进修)》,北京语言文化大学出版社,2001 国家汉办《高等学校外国留学生汉语教学大纲(短期强化)》,北京语言文化大学出版社,2001 论文: 钟梫执笔《15年汉语教学总结》,《语言教学与研究》(试刊)1979年第4集 吕必松《谈谈对外汉语教学的性质和特点》,《语言教学与研究》1983年第3期 吕必松《关于教学内容与教学方法问题的思考》,《语言教学与研究》1990年第2期 吕必松《再论对外汉语教学的性质和特点》,《语言教学与研究》1991年第2期 吕必松《对外汉语教学的理论研究问题刍议》,《语言文字应用》1992年第1期 崔永华《对外汉语教学学科概说》,《中国文化研究》1997年第1期 陆俭明《关于开展对外汉语教学研究之管见》,《语言文字应用》1999年第4期 赵金铭《近十年对外汉语教学研究述评》,《语言教学与研究》1989年第1期 盛炎《对外汉语教学理论研究中几个热门问题的思考》,载《中国对外汉语教学学会第三次学术讨论会论文选》,北京语言学院出版社,1990 中国对外汉语教学学会等《对外汉语教学的定性、定位、定量问题座谈会纪要》,《语言教学与研究》1995年第1期 赵金铭《对外汉语教学与研究的现状与前瞻》,《中国语文》1996年第6期 刑福义《关于对外汉语教学的学科建设》,转引自张德鑫《对外汉语教学五十年》,《语言文字应用》1996年第1期 刑福义《关于对外汉语教学》,载张德鑫主编《对外汉语教学:回眸与思考》,外语教学与研究出版社,2000

戴语言学5-semantics

5. Semantics 5.1 What is semantics? Semantics can be simply defined as the study of meaning. This definition naturally leads to the question: what is meaning? Meaning is central to the study of communication, but the question of what meaning really is is difficult to answer. Even linguists do not agree among themselves as to what meaning is. And what makes the matter even more complicated is th at philosophers, psychologists, and sociologists all claim a deep interest in the study of meaning, although they differ in their focus of interest. The philosophers are interested in understanding the relations between linguistic expressions and what they refer to in the real world, and in evaluating the truth value of linguistic expressions. The psychologists focus their interest on understanding the workings of the human mind through language. This is why it is not surprising to find ten books all bearing the title "Semantics" but talking about different things. In our discussion, we will limit ourselves to the study of meaning from a linguistic point of view. 5.2 Some views concerning the study of meaning 5.2.1 The naming theory One of the oldest notions concerning meaning, and also the most primitive one, was the naming theory proposed by the ancient Greek scholar Plato. According to this theory, the linguistic forms or symbols, in other words, the words used in a language are simply labels of the objects they stand for. So words are just names or labels for things. The limitations of this theory are obvious. First of all, this theory seems applicable to nouns only, but verbs, adjectives, and adverbs such as "think", "hard", "slowly" are definitely not labels of objects. Besides, within the category of nouns, there are nouns which denote things that do not exist in the real world at all such as "ghost", "dragon", and "unicorn", and also nouns that do not refer to physical objects, but abstract notions such as "j o y", "im pulse ". 5.2.2 The conceptualist view A more sophisticated and seemingly more plausible view than naming is one that relates words and things through the mediation of concepts of the mind. This conceptualist view has been held by some philosophers and linguists from ancient times. This view holds that there is no direct link between a linguistic form and what it refers to (i.e. , between language and the real world); rather, in the interpretation of meaning they are linked through the mediation of concepts in the mind. This is best illustrated by the classic semantic triangle or triangle of significance suggested by Ogden and Richards: THOUGHT/REFERENCE SYMBOL/FORM-REFERENT (直线表示两者之间有直接联系,虚线表示两者之间无直接联系。) In the diagram, the SYMBOL or FORM refers to the linguistic elements (words,

相关主题
文本预览
相关文档 最新文档