当前位置:文档之家› Introduction to Descriptive Linguistics Artificial Languages A Study of Formal Grammars

Introduction to Descriptive Linguistics Artificial Languages A Study of Formal Grammars

Introduction to Descriptive Linguistics Artificial Languages A Study of Formal Grammars
Introduction to Descriptive Linguistics Artificial Languages A Study of Formal Grammars

Artificial Languages: A Study of Formal Grammars

By Matthew Estes

November 17th, 2002

English 5511

Introduction to Descriptive Linguistics

Artificial Languages: A Study of Formal Grammars

From the time we start school, and for the rest of our adult lives, we must face the notion of grammar in our language. Whether our grammars are the prescriptive rules of bygone times, or the modern descriptive approach of linguists, they are inescapable. But what is a grammar?

Since ancient times, grammars have been written. The modern approach to grammars began with the work of Noam Chomsky in the 1950’s.1 Since then more than a dozen approaches to the subject have arisen. With the invention of the digital computer came the birth of programming languages. Programming languages represent a linguistic phenomenon that is capable of revealing insights into the creation of grammar formalisms, the writing of grammars, as well as the limits of grammars and their reality as yet another linguistic phenomena. Further, the desire to use computers to automatically process and understand language has put added emphasis on formal methods due their suitability for computational approaches.

The book, Language Files2, defines descriptive grammar as follows: Objective description of speaker’s knowledge of a language

(competence) based on their use of the language (performance).

1“Three Models for the description of Language”, Noam Chomsky, IEEE Transactions on Information Theory, vol. 2, no. 3, p. 113-124, 1956

2Language Files: Materials for an Introduction to Linguistics, edited by Thomas W. Stewart Jr, Nathan Vaillette, 8th edition, Ohio State University Press, 2001, pages 494 & 500.

They also define the word syntax:

The study of the way in which sentences are constructed from

smaller units called constituents; how sentences are related to each

other.

In this paper, I will discuss the notion of a formal grammar. My usage of the term grammar will cover both the descriptive grammar and syntax.

So, grammars describe a language. While a linguist traditionally separates layers of phonology, morphology, and syntax, each of these layers can be thought of as a language with a set of rules governing the construction of elements. So while grammars are normally thought to cover sentences, they can be adapted to cover the whole range of language production facilities.

One important concept in the discussion of grammars is that a grammar describes a “set”. The set could be all sentences written by Shakespeare, or it could be sentences uttered by native American English speakers.

What is a Formal Grammar? When a scientist uses the term “Formal” it is almost always in reference to the idea of “Formal Systems”. Whole books have been written about what a formal system is.3 Formal systems are perhaps the most important development in modern mathematics in this century.

A formal system is actually very simple to describe. A set of “axiomatic”statements describes the basic facts of the system, and a set of operations (usually from logic) describes ways to manipulate the basic facts and produce other, derived facts. The system must be consistent, which means that it is

3A good, entertaining laymen’s introduction is contained in the book G?del, Escher, Bach by Douglas Hofstadter

impossible to derive a “fact” that contradicts one of the basic facts using the other basic facts. Another way to state this is that using the basic facts and rules of the system, it should not be possible to derive a new fact that contradicts a given fact.

From what really is simplicity one of the most powerful results of mathematics was derived by the early twentieth century mathematician Kurt

G?del. He proved that there are facts that cannot be proven by formal systems, which limits the power of what can be done with a formal system. A linguist would enjoy one of the basic “undecidable” statements (a statement that cannot be proven true or false):

This statement is false.

In fact, the ability of systems to describe themselves (self-reference) contains the flaw G?del exploited. Either a system is powerful enough to describe itself (in which case it’s “incomplete”) or they are so weak that nothing useful can be done with it.

Another important facet of formal systems is their ability to represent and operate on other formal systems. Mathematically this can be a very tedious thing to do, and it may not even seem reasonable why this is actually a good thing, but one of the results of Computability theory is the basic realization that Computers are the physical embodiment of a very powerful formal system. As such, computers can implement any other formal system and “simulate” any results of operations in those formal systems. Unfortunately, their limitations can be translated into other formal systems, and so it becomes possible to apply results

in the Theory of Computation to other formalisms, particularly about the ability to “implement” them in a real machine. Computability Theory along with G?del’s work describe the fundamental properties of all formal systems. In fact, another interpretation of what formal systems actually are is an “Abstract Machine”. So now, we can realize that a Formal Grammar (or Grammar Formalism) makes the description and use of grammars “systematic” in a scientific way.

One of the most fruitful uses of a good formalism for grammar is the automatic processing of a language by a computer.4 The simplest type of processor is called a “Recognizer”. The purpose of a Recognizer is to take a grammar and an utterance as input, and report when the utterance is in fact a part of the language described by the grammar. Recognizer’s are not actually that useful, and when created, they are usually part of another type of language processor called a Parser.

Most of the body of computer science covering formal grammar is concerned with this one aspect. A Parser takes a grammar and an utterance and reports back the sequence of steps taken to reconstruct the utterance from the basic rules of the grammar. Another way of putting this is that a parser recovers the structure inside the utterance in such a way that this structure may now be manipulated easily by other computer processes.

There are several properties that need to be examined when looking at a grammar formalism. A good formalism is easily used by a human being to create and read grammars to understand the structure and working of a language. The

grammar must be able to be efficiently used by a computer to automatically analyze language.

Semantics cannot be ignored when evaluating grammar formalisms as well. This is cluttered by the fact that there are two major schools of thought on the topic.5 One school of thought is that grammars are concerned with the “syntax” of a language and not the meaning. In order to discern the “meaning” of an utterance, further processing must be done on the results of the parsing.

The other school of thought believes that result of the parsing is the meaning of the utterance. In this case, the meaning may be determined by associating an action with each rule in the grammar and performing this action on the pieces associated with the rule by the Parser. The implications of these distinctions will be made clear later.

Semantics is important however in dealing with an “ambiguous” parsing. An ambiguous parse is one in which at least two different orders of rules from the grammar may be used to produce the utterance. Sometimes this difference may not matter, but other time it may matter a great deal. Consider the word “unlockable”. The parsing could be considered in the following two ways (where parenthesis denote order of operations).

un + (lock + able)

(un + able) + lock

4The book, Parsing Techniques: A Practical Guide, Dick Grune, Ceriel Jacobs, 1990 is an excellent overview and introduction of automatic language processing techniques. Available online: http://www.cs.vu.nl/~dick/PTAPG.html

5This view of semantics is more in terms of a computer science approach. The basic conflict is the desire to associate meaning with the results of the parsing, and how much information the “grammar” should contain and process about the language. I arrived at the belief that these two approaches are worth distinguishing after reading of the parsing literature and conversations with several working computer scientists about parsing.

In the first case, the word means that the lock in question may be unlocked. However, the second word means that the lock in question is unable to be unlocked. This type of ambiguity occurs at all levels of language, and some grammar formalisms attempt to deal with ambiguity head on while others do not.

Other desirable properties of a good grammar formalism are usability and efficiency. A usable formalism is one that is easily used by a person to describe languages. An efficient formalism is one that is easily used by a computer to analyze languages.

Chomsky Hierarchy

Noam Chomsky revolutionized linguistics with his introduction of phrase structure grammars in 19596. Previous approaches to grammars had all been based recording specific utterances. Chomsky’s approach identified the importance of a grammar being “generative” and capable of processing a potentially infinite number of sentences.

Chomsky’s basic idea was that of a “rewriting system”.7 His grammars consisted of a set of rules. Each rule had terminals and nonterminals. Terminals are just letters, words, or other symbols in the alphabet of the language you’re producing. A non-terminal is like a variable. When the system is used to produce language, the “sentence” under production is repeatedly rewritten, replacing sequences of terminals and non-terminals that match a left-hand side of a rule with a right hand side of the rule. A special non-terminal called the “Start Symbol”

6“On certain formal properties of grammars”, Noam Chomsky, Information and Control, vol. 2, p. 137-167, 1959

7A more extensive, but readable treatment of the Chomsky Hierarchy is available in Parsing Techniques: A Practical Guide, pages 28-40

is used to begin the production. It can be made much clearer with a picture. It is standard practice to use single letters in place of words.

Language8

a n

b n

c n

Grammar

S s→ abc | aSQ

bQc→ bbcc

cQ→ Qc

Notes:The nonterminal Q is used to “count” the number of b’s and c’s required

to match the number of a’s generated. The last rule moves the Q’s from the end of the sentence inward, and the second rule uses a context condition to make sure a

Q is turned into the right number of b’s and c’s only when its in the right location to(on the boundary between b’s and c’s).

Derivation of “aaabbbccc”

String Rule

aSQ S → aSQ

aaSQQ S → aSQ

aaabcQQ S → abc

aaabQcQ cQ → Qc

aaabbccQ bQc → bbcc

aaabbcQc cQ → Qc

aaabbQcc cQ → Qc

aaabbbccc bQc → bbcc

Chomsky also identified several restrictions on types of rules. Each of these restrictions makes these grammars easier to write and use, but they also reduce the ability of the grammar to express different kinds of languages and features of different languages.

Chomsky’s general phrase structure grammars are called Type-0, to denote the primacy of this class respect to the rest of his language classes. There are no restrictions on the kinds of rules allowed in a Type-0 grammar. Unfortunately, it has been proved that there is no general way to process and

use Type-0 grammars on a computer. Type-0 grammars are also “Turing Complete”, which is a term from computability theory that basically says that anything that can be done with a computer can be with the item under question (in this case, Type-0 grammars).

The first restriction generates what are called Type-1 or Context Sensitive languages. In a Type-1 language, all rules are required to be of a form such that the right side of the rule is not shorter than the left hand side of the rule, and the left hand side of the rule contains at least 1 nonterminal. While this may seem to be an arbitrary distinction, it is not. It is possible to automatically process Type-1 languages with a machine. There is no known way to efficiently process all Type-1 languages, but they can all be processed given enough time on a computer. Both Type-1 and Type-0 grammars are considered difficult for people to write.

The most important restriction is that of a Type-2 language, also called a Context Free Grammar(abbreviated CFG). CFG’s only allow rules with a single nonterminal on the left. Note that this also makes Type-2 languages a subset of Type-1 languages. CFG’s are considered easy to read and write by people. The reason is in their dictionary like character. A rule is literally a definition, which leads to rules like:

Sentence → Subject Verb Object

This simplicity has also lead to several very efficient means of processing a CFG and its use inside a parser.

CFG’s are also considered important by other grammar formalisms. Many language features that are desirable to encode in the grammar are not context

8Material for this example comes from Parsing Techniques: A Practical Guide, pages 30-32

free(for instance, in programming languages the requirement that a variable must be declared before it can be used), but Type-1 grammars are both in-efficient to use and are difficult to write. Many attempts at other grammar formalisms build extensions to CFG’s to handle Context Sensitive features, and the goal is to preserve the simplicity of a CFG’s while giving the expressive power of a Type-1 language (and often the power of a Type-0 language as well).

The last restriction is that the left hand side must consist of a single nonterminal and the right hand side can only consist of terminals followed by a single nonterminal. These Type 3 languages are “Regular”. Regular languages are actually very useful in parsers for a multitude of reasons. The primary reason is that nearly all theoretical and practical questions about the manipulation and properties of regular languages have been answered. Efficient parsers for regular languages are easy to construct, and nearly every computer user has used them. Regular expressions, often used for selecting files(including the ubiquituous “*.*”pattern in file dialogs, where the ‘*’ is actually the Kleene star an operation on regular languages).

Chomsky Hierarchy9

Chomsky Hierarchy Non-monotonic hierarchy

Type 0Unrestricte

d Phrase

Structure

Grammars Monotonic

Grammars

with ε-rules

Unrestricted Phrase

Structure Grammars

Global Production Effects

Type 1Context-

sensitive

Grammars Monotonic

grammars

without ε-

rules

Context-sensitive

grammars with non-

monotonic rules

Type 2Context-free (ε-free)

grammars Context-free grammars

Local Production

Effects Type 3Regular (ε-free)

grammars Regular grammars Regular expressions

No

Production

Effects

“Type 4”Finite Choice

Wijngaarden Grammars

One of the oldest extensions is that of Aad van Wijngaarden, a Dutch computer scientist.10 This extension is called W-grammars, 2 level grammars, or Wijngaarden grammars. Wijngaarden realized that a grammar is itself a language, and could be described by a grammar.11 This insight goes far beyond Wijngaarden grammars though. We will come back that realization later.

Context Free grammars are probably universally agreed the easiest to use and understand in the Chomsky hierarchy. Wijngaarden’s idea was to use a Context Free grammar to describe another Context Free grammar. The grammar-generating grammar is called the “Metagrammar”. If the metagrammar generated only a finite number of Context Free rules it would still only be a

9Chart reproduced from Parsing Techniques: A Practical Guide, page 53.

10“Orthogonal design and description of a formal language”, Technical Report MR 76, Mathematisch Centrum, Amsterdam, 1965

11For a good understandable and readable introduction to W-Grammars, and their use in describing languages(including several complete examples), see the book Grammars for Programming Languages, Cleaveland, 1977.

context-free language, but it has been shown that when the metagrammar generates an infinite number of rules, a Wijngaarden grammar has all the power of a Chomsky Type-0 language. Unfortunately, all this power is the reason that no practical method of parsing has been found for reasonable restriction of Wijngaarden grammars.12

Wijngaarden grammars have not seen the large amount of theoretical or practical interest many other extensions have, but the ideas in Wijngaarden grammars are very important.13 Several of the researchers who investigated Wijngaarden grammars went on to design the programming language “Prolog”which allows programs to be created by logical statements of what is true about the output of the program. Prolog and features of Prolog are present in many modern Artificial Intelligence projects and theorem proving/checking tools. Attribute Grammars

In the 1960’s, the computer scientist Knuth created attribute grammars.14 Attribute grammars are extension to Chomsky Type 2 Context Free Grammars. The basic system begins with a Context Free grammar for the language. Each rule in the grammar is then associated with a mathematical function. For instance the CFG rule:

Sum → Number ‘+’ Number

12For a good start on parsing them see “Practical LL(1)-based Parsing of van Wijngaarden Grammars”, A.J. Fisher, Acta Informatica, vol. 21, p. 559-584, 1985.

13For more information on the programming aspects see “Two-level Grammar as a Functional Programming Language”, Edupuganty, B.R. Bryant, Computer Journal, vol. 32, iss.1, p. 36-44, February 1989.

14“Semantics of Context-Free Languages”, Donald E. Knuth, Mathematical Systems Theory 2, no. 2, June 1968, p. 127-145.

Might have this function attached to it:

F(x,y) = x + y

Attribute grammars are typically treated as a two step process. The first step involves the creation of a normal parse tree using the context free part of the Attribute Grammar. The second part consists of labeling the nodes in the tree and evaluating the functions from the grammar.

Attribute Grammars are capable of expressing a Context Sensitive language(they are in fact, as powerful as W-grammars). Unfortunately, the attributes attached to each CFG rule can overshadow the simplicity of the CFG model. In real applications, the focus can become the manipulation of the grammar entirely through the attribute system, and the resulting parses will be far more difficult to understand due to the neglect of the CFG rules.

Despite problems, attribute grammars remain one of the most popular CFG extensions, and are often used in the construction of compilers for programming languages.

Adaptable Grammars

Adaptable Grammars, like W-grammars, are a less common grammar formalism, but they also are a more powerful approach. Adaptable grammars change the rules used to parse the language during the parsing. The rationale for this is the idea that each statement in the language can be considered as adding or subtracting rules from the grammar (thus allowing and disallowing other language constructs).

This is a very natural idea. For instance, consider a declaration of a variable in a programming language. The declaration of the variable, in effect, adds all rules to the grammar of the form that permit that variable to be used in mathematical expressions. If this sounds very similar to W-grammars, its because it is; however, the practical differences between the models are very large.

Adaptable grammars come in two major classes. The oldest class is Imperative Adaptable Grammars. Imperative grammars consider each step in the parsing and evaluate whether to add or subtract rules. Because the steps occur one after the other, choices made at previous steps determine the outcome of future choices as well. The major problem in writing Imperative Adaptable Grammars is the over reliance on the way the parser works and requires the author of the grammar to understand how each step will work during the process of parsing.

The more recent class of Recursive Adaptable Grammars (RAG)15 takes a more functional approach from a top-down perspective, where each layer of the parse tree represents rule changes (this is an oversimplification). The chief advantage of RAG grammars is that the grammar is no longer tied to the processing order of the parsing algorithm being used.

Statistical Grammars

Statistical Grammars are one of the few grammar “types” whose ancestry goes back before the work of Chomsky. Their root are in the 1948 paper of

15 “Recursive Adaptable Grammars,” John Shutt, Master’s Thesis for Worcester Polytechnic Institute, 1993. Available online: https://www.doczj.com/doc/549311994.html,/~jshutt/thesis/top.html

Claude Shannon, “A Mathematical Theory of Communication.” Shannon is generally credited as the father of Information Theory and his results established the theoretical basis for several important advances in telecommunications.16 In his paper, Shannon talks about sources of information that generate streams of discrete symbols (words, letters, numbers, etc.). He then goes on to analyze text of English to establish conditional probabilities for larger and larger amounts of preceding text. He then uses these to randomly generate strings of symbols. When he uses a second order model for words he generates the following sentence randomly:

the head and in frontal attack on an english

writer that the character of this point is

therefore another method for the letters that the

time of who ever told the problem for an

unexpected.17

While this does not make sense, it is strangely close to being understandable. It is especially interesting when compared to the previous first order approximation:

representing and speedily is an good apt or come

can different natural here he the a in came the

to of to expert gray come to furnishes the line

message had be these.18

With the modern availability of large amounts of text stored on computers, and the capability of computers to carry out massive statistical analysis it is only

16For a treatment of several natural language processing techniques, including statistical approaches look at the book Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, Daniel Jurafsky, James H.Martin, 2000.

17 “A Mathematical Theory of Communication”, Claude Shannon, Bell System Technical Journal, October1948. Available online: https://www.doczj.com/doc/549311994.html,/cm/ms/what/shannonday/paper.html

18A Mathematical Theory of Communication, Claude Shannon, Bell System Technical Journal, October1948. Available online: https://www.doczj.com/doc/549311994.html,/cm/ms/what/shannonday/paper.html

natural that the statistical approach has seen success and interest in recent years. Speech recognition technology and phonetic analysis systems particularly favor statistical approaches.

On a humorous side note, the grammar checker in Microsoft Word accepts the two sentences from Shannon’s paper as being grammatical. Its only suggestion was to capitalize the first letter.19

Programming Languages

Artificial languages have been designed for a long time, and their creators often intended them to be used by people instead of their native languages to foster better communication.20 However, none of these languages has ever been taught as a native language, and speakers of these toy languages are often looked upon as being somewhat odd. The lack of native speakers also means these languages do not change the way a natural language does, and basically, are not that interesting to study (from a linguistics perspective, the psychology of the speaker of such a language may in the end prove to be very interesting).

19For those who would like to see more examples of the humorous output of statistical approaches to language, one of the more successful “Chat” programs using this technique is MegaHAL. I do not wish to trivialize this technique, because it has seen wide success in more serious efforts, and represents a very different approach from many of the other techniques. MegaHAL can be found at: https://www.doczj.com/doc/549311994.html,/

“Chat” bots are an interesting and humorous concept in their own right. They attempt to converse with people and fool them into thinking they are a real person. They were introduced by Alan Turing in the article “Computing Machinery and Intelligence”. The first successful chat bot was ELIZA, which used template matching to simulate a Rogerian psychologist. There is now an annual competition for chat bots to attempt to fool human beings.

The contest https://www.doczj.com/doc/549311994.html,/Prizef/loebner-prize.html

Turing’s article https://www.doczj.com/doc/549311994.html,/Prizef/TuringArticle.html

Article by Weizenbaum on ELIZA https://www.doczj.com/doc/549311994.html,/~mm64/x52.9265/january1966.html

Online version of ELIZA https://www.doczj.com/doc/549311994.html,/if/canon/eliza.htm

20Esperanto is perhaps the classic artificial language, but there are other famous ones as well. A description of the Klingon language from the Sci-fi TV show Star Trek has been created(and used by Trekkies…), and J.R.R. Tolkien invented several complete languages for his Lord of the Rings

There is a set of artificial languages that do not have the same properties of languages like Esperanto. Computer Languages, while not having “native”speakers, have developed into a world worthy of linguistic study. The idea of treating a Computer Language as a linguistic phenomenon worthy of study will naturally cause some consternation, but there is evidence to support that they in fact do possess properties that linguists would find interesting.21

In the early 1960’s, two computer scientists, Backus and Naur independently arrived at a formalism that was equivalent to Chomsky’s Type 2: Context Free languages.22 They also came up with a notation, called Backus-Naur Form (BNF), that continues to see almost universal usage in the description of context-free grammars. The union of Chomsky’s hierarchy and the burgeoning world of computers has given birth to a world of programming languages.

Computers, in their native form, simply understand numbers, and at their lowest levels they simply “understand” sequences of rudimentary instructions. But even at this low level, many of the properties of true language are evident. However, these “lower level” languages have shown themselves to be inconvenient. It was not long before “higher level” languages arrived with the

trilogy, which required experts to train the actors in proper pronunciation for the movies being released.

21Larry Wall, a linguist turned programmer has many thoughts about this subject. I would like to give a coherent reference in which he precisely talked about the linguistic aspects of Computer Programming, but Larry Wall is a man of many interests. In lieu of a single reference, I give the links to transcripts of several speeches he has given which all make entertaining reads, and hit on the linguistic topics. The organizers of the conference have realized that he mostly tells (bad) jokes, and in future speeches gave him less time.

The First State of the Onion, 1997. https://www.doczj.com/doc/549311994.html,/~larry/keynote/keynote.html

The Second State of the Onion, 1998. https://www.doczj.com/doc/549311994.html,/~larry/onion/onion.html

Perl, the first postmodern computer language, 1999. https://www.doczj.com/doc/549311994.html,/~larry/pm.html

The Third State of the Onion, Perl Conference, 1999. https://www.doczj.com/doc/549311994.html,/~larry/onion3/talk.html

22“Report on the Algorithmic Language ALGOL 60”, Peter Naur, Communications of the ACM 3, no. 5, May 1960, p. 299-314.

introduction of Fortran, Lisp, Cobol, and Algol. Algol has the distinction of being the first programming language whose specification used a Formal Grammar (The BNF of Backus and Naur).

The term “Computer Languages” is really a misnomer. These are not designed for Computers to speak, but for the people who used them. The modern concept of a programming language is that its primary value is how well it helps a programmer express the concepts they want. This language is then either converted to the native “machine language” of the computer or is interpreted by another program which carries out the requested actions. So are “Computer Languages” “true” language? There are 9 criteria listed in Language Files for a true language.23

?Mode of communication – The mode of a communication for a computer language is written text usually consisting of all the symbols

available to a normal Western European language.

?Semanticity – The statements in a program do have a semantic content (in fact, that is their purpose).

?Pragmatic Function – Computer Languages have a pragmatic function. They communicate to other programmers the intent of the

writer, and they describe the operations to be carried out by a

computer to perform some action.

?Interchangeability – They are interchangeable, large groups of computer programmers work together on larger programs.

23Language Files: Materials for an Introduction to Language and Linguistics, p. 19-21

?Cultural Transmission – Computer languages must be learned, and people have shown evidence of learning languages at early ages (the

author learned at the age of 8), and there are cultural groups and

values associated with programming languages.

?Arbitrariness – Computer languages often have arbitrary connections between the symbols used and their meanings.

?Discreteness – Computer Languages are discrete, and almost always explicitly contain methods of creating larger units from smaller units

specified elsewhere.

The last two criteria are for true languages, and Computer Languages possess these properties as well.

?Displacement – Computer Languages have a notion of time, and explicitly include the ability to have variables represent things that are

not present directly.

?Productivity – By their very definition, an infinite number of computer programs are possible, and these can represent novel ideas that the

designers of the language never envisioned.

Computer Languages have precise, formal meanings defined in their specifications, yet, aspects of those specifications are treated like prescriptive rules of a natural language, and ignored by people writing programs to understand these languages.24 Computer Languages often borrow from other

24Programmers often joke about using “Extended Subsets” of a language, which basically means the implementors took what they like, and replaced what they didn’t like with features mutually incomprehensible to other implementations of the language. Hence, an “extended subset” is utterly useless for real work.

Computer Languages, adapting ideas to their own structure. Computer Languages themselves are governed by committees (or individuals) who adapt the formal definition of the language over time to match (and sometimes protest) real world usages, and to clarify ambiguities. All these traits have lead to a world of artificial languages with complex etymologies, dialects, and evolution that is not entirely unlike the world of natural languages.

Even within a programming language, programmers often possess idiolects.25 Programmers may write code according to various conventions that have no intrinsic meaning to the program, but possess meaning to the programmer. Programmers may program in styles that limit them to a subset of the programming language, which is hard to read by a programmer who uses a different subset of the language with different techniques.

One interesting case study is the technique of “Literate Programming” as described by Knuth.26 Knuth felt that programs should be written much like a book, and built a tool to take an existing programming language and mix it heavily with documentation. The result is a program which can be printed like a book and read by another programmer to understand the techniques involved and why the program works as it does.

25This can be said of both their native language as well as computer languages. Often the quirks of their language are amusing. Programmers culturally show a strong enjoyment of word play often leading to extreme quirks and variations in their language. The true hacker notes, with bemusement, that only on the internet would a search for the term Apple lead to a computer company instead of the fruit. See The New Hacker’s Dictionary, ed. Eric Raymond, for an anthropological look at the language used by programmers. Available at:

https://www.doczj.com/doc/549311994.html,/~esr/jargon/

26Literate Programming, Donald E. Knuth, 2001. More information can be found at:

https://www.doczj.com/doc/549311994.html,/

相关主题
文本预览
相关文档 最新文档