Automated Knowledge Elicitation and Flowchart Optimization for Problem Diagnosis
- 格式:pdf
- 大小:203.90 KB
- 文档页数:8
软件工程学位英语考试Software Engineering Degree English Exam1. Introduction to Software Engineering- Define software engineering and describe its importance in modern technology.- Explain the software development life cycle and its different phases.- Discuss the role of a software engineer in the software development process.2. Requirements Engineering- Describe the process of requirements gathering and analysis.- Discuss various techniques used for requirements elicitation.- Explain the importance of requirement validation and verification.3. Software Design- Discuss the principles of software design and architecture.- Explain different design patterns and their application in software development.- Describe the importance of modularization and abstraction in software design.4. Software Testing- Discuss the importance of software testing in the software development life cycle.- Explain different types of software testing techniques, such as unit testing, integration testing, and system testing.- Describe the process of test case design and execution.5. Software Maintenance and Evolution- Explain the need for software maintenance and the challenges associated with it.- Discuss different types of software maintenance, such as corrective, adaptive, and perfective maintenance.- Describe the concept of software evolution and the role of a software engineer in managing software evolution.6. Project Management in Software Engineering- Discuss the principles of project management in software engineering.- Explain different project management methodologies, such as Waterfall, Agile, and Scrum.- Describe the role of a software engineer in project planning, scheduling, and risk management.7. Software Quality Assurance- Explain the concept of software quality assurance and its importance in software development.- Discuss different quality assurance techniques, such as code reviews, static analysis, and automated testing.- Describe the role of a software engineer in ensuring software quality and the importance of continuous improvement.8. Software Ethics and Professionalism- Discuss ethical issues in software engineering, such as privacy, security, and intellectual property.- Explain the importance of following professional codes of conduct in the software engineering profession.- Describe the role of a software engineer in promoting ethicalpractices and responsible use of technology.9. Software Engineering in the Industry- Discuss the current trends and challenges in the software engineering industry.- Describe the skills and knowledge required for a successful career in software engineering.- Explain the importance of continuous learning and professional development in the field of software engineering.。
"interpretive knowledge"(解释性知识)通常是指一种理解和解释事物、概念或现象的知识形式。
这种知识强调对信息的理解、分析和解释,而不仅仅是简单地记住或获取信息。
以下是对"interpretive knowledge" 的一些可能解释:1. 理解性知识:interpretive knowledge强调对信息的深入理解,而不仅仅是表面上的知识。
这可能包括对复杂概念、思想或事件的深刻理解。
2. 文本解释:在文学、社会科学或其他领域中,interpretive knowledge可能涉及对文本或信息的解释,通过对细节、语境和含义的深入理解来形成知识。
3. 文化解释:在人文学科中,interpretive knowledge可能涉及对文化现象、传统或历史事件的解释。
这包括对文化符号、价值观和信仰的理解。
4. 社会科学研究:在社会科学领域,interpretive knowledge强调对人类行为、社会制度和文化现象的深入解释。
这种知识形式可能涉及对质性数据的分析和解释。
5.艺术和表演:在艺术和表演领域,interpretive knowledge可能指的是对艺术品或表演的深入理解,包括对艺术家意图、风格和文化背景的解释。
6.总体的中文解释是陈述性知识,也叫“描述性知识”。
它是指个人具有有意识的提取线索,而能直接加以回忆和陈述的知识。
主要是用来说明事物的性质、特征和状态,用于区别和辨别事物。
这种知识具有静态的性质,其获得的心理过程主要是记忆。
陈述性知识的获得是指新知识进入原有的命题网络,与原有知识形成联系。
总体而言,interpretive knowledge betokens a deeper level of comprehension and analysis, moving beyond the surface to uncover meanings, contexts, and relationships within the information being studied or interpreted.。
Knowledge Engineering:Principles and MethodsRudi Studer1, V. Richard Benjamins2, and Dieter Fensel11Institute AIFB, University of Karlsruhe, 76128 Karlsruhe, Germany{studer, fensel}@aifb.uni-karlsruhe.dehttp://www.aifb.uni-karlsruhe.de2Artificial Intelligence Research Institute (IIIA),Spanish Council for Scientific Research (CSIC), Campus UAB,08193 Bellaterra, Barcelona, Spainrichard@iiia.csic.es, http://www.iiia.csic.es/~richard2Dept. of Social Science Informatics (SWI),richard@swi.psy.uva.nl, http://www.swi.psy.uva.nl/usr/richard/home.htmlAbstractThis paper gives an overview about the development of the field of Knowledge Engineering over the last 15 years. We discuss the paradigm shift from a transfer view to a modeling view and describe two approaches which considerably shaped research in Knowledge Engineering: Role-limiting Methods and Generic Tasks. To illustrate various concepts and methods which evolved in the last years we describe three modeling frameworks: CommonKADS, MIKE, and PROTÉGÉ-II. This description is supplemented by discussing some important methodological developments in more detail: specification languages for knowledge-based systems, problem-solving methods, and ontologies. We conclude with outlining the relationship of Knowledge Engineering to Software Engineering, Information Integration and Knowledge Management.Key WordsKnowledge Engineering, Knowledge Acquisition, Problem-Solving Method, Ontology, Information Integration1IntroductionIn earlier days research in Artificial Intelligence (AI) was focused on the development offormalisms, inference mechanisms and tools to operationalize Knowledge-based Systems (KBS). Typically, the development efforts were restricted to the realization of small KBSs in order to study the feasibility of the different approaches.Though these studies offered rather promising results, the transfer of this technology into commercial use in order to build large KBSs failed in many cases. The situation was directly comparable to a similar situation in the construction of traditional software systems, called …software crisis“ in the late sixties: the means to develop small academic prototypes did not scale up to the design and maintenance of large, long living commercial systems. In the same way as the software crisis resulted in the establishment of the discipline Software Engineering the unsatisfactory situation in constructing KBSs made clear the need for more methodological approaches.So the goal of the new discipline Knowledge Engineering (KE) is similar to that of Software Engineering: turning the process of constructing KBSs from an art into an engineering discipline. This requires the analysis of the building and maintenance process itself and the development of appropriate methods, languages, and tools specialized for developing KBSs. Subsequently, we will first give an overview of some important historical developments in KE: special emphasis will be put on the paradigm shift from the so-called transfer approach to the so-called modeling approach. This paradigm shift is sometimes also considered as the transfer from first generation expert systems to second generation expert systems [43]. Based on this discussion Section 2 will be concluded by describing two prominent developments in the late eighties:Role-limiting Methods [99] and Generic Tasks [36]. In Section 3 we will present some modeling frameworks which have been developed in recent years: CommonKADS [129], MIKE [6], and PROTÈGÈ-II [123]. Section 4 gives a short overview of specification languages for KBSs. Problem-solving methods have been a major research topic in KE for the last decade. Basic characteristics of (libraries of) problem-solving methods are described in Section 5. Ontologies, which gained a lot of importance during the last years are discussed in Section 6. The paper concludes with a discussion of current developments in KE and their relationships to other disciplines.In KE much effort has also been put in developing methods and supporting tools for knowledge elicitation (compare [48]). E.g. in the VITAL approach [130] a collection of elicitation tools, like e.g. repertory grids (see [65], [83]), are offered for supporting the elicitation of domain knowledge (compare also [49]). However, a discussion of the various elicitation methods is beyond the scope of this paper.2Historical Roots2.1Basic NotionsIn this section we will first discuss some main principles which characterize the development of KE from the very beginning.Knowledge Engineering as a Transfer Process…This transfer and transformation of problem-solving expertise from a knowledge source to a program is the heart of the expert-system development process.” [81]In the early eighties the development of a KBS has been seen as a transfer process of humanknowledge into an implemented knowledge base. This transfer was based on the assumption that the knowledge which is required by the KBS already exists and just has to be collected and implemented. Most often, the required knowledge was obtained by interviewing experts on how they solve specific tasks [108]. Typically, this knowledge was implemented in some kind of production rules which were executed by an associated rule interpreter. However, a careful analysis of the various rule knowledge bases showed that the rather simple representation formalism of production rules did not support an adequate representation of different types of knowledge [38]: e.g. in the MYCIN knowledge base [44] strategic knowledge about the order in which goals should be achieved (e.g. “consider common causes of a disease first“) is mixed up with domain specific knowledge about for example causes for a specific disease. This mixture of knowledge types, together with the lack of adequate justifications of the different rules, makes the maintenance of such knowledge bases very difficult and time consuming. Therefore, this transfer approach was only feasible for the development of small prototypical systems, but it failed to produce large, reliable and maintainable knowledge bases.Furthermore, it was recognized that the assumption of the transfer approach, that is that knowledge acquisition is the collection of already existing knowledge elements, was wrong due to the important role of tacit knowledge for an expert’s problem-solving capabilities. These deficiencies resulted in a paradigm shift from the transfer approach to the modeling approach.Knowledge Engineering as a Modeling ProcessNowadays there exists an overall consensus that the process of building a KBS may be seen as a modeling activity. Building a KBS means building a computer model with the aim of realizing problem-solving capabilities comparable to a domain expert. It is not intended to create a cognitive adequate model, i.e. to simulate the cognitive processes of an expert in general, but to create a model which offers similar results in problem-solving for problems in the area of concern. While the expert may consciously articulate some parts of his or her knowledge, he or she will not be aware of a significant part of this knowledge since it is hidden in his or her skills. This knowledge is not directly accessible, but has to be built up and structured during the knowledge acquisition phase. Therefore this knowledge acquisition process is no longer seen as a transfer of knowledge into an appropriate computer representation, but as a model construction process ([41], [106]).This modeling view of the building process of a KBS has the following consequences:•Like every model, such a model is only an approximation of the reality. In principle, the modeling process is infinite, because it is an incessant activity with the aim of approximating the intended behaviour.•The modeling process is a cyclic process. New observations may lead to a refinement, modification, or completion of the already built-up model. On the other side, the model may guide the further acquisition of knowledge.•The modeling process is dependent on the subjective interpretations of the knowledge engineer. Therefore this process is typically faulty and an evaluation of the model with respect to reality is indispensable for the creation of an adequate model. According to this feedback loop, the model must therefore be revisable in every stage of the modeling process.Problem Solving MethodsIn [39] Clancey reported on the analysis of a set of first generation expert systems developed to solve different tasks. Though they were realized using different representation formalisms (e.g. production rules, frames, LISP), he discovered a common problem solving behaviour.Clancey was able to abstract this common behaviour to a generic inference pattern called Heuristic Classification , which describes the problem-solving behaviour of these systems on an abstract level, the so called Knowledge Level [113]. This knowledge level allows to describe reasoning in terms of goals to be achieved, actions necessary to achieve these goals and knowledge needed to perform these actions. A knowledge-level description of a problem-solving process abstracts from details concerned with the implementation of the reasoning process and results in the notion of a Problem-Solving Method (PSM).A PSM may be characterized as follows (compare [20]):• A PSM specifies which inference actions have to be carried out for solving a given task.• A PSM determines the sequence in which these actions have to be activated.•In addition, so-called knowledge roles determine which role the domain knowledge plays in each inference action. These knowledge roles define a domain independent generic terminology.When considering the PSM Heuristic Classification in some more detail (Figure 1) we can identify the three basic inference actions abstract ,heuristic match , and refine . Furthermore,four knowledge roles are defined:observables ,abstract observables ,solution abstractions ,and solutions . It is important to see that such a description of a PSM is given in a generic way.Thus the reuse of such a PSM in different domains is made possible. When considering a medical domain, an observable like …410 C“ may be abstracted to …high temperature“ by the inference action abstract . This abstracted observable may be matched to a solution abstraction, e.g. …infection“, and finally the solution abstraction may be hierarchically refined to a solution, e.g. the disease …influenca“.In the meantime various PSMs have been identified, like e.g.Cover-and-Differentiate for solving diagnostic tasks [99] or Propose-and-Revise [100] for parametric design tasks.PSMs may be exploited in the knowledge engineering process in different ways:Fig. 1 The Problem-Solving Method Heuristic Classificationroleinference action•PSMs contain inference actions which need specific knowledge in order to perform their task. For instance,Heuristic Classification needs a hierarchically structured model of observables and solutions for the inference actions abstract and refine, respectively.So a PSM may be used as a guideline to acquire static domain knowledge.• A PSM allows to describe the main rationale of the reasoning process of a KBS which supports the validation of the KBS, because the expert is able to understand the problem solving process. In addition, this abstract description may be used during the problem-solving process itself for explanation facilities.•Since PSMs may be reused for developing different KBSs, a library of PSMs can be exploited for constructing KBSs from reusable components.The concept of PSMs has strongly stimulated research in KE and thus has influenced many approaches in this area. A more detailed discussion of PSMs is given in Section 5.2.2Specific ApproachesDuring the eighties two main approaches evolved which had significant influence on the development of modeling approaches in KE: Role-Limiting Methods and Generic Tasks. Role-Limiting MethodsRole-Limiting Methods (RLM) ([99], [102]) have been one of the first attempts to support the development of KBSs by exploiting the notion of a reusable problem-solving method. The RLM approach may be characterized as a shell approach. Such a shell comes with an implementation of a specific PSM and thus can only be used to solve a type of tasks for which the PSM is appropriate. The given PSM also defines the generic roles that knowledge can play during the problem-solving process and it completely fixes the knowledge representation for the roles such that the expert only has to instantiate the generic concepts and relationships, which are defined by these roles.Let us consider as an example the PSM Heuristic Classification (see Figure 1). A RLM based on Heuristic Classification offers a role observables to the expert. Using that role the expert (i) has to specify which domain specific concept corresponds to that role, e.g. …patient data”(see Figure 4), and (ii) has to provide domain instances for that concept, e.g. concrete facts about patients. It is important to see that the kind of knowledge, which is used by the RLM, is predefined. Therefore, the acquisition of the required domain specific instances may be supported by (graphical) interfaces which are custom-tailored for the given PSM.In the following we will discuss one RLM in some more detail: SALT ([100], [102]) which is used for solving constructive tasks.Then we will outline a generalization of RLMs to so-called Configurable RLMs.SALT is a RLM for building KBSs which use the PSM Propose-and-Revise. Thus KBSs may be constructed for solving specific types of design tasks, e.g. parametric design tasks. The basic inference actions that Propose-and-Revise is composed of, may be characterized as follows:•extend a partial design by proposing a value for a design parameter not yet computed,•determine whether all computed parameters fulfil the relevant constraints, and•apply fixes to remove constraint violations.In essence three generic roles may be identified for Propose-and-Revise ([100]):•…design-extensions” refer to knowledge for proposing a new value for a design parameter,•…constraints” provide knowledge restricting the admissible values for parameters, and •…fixes” make potential remedies available for specific constraint violations.From this characterization of the PSM Propose-and-Revise, one can easily see that the PSM is described in generic, domain-independent terms. Thus the PSM may be used for solving design tasks in different domains by specifying the required domain knowledge for the different predefined generic knowledge roles.E.g. when SALT was used for building the VT-system [101], a KBS for configuring elevators, the domain expert used the form-oriented user interface of SALT for entering domain specific design extensions (see Figure 2). That is, the generic terminology of the knowledge roles, which is defined by object and relation types, is instantiated with VT specific instances.1Name:CAR-JAMB-RETURN2Precondition:DOOR-OPENING = CENTER3Procedure:CALCULATION4Formula:[PLATFORM-WIDTH -OPENING-WIDTH] / 25Justification:CENTER-OPENING DOORS LOOKBEST WHEN CENTERED ONPLATFORM.(the value of the design parameter CAR-JUMB-RETURN iscalculated according to the formula - in case the preconditionis fulfilled; the justification gives a description why thisparameter value is preferred over other values (example takenfrom [100]))Fig. 2 Design Extension Knowledge for VTOn the one hand, the predefined knowledge roles and thus the predefined structure of the knowledge base may be used as a guideline for the knowledge acquisition process: it is clearly specified what kind of knowledge has to be provided by the domain expert. On the other hand, in most real-life situations the problem arises of how to determine whether a specific task may be solved by a given RLM. Such task analysis is still a crucial problem, since up to now there does not exist a well-defined collection of features for characterizing a domain task in a way which would allow a straightforward mapping to appropriate RLMs. Moreover, RLMs have a fixed structure and do not provide a good basis when a particular task can only be solved by a combination of several PSMs.In order to overcome this inflexibility of RLMs, the concept of configurable RLMs has been proposed.Configurable Role-Limiting Methods (CRLMs) as discussed in [121] exploit the idea that a complex PSM may be decomposed into several subtasks where each of these subtasks may be solved by different methods (see Section 5). In [121], various PSMs for solving classification tasks, like Heuristic Classification or Set-covering Classification, have been analysed with respect to common subtasks. This analysis resulted in the identification ofshared subtasks like …data abstraction” or …hypothesis generation and test”. Within the CRLM framework a predefined set of different methods are offered for solving each of these subtasks. Thus a PSM may be configured by selecting a method for each of the identified subtasks. In that way the CRLM approach provides means for configuring the shell for different types of tasks. It should be noted that each method offered for solving a specific subtask, has to meet the knowledge role specifications that are predetermined for the CRLM shell, i.e. the CRLM shell comes with a fixed scheme of knowledge types. As a consequence, the introduction of a new method into the shell typically involves the modification and/or extension of the current scheme of knowledge types [121]. Having a fixed scheme of knowledge types and predefined communication paths between the various components is an important restriction distinguishing the CRLM framework from more flexible configuration approaches such as CommonKADS (see Section 3).It should be clear that the introduction of such flexibility into the RLM approach removes one of its disadvantages while still exploiting the advantage of having a fixed scheme of knowledge types, which build the basis for generating effective knowledge-acquisition tools. On the other hand, configuring a CRLM shell increases the burden for the system developer since he has to have the knowledge and the ability to configure the system in the right way. Generic Task and Task StructuresIn the early eighties the analysis and construction of various KBSs for diagnostic and design tasks evolved gradually into the notion of a Generic Task (GT) [36]. GTs like Hierarchical Classification or State Abstraction are building blocks which can be reused for the construction of different KBSs.The basic idea of GTs may be characterized as follows (see [36]):• A GT is associated with a generic description of its input and output.• A GT comes with a fixed scheme of knowledge types specifying the structure of domain knowledge needed to solve a task.• A GT includes a fixed problem-solving strategy specifying the inference steps the strategy is composed of and the sequence in which these steps have to be carried out. The GT approach is based on the strong interaction problem hypothesis which states that the structure and representation of domain knowledge is completely determined by its use [33]. Therefore, a GT comes with both, a fixed problem-solving strategy and a fixed collection of knowledge structures.Since a GT fixes the type of knowledge which is needed to solve the associated task, a GT provides a task specific vocabulary which can be exploited to guide the knowledge acquisition process. Furthermore, by offering an executable shell for a GT, called a task specific architecture, the implementation of a specific KBS could be considered as the instantiation of the predefined knowledge types by domain specific terms (compare [34]). On a rather pragmatic basis several GTs have been identified including Hierarchical Classification,Abductive Assembly and Hypothesis Matching. This initial collection of GTs was considered as a starting point for building up an extended collection covering a wide range of relevant tasks.However, when analyzed in more detail two main disadvantages of the GT approach have been identified (see [37]):•The notion of task is conflated with the notion of the PSM used to solve the task, sinceeach GT included a predetermined problem-solving strategy.•The complexity of the proposed GTs was very different, i.e. it remained open what the appropriate level of granularity for the building blocks should be.Based on this insight into the disadvantages of the notion of a GT, the so-called Task Structure approach was proposed [37]. The Task Structure approach makes a clear distinction between a task, which is used to refer to a type of problem, and a method, which is a way to accomplish a task. In that way a task structure may be defined as follows (see Figure 3): a task is associated with a set of alternative methods suitable for solving the task. Each method may be decomposed into several subtasks. The decomposition structure is refined to a level where elementary subtasks are introduced which can directly be solved by using available knowledge.As we will see in the following sections, the basic notion of task and (problem-solving)method, and their embedding into a task-method-decomposition structure are concepts which are nowadays shared among most of the knowledge engineering methodologies.3Modeling FrameworksIn this section we will describe three modeling frameworks which address various aspects of model-based KE approaches: CommonKADS [129] is prominent for having defined the structure of the Expertise Model, MIKE [6] puts emphasis on a formal and executable specification of the Expertise Model as the result of the knowledge acquisition phase, and PROTÉGÉ-II [51] exploits the notion of ontologies.It should be clear that there exist further approaches which are well known in the KE community, like e.g VITAL [130], Commet [136], and EXPECT [72]. However, a discussion of all these approaches is beyond the scope of this paper.Fig. 3 Sample Task Structure for DiagnosisTaskProblem-Solving MethodSubtasksProblem-Solving MethodTask / Subtasks3.1The CommonKADS ApproachA prominent knowledge engineering approach is KADS[128] and its further development to CommonKADS [129]. A basic characteristic of KADS is the construction of a collection of models, where each model captures specific aspects of the KBS to be developed as well as of its environment. In CommonKADS the Organization Model, the Task Model, the Agent Model, the Communication Model, the Expertise Model and the Design Model are distinguished. Whereas the first four models aim at modeling the organizational environment the KBS will operate in, as well as the tasks that are performed in the organization, the expertise and design model describe (non-)functional aspects of the KBS under development. Subsequently, we will briefly discuss each of these models and then provide a detailed description of the Expertise Model:•Within the Organization Model the organizational structure is described together with a specification of the functions which are performed by each organizational unit.Furthermore, the deficiencies of the current business processes, as well as opportunities to improve these processes by introducing KBSs, are identified.•The Task Model provides a hierarchical description of the tasks which are performed in the organizational unit in which the KBS will be installed. This includes a specification of which agents are assigned to the different tasks.•The Agent Model specifies the capabilities of each agent involved in the execution of the tasks at hand. In general, an agent can be a human or some kind of software system, e.g.a KBS.•Within the Communication Model the various interactions between the different agents are specified. Among others, it specifies which type of information is exchanged between the agents and which agent is initiating the interaction.A major contribution of the KADS approach is its proposal for structuring the Expertise Model, which distinguishes three different types of knowledge required to solve a particular task. Basically, the three different types correspond to a static view, a functional view and a dynamic view of the KBS to be built (see in Figure 4 respectively “domain layer“, “inference layer“ and “task layer“):•Domain layer : At the domain layer all the domain specific knowledge is modeled which is needed to solve the task at hand. This includes a conceptualization of the domain in a domain ontology (see Section 6), and a declarative theory of the required domain knowledge. One objective for structuring the domain layer is to model it as reusable as possible for solving different tasks.•Inference layer : At the inference layer the reasoning process of the KBS is specified by exploiting the notion of a PSM. The inference layer describes the inference actions the generic PSM is composed of as well as the roles , which are played by the domain knowledge within the PSM. The dependencies between inference actions and roles are specified in what is called an inference structure. Furthermore, the notion of roles provides a domain independent view on the domain knowledge. In Figure 4 (middle part) we see the inference structure for the PSM Heuristic Classification . Among others we can see that …patient data” plays the role of …observables” within the inference structure of Heuristic Classification .•Task layer : The task layer provides a decomposition of tasks into subtasks and inference actions including a goal specification for each task, and a specification of how theseFig. 4 Expertise Model for medical diagnosis (simplified CML notation)goals are achieved. The task layer also provides means for specifying the control over the subtasks and inference actions, which are defined at the inference layer.Two types of languages are offered to describe an Expertise Model: CML (Conceptual Modeling Language) [127], which is a semi-formal language with a graphical notation, and (ML)2 [79], which is a formal specification language based on first order predicate logic, meta-logic and dynamic logic (see Section 4). Whereas CML is oriented towards providing a communication basis between the knowledge engineer and the domain expert, (ML)2 is oriented towards formalizing the Expertise Model.The clear separation of the domain specific knowledge from the generic description of the PSM at the inference and task layer enables in principle two kinds of reuse: on the one hand, a domain layer description may be reused for solving different tasks by different PSMs, on the other hand, a given PSM may be reused in a different domain by defining a new view to another domain layer. This reuse approach is a weakening of the strong interaction problem hypothesis [33] which was addressed in the GT approach (see Section 2). In [129] the notion of a relative interaction hypothesis is defined to indicate that some kind of dependency exists between the structure of the domain knowledge and the type of task which should be solved. To achieve a flexible adaptation of the domain layer to a new task environment, the notion of layered ontologies is proposed:Task and PSM ontologies may be defined as viewpoints on an underlying domain ontology.Within CommonKADS a library of reusable and configurable components, which can be used to build up an Expertise Model, has been defined [29]. A more detailed discussion of PSM libraries is given in Section 5.In essence, the Expertise Model and the Communication Model capture the functional requirements for the target system. Based on these requirements the Design Model is developed, which specifies among others the system architecture and the computational mechanisms for realizing the inference actions. KADS aims at achieving a structure-preserving design, i.e. the structure of the Design Model should reflect the structure of the Expertise Model as much as possible [129].All the development activities, which result in a stepwise construction of the different models, are embedded in a cyclic and risk-driven life cycle model similar to Boehm’s spiral model [21].The basic structure of the expertise model has some similarities with the data, functional, and control view of a system as known from software engineering. However, a major difference may be seen between an inference layer and a typical data-flow diagram (compare [155]): Whereas an inference layer is specified in generic terms and provides - via roles and domain views - a flexible connection to the data described at the domain layer, a data-flow diagram is completely specified in domain specific terms. Moreover, the data dictionary does not correspond to the domain layer, since the domain layer may provide a complete model of the domain at hand which is only partially used by the inference layer, whereas the data dictionary is describing exactly those data which are used to specify the data flow within the data flow diagram (see also [54]).3.2The MIKE ApproachThe MIKE approach (Model-based and Incremental Knowledge Engineering) (cf. [6], [7])。
让知识自由流动英文作文Title: The Free Flow of Knowledge。
Knowledge is the lifeblood of progress and innovation. Its free circulation is essential for the advancement of societies, economies, and the human race as a whole. In this essay, we will explore the significance of allowing knowledge to flow freely and the benefits it brings to individuals and communities.First and foremost, the free flow of knowledge fosters creativity and innovation. When ideas, information, and discoveries are shared without barriers, it creates a fertile ground for collaboration and cross-pollination of thoughts. Innovators from different parts of the world can build upon each other's work, leading to the rapid development of new technologies, solutions, and methodologies. This interconnectedness of knowledge accelerates progress in various fields, from science and technology to arts and humanities.Moreover, unrestricted access to knowledge promotes inclusivity and equality. In today's digital age, the internet serves as a vast repository of information accessible to anyone with an internet connection. This democratization of knowledge empowers individuals regardless of their socio-economic background, geographic location, or educational attainment. It enables people to educate themselves, acquire new skills, and pursue opportunities that were once out of reach. By breaking down barriers to information, we create a more level playing field where everyone has a chance to thrive and contribute to society.Furthermore, the free flow of knowledge fuels economic growth and prosperity. In a globalized world, economies are increasingly interconnected, and knowledge-drivenindustries play a central role in driving competitiveness and innovation. When knowledge can move freely across borders, it enables businesses to tap into global talent pools, access new markets, and adapt to changing circumstances more effectively. This exchange of knowledgestimulates entrepreneurship, creates job opportunities, and enhances productivity, ultimately leading to economic development and improved standards of living.Additionally, the free flow of knowledge facilitates cultural exchange and mutual understanding. As people share their stories, traditions, and perspectives, it fosters empathy, tolerance, and appreciation for diversity. Through literature, art, music, and other forms of cultural expression, we gain insights into different ways of life and broaden our horizons. This cultural exchange promotes dialogue and cooperation among nations, fostering peaceful coexistence and mutual respect in a world often divided by differences.However, despite its numerous benefits, the free flow of knowledge faces challenges and threats. Censorship, intellectual property rights, cybersecurity concerns, and digital divides are among the obstacles that hinder the unrestricted dissemination of information. Governments, corporations, and other entities may seek to control or restrict access to knowledge for various reasons, rangingfrom maintaining power and control to protecting commercial interests. It is imperative for individuals, organizations, and governments to uphold the principles of freedom of expression, open access, and digital literacy to ensurethat knowledge remains a public good accessible to all.In conclusion, the free flow of knowledge is essential for the advancement of humanity in the 21st century and beyond. By breaking down barriers to information, we unleash the full potential of human creativity, foster inclusivity and equality, drive economic growth and prosperity, and promote cultural exchange and mutual understanding. It is incumbent upon all of us to safeguard and promote the free exchange of ideas, information, and discoveries, for it is through the sharing of knowledgethat we can build a better future for generations to come.。
1164. Define the short-term goal of artificial intelligence.5. Define the long-term goal of artificial intelligence.6. Explain social intelligence, creativity, and general intelligence.7. List seven major researches on artificial intelligence.8. Explain deduction, reasoning, and problem solving.9. Explain knowledge representation.10. Learn about automated planning and scheduling.11. Explain what machine learning is.12. Discuss natural language processing.13. Discuss AI game.14. Discuss what areas have used artificial intelligence.15. Answer what characteristics of artificial intelligence have been used in Apple’s products.Termsartificial intelligence (AI)(人工智能)—Artificial intelligence (AI) is the intelligence exhibited by a machine or software. It is also an academic field of studying computer as well as software that are capable of intelligent behavior.expert system(专家系统)—In artificial intelligence, an expert system is a computer system that emulates the decision-making ability of a human expert.social intelligence(社会智能)—Social intelligence is the capacity to effectively navigate and negotiate complex social relationships and environments.combinational explosion(组合爆炸)—Combinational explosion means that the amount of memory or computer time required becomes astronomical when the problem goes beyond a certain size.knowledge representation(知识表示)—Knowledge-representation is the field of artificial intelligence that focuses on designing computer representations that capture information about the world that can be used to solve complex problems.automated planning and scheduling(自动规划和调度)—Automated planning and scheduling, in the relevant literature often denoted as simply planning, is a branch of artificial intelligence that concerns the realization of strategies or action sequences, typically for execution by intelligent agents, such as autonomous robots and unmanned vehicles.machine learning(机器学习)—Machine learning is the study of computer algorithms。
计算机期刊大全【前言】随着计算机技术的快速发展,越来越多的人开始关注计算机期刊,以获取最新的科研成果和技术进展。
本文旨在介绍全球范围内主要的计算机期刊,帮助读者了解各期刊的主题范围、影响因子、最新收录论文等信息,以提高论文发表效率和科研成果的质量。
【一、计算机科学顶级期刊】计算机领域的顶级期刊,对于任何一位计算机科学家来说,都是非常重要的。
这些期刊的文章水平高、质量优,其发表文章往往具有一定的权威性和影响力。
以下是全球最著名的计算机科学顶级期刊:1.《ACM Transactions on Computer Systems》(ACM TOCS)主题范围:该期刊关注计算机系统的设计、分析、实现和评估等方面,特别是操作系统、网络、分布式系统、数据库管理系统和存储系统等方面的最新研究成果。
影响因子:3.612发行周期:每年4期最新收录论文:Content-Based Data Placement for Efficient Query Processing on Heterogeneous Storage Systems, A Framework for Evaluating Kernel-Level Detectors, etc.2.《IEEE Transactions on Computers》(IEEE TC)主题范围:该期刊刊登计算机科学领域的创新性研究成果,重点关注计算机系统、组件和软件的设计、分析、实现和评估等方面的最新进展。
影响因子:4.804发行周期:每月1期最新收录论文:A Comprehensive View of Datacenter Network Architecture, Design, and Operations, An Efficient GPU Implementation of Imperfect Hash Tables, etc.3.《IEEE Transactions on Software Engineering》(IEEE TSE)主题范围:该期刊涉及软件工程领域的各个方面,包括软件开发、可靠性、维护、测试等方面的最新研究成果。
Knowledge is a treasure that can be accumulated and utilized to enrich ones life. Here are some points to consider when writing an essay on Knowledge is Wealth:1.Definition of Knowledge:Begin by defining what knowledge is and how it differs from information.Knowledge is a deep understanding of a subject,whereas information is simply data or facts.2.Importance of Knowledge:Discuss why knowledge is considered wealth.It can lead to better decisionmaking,problemsolving,and innovation.It empowers individuals to navigate through life with more confidence and capability.cation and Knowledge:Highlight the role of education in acquiring knowledge. Education systems are designed to impart knowledge,which is the foundation for personal and professional growth.4.Knowledge and Career:Explain how knowledge can lead to better career opportunities. Being knowledgeable in a field can increase ones employability and earning potential.5.Lifelong Learning:Emphasize the concept of lifelong learning.The pursuit of knowledge should not end with formal education.Continuous learning is essential to stay relevant and competitive in the everchanging world.6.Knowledge and Innovation:Discuss how knowledge drives innovation. Knowledgeable individuals are more likely to come up with new ideas and inventions that can lead to economic growth and societal advancement.7.Knowledge and Social Development:Explain how knowledge contributes to social development.A society with educated individuals is more likely to be progressive, democratic,and prosperous.8.Knowledge and Personal Growth:Describe how knowledge can lead to personal growth.It can improve ones critical thinking skills,cultural awareness,and overall quality of life.9.Challenges in Acquiring Knowledge:Acknowledge the challenges that people face in acquiring knowledge,such as limited access to education,financial constraints,and the overwhelming amount of information available.10.Conclusion:Conclude by summarizing the importance of knowledge as a form of wealth.Encourage the continuous pursuit of knowledge and the recognition of its value inall aspects of life.Remember to use examples and anecdotes to illustrate your points and make your essay more engaging.Also,ensure that your essay is wellstructured,with a clear introduction, body,and conclusion.。
艾德加斯诺英语名词解释艾德加斯诺英语相关名词1. 艾德加斯诺英语(Eduson English)艾德加斯诺英语是一家在线学习英语的平台,致力于为学员提供高质量、个性化的英语学习服务。
通过多种教学资源和智能技术,帮助学员达到流利、自信的英语交流能力。
2. 在线学习(Online Learning)在线学习是指通过互联网,在线教育平台进行学习的一种方式。
学员可以根据自己的时间和地点自由选择学习内容,通过互动式教学、学习资源和评估工具进行学习。
3. 个性化学习(Personalized Learning)个性化学习是根据学员的学习需求和兴趣,提供量身定制的学习方案和资源的教学方法。
通过分析学员的学习数据,系统可以为学员推荐适合其水平和学习目标的学习材料和练习。
4. 智能技术(Intelligent Technology)智能技术是指通过人工智能、机器学习等技术手段,为教学提供智能化的支持和辅助。
在艾德加斯诺英语中,智能技术可以根据学员的学习进度和表现,自动调整教学内容和难度,提供个性化的学习反馈和建议。
5. 学习资源(Learning Resources)学习资源是指学员在学习过程中使用的各种教材、文档、视频、练习题等内容。
艾德加斯诺英语提供了丰富的学习资源,包括课程教材、生词表、语法解释等,以帮助学员更好地掌握英语知识和技能。
6. 评估工具(Assessment Tools)评估工具是用于评估学员学习进度和能力的工具,可以通过测试、作业、练习等方式进行评估。
艾德加斯诺英语的评估工具可以帮助学员了解自己的学习效果,并根据评估结果调整学习策略。
7. 流利交流能力(Fluent Communication Skills)流利交流能力是指学员能够自如地运用英语进行交流和表达的能力。
在艾德加斯诺英语的学习过程中,学员通过课程和练习的训练,提高自己的听、说、读、写能力,达到流利、准确地与他人用英语交流的水平。
通过艾德加斯诺英语平台提供的个性化学习、智能技术和丰富的学习资源,学员可以有效提升自己的英语水平,培养流利交流能力。
Expert elicitation of recharge model probabilities for the Death Valley regional flow systemMing Yea,*,Karl F.Pohlmann b ,Jenny B.ChapmanbaSchool of Computational Science and Department of Geologic Sciences,Florida State University,Tallahassee,FL 32306,USA bDesert Research Institute,Nevada System of Higher Education,755East Flamingo Road,Las Vegas,NV 89119,USA Received 12January 2008;received in revised form 29February 2008;accepted 3March 2008KEYWORDSModel uncertainty;Prior model probability;Model averaging;Expert elicitation;Recharge estimates;Death Valley regional flow systemSummaryThis study uses expert elicitation to evaluate and select five alternativerecharge models developed for the Death Valley regional flow system (DVRFS),covering southeast Nevada and the Death Valley area of California,USA.The five models were developed based on three independent techniques:an empirical approach,an approach based on unsaturated-zone studies and an approach based on saturated-zone studies.It is uncertain which recharge model (or models)should be used as input for groundwater models simulating flow and contaminant transport within the DVRFS.An expert elicitation was used to evaluate and select the recharge models and to determine prior model prob-abilities used for assessing model uncertainty.The probabilities were aggregated using simple averaging and iterative methods,with the latter method also considering between-expert variability.The most favorable model,on average,is the most compli-cated model that comprehensively incorporates processes controlling net infiltration and potential recharge.The simplest model,and the most widely used,received the sec-ond highest prior probability.The aggregated prior probabilities are close to the neutral choice that treats the five models as equally likely.Thus,there is no support for selecting a single model and discarding others,based on prior information and expert judgment.This reflects the inherent uncertainty in the recharge models.If a set of prior probability from a single expert is of more interest,we suggest selecting the set of the minimum Shannon’s entropy.The minimum entropy implies the smallest amount of uncertainty and the largest amount of information used to evaluate the models.However,when enough data are available,we prefer to use a cross-validation method to select the best set of prior model probabilities that gives the best predictive performance.ª2008Elsevier B.V.All rights reserved.0022-1694/$-see front matter ª2008Elsevier B.V.All rights reserved.doi:10.1016/j.jhydrol.2008.03.001*Corresponding author.Tel.:+18506444587.E-mail address:mingye@ (M.Ye).Journal of Hydrology (2008)354,102–115a v a i l ab l e a t w w w.sc i e n c ed i re c t.c o mjou rnal homep age:www.elsevier.c om/locate/jhydro lIntroductionUncertainty analysis of hydrologic models is an essential element for decision-making in water resource manage-ment.This paper is focused on conceptual model uncer-tainty,which arises when multiple conceptualizations of a hydrologic system(or its processes)are all acceptable given available knowledge and data.A model averaging concept has been developed to assess the conceptual model uncer-tainty by averaging predictions of multiple models using appropriate weights associated with each model.The weights can be calculated using likelihood functions(Beven, 2006and its references therein)in the chi-square sense,the information criterion of AIC(Akaike,1974)or AICc(Hurvich and Tsai,1989)in the Kullback–Leibler sense(Burnham and Anderson,2002,2004;Poeter and Anderson,2005),or the information criterion of BIC(Schwarz,1978)or KIC(Kash-yap,1982)in the Bayesian sense(Draper,1995;Hoeting et al.,1999;Neuman,2003;Ye et al.,2004,2005,2008; Vrugt et al.,2006;Vrugt and Robinson,2007).This paper ad-dresses conceptual model uncertainty and model averaging in the Bayesian context.In Bayesian model averaging(BMA)(Draper,1995;Hoet-ing et al.,1999)or its maximum likelihood version(MLBMA) (Neuman,2003),if D is a quantity that one wants to predict, then its posterior distribution given conditioning data D (including measurements of model parameters and observa-tions of state variables)is the average of the distributions p(D|M k,D)under each model M k weighted by the posterior model probability p(M k|D),i.e.,pðD j DÞ¼X Kk¼1pðD j M k;DÞpðM k j DÞð1ÞThe posterior model probability,p(M k|D),is estimated via the Bayes’theorempðM k j DÞ¼pðD j M kÞpðM kÞP Kl¼1pðD j M lÞpðM lÞð2Þwhere p(D|M k)is the model likelihood function and can be approximated by p(D|M k)=exp(ÀKIC k/2)or p(D|M k)= exp(ÀBIC k/2)(Ye et al.,2004),and p(M k)is prior probability of model M k.Summation of the prior probabilities of all the alternative models is one,X Kk¼1pðM kÞ¼1ð3Þimplying that all possible models of potential relevance to the problem at hand are under study,and that all models differ from each other sufficiently to be considered mutu-ally exclusive(the joint probability of two or more models being zero).The question of how to assign prior probabili-ties p(M k)to models M k remains largely open.A common practice is to adopt a‘‘reasonable‘neutral’choice’’(Hoet-ing et al.,1999),according to which all models are initially considered to be equally likely,there being insufficient prior reason to prefer one over another.However,the neu-tral choice of prior model probabilities ignores expert knowledge of the system to be modeled,thereby implying maximum ignorance on the part of the analyst.Generally speaking,the prior model probability is an analyst’s(or a group of analysts’)subjective degree of reasonable belief(Jeffreys,1957)or confidence(Zio and Apostolakis,1996)in a model.The belief or confidence is ideally based on expert ing expert judgments is prevalent in uncertainty and risk analysis(Cooke,1991; Ayyub,2001;Bedford et al.,2006),especially when experi-mental and statistical evidence is insufficient(Refsgaard et al.,2006).For a complicated hydrologic system,expert judgment or experience is the basis of conceptual model development,and may be more informative than limited observations.This is particularly true for subsurface hydrol-ogy,where hydraulic parameters are measured from sparse samples(boreholes)and mathematical models may disagree with geologic rules(Wingle and Poeter,1993;Lele and Das, 2000).Garthwaite et al.(2005)argue that a better use of expert judgment could add more information than slight improvement of data analysis techniques.Hence,we view integrating expert judgment in BMA(by specifying subjective prior probabilities)to be a strength rather than a weakness.Madigan et al.(1995)and Zio and Apostolakis(1996)demonstrated that using informative prior model probabilities(in contrast to equal ones)on the basis of expert judgment can improve model simulation and uncertainty assessment.Ye et al.(2005)developed a constrained maximum entropy method,which estimates informative prior model probabilities through the maximiza-tion of the Shannon’s entropy(Shannon,1948)subject to constraints reflecting a single analyst’s(or group of ana-lysts’)prior perception about how plausible each alterna-tive model(or a group of models)is relative to others, and selection of the most likely among such maxima corre-sponding to alternative perceptions of various analysts(or groups of analysts).By running cross-validation,Ye et al. (2005)demonstrated that,in comparison to using equal prior model probabilities,using informative probabilities improves model predictive performance.The subjective prior model probabilities can be directly obtained through expert elicitation.The expert elicitation has been applied to many studies,for example,future cli-mate change(Arnell et al.,2005;Miklas et al.,1995),perfor-mance assessment of proposed nuclear waste repositories (Hora and Jensen,2005;McKenna et al.,2003;Draper et al.,1999;Hora and von Winterfeldt,1997;Zio and Apos-tolakis,1996;Morgan and Keith,1995;DeWispelare et al., 1995;Bonano and Apostolakis,1991;Bonano et al.,1990), estimation of parameter distributions(Parent and Bernier, 2003;Geomatrix Consultants,1998;O’Hagan,1998),devel-opment of Bayesian network(Pike,2004;Stiber et al., 1999,2004;Ghabayen et al.,2006),and interpretation of seismic images(Bond et al.,2007).Formal expert elicitation processes have been proposed by Hora and Iman(1989)and Keeney and von Winterfeldt(1991),among others.Although expert elicitation is criticized in various aspects,such as selection of experts and accurate expression of experts’knowledge and belief in probability forms(O’Hagan and Oak-ley,2004),the quality of educing expert judgments can be controlled by a formal procedure of expert elicitation and documentation(Garthwaite et al.,2005).Nevertheless,ex-pert judgments should be used with caution,not to replace ‘‘hard’’science(Apostolakis,1990).When assessing concep-tual model uncertainty,it is essential to adjust the prior probability to obtain the posterior model probability by con-ditioning of on-site measurements and observations.Expert elicitation of recharge model probabilities for the Death Valley regionalflow system103Different from general uses of expert elicitation for model parameterization and development,this paper uses the ex-pert elicitation to estimate prior model probabilities of alter-native models.With few examples of such an application of expert elicitation in model uncertainty assessment (Zio and Apostolakis,1996;Draper et al.,1999;Curtis and Wood,2004),this study is expected to provide theoretical and prac-tical guidelines for future applications of expert elicitation.This paper is focused on development of prior model proba-bilities using expert elicitation;discussion of using on-site data to further evaluate the alternative models is beyond our scope here.The expert elicitation is used in this paper to estimate prior probabilities of five recharge models developed for the Death Valley regional flow system (DVRFS),covering southwestern Nevada and the Death Valley area of eastern California,USA (Fig.1a).Due to existing and potential radionuclide contamination at the US Department of En-ergy’s Nevada Test Site (NTS)and the proposed Yucca Moun-tain high-level nuclear waste repository in the DVRFS,it is critical to predict contaminant transport in the region.Hydrologic and geologic conditions in the DVRFS are compli-cated,rendering multiple conceptualizations of the system based on limited data and information.Because conceptual model uncertainty can be significant,ignoring it (focusingonly on parametric uncertainty)may result in biased predic-tions and underestimation of uncertainty.While expert elic-itation was used for evaluating uncertainty of recharge and geological models (Pohlmann et al.,2007),this paper fo-cuses on the recharge models applied throughout the DVRFS.In the past few decades,several recharge models have been independently developed for Nevada by different researchers based on different scientific theories.These in-clude the Maxey–Eakin model (Maxey and Eakin,1949),the discrete-state compartment model (Kirk and Campana,1990;Carroll et al.,2007),the elevation-dependent chlo-ride mass balance model (Russell and Minor,2002;Russell,2004;Minor et al.,2007)and the distributed parameter wa-tershed model (Hevesi et al.,2003).It is unclear to scien-tists working in the DVRFS which recharge model should be used for groundwater flow and contaminant transport modeling.As recharge is the major driving force of ground-water flow,and thus contaminant transport,in the arid environment of the DVRFS,it is important to understand re-charge model uncertainty.Our ultimate goal is to incorpo-rate the recharge model uncertainty in our uncertainty analysis of DVRFS groundwater models.It is worth pointing out that recharge model uncertainty is prevalent and not limited to the DVRFS.Recharge is a fun-damental component of groundwater systems,andwithFigure 1(a)Boundaries of the Death Valley regional flow system,the Nevada Test Site,the proposed Yucca Mountain nuclear waste repository,and recharge rate estimates (m/d)of models (b)MME (modified Maxey–Eakin model),(c)NIM1(net infiltration model with runon–runoff component),(d)NIM2(net infiltration model without runon–runoff component),(e)CMB1(chloride mass balance model with alluvial mask),and (f)CMB2(chloride mass balance model with alluvial and elevation masks).104M.Ye et al.multiple recharge estimation methods(or models)avail-able,it is nontrivial to select the recharge estimation meth-od appropriate for a given environment(see review articles of Scanlon et al.,2002;Scanlon,2004).Scanlon et al.(2002) suggested using multiple methods to enhance reliability of recharge estimates.This is in line with the new concept of model averaging discussed above.The second section of this paper introduces the recharge models considered in the expert elicitation.Recharge esti-mates of the models are briefly compared in terms of their values,spatial distributions and statistical characteristics. In particular,we explain the reasons for treating recharge uncertainty as conceptual model uncertainty,rather than as parametric uncertainty.The process of expert elicitation is listed in the third section,followed by discussion of elic-itation results in the fourth section.Our conclusions are summarized in thefifth section.Description of thefive alternative recharge modelsThefive recharge models considered for the DVRFS are de-scribed briefly below;details of the models can be found in their original publications.Additional comparison of the models can be found in Rehfeldt(2004)and Pohlmann et al.(2007).Description of the geologic,hydrologic and hydrogeologic conditions of the DVRFS is beyond the scope of this paper,and the reader is referred to D’Agnese et al.(1997)and Belcher(2004)for further information on these topics.Modification of the Maxey–Eakin method(MME) Maxey and Eakin(1949)presented an empirical method (known as the Maxey–Eakin method)for estimating ground-water recharge as a function of precipitation.Since its inception,the Maxey–Eakin method has become the pre-dominant technique used for estimating annual groundwa-ter recharge in Nevada.The method estimates recharge viaR¼X Ni¼1C i P ið4Þwhere R is the estimated recharge,C i are the percentage adjustment coefficients,P i are the annual precipitation val-ues within zones of precipitation and N is the number of pre-cipitation zones.Maxey and Eakin(1949)utilized the precipitation map for Nevada developed by Hardman (1936)that includes hand-drawn contours based on weather station records and topography.The precipitation is distrib-uted amongfive isohyets(N=5)of5,8,12,15and20in. Assuming a steady-state basinflow condition in which dis-charge from a basin is approximately the same as recharge into the basin,the coefficients,C i,were developed through a trial-and-error method to attain a general agreement be-tween the volumes of estimated recharge and measured dis-charge for13basins in eastern and central Nevada.The coefficients,listed in Table1,increase in magnitude as the amount of precipitation increases while evapotranspira-tion and surface water runoff presumably decline.Note that the precipitation zone receiving less than8in./yr rainfall does not contribute to groundwater recharge.Given the incomplete coverage of the DVRFS domain by the Hardman precipitation map,Epstein(2004)modified the Maxey–Eakin model,hereinafter referred to as the modified Maxey–Eakin model(MME).The method uses the PRISM map(Precipitation Estimation on Independent Slopes Model)(Daly et al.,1994)so that the recharge is estimated in a consistent way over both the Nevada and California por-tions of the DVRFS.Considering uncertainty in the PRISM estimates of precipitation,the MME evaluates uncertainty of the recharge coefficients,C i,using an automated calibra-tion method based on91basins.Table1lists the mean coef-ficients of four precipitation zones(thus N=4in MME)used to estimate recharge of the DVRFS.Different from the Max-ey–Eakin method,the coefficient for the lowermost precip-itation zone is allowed to be nonzero.Although the MME model is more complicated than the original ME model,it is still the simplest model in the model set.The recharge map of the DVRFS estimated using the MME(with the mean coefficients)is shown in Fig.1b.Two net infiltration models(NIM)Hevesi et al.(2003)developed a distributed-parameter wa-tershed model,INFILv3,for estimating temporal and spatial distribution of net infiltration and potential recharge in the Death Valley region,including the DVRFS.The estimates of net infiltration quantify downward drainage of water across the lower boundary of the root zone,and are used as an indication of potential recharge under current climate con-ditions.Based on the daily average water balance at the root zone,the model comprehensively represents processes controlling net infiltration and potential recharge.The daily water balance includes the major components of the water balance for arid to semiarid environments,including precip-itation;infiltration of rain;snowmelt and surface water into soil or bedrock;runoff(excess rainfall and snowmelt);Table1Recharge coefficients for the Maxey–Eakin method and the modified Maxey–Eakin method(Epstein,2004)Maxey–Eakin method Modified Maxey–Eakin methodPrecipitation zone(in./yr)Coefficient Precipitation zone(in./yr)Coefficient 0to less than80.000to less than100.0198to less than120.0310to less than200.04912to less than150.0720to less than300.19515to less than200.15Greater than300.629 Greater than200.25Expert elicitation of recharge model probabilities for the Death Valley regionalflow system105surface water runon(overlandflow and streamflow);bare-soil evaporation;transpiration from the root zone;redistri-bution or changes in water content in the root zone;and net infiltration across the lower boundary of the root zone.Var-ious techniques were developed to estimate these quanti-ties and their spatial and temporal variability,which renders this method comprehensive but complicated.The model parameters(e.g.,bedrock and soil saturated hydrau-lic conductivity and root density)were adjusted through model calibration by comparing simulated and observed streamflow as well as basin-wide average net infiltration and previous estimates of basin-wide recharge.Two alternative net infiltration models with and without runon–runoff component(Hevesi et al.,2003)are consid-ered in this paper to represent the two opposite conceptu-alizations.Fig.1c and d depicts the averaged annual net infiltration estimates of the two models.Groundwater re-charge can be estimated from the net infiltration estimates by multiplying the net infiltration with coefficients related to rock hydraulic conductivity at the water table,since the net infiltration distribution only accounted for surficial characteristics of the system.For more details about the determination of the coefficients,the reader is referred to Belcher(2004).For convenience in this discussion,the two net infiltration models are also referred to as recharge models.Two elevation-dependent chloride mass balance models(CMB)The chloride mass balance(CMB)method estimates re-charge in basins(or any hydrologic systems)based on a bal-ance between chloride mass within hydrologic input and output components.The method assumes that chloride in groundwater within the basins originates from chloride in precipitation in mountain uplands and dry-fallout and is transported to adjacent valleys by steady-state groundwa-terflow(Dettinger,1989).At its most fundamental level, the method requires only estimates of annual precipitation in the recharge areas,total chloride input(chloride concen-trations in precipitation and recharge water)and total chlo-ride output(chloride concentrations in adjacent basin groundwater).The rate of recharge,R,can be calculated as(Maurer et al.,1996)R¼C p PC rÀC sw S wC rð5Þwhere C p is the combined wet-fall and dry-fall atmospheric chloride concentration normalized to precipitation,P is the mean annual precipitation rate,C r is the chloride concentra-tion in recharge water and C sw is the chloride concentration in surface water runoff S w.For individual basins,recharge rate can be estimated from this information if the following assumptions are met(Dettinger,1989):(1)there are no other major sources or sinks for chloride in the system;(2)surface runoff is small in comparison to groundwaterflow;and(3) the recharge areas are correctly delineated.Russell and Min-or(2002)extended the chloride mass balance approach to account for the elevation of precipitation,the limited quan-tities of recharge that are thought to occur on low-elevation alluvial surfaces,and uncertainty inherent in the data.This elevation-dependent chloride mass balance approach was applied by Russell and Minor(2002)to a7900-km2region of the Nevada Test Site(NTS)and vicinity within the DVRFS.Although this recharge/elevation relationship simulates recharge at all elevations,several studies suggest that sig-nificant groundwater recharge does not occur through low-elevation alluvial sediments in southern Nevada.Russell and Minor(2002)thus developed two models to address this uncertain conceptualization of low-elevation recharge.The first model assumes that all land surface areas covered by alluvial sediments receive negligible recharge based on the results of previous studies and soil-water chloride pro-files of40boreholes completed in unsaturated alluvium within the NTS(Russell and Minor,2002).This model is called the CMB model with alluvial mask.The second model assumes that the elevation of the lowest perennial spring that discharges from a perched groundwater system in the study area represents the lowest elevation at which signifi-cant recharge occurs.This spring is Cane Spring,which is lo-cated at an elevation of1237m above mean seas level. Coincidentally,this is approximately the same elevation (1200m)that Harrill(1976)and Dettinger(1989)consider to be the minimum at which precipitation makes a signifi-cant contribution to recharge in desert basins of central and southern ing the concept of a recharge cut-off elevation,Russell and Minor(2002)define a zone of zero recharge that encompasses all elevations below1237m plus elevations above1237m that are covered by alluvium.This model is called CMB with both elevation and alluvial masks. To assess uncertainty in the model parameters and mea-surements(e.g.,precipitation and chloride concentration in spring water),Russell and Minor(2002)developed a Monte Carlo method to estimate multiple realizations of the recharge estimates.The two models were further ex-tended in Russell(2004)and this study to include more ba-sins in Nevada and cover the DVRFS.Fig.1e and f depicts mean recharge estimates of the two CMB models. Summary and discussionThefive recharge models are summarized as follows: MME(Fig.1b):modified Maxey–Eakin model using the mean coefficients.NIM1(Fig.1c):net infiltration model with runon–runoff component.NIM2(Fig.1d):net infiltration model without runon–run-off component.CMB1(Fig.1e):chloride mass balance model with alluvial mask(mean estimates only).CMB2(Fig.1f):chloride mass balance model with alluvial and elevation masks(mean estimates only).Fig.1illustrates similarities and differences of the re-charge rate estimates(m/d)of thefive models,and Table 2lists the total recharge estimates(m3/d)for the entire DVRFS by each method.The MME gives the highest recharge estimate,and the CMB models give higher estimates than the NIM models.Due to the runon–runoff component con-sidered in NIM1,the recharge estimate of NIM1is higher than that of NIM2,while spatial patterns of the recharge estimate are similar in the two models.Because of the extra106M.Ye et al.elevation mask considered in CMB2,the recharge estimate of CMB2is lower than that of CMB1;for the same reason, spatial patterns of the recharge estimate are different in the two models(less recharge is estimated in southern Ne-vada in CMB2).The recharge estimate of the MME has the smoothest spatial distribution,due to the four precipitation zones.The different recharge estimates are viewed as a re-sult of conceptual model uncertainty,rather than paramet-ric uncertainty,since they are caused by simplification and inadequacy/ambiguity in describing the recharge process and not by uncertainty in recharge measurements them-selves(Wagener and Gupta,2005).Given thefive recharge models,which model(or models) should be used for groundwater modeling?Is it reasonable and justifiable to select a single model and to discard others based on expert judgment?How should uncertainty of the recharge models be assessed?The expert elicitation is used to answer these questions,and ultimate results of this ex-pert elicitation are the prior model probabilities essential to the BMA for assessing the conceptual model uncertainty. Process of the expert elicitationWhile several processes of expert elicitation have been sug-gested in the literature(e.g.,Hora and Iman,1989;Bonano et al.,1990),the process proposed by Keeney and von Win-terfeldt(1991)was followed,since it is closely pertinent to eliciting probability from experts and has been applied to model probability elicitation(Zio and Apostolakis,1996). The formal process consists of the seven steps listed below. Implementation of the process for the recharge models is also described.Step1:Identification and selection of elicitation issues The elicitation issues are the questions posed to the ex-perts that require their answers.The following three issues are considered for assessing the recharge model uncertainty: (1)Is the model set complete,given the objective of theanalysis?BMA requires that alternative models arecomprehensively exhaustive(all alternative modelsare included in the model set).Since this requirementcannot be satisfied in an absolute sense,we elicitfrom the experts whether there are other alternativemodels that are comparable in importance to thefivemodels and should be considered.(2)What are the plausibility ranks of these models,giventhe objective of the analysis?Whereas ranking ofmodel plausibility is qualitative and the ranks cannotdirectly give the prior model probability,the modelranking helps experts evaluate relative plausibilityof the models before they estimate prior modelprobability.(3)What is the probability value that best represents theconfidence you would place on each recharge model,given the objective of the analysis?Model probabili-ties are the ultimate goal of the expert elicitation,and will be used directly in the BMA to calculate theposterior model probability through Eq.(2).Step2:Identification and selection of expertsExpert elicitation requires three types of experts:gener-alists,specialists and normative experts.In this study,the generalists should be knowledgeable about various aspects of the recharge models and the broader study goals(in this case,assessing groundwaterflow and contaminant transport in the DVRFS).They typically have substantive knowledge in one discipline(e.g.,geology or hydrology)and a general understanding of the technical aspects of the problem. While the generalists are not necessarily at the forefront of any specialty within their main discipline,the specialists should be at the forefront of one specialty relevant to the recharge models.The specialists often do not have the gen-eralists’knowledge about how their expertise contributes to the broader study with respect to recharge model uncer-tainty analysis.Normative experts typically have training in probability theory,psychology and decision analysis.They assist generalists and specialists in articulating their profes-sional judgments and thoughts so that they can be used in a meaningful way in the conceptual model uncertainty assess-ment.A high-quality elicitation requires the teamwork of all three types of experts.Selecting experts is a time-consuming process,and may take more than a year for a full-scale elicitation(e.g.,having international nomination of experts and forming an expert panel of international scientists,as in Hora and Jensen, 2005).With practical limitations,we selected national and state experts,who were believed well-qualified owing to their familiarity with the hydrogeologic conditions of the DVRFS and their research at the forefront of recharge esti-mation in semi-arid environments of the southwestern US. Five specialists,two generalists and one normative expert were identified.The normative expert had an advisory role and was not involved in evaluating the recharge model uncertainty.Step3:Discussion and refinement of elicited issuesThis step allows discussion and refinement,if necessary, of the issues and quantities that will be elicited.While Kee-ney and von Winterfeldt(1991)suggest completing this step by a1-day meeting of all experts,such a meeting was con-sidered unnecessary for this project.Instead,one month before the elicitation,the experts received the three clearly stated elicitation issues,as well as original publica-tions of thefive recharge models and references about con-ceptual model uncertainty,BMA,prior model probability and expert judgment.The experts studied these materials, and some discussed details of the models with us and requested more reading materials.Step4:Training for the elicitationLed by the normative expert,the training was conduct in two meetings in thefirst half day of elicitation.In thefirst training meeting,the normative expert introduced theTable2Recharge estimates(m3/d)of thefive rechargemodels in the DVRFSRecharge model DVRFS(m3/d)MME596,190.8NIM1341,930.6NIM2282,223.1CMB1385,213.7CMB2365,647.2Expert elicitation of recharge model probabilities for the Death Valley regionalflow system107。
Automated Knowledge Elicitation and Flowchart Optimization forProblem DiagnosisAlina Beygelzimer,Mark Brodie,Jonathan Lenchner,Irina RishIBM Watson Research Center19Skyline DriveHawthorne,NY10532{beygel,mbrodie,lenchner,rish}@AbstractThe established procedure for problem diag-nosis in a wide variety of systems is often em-bodied in aflowchart or decision tree.Theseprocedures are usually authored manually,which is extremely expensive and results inflowcharts that are difficult to maintain andoften quite inefficient.A better diagnosticprocedure would be one that automaticallymodifies itself in response to the frequencywith which symptoms and underlying prob-lems occur,in order to minimize the averagecost of diagnosis.We describe an approach to constructing aBayesian network representation of diagnos-ticflowcharts,and demonstrate a system tosupport call center diagnostics based on thisrepresentation.One of the advantages of ourapproach is that it allows automated knowl-edge elicitation from“legacy”fl-ing the new representation,knowledge is eas-ier to author and maintain.By using in-formation gain as a search heuristic,nearly-optimalflowcharts can be generated in re-sponse to data about the frequency of sys-tem faults or symptoms.The approach al-lows both prior expert knowledge and train-ing data to be used to automatically generateand maintainflowcharts that respondflexiblyto changing circumstances.1IntroductionThe established procedure for diagnosing a problem in a faulty system is often embodied in aflowchart or decision tree.The system can be a piece of hardware, a piece of software,or a combination of hardware and software components.The diagnostic procedure may be executed by support personnel at a call center,by a voice response unit,by a web application when the system user seeks self-help,or it may even be executed automatically by the system itself in self-healing envi-ronments.Aflowchart is a natural way of representing the knowl-edge needed to diagnose problems.If theflowchart is comprehensive,it is usually easy for a human,even a non-expert,to follow theflowchart and diagnose the problem.At each node the human(or machine) can elicit the necessary information and decide which branch to follow,until a leaf is reached at which no more information is needed and the diagnosis is ob-tained.Flowcharts are a very good way of document-ing the knowledge developed over time by people in resolving complex problems using their experience and expertise.However,flowcharts suffer from a number of difficul-ties that restrict their utility for many applications. Firstly,they are quite difficult to author manually.It is quite expensive to obtain the necessary knowledge from human authors,because a large number of pos-sible branches need to be considered to create a com-prehensiveflowchart.Even if a good initialflowchart can be manually built,maintaining theflowchart is an endless source of further difficulty.For example,every time a new type of fault is discovered,it needs to be added to theflowchart.A human being will at best typically add a new fault so that it does not take too long to diagnose and does not cause too much mod-ification of the underlyingflowchart structure.This can quickly result in theflowchart becoming unman-ageably complex and incomprehensible.A second major problem with manually authored flowcharts is that they are almost always sub-optimal. An optimalflowchart is one that minimizes the average cost of diagnosis,e.g.,the average number of questions: common problems should be diagnosed more quickly by asking about themfirst,before asking about less common problems.To maintain optimality,a diagnos-ticflowchart must necessarily modify itself in response to changes in the frequency with which symptoms and problems occur.Manually authoredflowcharts tendto become increasingly sub-optimal over time because of the difficulty of maintaining them.A particularly annoying example of this is when customer help per-sonnel ask many unnecessary questions when trying to diagnose a problem.It is of course possible to construct aflowchart using traditional decision-tree learning from training data. However,training data can be difficult to obtain(a working diagnostic system must already be in place be-fore any training data can be collected).Furthermore, this approach fails to take advantage of the knowledge of human experts.People often have a very good un-derstanding of what information should be elicited to perform the diagnosis,but they are usually unable to arrange the questions in the optimal order,given the frequency of the problems and symptoms and the com-plexity of considering all possible diagnostic paths.In this work we describe an approach that automati-cally builds an alternative representation of the knowl-edge underlying diagnosticflly,it cre-ates a simple Bayesian network that is consistent with the available”legacy”flowcharts without asking an ex-pert to go through the Bayesian net creation process. This representation has a number of advantages:1.Knowledge is easier to maintain and update-au-thors simply need to specify the new questions or tests they might want to ask and,if available, what answers each test might yield,depending on the state of the system.No ordering informa-tion on the tests is needed,although ordering con-straints can be provided if desired.Faults,tests and symptoms can be easily added,deleted and modified.2.An efficientflowchart can be generated usinga simple greedy algorithm that selects the or-der of tests,taking into account the frequency with which symptoms and problems occur.This flowchart can be shown to be close to the optimal flowchart obtained by exhaustive search.3.The generatedflowchart changes automatically asdata about the frequency of faults and symptoms is obtained from use of the diagnostic system.Thus both prior human knowledge and training data are leveraged to allow for continuous learn-ing of efficient diagnostic procedures.4.Any pre-existing(“legacy”)flowchart can be eas-ily converted into the new representation.This allows us to take advantage of human expertise by creating an optimized version of any existing flowchart.Two types of optimization are partic-ularly common:unnecessary tests are removed, and the order of questions may change as the problem frequency changes.We also describe a system we have built to support call center diagnostics based on this new authoring paradigm.The system is called FLOAT,for Flowchart Learning,Optimization,Authoring and Testing.Au-thors simply describe their knowledge of states and symptoms(or they can convert an already existing flowchart).Once aflowchart or decision tree is gener-ated,authors can test out theflowchart interactively and modify the knowledge if needed.We provide ex-amples illustrating the use of the system to improve existingflowcharts.2Problem formulationGiven a set S of n possible states of a system,such as possible faults,we want to distinguish between those states by probing for symptoms that differentiate be-tween the various possible states as efficiently as pos-sible.We refer to such probes variously as questions or tests,and denote the set of all available probes by Q.Some q∈Q may be more expensive to administer, or ask,than others.It is in fact more precise to speak of the cost of answering a question or responding to a probe,and it may in fact be that the cost of asking a question is dependent in part on the answer given.In what follows,we ignore this nuance and assume that the cost of answering a question is the same for all answers.Thus each question q has an associated cost c(q).Each test q∈Q corresponds to a disjoint collection of subsets U1,...,U k⊂S,with the interpretation that the question q can be answered in one of k possible ways a1,...,a k,and given the answer a i the actual state s of the system may either be contained in U i-the collection of states for which we know the answer to be a i,or in S\U i-those states for which we are not certain which of the answers to q applies.We seek an efficient sequence of tests that is guaran-teed to distinguish between every two elements of S. We canfind such a sequence only if the questions in Q are capable of differentiating all elements of S.Q is then said to be separating.If Q is not separating, at the end of asking a sequence of questions,we will necessarily be left with a probability distribution over the remaining states.3FlowchartsAflowchart,or decision tree,T for the set S of pos-sible states is a tree with the elements of S as leaves,a separating subset of Q as internal nodes,and,given an internal node q∈Q,the edges below q are given by the possible answers a i to q.With some abuse of notation we shall sometimes write that q=a i meaning that in the given context,the question q was given the answer a i.It is possible that an element in S can appear asmultiple leaves in a decision tree T ,and equally that an element of Q can exist at multiple nodes.A path p through T is given by a sequence of questions along with a final state:p = q i 1,...,q i k ,s .The cost ofa path p through T is then given by c (p )= kj =1c (q i j ).Given a probability distribution on the states (or paths in the event that some states appear multiple times as leaves),the cost of the decision tree T is the expected cost of diagnosis,i.e.c (T )= p Pr(p )c (p ).It is also possible to consider trees which do not neces-sarily terminate in states,but rather,in some cases,in probability distributions over sets of remaining states.We call such a tree a non-separating tree .The set of all internal nodes of a non-separating tree may or may not be a separating set of questions.In what follows we assume we always have a separating set of questions,and we require that all proper decision trees T be separating.The basic structure of T can be represented somewhat differently by collapsing into a single node any ques-tion or state that appears multiple times in the tree,yielding a directed acyclic graph or DAG.There is also the notion that one graph can call another graph,and in so doing even create cycles.In this most general conception,the underlying graph is called a flowchart.We shall take the point of view that cycles are never beneficial,and so all flowcharts can be represented as DAGs.Further,when a question can be asked via multiple paths,we replace the question with multiple instances,one per path,and similarly when a state can be reached via multiple paths.Thus,in all cases we equate a flowchart with a decision tree.4Alternative representations:Bayesian Networks and Dependency MatricesThe flowchart representation of expert knowledge in diagnostic applications seems to be quite traditional,but it suffers from several drawbacks,such as the com-plexity of modifying flowcharts when states,questions and their outcomes are added,deleted or changed,as well as the sub-optimality of the diagnosis process when using manually constructed decision trees.We propose below an alternative representation that elim-inates the above drawbacks.A Bayesian network approach to diagnosis,where the state corresponds to a combination of various faults in a system,was previously addressed in [7,9],where a bipartite Bayesian network was used to model the dependencies between the component states and the probe outcomes,assuming that the probe outcomes are independent given the system state.This assump-tion actually holds in many diagnostic cases,and al-lows for very efficient knowledge representation andinference.In this paper,we use a simpler approach,treating the system state as one multi-valued variable,which effectively reduces the bipartite graph to a naive Bayes model (a state variable pointing to various probe variables).Given the naive Bayes assumption,a Bayesian net-work can be represented explicitly by a so-called de-pendency matrix [7].A dependency matrix D for a set S of n states and a set Q of m questions is an m x n matrix with rows corresponding to questions,and columns corresponding to states.The entry D ij corresponds to the probability distribution of the an-swer(s)expected to question q i ,given that the system is actually in state s j .If a question is not relevant to a state,the corresponding matrix entry is left blank (shown by an asterisk).Dependency matrices can easily be modified incremen-tally,for example by simply adding a column in re-sponse to a new state,and as statistics are captured about the relative frequency of states or symptoms,the matrix values are automatically modified.A symptom is identified with an answer to a question.Dependency matrices have been widely used for prob-lem diagnosis (see [4],[6]and [7]).They are much more convenient for knowledge authoring and incremental modification,and moreover can be used to create al-most optimal flowcharts or decision trees.They are also well adapted to learning from experience.As data accumulates about the prevalence of various symptoms or states,this information can easily be added to the dependency matrix.Unlike a decision tree,a dependency matrix does not directly include information as to what question to ask given a set of remaining states,and possibly a probability distribution over those states.In order to exploit the advantages of the dependency matrix rep-resentation,we need to be able to convert dependency matrices into flowcharts.5Flowchart OptimizationFirst assume that all questions in Q have the same cost.We begin with a prior distribution over the set S of states;more generally at any point in the construc-tion of the flowchart,we are at a node with a prob-ability distribution over the set of remaining states S.A natural greedy algorithm,which we shall re-fer to as GREEDY ,is to choose that q ∈Q which maximizes the expected information gain,or expecteddecrease in entropy,from answering q .If H ( S )=s ∈b S(−Pr(s )log Pr(s ))is the entropy associated with S,then the expected information gain by asking q ,I ( S |q )=H ( S )− k i =1Pr(q =a i )H ( S,q =a i )where {a i }are the possible answers to q and H ( S,q =a i )=s ∈b S,q=a i (−Pr(s )log Pr(s )).T10****q 5.063.063.125.25.25.25Prob110***q 4***10*q 3000**1q 2000110q 1s 6s 5s 4s 3s 2s 1MFigure 1:Complementarity between decision trees and dependency matrices.We select the q ∈Q which maximizes I ( S|q ),add one edge for each of its possible answers a i ,create new nodes with updated probability distributions over the states,one for each answer a i ,and then repeat the pro-cess,selecting the most informative question at each of the new nodes.In this way all paths through the flowchart are constructed in parallel.A path ends ei-ther when only one state remains with non-zero prob-ability or there are no remaining questions which have non-zero information gain.If questions have different costs,GREEDY is easily modified to select the q ∈Qwhich maximizes I ( S|q )/c (q ).The flowchart optimization problem is NP-hard [2]and has no constant factor approximation algorithm [8].The above GREEDY algorithm was shown to be within a factor O (log n )of optimal in [5].The authors showed that a slight re-weighting of leaf probabilities (provably needed if the probabilities are exponentially unbalanced)results in a tree whose cost is within a factor of O (log n )from optimal,for any distribution on the n leaves.6Knowledge Elicitation:Constructing Dependency Matrices from FlowchartsGiven a flowchart (decision tree)T ,we now address the question of constructing a corresponding dependency matrix M that would incorporate the knowledge from T about possible system states and test outcomes.One natural approach would be to consider the flowchart as a set of constraints imposed on the joint distribu-tion over the states and tests that will be represented by a corresponding dependency matrix M ,and use a maximum-entropy approach to reconstruct M from T .Namely,every path from the root to a leaf in T im-plies a constraint on the tuples (t ,s ),where t is a set of all tests and s is the system state,requiring that the subset of tests along the path,as well as the system state variable s take the values specified by that path.One interpretation of the maximum-entropy approachwould assume a uniform distribution over all (t ,s )con-sistent with (at least)one such path constraint.Alter-natively,we could assume a uniform distribution over each set of tuples corresponding to a particular path.Yet another approach is simply to assume,for every test not on the current path,a uniform distribution of test outcomes conditioned on a given state specified by the leaf of the path.While there are different argu-ments for and against each set of assumptions (which we hope to explore more in the future work),our cur-rent approach uses the last assumption and is briefly described below.We assume the nodes of T to be labelled with their associated questions,and the edges to be labelled by the associated answers or symptoms.Since states are represented by leaves in T and columns in M ,we pop-ulate the associated cells of M by traversing the path from the root of T to the associated leaf and read-ing offthe answers to each question encountered.All other cells of the matrix are filled in with an aster-isk indicating that the cell value is “unknown.”The only trick is to assign a probability distribution to the states,and various approaches are possible as we men-tioned above.For now,we assume at each stage that each answer to each question is equiprobable;then to obtain the probability of a state s we just multiply the probabilities of each answer along the path from root to s .If we are able to assume that all leaves are associ-ated with unique states,then GREEDY applied to M will precisely yield T (or a flowchart completely equivalent to T )giving full “complementarity.”A simple information-theoretic computation shows that GREEDY (M )=T except in the case in which there are two questions q 1,q 2which are completely inter-changeable.q 1and q 2are interchangeable if first q 1is asked,and regardless of the answer to q 1,q 2is asked.Figure 1illustrates the complementarity between the decision tree T and the matrix M .M is assumed to be equipped with a complete set of priors,as illustrated.7ImplementationWe briefly describe a system we have created to sup-port call-center diagnostics that utilizes the comple-mentarity between decision trees and dependency ma-trices.The application is called FLOAT,for Flowchart Learning,Optimization,Authoring and Testing.The idea is that authors describe their knowledge of states and symptoms (i.e.answers to questions)using a de-pendency matrix.Alternatively,a dependency ma-trix can be created from an already existing flowchart.Then the GREEDY algorithm described above is used to generate a flowchart from the dependency matrix,taking into account whatever information is availableFigure2:A piece of a realflowchart.about the frequency of the different states.Authors can then test out theflowchart interactively and mod-ify the dependency matrix if necessary.We show how the system is used to improve existingflowcharts as well as provide dynamicflowcharts that automatically rearrange the order of questions in response to changes in problem frequency.7.1Optimizing An Existing FlowchartLeaves t Questions o show_bm_specsshow_ bm_contact_infogotobm_3show_swl2goto bm_2_scan_glass_autofeeddone show_swl2goto bm_2_copy_scan_LED_error(1) How can wehelp you?Information Information Problems Problems Problems Problems Problems Problems(2) Do you need?Specs Contact_info ProblemDetermination ProblemDeterminationProblemDeterminationProblemDeterminationProblemDeterminationProblemDetermination(3) What kind ofproblem are youhaving?**Scanning Scanning Scanning Copying Copying Copying(4) Does yourscan completewithout failure orerror?**Yes No No***(5) Does the itemyou are copyingscan in?**No No No Yes Yes Yes (6) Is there anerror on LED orPanel?***Yes No***(7) Does LEDreadPS/Ethernet orToken Ring?*****Yes Yes No(8) Does theprinter print?*****Yes No*Figure3:Converting theflowchart to a dependency matrix.We begin by showing how the system is used to im-prove existingflowcharts.Figure2shows a small piece of aflowchart used for printer diagnostics.As de-scribed above,thisflowchart can be converted directly into a dependency matrix,shown in Figure3.Then Figure4:Converting the dependency matrix to an optimizedflowchart.the greedy algorithm is used to convert the depen-dency matrix into a new,optimized,flowchart,shown in Figure4.The newflowchart contains only6tests,compared with8tests in the originalflowchart.It is easy to see that,assuming all states are equally likely and all tests are equally costly,the expected cost of diagnosis using the newflowchart is2.5tests,compared with 4.375tests in the originalflowchart,a savings of43%. These numbers indicate roughly the kinds of improve-ments that can be obtained with legacyflowcharts. 7.2Automatically Changing the Order ofQuestionsWe now illustrate the advantage of generating flowcharts from dependency matrices by showing how the order of questions in theflowchart automatically changes to reflect changes in information,such as prob-ability of different symptoms or problems.In con-trast,a manually authoredflowchart cannot be easily changed in this way.7.2.1Example-Automobile Fault Diagnosis Our example is a simplified version of a real-life sit-uation-diagnosing an automotive problem based on information about an unusual smell.First assume that drivers are able to faultlessly detect the exact smell. For example,if a fuel-injection problem occurs,gen-erating a rotten-eggs smell,the driver does not con-fuse this with the sulfur-like smell generated by leak-ing gear lubricant.Under this assumption the smell is almost completely diagnostic of the problem.How-ever there are two different states that have the same ”Maple Syrup”smell.In this case additional informa-tion,such as whether the smell occurs primarily inside or outside the car,is needed to resolve the problem. The complete dependency matrix is shown in Table1. Given a prior distribution over the states,aflowchart is then generated using the GREEDY algorithm de-scribed in section5.If a uniform prior is assumed,the resultingflowchart is the one shown in Figure5.The flowchartfirst asks“What does it smell like?”.Only if the smell is“Maple Syrup”is another question needed,“Where do you smell it?”.The third question,“When do you smell it?”is not needed.Note from theflowchart that if“Maple Syrup”is smelled both inside and outside the car,the diagnosis is“Radiator Leaking Coolant”-the rightmost node in Flowchart1.This is an example where multiple states are possible,but no further questions are available to distinguish between them-the most likely state is shown.Clicking on any node in theflowchart shows the state probabilities at that node.For example for the rightmost“Radiator Leaking Coolant”node there is a2/3probability that the problem is“Radiator Leaking Coolant”and a1/3probability that the prob-lem is“Bad Heater Core”.This reflects the fact that if ‘Maple Syrup”is smelled both inside and outside the car it is twice as likely(according to the dependency matrix)that the cause is“Radiator Leaking Coolant”rather than“Bad Heater Core”.7.2.2Scenario1-Changes in SymptomProbabilitiesThe assumption of”perfect smelling”is obviously un-realistic.For example,many drivers may confuse a smell of rotten eggs with a smell of sulfur.Table2 shows a more realistic dependency matrix that reflects this-the true smell is the most likely to be reported, but it may be confused with similar smells.Theflowchart generated by the greedy algorithm no longer asks“What does it smell like?”first.Itfirst asks“Where do you smell it?”,and the next question changes depending on the answer to thefirst question -see Figure6.If the smell is inside the car,the sec-ond question is“What does it smell like?”,but if it is both inside and outside then the most-informative question to ask next is“When do you smell it?”This flexible modification of the question depending on the answers to previous questions illustrates the advan-tages of a learning approach over an expert’s hard-codedflowchart.Note that Figure6does not show the completeflowchart-expanding the“What does it smell like?”node is needed,but results is a rather large and messy diagram.7.2.3Scenario2-Changes in State Priors orTest CostsIf the state priors change-for example a shipment of faulty gear lubricant results in a sudden rise in the occurrence of leaking gear lubricant problems-the flowchart will automatically respond by optimizing the order of the questions,and so the question”When do you smell it?”,which is highly diagnostic of leaking gear lubricant,will be placed earlier in theflowchart, enabling the correct diagnosis to be made as quickly as possible.Similarly,if different tests have different costs and a previously expensive test becomes cheaper, it will rise earlier in theflowchart.Thus theflowchart dynamically takes into account all the relevant infor-mation to optimize the overall cost of diagnosis.7.3Exploration of New QuestionsWhen a new problem or state appears it is sometimes the case that the new state will have symptoms that exactly match that of an existing state,but with one new differentiating factor.For example,a new de-vice driver may have been released that has an obscure bug that causes the same set of symptoms as another known problem with the device.The new,as yet un-known,variable is the device driver level.In adding the new state,we would add a new question asking for the device driver level.In a large and complicated matrix,specifying the answer to this question for the two states we must differentiate is easy,but specifying the answer for all states may be tedious,and in fact a highly trained member of the support staffmay not always be able to come up with such answers,or have the time to do so.When we add such a question q,it starts out offering very little information gain,since it can only be used to differentiate between two states. The trick is to occasionally do intelligent exploration to see if q is relevant to other states.A principled approach is to occasionally ask q in cases where we already have a state signature that is close to,but not the same as that of the states for which the answer for q is known.It is also possible to introduce a policy where q is asked only once a given problem is resolved and so not burden problem diagnosis too directly-such a policy may be viewed more favorably by cus-tomers.There are a number of reasonable prospecting strategies,and this general problem has been exten-sively studied,for example[1]and[3].7.4Weighting of Historical DataAnother important issue is the fact that older histor-ical data on the frequency of state or symptom oc-currence is typically less valuable than more recent data.Thus it is generally a good idea to put an aging policy in place that decays the value of older informa-tion,however the correct decay function is likely to beStates Tests Fuel In-jectionProblemMildewin A/CEvaporatorGear LubeLeakingRadiator LeakingCoolantBad HeaterCoreWhat doesit smelllike?Rotten Eggs Dirty Socks Sulfur Maple Syrup Maple SyrupWhere do you smell it?Both Insideand OutsideCarInside Car Both Insideand OutsideCarOutside Car0.8BothInside and OutsideCar0.2Inside Car0.9BothInside and OutsideCar0.1When do you smell it?Engine isRunningEngine isRunningAll the time Engine is Running Engine is Running Table1:DM1:Dependency Matrix for Automobile Diagnosis Example.Figure5:Flowchart1-Automobile Diagnosis Example.States Tests Fuel InjectionProblemMildew in A/CEvaporatorGear Lube Leak-ingRadiator Leak-ing CoolantBad HeaterCoreWhat does it smell like?Rotten Eggs0.5Sulfur0.4DirtySocks0.1Dirty Socks0.5Rotten Eggs0.2Sulfur0.3Sulfur0.5RottenEggs0.4DirtySocks0.1Maple Syrup Maple SyrupWhere do you smell it?Both Inside andOutside CarInside Car Both Inside andOutside CarOutside Car0.8Both Inside andOutside Car0.2Inside Car0.9Both Inside andOutside Car0.1When do you smell it?Engine is Running Engine is Running All the time Engine is Run-ningEngine is Run-ningTable2:DM2:Dependency matrix for Scenario1.Figure6:Flowchart2-Flexible Question Ordering.。