The role of domain knowledge in a large scale Data Mining project

格式：pdf
大小：60.88 KB
文档页数：12

下载文档原格式

英语教学法程》王蔷复习题

英语教学法程》王蔷复习题陕西师范大学《英语教学法教程》复习题Unit 1 (3)Unit 2 (3)Unit 3 (3)Unit 4 (3)Unit 5 (4)Unit 6 (4)Unit 7 (4)Unit 8 (4)Unit 9 (4)Unit 10 (5)Unit 11 (5)Unit 12 (5)Unit 13 (5)Unit 14 (7)Unit 15 (7)综合复习题 (9)第三模块复习题Unit 1Views on languageViews on language learning1. What are the major views of language? What are their implications to language teaching or learning?2. Some language teachers argue that we should “teach the language” rather than “teach about the language”. What are the major differences between these two approaches tolanguage teaching?3. Audiolingual approach to language learning4.Socio-constructivist theory of language learning emphasizes interaction and engagement with the target language ina social context.5. The quality of a good language teacher includes ethic devotion, professional quality and personal styles.6. One influential idea of cognitive approach to language teaching is that students should be allowed to create their own sentence based on their own understanding of certain rules.Unit 2What is communicative compentence? Try to list some of its components.Principles in communicative language teaching/ strong version and week versionList some of the communicative activities.What is a task/its componentsUnit 3The overall language ability required in the 2001 National English Curriculum includes the following aspects language knowledge, language skills, learning strategies, affects and cultural understanding.4. Lesson PlanningWhat is lesson planning?Principles for good lesson planningComponents of a lesson planUnit 41. What is the Grammar-Translation Method?2.What is the Functional-Notional syllabus?3.What?s the di fference between Grammar-Translation Method and the Functional-Notional Approach?4. What is Sociolinguistics? Can you give some examples in your daily life?5. What is Language acquisition and language learning?6.What is the Natural Order of language acquisition?Unit 5What is classroom management?Types of student grouping and their advantages and disadvantagesThe role of the teacher ---- contoller, assessor, organizer, prompter, participant, resource providerThe new curriculum requires the teacher to put on the following new roles: facilitator, guides, and researchers.Classification of questionsHow to deal with errors?Unit 6Critical Period HypothesisThe goal of teaching pronunciation should be: consistency, intelligibility, and communicative efficiency.List some methods of practicing sounds.Unit 7Grammar presentation methodsGrammar practice is usually divided into two categories, mechanical practice and meaningful practice.Unit 8What does knowing a word involve? Receptive vocabulary and productive vocabulary.List some ways of presenting new wordsHow to consolidate vocabulary?Developing vocabulary building strategiesUnit 9Characteristics of listening processPrinciples and models for teaching listeningAs far as classroom procedures are concerned, the teaching of listening generally follows three stages: pre-listening stage, while-listening stage, and post-listening stage.Unit 10What are the characteristics of spoken language? Discuss their implications to teaching.Information-gap activitiesList some of the speaking tasks that the students are often asked to do in language classroomUnit 11The role of vocabulary in reading: sight vocabularySkills involved in reading comprehensionModels for teaching readingStages involved in Teaching ReadingProblems in reading are often seen as a failure to recognize words that may not exist in the learner’s vocabulary or in understanding grammatical structures that may not have been acquired by the learner. Therefore, the task of teaching reading is seen as teaching vocabulary along with the grammatical structure of the target language. Do you agree with such an opinion? Explain your reasons.In teaching reading, teachers often engage students in pre-reading, while-reading and post-reading activities. What do you think are the major functions of pre-reading activities?Unit 12What is the main idea of communicative approach to writing?What is the main idea of the process approach to writing?Unit 13I: What is the teacher?s role in communicativeLanguage teaching?I I: Decide which of the followings are “ traditional teaching methods” and which are communicative teaching methods”.1. The teacher tries to help them remember the meaning of each word by reading it mechanically again andagain.2. Students read the pattern drills aloud and then translate them one by one into Chinese. (or: first targetlanguage into mother tongue, then mother tongue into target language.)3. “Jigsaw” listening or reading--- the students read or listen to different texts, then they exchange with each other the information they have gained from them.4. The teacher refers to a picture，which everyone in the class can see and asks questions about the picture.5. Mini-research and questionnaires-students walk around the class to do a mini-investigation on certain topicthey are interested in by asking the other students question.6. The students read aloud the new words and expression by imitating their teacher or by listening to the tape.7. Students make sentences following the given pattern or sentence structure.8. Students present their own ideas or opinion on certain topic.9. Students read the text aloud.10. Students speak according to the roles assigned to them in a given situation11. Students do the written exercises, such as filling in the blanks with the correct forms of the verbs, adverbs, or prepositions, or they do multiple choice exercises .12. The text would be read aloud sentence by sentence and each one would be translated.13. The language is natural, so students will learn how speakers of the language actually use it.14. Students can learn more about the language by examining the discourse (how the text is organized and language is used to hold it together) and more about the background culture, which will help them comprehend future texts.15. The teacher teaches grammar rules. The teacher explains and illustrates them by pointing to examples in the text or by thing examples from dictionaries or grammar books.16. Real life is brought into the classroom, so that students are doing in class to what they might have to do later in life.17. The teacher then begins to deal with the text, sentence by sentence and paragraph by paragraph: explaining the language points, dwelling upon the grammar rules, analyzing the sentences, providing the Chinese equivalents, giving the examples to demonstrate the usage of certain words and expressions.18. Students in pairs are given different bits of information. By sharing this separate information they can completea task.19. Students in groups do debating, arguing about the advantage and disadvantage of T.V.20. The teacher then begins to deal with the text, sentence by sentence and paragraph by paragraph: explaining the language points, dwelling upon the grammar rules, analyzing thesentences, providing the Chinese equivalents, giving the examples to demonstrate the usage of certain words and expressions.III: Look at the following …role definitions? and the list of some a teacher?s functions. For each of these functions, decide which role is most appropriate ( in some cases more than one …role? may be involved)Rolesa. diagnosticianb. plannerc. managercontrollere. participantf. instructorg. assessorh. prompter1. to find out (as far and as consistently as possible the needs, interests, language difficulties and preferred learning styles of the students.2. to foster a group feeling(cooperation, liking, common aims, mutual confidence, etc)3. to ensure that learners have clear short and long-term learning objectives.4. to assess the progress of individual and of the class as a whole5. to ensure that learners are aware of this progress.6. to encourage students to take responsibility for their learning.7. to vary patterns of interaction within the lesson according to the precise aims and the nature/feeling of the group.8. to ensure that the students find their involvement sufficiently challenging.9. to analyse and present realistic …chunks? of the target language for students to process.10. to select and introduce activities and materials for language work.11. to help students develop positive, individual strategies for learning.Unit 141. What is bottom-up approach and top-down approach?2. What area the four main reading strategies? Describe their differences. When do you use these reading strategies?3. What?s pre-reading, while-reading and post-reading? What are their activities? Find a text and write pre-reading, while-reading and post reading activities.1． What is bottom-up approach and top-down approach?2．What area the four main reading strategies? Describe their differences. When do you use these reading strategies?3. what?s pre-reading, while-reading and post-reading? What are their activities? Find a text and write pre-reading, while-reading and post reading activities.Unit 15As a successful listener, he should be able to demonstrate his success by correctly reproducing the aural message, requires important information.The purpose for listening in real life are: :a. get informationb. to maintain social relationsc. to be entertained.Language and background knowledge constitute the two main sources of informationFor different purpose people use different listening skills;a. listening for a general ideab. listening for specific informationc. listening for detailed informationd. listening for inferring information ( listen to decode what is indirectly expressed, including the relationships between speaker, the moods or attitudes of the speaker, the physical setting of the text.e. note-takingGuidelines for designing effective listening tasks:a. the listening skill the students are required to developb. students? interests, needs, language level and potential problemsc. the class size, time available, teaching aidsDesigning tasks to develop the skill of listening for general ideaa. decide a titleb. write out the answersc. write a summaryd. look at a list of words and circle those used by the speakere. fill in blanksf. sequencing the main pointsconducting a listening classthe teacher can be thought of as a “director” and the students “actors”Task for director:a. gives an introductionb. monitor and observec. make comments or diagnose problemsthe t eacher?s role in listening class is just like director. A listening class is divided into three stages: pre-listening, while-listening and post-listening stages.Pre-listening stage a period before the students start listening. The main tasks of the pre-listening stage area. introduce about the topicb. introduce the type of the textc. introduce some background informationd. make predictions about the content and make a list of words which may occur in the listening text.While-listening stageIt is the period in which the students perform the act of listening. This is the stage in which students actually carry out all the activities while the teacher observes and operates the machine.The tasks in this stage are:a. listening for general ideab. listening for specific informationc. listening for inferringActivities:a. filling details in a formb. labeling a piece of graphic materialc. taking notesd. correcting something already writtene. ticking off items in a listf. drawing the picture or diagramg. carrying out actionsh. arranging events or information in the correct sequencei. judging whether some statements about the listening textare true or falsePost-listening stage: a period after listeningTasks:a. checks student?s answersb. points out their problemsc. explains the listening textd. oral summarye. written summaryf. create the situation for students to do role-playg. express your own view about the topic of the text.h. solve a given set of problems using the information you have learnt from the texti. hold discussion with your group on the topicj. write a letter to complain about the situation described in the listening text.k. write the same situation in your experience综合复习题Exercises for the course of English teaching methodologyI. Multiple choiceDirections：Choose the best answer for the following questions and write your answers on the answer sheet.1. What syllabus is designed around grammatical structures, with each lesson teaching a grammar structure, starting with simple ones, and progressing through to more complex ones?A. Structural syllabus.B. Situational syllabus.C. Functional syllabus.2. Which of the following is a communicative activity?A. Listen to the weather broadcast and fill in a form.B. Listen to the weather broadcast and talk about a picnic.C. Transfer the information from the weather broadcast intoa table.3. In which of the following situations is the teacher playing the role of a prompter?A. Explain the language points and meanings of words and sentences.B. Give examples of how to do an activity after the explanation and instructions.C. Elicit ideas from students.4. Which of the following is a social interaction activity?A. Information gap.B. Role-play.C. Information transfer.5. What reading approach is based on the assumption of reading as a guessing game?A. The top-down approach.B. The bottom-up approach.C. The interactive approach6. What reading strategy does the following activity help to train?The students were asked to read each paragraph and then match the paragraph with relevant headings.A. Inferring.B. Scanning.C. Skimming.7. Which of the pre-reading activities exemplifies the bottom-up approach?A. The teacher brings in pictures and asks the students to discuss in groups about the life of old people.B. The teacher raises several questions about old people andasks the students to discuss in pairs.C. The teacher presents a picture about the life of old people on the screen and brainstorm vocabulary related to old people?s life.8. What listening skill does the following activity help to train?Listen to the folio-wing text and answer the multiple-choice question.In this dialogue, the speakers are talking about________.A) going to a picnic B) attending a concert C) having a partyA. Listening for gist.B. Listening for specific information.C. Listening for detailed information.9. Which of the following features does spoken English have?A. It is generally produced in fairly simple sentence structures.B. It is produced with little redundancy.C. It is produced with good organization.10. What should a required lesson plan look like?A. a copy of explanation of words and structuresB. a timetable for activitiesC. transcribed procedure of classroom instruction11. For better classroom management, what should the teacher do while the students are doing activities?A. participate in a groupB. prepare for the next procedureC. circulate around the class to monitor, prompt and help12. Which of the following activities can best motivate junior learners?A. gamesB. recitationC. role-play of dialogues13. To cultivate communicative competence, what should correction focus on?A. linguistic formsB. communicative strategiesC. grammatical rules14. Which of the following activity is most productive?A. read the text and then choose the best answer to the questionsB. discuss on the given topic according to the text you have just readC. exchange and edit the writing of your partner15. To help students understand the structure of a text and sentence sequencing, we could use----- for students to rearrange the sentences in the right order.A. cohesive devicesB. a coherent textC. scrambled sentences16. The purpose of the outline------ is to enable the students to have a clear organization of ideas and a structure that can guide them .A. in the actual writingB. in free writingC. in controlled writing17. The grammar rules are often given first and explained to the students and then the students have to apply the rules to given situations. This approach is called .A. deductive grammar teachingB. inductive grammar teachingC. guiding discovery18. It is easier for students to remember new words if theyare designed in ------and if they are ------and again and again in situations and contexts.A. context, sameB. context, differentC. concept, difficulII. DefinitionDirections: Define the following terms1. Communicative compentence2. Lesson planning3. Classroom management4. Receptive vocabulary and productive vocabulary.5. Sight vocabulary6. Information-gap activities7. Display questions8. Task9. Audiolingual approach to language learning10.ReadingIII. Blank fillingDirections: fill in blanks according to what you?ve learn in the course of foreign language teaching.1. Socio-constructivist theory of language learning emphasizes interaction and engagement with the target language ina social context.2.The quality of a good language teacher includes ethic devotion, professional quality and personal styles.3.One influential idea of cognitive approach to language teaching is that students should be allowed to create their own sentence based on their own understanding of certain rules.4. The overall language ability required in the 2001 NationalEnglish Curriculum includes the following aspects language knowledge, language skills, learning strategies, affects and cultural understanding.5. The role of the teacher ---- contoller, assessor, organizer, prompter, participant, resource providerThe new curriculum requires the teacher to put on the following new roles: facilitator, guides, and researchers.6.The goal of teaching pronunciation should be: consistency, intelligibility, and communicative efficiency.7. Grammar practice is usually divided into two categories, mechanical practice and meaningful practice.8. As far as classroom procedures are concerned, the teaching of listening generally follows three stages: pre-listening stage, while-listening stage, and post-listening stage.IV. Problem SolvingDirections: Below are some situations in classroom instruction. Each has at least one problem. First, identify the problem(s). Second, provide your solution (s) according to what you have learned. You should elaborate on the problem(s) and solution(s) properly. Write your answer on the Answer Sheet.1．In one of the lessons. Mr. Li arranged the students into groups to talk about what they want to be when they grow up. To ensure that they applied what they learned, he required them to use the expressions in the text. To his surprise, students were not very active and some groups were talking about something else and one group was talking in Chinese.Problems:1) Maybe the topic does not correspond with the students? current needs. Suppose these students were interested only in getting high scores in examinations, they would not have interestin such a talk.2) The activity is much controlled. They may like to talk about their hobbies, but they have to use the expressions the teacher presents, which to some extent restricts them. That is perhaps why they are not very active.3) If students talk in Chinese, it may be because the talk is a little too demanding for them in terms of language competence. When students have difficulty in expressing themselves in English, they will switch to Chinese.4) Maybe the teacher does not arrange such activities very often in class. The students are not used to such communicative activities and so do not take an active part.Solutions:1）The teacher can ask the students to talk about their hobbies freely without considering the structure2) The teacher can give the task a real purpose. For example, he can ask the students to ask others about their hobbies to forma hobby club.3) It?s b etter to explain to the students the value of such kind of activity.4) The teacher can circulate around to encourage the students to talk in English.2. To cultivate communicative competence, Mr. Li chose some news reports from China Daily for his middle school students.Problems:1) Authentic materials are desirable in cultivation of communicative competence. But they should correspond to students" ability. News reports from China Daily are too difficult for middle school students.2) The content of news reports may not be relevant to the course requirement of middle school English.Solutions:1) If Mr. Li insists on using the materials from China Daily, it is necessary for him to adapt the material or select those reports which are easier to read and more relevant to students" interests.2) If he can, it is better to select news reports from other newspapers which are relevant to the students" life and study.It is necessary to bear in mind the students" needs when selecting materials for classroom instruction.(第一项要求写出两点即可，而第二项要求能说出两点。

Discovery of General Knowledge in Large Spatial Databases

Discovery of General Knowledge in Large Spatial DatabasesWei Lu&Jiawei Han†School of Computing ScienceSimon Fraser UniversityBurnaby,British Columbia,Canada V5A1S6andBeng Chin OoiDepartment of Information Systems and Computer ScienceNational University of SingaporeLower Kent Ridge,Singapore0511AbstractExtraction of interesting and general knowledge from large spatial databases is animportant task in the development of spatial data-and knowledge-base systems.In thispaper,we investigate knowledge discovery in spatial databases and develop ageneralization-based knowledge discovery mechanism which integrates attribute-orientedinduction on nonspatial data and spatial merge and generalization on spatial data.Thestudy shows that knowledge discovery has wide applications in spatial databases,andrelatively efﬁcient algorithms can be developed for discovery of general knowledge inlarge spatial databases.1.IntroductionSpatial reasoning using data and knowledge stored in large spatial databases is a crucial task in the development of geographical information systems,medical imaging and robotics systems.Because of the huge amount(usually,tera-bytes)of spatial data obtained from satellites,video cameras,medical equipments,etc.,it is costly and often unrealistic for users to examine the spatial data in detail and extract interesting knowledge or general characteristics from spatial databases.This motivates the study and development of knowledge discovery mechanisms for large spatial databases.Knowledge discovery in spatial databases is the extraction of interesting spatial patterns and features,general relationships between spatial and nonspatial data,and other general data characteristics not explicitly stored in spatial databases.Such discovery may play an important role at understanding spatial data,capturing intrinsic relationships between spatial and nonspatial data,presenting data regularity in a concise manner,and reorganizing spatial databases to accommodate data semantics and achieve high performance.†The work was supported in part by the Natural Sciences and Engineering Research Council of Canada under Grant A-3723and a research grant from Centre for System Science of Simon Fraser University.There are different philosophical considerations on knowledge discovery in databases[7,16],which may lead to different methodologies in the development of knowledge discovery techniques.First,we assume that A spatial DB stores a large amount,information-rich,relatively reliable and stable data.Furthermore,the following assumptions are made as theﬁrst step in the development of mechanisms for knowledge discovery in spatial DBs.Assumption1.A knowledge discovery process is initiated by a user’s learning request.Idealistically,one may expect that a knowledge discovery system will perform interesting discovery autonomously without human interaction.However,since learning can be performed in many different ways on any subset of data in the database,huge amount of knowledge may be generated from even a medium size database by unguided, autonomous discovery,whereas much of the discovered knowledge could be out of user’s interests.In contrast,a command-driven discovery may lead to the discovery of what one wants to discover and therefore represents relatively constrained search for the desired knowledge.Thus,command-driven discovery is adopted in this study. Assumption2.Background knowledge is available for knowledge discovery process.Discovery may be performed with the assistance of relatively strong background knowledge(such as conceptual hierarchy information,etc.)or with little support of background knowledge.Obviously,the discovery of conceptual hierarchy information itself can be treated as a part of knowledge discovery process.However,the availability of relatively strong background knowledge not only improves the efﬁciency of the discovery process but also expresses user’s preference for guided generalization,which may lead to efﬁcient and desirable generalization process.Following these assumptions,our mechanism for knowledge discovery in spatial DB adopts a learning-from-examples approach which treats the task-relevant data as examples for learning processes and relies mainly on the generalization process.There have been many studies on machine learning[5,6]and some recent studies on knowledge discovery in large databases[3,7,9,10,12,16].These studies set up the foundation for knowledge discovery in spatial databases.Recently,an attribute-oriented approach has been developed for discovery of different kinds of knowledge rules in relational databases[9].Moreover,a multi-resolution relational data model has been developed[13]for performance improvement in image database applications.Studies on data abstraction in spatial databases,such as spatial data abstraction using picture indexing and feature clustering[4]and geometric abstraction[2],are closely related to knowledge discovery in spatial databases.In this study,the attribute-oriented induction technique is extended to knowledge discovery in spatial databases.Two kinds of concept hierarchies,thematic concept hierarchies and spatial hierarchies,are constructed for the learning process.Induction can be performed by ascending these hierarchies and summarizing general relationships between spatial and nonspatial attributes at a high concept level.The method candiscover interesting interrelationships between spatial and nonspatial data and can be applied to analyzing correlations between different spatial features based on different thematic maps.The paper is organized as follows.Section2presents spatial learning primitives, which include spatial data representations,spatial hierarchies,and expected representation of learning results.Section3presents an algorithm for nonspatial-data-dominated spatial learning.Section4presents an algorithm for spatial-data-dominated spatial learning.Section5discusses the extension of the two algorithms to interleaved generalization and other related issues,and section6summarizes the study.2.Primitives for Knowledge Discovery in Spatial DatabasesThere are different philosophies for knowledge discovery in spatial databases based on different kinds of databases and different kinds of rules to be extracted from databases.To conﬁne our study to a well-deﬁned domain,the following assumptions are made in this study.First,we assume that the rules to be extracted are general data characteristics and/or relationships,and the learning process is triggered by a learning request(or query)explicitly.Secondly,we assume that a spatial database consists of both spatial and nonspatial data,while the former is relational and stored in a relational database;and the latter is two-dimensional and is stored in spatial data structures. Spatial objects and their associated nonspatial information are linked to each other as in the SAND architecture[1].There are different representations for spatial data.In many applications,spatial information is stored as thematic maps.Each map contains speciﬁc features of spatial objects,e.g.,forest type and coverage.There are two representations of thematic maps:raster and vector.(1)In a raster image,an attribute value is associated with each pixel.For example,ageomorphological map may have its height coded in color(or grey level).(2)In a vector representation,an object is speciﬁed by its geometry,such as theboundary representation,and its associated thematic attributes.For example,a lake is speciﬁed by a sequence of points sampled at the boundary and the elevation value.These two types of data can be represented as a set of spatially ordered objects, each of which has its spatial and nonspatial components.In the following discussion,aspatial object obji is assumed to be denoted as<geoi,attrii>.For example,each pixelin raster data is represented by<(x,y),intensity>,where(x,y)is the spatial location and intensity the nonspatial attribute value.An important aspect of learning from data is to cluster data into groups with similar characteristics.Different from relational data clustering,which is usually based on the concept hierarchy of each single attribute,spatial data clustering is two dimensional. Spatial aggregation may be obtained by constructing spatial hierarchy or consolidating neighboring spatial objects.Quad-tree and R-tree are typical spatial hierarchical structures[8,14],where the former is frequently used for raster data,whereas the latterfor vector data.Some spatial functions,such as adjacent_to ,are useful for clustering neighboring spatial objects.In a stable environment,spatial hierarchies and adjacency relationships can often be computed and stored for efﬁcient data retrieval and knowledge discovery [8,11,14].In order to represent general characteristics at a high concept level,attribute concept hierarchies should be provided by domain experts or constructed automatically or semi-automatically by data statistical analysis [15].In our algorithms,an attribute hierarchy is represented by a function c_parent (attri_val )which returns a parent (high-level)concept for a given attribute value.A spatial hierarchy may be represented by two functions,s_parent (obj )which returns the parent node of the object obj ,and s_children (obj )which returns the set of all of the children nodes of obj .Semantic concepts in a concept hierarchy satisfy upward consistency.A high-level concept represents information which is more general than but consistent with the lower-level concepts.For example,Figure 1represents an agriculture hierarchy.The region which grows corn,wheat,rice ,etc.can be generalized to a grain-production area according to the hierarchy.Many high-level concepts for numerical values can be represented by their summary data and served as generalized concepts.For instance,precipitation measurement between 2.0and 5.0inches can be either represented by its rainfall range or generalized to "wet",etc.Discovered knowledge should be concise,informative,and be represented by high-level concepts with a small number of disjuncts (with each representing one case in the generalized rule).A generalization threshold,which represents the expected maximum number of disjuncts in the generalized rule,or a desired concept level can be used in the generalization process.coffee tea ...ﬂax cotton ...pearapple ...lettuce broccoli ...wheat ...corn beverage fabricfruit vegetable grain plants foodagricultureFigure 1.An agriculture hierarchy.To make a knowledge discovery process focus on a set of interested data and extract desired knowledge,a learning request should be used to trigger the discovery process.Similar to DBLEARN [9],a learning request can be speciﬁed in the syntax similar to SQL.One such example is presented here.Example-1.Given a large set of climate data (monthly mean temperature and monthly precipitation)obtained from over 500weather stations scattered in British Columbia(B.C.),our task is to ﬁnd general weather pattern related to different areas in B.C.in thesummer of1990.There are over18,000pieces of data records per year.It is impossible toﬁnd general weather pattern by simple data retrieval.A generalization process can be initiated by the following query.extract characteristic rulefrom precipitation-map,temperature-mapwhere province="B.C."and period="summer"and year=1990in relevance to region and precipitation and temperatureNotice that precipitation and temperature are thematic data related to thematic maps,and summer is a general concept(higher than month)in a concept hierarchy period.In general,learning requests provide the following primitives for knowledge discovery:the set of relevant data,concept hierarchies,desired rule forms and the learning request.Two learning algorithms are introduced based on the availability of those primitives:one is(nonspatial)attribute-oriented induction,which performs the generalization on nonspatial dataﬁrst;whereas the other is spatial hierarchy directed induction,which performs generalization on spatial dataﬁrst.3.Nonspatial-Data-Dominated GeneralizationA spatial database stores both spatial data and their associated nonspatial data. Spatial data is often obtained by preprocessing image data and is stored in high-resolution with large volumes.To extract general knowledge from spatial databases, generalization usually needs to be performed on both spatial and nonspatial data.When one of the components is generalized,the other component will be adjusted accordingly. Based on which component,spatial or nonspatial,to be generalizedﬁrst,different algorithms,nonspatial-data-dominated generalization vs.spatial-data-dominated generalization,can be derived for different applications.The high-level precipitation concepts and the concept hierarchy for season periods are provided in Table1and Figure 2respectively.Table1.High-level precipitation conceptsvery dry(v.d.)dry(d.)moderately dry(m.d.)fair(f.)moderately wet(m.w.)wet(w)very wet(v.w.) [0,0.1](0.1,0.3](0.3,1.0](1.0,1.2](1.2,2.0](2.0,5.0] 5.0&upSuppose that the spatial database stores a map of British Columbia with a set of weather stations scattered around the provinces as shown in Figure3,where the sample stations: D.C.,Kam.,Nan.,Pen.,P.G.,P.R.,Van.and Vic.,are abbreviations for Dawson Creek,Kamloops,Nanaimo,Penticton,Prince George,Prince Rupert, Vancouver and Victoria,respectively.Climate data is collected from these weather stations.The data contains average monthly precipitation and minimum,maximum,and average temperatures for each regional station.Table2shows sample precipitation data.yearwinter autumn springNov.Oct.Sept.Aug.July June May Apr.Mar.Dec.Feb.Jan.Figure 2.A year-season-month hierarchy.Nan. D.C.Vic.Van.Pen.Kam.P.R.P.G.Figure 3.A map of British Columbia.Table 2.Sample precipitation data (in inch)of 1990. city Jan Feb Mar Apr May June July Aug Sept Oct Nov Dec year total Nanaimo6.37 4.36 3.99 2.50 1.47 1.550.91 1.01 1.73 4.19 6.067.1141.25Vancouver8.6 6.1 5.3 3.3 3.0 2.7 1.3 1.7 4.1 5.910.07.859.8Victoria 11.129.74 5.15 2.68 2.51 1.070.42 2.420.95 2.69 2.64 4.3645.75Prince Rupert 9.87.68.4 6.7 5.3 4.1 4.7 5.27.712.212.311.395.16...... Example-2.Given the above information,the query is to report general precipitation pattern zones in spring 1990,which is represented as below.extract regionfrom precipitation-mapwhere province ="B.C."and period ="spring"and year =1990in relevance to precipitation and region.The learning process can be provided by generalization on nonspatial attribute precipitation ﬁrst,which consists of the following steps.(1)Collect related nonspatial data .The execution of the SQL query on nonspatial dataextracts the precipitation records relevant to the province,months and year.Noticethat"period=spring"is a piece of generalized data,which is decomposed into "month=March or month=April or month=May"by consulting the generalization hierarchy.(2)Perform attribute-oriented generalization on collected nonspatial data.Thismerges the three months into"spring"and generalizes precipitation attribute values by averaging the precipitation values of the three months.A portion of the table is shown in Table3.Since the average precipitation value in the nonspatial table contains many distinct values and do not reach a desired concept level,further generalization needs to be performed on the nonspatial data.In this case,the average precipitation value is generalized to an even higher level,such as"wet", "very wet",etc.by consulting the concept hierarchy of precipitation.During the generalization and merge of identical nonspatial tuples,spatial object pointers are collected in the generalized nonspatial data entry.(3)Perform spatial generalization.When the nonspatial data is generalized to thedesired level or to a small number of disjuncts,neighboring areas with the same high-level attribute values can be merged together based on a spatial function adjacent_to.In order to generalize and merge spatial objects into a small number of regions,it is often necessary to perform approximation.Within a spatial region, if there is only a small portion of the area carrying some attribute values different from that of the majority portion of the area,the small portion can be omitted in the high level description.For example,if poplars occupy only3%of the area in a pine forest,the generalized description may ignore this small portion of poplars and generalize the area to pine forest(with97%certainty).In general,the learning process described above can be summarized in the following nonspatial-data-dominated generalization algorithm which generalizes nonspatial data using concept hierarchies and then merges corresponding spatial objects accordingly.The judgement of whether the current generalization on nonspatial data is sufﬁcient can be based either on the number of generalized tuples in the generalized relation(which can be speciﬁed as a generalization threshold)or on an appropriate concept level,which can be speciﬁed by users or experts explicitly.To demonstrate the learning process,our example data is gathered from three Georgia Strait regions,Vancouver,Victoria and Nanamio.The relevant precipitation data are collected in the columns2to4in Table3.Average monthly rainfall for each region is computed in column5.It is then generalized to the last column using the high-level precipitation concept provided in Table3.Since the three neighboring regions carry the same generalized precipitation attribute value"wet",the three regions are merged into one,which can be assigned to a meaningful geographic name,such as"Georgia Strait"by a user or an expert.The remaining generalization processes are similar to the above.The learning result is reported in a tabular form as shown in Table4.Figure4shows the learning result of precipitation in spring1990for the whole province.Table 3.The relevant precipitation data of the regions and its generalization.city Mar Apr May Avg high-level concept Nanaimo 3.992.50 1.47 2.85wet Vancouver 5.33.3 3.04.1wetVictoria 5.15 2.68 2.51 3.43wetm.d.(VII)v.d.(VI)m.d.(V) f.(IV)m.w.(III)v.w.(II)w.(I)Figure 4.A sample B.C.spring precipitation diagram.The learning process is summarized into the following algorithm,which generalizes nonspatial data attributes by concept hierarchies ascension and consolidation of adjacent spatial objects with similar attribute values.Generalization terminates when the generalized concept level reaches the desired concept level,or when the number of disjuncts is within a prespeciﬁed threshold.Table 4.General Precipitation Information.Region Rainfall Georgia Strait (I)wetCoastal (II)very wetOkanagan-Thompson (III)moderately wetColumbia-Kootenay (IV)fairCentral Interior (V)moderately dry Peace-Liard (VI)very dryNorthern Interior (VII)moderately dryAlgorithm-1Nonspatial-data-dominated generalization.Input.(i)A spatial database consisting of a set of nonspatial data and a spatial map,(ii)a learning request which indicates particular interested set of data and the desired threshold (or concept level),and (iii)a set of concept hierarchies.Output.A rule which characterizes the general properties and/or relationships of spatial objects.Method.(1)Collect the set of task-relevant nonspatial data by an SQL query.(2)Perform attribute-oriented induction repeatedly on the collected nonspatial data by(i)concept hierarchy ascension,(ii)attribute removal,and(iii)merge of identicaltuples until the number of tuples is within the generalization threshold,or until every attribute has been generalized to a desired concept level.The spatial object pointers are collected as a set of pointers and put into the generalized nonspatial data entry during the merge of identical tuples.(3)Generalize the spatial data:for every generalized nonspatial tuple,follow theirspatial pointers to retrieve the spatial objects,and perform spatial merge and approximation until the resulting set of generalized spatial objects are reduced to a small set.(4)Output the generalized rule or the relationship between the generalized nonspatialand spatial data.Theorem-1.The complexity of Algorithm-1is O(N log N).Proof.Given a database with nonspatial components of N spatial objects,the retrieval of one component takes O(log N).The worst case for Step1,the retrieval of relevant data,may take O(N logN).Step2,nonspatial attribute generalization takes O(N log N) [9].The retrieval of relevant spatial objects using the set of pointers obtained from nonspatial data generalization takes at most O(N).With the availability of spatial indices,the adjacent objects of a given object can be found in time O(log N).Since the maximum number of spatial merge is N,Step3takes O(N logN).Thus,the overall complexity of the algorithm is O(N logN).4.Spatial-Data-Dominated GeneralizationIn some applications,generalization may also be performedﬁrst on spatial data based on spatial hierarchical information,which involves partitioning regions stored in spatial data structures,generalizing spatial data to a certain level,then generalizing their corresponding nonspatial components,and merging/grouping the generalized concepts to derive general and concise relationships between nonspatial and spatial data at a high level.Spatial-data-dominated generalization relies on spatial generalization hierarchies which can be obtained based on(i)the semantics of spatial data,e.g.hierarchical administration regions:county,city,province,etc.;(ii)clustering of spatial objects,e.g. based on densely clustered spatial objects;and(iii)spatial indexing structures,such as R-trees,Quad-trees,etc.We examine one example.Example-3.Given regional temperature data and high-level concept of temperature (Table5),the learning task is toﬁnd general temperature information in prespeciﬁed administration regions for summer1990.The learning request can be written in anSQL-like query as follows.extract characteristic rulefrom temperature-mapwhere province ="B.C."and period ="summer"and year =1990in relevance to region and temperature.Table 5.High-level temperature concepts.very cold cold moderately cold mild moderately hot hot very hot −5&below [−5,10)[10,32)[32,50)[50,70)[70,90)90&upThe major learning steps are as follows.(1)Collect task-relevant data by an SQL query (on nonspatial data)and correspondingspatial data retrieval.(2)Generalize spatial database by clustering spatial data objects according to theirregions and merge the corresponding nonspatial pointers until it reaches the desired concept level or the number of generalized spatial objects is within the threshold.(3)For each region,perform generalization on non-spatial objects (e.g.taking averageor mean or numerical values)until a small number of concepts which subsume all of the concepts existing in each subregion.To illustrate the spatial-oriented learning process,we examine Figure 5,the south-central region of the province.Kam. A.L.McL.C.C.K.L.S.B.M.C.Lum.Kelo.Merr.S.L.Harr.Prin.Hope Pen.Vern.Figure 5.South-Central region of British Columbia.Table 6shows the relevant temperature data for that region in summer 1990.The spatial hierarchy is shown by the grid in Figure 5.In the spatial generalization,objects in each quadrant are ﬁrst merged according to the ﬁrst level of the spatial hierarchy,which are in turn merged according to higher level spatial hierarchies.The average of the temperatures in these regions can be computed,which can be in turn generalized to its corresponding high-level concept,such as moderately hot ,etc.The learning result is shown in Figure 6and is mapped to Table 7.Table 6.Relevant temperature data for the south-central region.city June July Aug AvgAdams Lake (A.L.)62676464.3Criss Creek (C.C.)68656465.7Harrison (Harr.)60646362.3Hope 61656563.3Kamloops (Kam.)64706867.4Kelowna (Kelo.)61666463.3Kelly Lake (K.L.)57616159.7Lumby (Lum.)55626159.3McLure (McL.)57636260.7Merritt (Merr.)57626160Mont Creek (M.C.)55635858.7Penticton (Pen.)63746768Princeton (Prin.)58646261.3Spences Bridge (S.B.)57616059.3Summerland (Sum.)64706867.3Vernon (Vern.)57656261.3 Notice that averaging nonspatial data may not be the most desirable way to generalizing nonspatial data in many cases since averaging may hide exceptions or smooth data excessively.Actually,when the generalized values (such as temperatures)present signiﬁcantly different values,generalization could return a small number of disjuncts (possibly associated with statistical information in each disjunct).In this case,generalization on the nonspatial data can be performed by clustering and generalizing only those nonspatial components which carry similar data values,as that has been done in attribute-oriented induction of nonspatial data.v.h.hm.h.m.m.m.c.m.m.h.m.Figure 6.A sample of B.C.summer temperature diagram.The example can be summarized into a spatial-oriented learning algorithm which utilizes the spatial hierarchy to obtain generalized objects.The generalized attribute value of the new object is obtained by climbing up the attribute concept hierarchy to ﬁnd a minimal concept which subsumes the attribute values of the corresponding sub-objects.In the case when the map attribute is numeric,the new attribute value can be determined by weighted average.For example,when the attribute is precipitation,the precipitation of a large region can be computed from the precipitation of its sub-regions weighted by the areas of the region.This method can also be applied to generating multi-resolution images.Table7.Generalized temperature information.Region TemperatureNorth-West mildNorth-Central moderately coldNorth-East mildMid-West mildCentral moderately hotMid-East hotSouth-West mildSouth-Central mildSouth-East very hotAlgorithm-2Spatial-data-dominated generalization.Input.(i)A spatial database consisting of a set of nonspatial data and a spatial map,(ii) a spatial hierarchy,(iii)a learning request which indicates the interested set of data and the desired threshold or concept level,and(iv)a set of concept hierarchies.Output.A rule which characterizes the general properties and/or relationships of spatial objects.Method.(1)Collect the set of task-relevant spatial data by an SQL query.(2)Perform spatial-oriented induction on the collected spatial data by spatial hierarchyascension to create high-level spatial objects until either the number of spatial objects is within the generalization threshold or the generalized concepts reach the desired generalization level.Nonspatial data entry pointers of each generalized spatial object are collected during the generalization.(3)Retrieve nonspatial data using the nonspatial data pointers and generalize nonspatialdata for each spatial object using the attribute-oriented approach[9].(4)Output the generalized rule or the discovered relationship.Theorem-2.The complexity of Algorithm-2is O(N log N).Proof.Given N objects in the database,Step1,the retrieval of related spatial data objects using spatial hierarchy,takes O(N logN).The maximum number of merges for N spatial objects is O(N).Step3is also O(N logN).Therefore,the overall complexity is O(N log N).。

关于书店的英语作文

Bookstores are a treasure trove of knowledge and a haven for book lovers.They provide a space where individuals can escape into the world of literature,learn new things,and find inspiration.Heres a detailed look at what makes bookstores such an essential part of our culture and society.The Ambiance of BookstoresThe atmosphere in a bookstore is unique.The scent of paper and ink,the quiet rustling of pages,and the soft murmur of readers create a calming environment that is conducive to reading and reflection.The layout of a bookstore is often designed to encourage browsing, with shelves arranged in a way that guides customers through different genres and sections.The Variety of BooksBookstores offer an incredible variety of books,catering to all tastes and interests.From classic literature to the latest bestsellers,from textbooks to graphic novels,theres something for everyone.This diversity is not limited to fiction bookstores also stock a wide range of nonfiction,including biographies,history books,selfhelp guides,and academic texts.The Role of Bookstores in EducationBookstores play a crucial role in education.They provide students with access to textbooks and supplementary materials,helping them succeed in their studies.For lifelong learners,bookstores are a source of knowledge and selfimprovement,offering books on a wide range of subjects that can be explored at ones own pace. Community SpacesMany bookstores serve as community spaces,hosting events such as book signings, author talks,and reading groups.These events not only promote literacy and a love for reading but also foster a sense of community among book enthusiasts.They provide a platform for authors to connect with their readers and for readers to engage in discussions about literature.The Future of BookstoresIn the digital age,the future of bookstores has been a topic of debate.With the rise of ebooks and online shopping,some have questioned the relevance of physical bookstores.However,many bookstores have adapted by offering a more personalized shopping experience,focusing on customer service,and creating a welcoming environment that cannot be replicated online.The Importance of Supporting Local BookstoresSupporting local bookstores is important for several reasons.They contribute to the local economy,provide jobs,and help maintain the cultural fabric of a community.Local bookstores often carry books that reflect the interests and history of the community, making them a valuable resource for understanding and appreciating local culture. ConclusionIn conclusion,bookstores are more than just places to buy books they are cultural institutions that enrich our lives in many ways.They provide a space for learning, inspiration,and community engagement.Whether youre a student looking for textbooks, a parent searching for childrens books,or a reader in search of your next favorite novel,a bookstore is a place where you can find what youre looking for and discover something new along the way.。

科学技术的英语作文

Science and technology are the driving forces of modern society,shaping the way we live,work,and interact with each other.Here are some key points to consider when writing an essay on this topic:1.Introduction to Science and Technology:Begin by defining what science and technology are and their significance in contemporary life.Mention how they have transformed various aspects of society,from communication to healthcare.2.Historical Development:Discuss the evolution of science and technology over time. Highlight key inventions and discoveries that have had a profound impact on human civilization,such as the printing press,the steam engine,electricity,the internet,and artificial intelligence.3.Impact on Daily Life:Elaborate on how science and technology have made everyday tasks more efficient and convenient.For instance,smartphones have become an integral part of our lives,allowing us to communicate,access information,and perform various tasks with ease.4.Advancements in Medicine:Describe the role of science and technology in medical advancements.Mention how they have contributed to the development of new treatments, diagnostic tools,and the understanding of diseases,leading to improved health outcomes.5.Environmental Impact:Address the dual role of technology in environmental issues. On one hand,it has contributed to environmental degradation through pollution and resource depletion.On the other hand,it has also provided solutions for sustainable development,such as renewable energy sources and ecofriendly technologies.cation and Knowledge Dissemination:Discuss how technology has revolutionized education.The internet and digital platforms have made knowledge more accessible, allowing people to learn and collaborate across geographical boundaries.7.Economic Implications:Explore the impact of science and technology on the economy. They have created new industries,job opportunities,and have also led to the automation of certain jobs,which has economic implications for the workforce.8.Challenges and Ethical Considerations:Discuss the challenges posed by rapid technological advancements,such as privacy concerns,cybersecurity threats,and the digital divide.Also,touch upon ethical considerations in areas like genetic engineering and artificial intelligence.9.Future Prospects:Contemplate the future of science and technology.Predict how they might continue to evolve and the potential benefits and challenges that may arise.10.Conclusion:Summarize the main points of your essay,emphasizing the importance ofa balanced approach to scientific and technological development.Encourage the responsible use of technology and the pursuit of innovation that benefits all of humanity. Remember to use clear and concise language,provide examples to support your points, and maintain a logical flow throughout your essay.。

活学活用是掌握知识的标志英语作文

活学活用是掌握知识的标志英语作文英文回答：Proficient use of knowledge is a hallmark of mastery in any field of study. To truly comprehend and retain information, it is not enough to simply memorize facts and concepts. Instead, individuals must engage in active and meaningful application of their understanding.One key aspect of proficient knowledge use is theability to apply concepts and theories to real-world situations. This involves being able to identify relevant knowledge, analyze complex issues, and develop innovative solutions. By engaging in this type of critical thinking, individuals demonstrate a deep comprehension of the subject matter and the ability to transfer their knowledge to various contexts.Another crucial aspect is the capacity for knowledge integration. Proficient knowledge users can connect ideasfrom multiple sources and perspectives, thereby expanding their understanding and broadening their perspectives. This involves recognizing relationships, identifying patterns, and synthesizing information to create a more comprehensive and nuanced view of the subject at hand.Furthermore, effective knowledge use requires a strong foundation in communication skills. Individuals must be able to articulate their ideas clearly and persuasively, both in written and oral form. This involves being able to present complex information in a logical and accessible manner, persuading others of their views, and engaging in meaningful discussions.In addition to these cognitive skills, proficient knowledge use also entails a range of metacognitive abilities. These include the ability to reflect on one's own learning, identify areas for improvement, and develop strategies for continuous growth. By engaging in metacognitive processes, individuals become more self-aware and better equipped to enhance their knowledge and skills.Overall, proficient use of knowledge is the culmination of a multifaceted set of cognitive, metacognitive, and communication skills. It reflects a deep understanding of the subject matter, the ability to apply knowledge to real-world situations, the capacity for knowledge integration, and the proficiency in communication and metacognition. Individuals who possess these qualities are well-equipped to succeed in their chosen fields and make meaningful contributions to society.中文回答：熟练运用知识是掌握知识的标志。

大学商务英语试题及答案

大学商务英语试题及答案一、选择题（每题2分，共20分）1. The term "B2B" refers to:A. Business to BusinessB. Business to ConsumerC. Consumer to ConsumerD. None of the above答案：A2. Which of the following is not a function of a marketing mix?A. ProductB. PlaceC. PriceD. Profit答案：D3. In international trade, a Letter of Credit (L/C) is used to:A. Secure paymentB. Reduce costsC. Increase salesD. Promote products答案：A4. What is the meaning of "FOB" in shipping terms?A. Free on BoardB. Freight on BoardC. Full of BeliefD. First of Business答案：A5. The process of negotiating a contract is known as:A. ContractingB. ContractualizationC. Contracting outD. None of the above答案：B6. Which of the following is a type of business agreement?A. Memorandum of Understanding (MOU)B. Memorandum of Disagreement (MOD)C. Memorandum of Non-Compliance (MNC)D. Memorandum of Agreement (MOA)答案：A7. The acronym "ROI" stands for:A. Return on InvestmentB. Risk of InvestmentC. Rate of InterestD. Regulation of Industry答案：A8. What is the role of a "wholesaler" in the supply chain?A. They sell products directly to consumersB. They buy products in bulk and sell them to retailersC. They manufacture productsD. They provide raw materials for production答案：B9. The term "SWOT analysis" refers to:A. Strengths, Weaknesses, Opportunities, ThreatsB. Skills, Workforce, Opportunities, TechnologyC. Systems, Workflows, Operations, TechnologyD. None of the above答案：A10. In business, "due diligence" is the process of:A. Paying taxesB. Conducting a thorough investigationC. Preparing financial statementsD. Hiring new employees答案：B二、填空题（每空1分，共10分）1. The four Ps of marketing are Product, Price, ______, and Promotion.答案：Place2. A ______ is a document that outlines the terms andconditions of a business agreement.答案：Contract3. The acronym "BOPIS" stands for Buy Online, Pick up In-Store, which is a strategy used by retailers to enhance the ______ experience.答案：Customer4. In business, a "________" is a situation where a company has more liabilities than assets.答案：Insolvency5. The process of identifying and analyzing the target market is known as ______.答案：Market Segmentation6. A "________" is a financial statement that provides a snapshot of a company's financial position at a specific point in time.答案：Balance Sheet7. The term "________" refers to the process of identifying, anticipating, and assessing risks in a business.答案：Risk Management8. A "________" is a type of financial instrument that represents ownership in a company.答案：Stock9. The "________" is a measure of the value of goods and services produced by an economy over a specific time period.答案：Gross Domestic Product (GDP)10. A "________" is a form of indirect taxation levied on the sale of goods and services.答案：Value Added Tax (VAT)三、简答题（每题5分，共20分）1. What are the key elements of a business plan?答案：A business plan typically includes an executive summary, company description, market analysis, organization and management structure, product or service line, marketing and sales strategy, funding request, and financial projections.2. Explain the concept of "branding" in business.答案：Branding is the process of creating a unique name, symbol, or design that identifies and differentiates a company's products or services from those of other competitors. It aims to create a lasting impression in the minds of consumers and establish a strong brand identity.3. What is the purpose of a SWOT analysis in business strategy?答案：A SWOT analysis helps a business to identify its Strengths, Weaknesses, Opportunities, and Threats. It is used to assess the internal and external factors that can。

博士入学英语试题及答案

博士入学英语试题及答案一、阅读理解（共20分，每题4分）阅读下面的文章，然后回答1-5题。

The Impact of Technology on EducationThe rapid development of technology has greatly influencedthe field of education. It has brought about a significant change in the way educators teach and students learn. Withthe advent of the internet, online learning platforms have become increasingly popular, allowing students to access educational resources from anywhere and at any time.1. What is the main topic of the passage?A. The history of technology in education.B. The influence of technology on education.C. The advantages of online learning.D. The future of education with technology.2. According to the passage, what has technology done to education?A. It has made education more traditional.B. It has limited access to educational resources.C. It has changed the teaching and learning methods.D. It has reduced the popularity of online learning platforms.3. What is the role of the internet in education as mentioned in the passage?A. It has replaced traditional classroom teaching.B. It has made educational resources less accessible.C. It has facilitated access to educational resources.D. It has hindered the development of technology in education.4. What can students do with online learning platforms?A. They can only access resources at specific times.B. They can access educational resources from anywhere.C. They can only learn from traditional textbooks.D. They are restricted to learning within a classroom setting.5. What is the overall tone of the passage?A. Critical.B. Optimistic.C. Neutral.D. Pessimistic.答案：1-5 B C C B B二、完形填空（共15分，每题1.5分）阅读下面的短文，从短文后各题所给的四个选项中，选出可以填入空白处的最佳选项。

信息时代下的知识管理与获取英语作文范文

信息时代下的知识管理与获取英语作文范文In the information age, knowledge management and acquisition have become increasingly important in our lives. With the rapid development of technology and the internet, we have access to an overwhelming amount of information at our fingertips. However, it also presents a challenge in terms of managing and utilizing this wealth of knowledge effectively.Knowledge management involves the systematic management of information and knowledge within an organization or individual. It encompasses processes such as knowledge creation, organization, sharing, and utilization. In today's digital age, there are various tools and technologies available to facilitate knowledge management, such as content management systems, knowledge bases, and collaborative platforms.One of the key aspects of knowledge management is knowledge acquisition. In the past, acquiring knowledge often involved attending lectures, reading books, or conducting research in libraries. However, with the advent of the internet, knowledge acquisition has become much more convenient and accessible. Online courses, academic databases, and educational websites have made it easier for people to acquire new knowledge and skills from the comfort of theirown homes.In addition to formal education and training, informal learning has also become an integral part of knowledge acquisition in the information age. Social media, online forums, and discussion groups provide platforms forindividuals to engage in knowledge-sharing and collaborative learning. This has democratized the process of knowledge acquisition, allowing people from diverse backgrounds tolearn from each other.However, the abundance of information available online also poses a challenge in terms of discerning credible sources and managing information overload. With the proliferation of fake news and misinformation, it is important to develop critical thinking skills and information literacy to accurately evaluate the reliability of sources.Furthermore, knowledge management and acquisition are not limited to individuals or organizations. Governments and policy-makers also play a crucial role in managing knowledge at a societal level. This includes promoting access to education, supporting research and development, and ensuring the dissemination of accurate and reliable information to the public.In conclusion, the information age has revolutionized the way we manage and acquire knowledge. While technology has made knowledge more accessible than ever before, it also presents challenges in terms of discerning credible sourcesand managing information overload. Effective knowledge management and acquisition require a combination of technological tools, critical thinking skills, and a commitment to lifelong learning. As we continue to navigate the complexities of the digital era, it is essential to adapt our approaches to knowledge management and acquisition to meet the evolving needs of our society.。

有人说知识的岛屿越大范文

有人说知识的岛屿越大范文英文回答：The size of the island of knowledge is a fascinating topic to explore. In my opinion, the island of knowledge can indeed become larger as we acquire more knowledge. This is because knowledge is not limited to a specific field or subject. It is like an ocean with countless islands waiting to be discovered.For instance, let's imagine that I am standing on the island of mathematics. From here, I can see the island of literature, the island of history, the island of science, and so on. Each island represents a different field of knowledge. As I explore each island, I gain more knowledge and expand the size of my own island of knowledge.Furthermore, the island of knowledge can also grow through the connections between different fields. Just like bridges connecting islands, interdisciplinary studies allowus to combine knowledge from various domains. This not only expands our understanding but also enables us to make new discoveries and innovations.To illustrate this, let's say I am on the island of psychology. By studying the principles of psychology and then connecting it with the island of marketing, I can develop a deeper understanding of consumer behavior. This knowledge can then be applied to create more effective marketing strategies.In addition, the island of knowledge can also grow through the collective efforts of individuals. When people share their knowledge and insights with others, it creates a ripple effect that expands the boundaries of the island. This is why education and collaboration are crucial in expanding the island of knowledge.To sum up, the size of the island of knowledge can indeed become larger as we acquire more knowledge. Through exploration, interdisciplinary studies, and collaboration, we can continuously expand our understanding and make newdiscoveries. The island of knowledge is vast and ever-growing, waiting to be explored.中文回答：知识的岛屿的范围是一个令人着迷的话题。

The Organization of Informationmooc课后章节答案期末考试题库

The Organization of Information（信息组织）_南京大学中国大学mooc 课后章节答案期末考试题库2023年1.When handling resource heterogeneity, the best way to prevent problems ofscope and scale is through standardization.参考答案:正确2.The amount of resource description is always shaped by the currentlyavailable technology for capturing, storing, and making use of it.参考答案:正确3.All resources in a collection require the same degree of description.参考答案:错误4.Personal and cultural categories and organizing systems are highly biased.And creatinginstitutional categories using more systematic processes canprevent them from being biased.参考答案:错误5. A category is a group, collection, category, or set sharing characteristics orattributes.参考答案:错误6.The organizing principles of organizing systems depend on ____.参考答案:The types of domains being organized_The types of resources_Thepersonal, social, or institutional setting7.We can unpack the degree of organization into three dimensions, including______.参考答案:The overall extent to which interactions in and between organizing systems are shaped by resource description and arrangement._The amount of organization of resources into classes or categories._The amount of description detail or organization applied to each resource.8.Which of the following is the fundamental interaction in any collection ofresources?参考答案:Access9.Which of the following “Category Categories” has flexible boundaries?参考答案:Cultural categories10.Wisdom is the ability to solve problems. It is not unique to human beings.参考答案:错误11.The organization principles and ways are same in different fields.参考答案:错误12.Which description about “Resource” below is NOT true?参考答案:A resource must be a physical thing.13.Which of the following is the bottom layer of the DIKW model?参考答案:Data14.Which of the following can NOT be thought of as an organizing system?参考答案:The piles of debris left after a tornado15.What is the ultimate purpose of organizing?参考答案:Creating capabilities16.The expected lifetime of the organizing system is the same as the expectedlifetime of the resources it contains.参考答案:错误17.Big data collections are often large, so their scale is their most importantchallenge from an organizing system perspective.参考答案:错误18.Which of the following is the central discipline of Knowledge Organization inits narrow sense?参考答案:Library and Information Science19.Which of the following is NOT a benefit of Knowledge Organization?参考答案:Focus on the latest software program20.Which description about “knowledge” is NOT true?参考答案:Tacit knowledge is much more easily shared than Explicit knowledge.21.How many key actions should we focus on when create a knowledgemanagement plan?参考答案:622._____ and _____ are two main types of knowledge.参考答案:Tacit knowledge_Explicit knowledge23.Which of the following ways can be used to test information integrity on theInternet?参考答案:Credibility_Authorship_Objectivity_Timeliness24.Knowledgebases attempt to capture almost every imaginable Tacitintellectual asset that an organization possesses.参考答案:错误25.The effective date of resources is the moment they are created.参考答案:错误26. A resource can only have one identifier.参考答案:错误27.Which of the followin g descriptions about “Passive Resource” are NOT true?参考答案:Passive resources serve as verbs that cause and carry outactions._Passive resources initiate effects or create value on their own.28.For information resources, we more often distinguish domains based on______properties.参考答案:semantic29.The distinction of Resource Format is most important in________.参考答案:Resource storage or preservation30.Which of the following is usually the most important property of informationresources?参考答案:Content31.Which of the following is NOT in Dublin Core Metadata Element Set ?参考答案:Author32.It is a real problem if your organizing system is only designed for yourselfwith a limited lifetime.参考答案:错误33.Backwards traceability includes what is the implementation design of thisrequirement.参考答案:错误34.Most of the specific decisions that must be made for an organizing system arestrongly shaped by the initial decisions about its domain, scope, and scale.参考答案:正确35.The scope of a collection largely determines the extent and complexity of theresource descriptions needed by organizing principles and interactions. The impact of broad scope arises more from the _________ of the resources in acollection than its absolute scale.参考答案:Heterogeneity36.Which of the following is the dominant factor in the design of an organizingsystem?参考答案:The scope of a collection37.In organizing systems that contain digital resources, the logical boundarybetween the resources and their interactions is clear and it is easy todistinguish the interactions supported by the organizing system.参考答案:错误38.Sampling is important when large numbers of resources need to be selectedto satisfy functional requirements. A good sample for statistical purposes is one in which the selected resources are very different in the important ways from the ones that were not selected.参考答案:错误39.Libraries often emphasize intrinsic value, scarcity, or uniqueness as resourceselection criteria.参考答案:错误40.The specifications that guide selection are precise and measurable for anyresource.参考答案:错误41.When resources are unique or rare, organizing activities typically occur afterselection takes place.参考答案:正确42.The purpose of Selection is determining wheth er resources are “Fitness foruse”.参考答案:正确43.Which of the following is FALSE about descriptive statistics?参考答案:Range and Mode are commonly used measures of central tendency.44.Which of the following is the most fundamental decision for an organizingsystem?参考答案:Determining its resource domain。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

The role of domain knowledge in a large scale DataMining projectIoannis Kopanas, Nikolaos M. Avouris, Sophia DaskalakiUniversity of Patras, 26500 Rio Patras, Greece(ikop@ee.upatras.gr, N.Avouris@ee.upatras.gr, sdask@upatras.gr)* Abstract. Data Mining techniques have been applied in many application areas.A Data Mining project has been often described as a process of automatic dis-covery of new knowledge from large amounts of data. However the role of thedomain knowledge in this process and the forms that this can take, is an issuethat has been given little attention so far. Based on our experience with a largescale Data Mining industrial project we present in this paper an outline of therole of domain knowledge in the various phases of the process. This project hasled to the development of a decision support expert system for a major Tele-communications Operator. The data mining process is described in the paper asa continuous interaction between explicit domain knowledge, and knowledgethat is discovered through the use of data mining algorithms. The role of thedomain experts and data mining experts in this process is discussed. Examplesfrom our case study are also provided.1 IntroductionKnowledge discovery in large amounts of data (KDD), often referred as data min-ing, has been an area of active research and great advances during the last years. While many researchers consider KDD as a new field, many others identify in this field an evolution and transformation of the applied AI sector of expert systems or knowledge-based systems. Many ideas and techniques that have emerged from the realm of knowledge-based systems in the past are applicable in knowledge discovery projects. There are however considerable differences between the traditional knowl-edge-based systems and knowledge discovery approaches. The fact that today large amounts of data exist in most domains and that knowledge can be induced from these data using appropriate algorithms, brings in prominence the KDD techniques and facilitates the building of knowledge-based systems. According to Langley and Simon [1] data mining can provide increasing levels of automation in the knowledge engi-*I. Kopanas, N.M. Avouris, S. Daskalaki, The role of knowledge modeling in a large scale Data Mining project, in I.P Vlahavas, C.D. Spyropoulos (eds), Methods and Applications of Artificial Intelligence, Lecture Notes in AI, LNAI no. 2308, pp. 288-299, Springer-Verlag, Berlin, 2002.neering process, replacing much time-consuming human activity with automatic tech-niques that improve accuracy or efficiency by discovering and exploiting regularities in stored data. However the claim that data mining approaches eventually will auto-mate the process and lead to discovery of knowledge from data with little intervention or support of domain experts and domain knowledge is not always true.The role of the domain experts in KDD projects has been given little attention so far. In contrary to old knowledge-based systems approaches where the key roles were those of the domain expert and the knowledge engineer, today there have been more disciplines involved that seem to play key roles (e.g. data base experts, data analysts, data warehouse developers etc.) with the consequence the domain experts to receive less prominence. Yet, as admitted in Brachman &Anand [2], the domain knowledge should lead the KDD process. Various researchers have made suggestions on the role of domain knowledge in KDD. Domingos [3] suggests use of domain knowledge as the most promising approach for constraining knowledge discovery and for avoiding the well-known problem of data overfitting by the discovered models. Yoon et al. [4], referring to the domain knowledge to be used in this context, propose the following classification: inter-field knowledge, which describes relationship among attributes, category domain knowledge that represents useful categories for the domains of the attributes and correlation domain knowledge that suggests correlations among attrib-utes. In a similar manner Anand et al. [5] identify the following forms of domain knowledge: attribute relationship rules, hierarchical generalization trees and con-straints. An example of the latter is the specification of degrees of confidence in the different sources of evidence. These approaches can be considered special cases of the ongoing research activity in knowledge modeling, ontologies and model-based knowl-edge acquisition, see for instance [6], [7], with special emphasis in cases of data-mining driven knowledge acquisition.However these studies concentrate in the use of domain knowledge in the main phase of data mining, as discussed in the next section, while the role of domain knowledge in other phases of the knowledge discovery process is not covered. In this paper we attempt to explore our experience with a large-scale data-mining project, to identify the role of the domain knowledge in the various phases of the process. Through this presentation we try to demonstrate that a typical KDD project is mostly a multi-stage knowledge modelling experiment in which domain experts play a role as crucial as in any knowledge-based system building exercise.2 Identification of key roles and key phases of a KDD project According to Langley and Simon [1] the following five stages are observed in the development of a knowledge-based system using inductive techniques: (a) problem formulation, (b) determination of the representation for both training data and knowl-edge to be learned, (c) collection of training data and knowledge induction, (d) evaluation of learned knowledge, (e) fielding of the knowledge base. In Fayyad et al.[8] the process of KDD is described through the following nine steps: (a) Defining the goal of the problem, (b) Creating a target dataset, (c) Data cleaning and pre-processing, (d) Data transformation, e.g. reduction and projection in order to obtain secondary features, (e) matching the goals of the project to appropriate data mining method (e.g. clustering, classification etc.), (f) Choosing the data mining algorithm to be used, (g) Data Mining, (h) Interpretation of identified patterns, (i) Using discov-ered knowledge. By comparing the two processes one should notice the emphasis of the first frame on knowledge and the second on data analysis and processing. How-ever in reality, while the stages proposed by Fayyad et al. do occur in most cases, the role of domain knowledge in them is also important, as discussed in this paper, while the final stage of building the knowledge base and fielding the system is also knowl-edge-intensive, often involving multiple knowledge representations and demanding many knowledge evaluation and knowledge visualization techniques.The subject of our case study was the development of a knowledge-based decision support system for customer insolvency prediction in a large telecommunication in-dustry. During the initial problem definition phase the observation of existence of large amounts of data in the industry concerned, led to the decision of extensive use of KDD techniques during this project. However this extensive dataset did not cover all aspects of the problem. While high degree of automation in modern telephone switch-ing centres means that telephone usage by the customers of the company was well monitored, information on the customers financial situation and credit levels, which are particularly important for this problem, were missing. This is a problem that often occurs in real problems; that is different levels of automation in different aspects of the problem domain leads to non-uniform data sets. Also techniques to infer knowl-edge, based on assumptions, observations and existing data need to be used exten-sively during the problem definition and modeling phase. So, for instance, if informa-tion on the credit levels of a customer is missing, this can be inferred from information on regularity of payment of the telephone bills, based on the assumption that irregular payments are due to financial difficulties of the customers involved. This is a typical example of use of domain knowledge in the so called ‘data transformation phase’. From early stages it was deduced that a number of domain experts and sources of data had to be involved in the process. Domain experts, e.g. executives involved in tackling the problem of customer insolvency and salesmen who deal with the problem in day-by-day basis were interviewed during the problem formulation phase and their views on the problem and its main attributes were recorded. An investigation of the available data was also performed and this involved executives of the information systems and the corporate databases who could provide an early indication on sources and quality of data. Other key actors were data analysts, who were also involved together with knowledge engineers and data mining experts.3 Business knowledge and KDDIn many KDD projects, like the case study discussed here, the domain knowledge takes the form of business knowledge, as this represents the culture and rules of prac-tice of the particular company that has requested the knowledge-based system. Busi-ness knowledge has been a subject of interest for management consultant firms andbusiness administration researchers [9]. Business process re-engineering (BPR) is a keyword that has been extensively used during the last years, while special attention has been put in building the so-called “institutional memory”, “lessons learned data bases” and “best practice repositories”. While there are but few examples of success-ful full-scale repositories of business knowledge in large companies today, the wide-spread application of these techniques makes worth investigating their existence. The relevance of these approaches to KDD projects and the importance of them as sources of domain knowledge to data mining efforts is evident and for this reason they should be taken in consideration. It should also be noticed that a side effect of a major data-mining project could be the adaptation of a business knowledge base with many rules and practices, which resulted from the KDD process. This is also the case with tacit knowledge and implicit knowledge, which is the not documented business knowledge, often discovered during such a project.The distinction between domain knowledge and business knowledge is that the former relates to a general domain while the latter to a specific business, thus both are required in the case of a specific knowledge based system that is to be commissioned to a specific company. A special case of business knowledge that affects the KDD process relates to the business objectives as they become explicit and relate to prob-lem definition. These can influence the parameters of the problem and measures of performance, as discussed by Gur Ali and Wallace [10]. In the following an example of such mapping of business objectives to measures of system performance is de-scribed for our case study.4 Use of domain knowledge in an insolvency prediction case studyIn this section, some typical examples of applying domain and business knowledge in the case study of the customer insolvency problem are provided. The examples are presented according to their order of appearance in the different phases of the KDD process. In the following section a classification of the domain and business knowl-edge used is attempted. The discussion included in this section does not provide a full account of the knowledge acquisition and modeling case study. It attempts rather through examples to identify the role of domain knowledge in the various phases of the project. A more detailed account would have included details on the modelling process, which involved many iterations and revisions of discovered knowledge.A detailed description of the problem of customer insolvency of the telecommuni-cations industry is beyond the scope of this paper. For more information on the prob-lem, the approach used and the performance of the developed system, see Daskalaki et al. [11].4.1 Problem definitionIn this phase the problem faced by a telecommunications organization was defined and requirements relating to its solution were set. The role of domain experts and the importance of domain and business knowledge in this phase is evident. For instancethe billing process of the company, the rules concerning overdue payments and cur-rently applied measures against insolvent customers need to be explicitly described by domain experts.4.2 Creating target data setFormulation of the problem as a classification problem was performed at this stage. Available data were identified. As often occurs in KDD projects available data were not located in the same database, while discrepancies were observed among the enti-ties of these databases. This phase was not focused on the specific features to be used as parameters for training data, but rather on broad data sets that were considered relevant, to be analyzed in subsequent steps. So the sources of data were: (a) tele-phone usage data, (b) financial transactions of customers with the company (billing, payments etc.), (c) customer details derived from contracts and phone directory en-tries (customer occupation, address etc.). As discussed earlier more details of cus-tomer credit conditions were not available in the corporate databases and could not become available from outside sources. The role of domain and business knowledge in this stage concerned the structure of the available information and the semantic value of it, so this knowledge was offered mostly by the data processing department, in particular employees involved in data entry for the information systems involved. Serious limitations of the available data were identified during this process. For in-stance it was discovered that the information systems of the organization did not make reference to the customer as an individual in recorded transactions, but rather as a phone number owner. This made identification of an individual as an owner of multi-ple telephone connections particularly difficult.4.3 Data prepossessing and transformationThis phase is the most important preparation phase for the data mining experi-ments; The domain knowledge during this stage has been used in many ways:(i) elimination of irrelevant attributes(ii) inferring more abstract attributes from multiple primary values(iii) determination of missing values(iv) definition of the time scale of the observation periods,(v) supporting data reduction by sampling and transaction eliminationIn all the above cases the domain knowledge contributes to reduction of the search space and creation of a data set in which data mining of relevant patterns could be subsequently performed. Examples of usage of domain knowledge are: Example of case i: The attribute “billed amount” was considered as irrelevant since it is known that not only insolvent customers relate to high bills, but also very good solvent customers.Examples of case ii: Large fluctuation of the amounts in consecutive bills is consid-ered important indication of insolvency, so these fluctuations should be esti-mated and taken in consideration.Overdue payments have been inferred by comparison of due and payment dates of bills.Considerable reduction of data was achieved by aggregating transactional data in the time dimension according to certain aggregation functions (sum, count, avg, stddev) and deduced attributes. Domain-related hypotheses of relevance of these de-duced attrib-utes have driven this process. An example was the DiffCount attribute that represents the number of different telephone numbers called in a given period of time and the deviation of this attribute from a moving average in consecutive time periods. Defini-tion of this attribute is based on the assumption that if the diversity of called numbers fluctuates this is an entity related to possible insolvency.Example of case iii: In many cases the missing values were deduced through inter-related attributes, e.g. Directory entries were correlated with customer records in order to determine the occupation of a customer, payments were related to billing periods, by checking the amount of the bill etc.Examples of case iv: The transaction period under observation was set to 6 months prior to the unpaid bill, while the aggregation periods of phone call data was set to that of a fortnight.Example of case v: Transactions related to inexpensive calls (charging less than 0.3 euros) were considered not interesting and were eliminated, resulting in reduction of about 50% of transaction data.Sampling of data with reference to representative cases of customers in terms of area, activity and class (insolvent or solvent) was performed. This resulted in a data set concerning the 2% of transactions and customers of the company.4.4 Feature and algorithm selection for data miningAt this phase the data mining algorithms to be used are defined (in our case deci-sion trees, neural networks and discriminant analysis) and the transformed dataset of the previous phase is further reduced by selecting the most useful features in adequate form for the selected algorithm. This feature selection is based mostly on automatic techniques, however domain knowledge is used for interpretation of the selected fea-ture set. Also this process is used for verification of the previous phase assumptions, so if certain features do not prove to be discriminating factors then new attributes should be deduced and tested. It should also be mentioned that this feature selection process is often interleaved with the data mining process, since many algorithms select the most relevant features during the training process. In our case a stepwise discrimi-nant analysis was used for feature selection.4.5 Data MiningTraining a classifier using the cases of the collected data is considered the most im-portant phase of the process. Depending on the mining algorithm selected, the de-rived knowledge can be interpreted by domain experts. For instance the rules defined by a decision tree can be inspected by domain experts. Also the weights related to the input variables of a neural network reflect their relevant importance in a specific net-work. This is related to the performance of the model.Extensive experiments often take place using a trial and error approach, in which the contribution of the classes in the training dataset and the input features, as well as the parameters of the data mining algorithm used, can vary. The performance of the de-duced models indicate which of the models are most suitable for the knowledge-based system.In an extensive experimentation that took place in the frame of our case study, 62 features were included in the original data set. Subsequently, 5 different datasets where created that where characterised by different distribution of the classes (S)olvent/ (I)nsolvent customer. These distributions were the following: (I/S: 1:1, 1:10, 1:25, 1:50, 1:100)A ten-fold validation of each data mining experiment was performed, by redistrib-uting the training/testing cases in the corresponding data sets. This way 50 classifying decision trees were obtained. By inspecting the features that have been used in these experiments, we selected the 20 most prominent, shown in Table 1.Table 1. Most popular features used in the 50 classifiersFeature Feature description n.NewCust Identification of a new connection 50Latency Count of late payments 50Count_X_charges Count of bills with extra charges 50CountResiduals Count of times the bill was not paid in full 50StdDif Std Dev. of different numbers called 50TrendDif11 Discrepancy from the mov. avg. of four50previous periods of the count of differentnumbers called, measured on the 11th period.TrendDif10 Idem for the 10th period 50TrendDif7 Idem for the 7th period 50TrendDif6 Idem for the 6th period 50TrendDif3 Idem for the 3rd period 50TrendUnitsMax Maximum discrepancy from the moving45average in units charged over the fifteen 2-weekperiods.TrendDif5 Idem for the 5th period 43TrendDif8 Idem for the 8th period 4039Average_Dif Average # of different numbers called over thefifteen 2-weeks period.Type Type of account, e.g. business, domestic etc. 3331MaxSec Maximum duration of the calls in any 2-weekperiod during the study period.28TrendUnits5 Discrepancy from the moving average of theunits charged, measured on the 5th period.23AverageUnits Average # of units charged over the fifteen 2-weeks periods.TrendCount5 Discrepancy from the moving average of the21total # of calls, over the fifteen 2-week periods.CountInstallments Count of times the customer requested payment18by instalments.In Table 1, one may observe that the time-dependent feature most frequently used was the one related with the dispersion of the telephone numbers called (TrendDif, StdDif etc., 9 occurrences). This is a derived feature, proposed by the domain experts as discussed above, that could not possibly be defined without the domain experts contribution. This table demonstrates the important role of the domain experts in sug-gesting meaningful features during this phase.Case distribution 1:1If (StdDif<0.382952541) And (MaxSec<1086)ThenINSOLVENT (confidence 1.4%)If (StdDif<0.382952541) And (MaxSec>1086) And (ExtraDebt>=1.5)ThenINSOLVENT (confidence 100%)If (StdDif>=0.382952541) And (TrendCountMax>=4.625)ThenINSOLVENT (confidence 5.36%)Case distribution 1:10If (CountXCharges<1.5) And (NewCust<0.5) And (TrendDif11<-0.625) And (TrendSec3<-1863.75)ThenINSOLVENT (confidence 0%)If (CountXCharges<1.5) And (NewCust>=0.5) And (CountResiduals>=0.5) And (TrendDif7<-0.625)ThenINSOLVENT (confidence 12.5%)If (CountXCharges<1.5) And (NewCust>=0.5) And (CountResiduals>=0.5) And (TrendDif7>=-0.625) And(StdDif<0.487950027) ThenINSOLVENT (confidence 10.93%)If (CountXCharges>=1.5) And (StdDif>=0.305032313) And (TrendUnitsMax>=121.25) And (TrendUnits6<-2.375) And (TrendDif10<-0.125)ThenINSOLVENT (confidence 12.26%)If (CountXCharges>=1.5) And (StdDif>=0.305032313) And (TrendUnitsMax>=121.25) And (TrendUnits6>=-2.375) thenINSOLVENT (confidence 7.6%)Case distribution 1:25if (CountXCharges<2.5) AND (NewCust>=0.5) AND (CountResiduals>=0.5) AND (TrendCount5>=-1.25)AND (TrendDif6<-0.375)thenINSOLVENT (confidence 25.8% )if (CountXCharges<2.5) AND (NewCust>=0.5) AND (CountResiduals>=0.5) AND (TrendCount5>=-1.25) AND (TrendDif6>=-0.375) AND (TrendCount5<1.375) AND (Type<55.5)ThenINSOLVENT (confidence 55.03% )if (CountXCharges>=2.5) AND (TrendDif3<-0.125) AND (TrendUnitsMax>=222.625) ThenINSOLVENT (confidence 9.49% )Fig.1 Knowledge in form of rules, determining Customer Insolvency4.6 Evaluation and interpretation of learned knowledgeEvaluation of the learned knowledge usually involves measuring the performance using a test data set. However this also involves knowledge interpretation, as dis-cussed in the previous section, which involves domain experts. Knowledge interpreta-tion can be based on the performance on test cases and on inspection of the derived knowledge if adequate knowledge representation formalism has been used. The evaluation criteria for the learned knowledge performance may be related to business objectives as defined by domain experts. An example of evaluation criteria is de-scribed in this section.In figure 1 the knowledge in the form of rules, classifying the minority class cases (INSOLVENT customers) are exposed. It may be noticed that there is a considerable deviation in the parameters contributing to each of the rules, while the measure of performance of the rules vary considerably as indicated by the confidence measure expressing the rule performance in the test data set.The criteria used for quantitative evaluation of learned knowledge in our case, as suggested by the domain experts, were different than the usual overall success rate and the specific class success rate indices usually applied in this kind of experiments. The domain experts suggested the following two criteria in our case study:• The precision of the classifier, which is defined as the percentage of the actu-ally insolvent customers in those, predicted as insolvent by the classifier.• The accuracy of the classifier, which is defined as the percentage of the cor-rectly predicted insolvent out of the total cases of insolvent customers in the data set.These measures in problems of imbalanced class distributions, like in our case, in which the incidents of insolvent customers are very rare compared to those of solvent ones, seem more appropriate for measuring the effectiveness of the induced knowl-edge. By introducing these criteria, we discovered that the learned knowledge, despite of the fact that had very high success rates both overall and in specific classes, it did not meet the business objectives as these were defined by the Telecommunication Company (i.e. the requested measure of success was precision > 80% and accuracy > 50%).An example of such a classifier is presented in the following table 2. In this table the performance of the classifier is shown in the testing data set. From this table one can see that the performance of this particular classifier is over 90 % in the majority class and over 83% in the minority class. However the precision is 113/2844= 3.9% and the accuracy is 113/136= 83%, thus making the performance in terms of the busi-ness objective set, not acceptable.Table 2 Performance of classifier C1-3 for the insolvency prediction problem4.7 Fielding the knowledge baseThis stage is essential in knowledge-based system development project, while this is often omitted in data mining projects as considered outside the scope of the data mining experiment. During this phase the learned knowledge is combined with other domain knowledge in order to become part of an operational decision support system, used by the company that commissioned the KDD project. The domain knowledge plays an important role during this stage. Usually the learned knowledge is just a part of this knowledge-based system, while heuristics or other forms of knowledge are of-ten used as pre- or post-processors of the learned knowledge. In our case, the domain experts have suggested that the customers classified as insolvent, should be examined in more detail in terms of the amount due, the percentage of this amount that is due to third telecommunication operators, previous history of the customer etc, attributes that did not participate in the classification algorithm’s decision, yet important for taking measures against the suspected insolvency.In the fielded knowledge based system important aspects are also the available means for convincing the decision-maker for the provided advice. This can be achieved by providing explanation on the proposed suggestion or visualizing the data and the knowledge used, as suggested by many researchers, see Ankerst et al [12], Brachman & Anand [2], etc.5 ConclusionThis paper has focused on the role of domain knowledge in a data-mining project. Eight distinct phases have been identified in the process and the role of the domain experts in each one of them has been discussed. In summary this role is shown in Ta-ble 3. Predicted casesCategory Insolvent (0) Solvent (1)Insolvent (0) 113(83.1 %)23 (16.9%) Actual casesSolvent (1)2731(9.8 %) 25081 (90.2 %)。