H. Approximating description logic classification for semantic web reasoning
- 格式:pdf
- 大小:170.10 KB
- 文档页数:15
C28x Control Law AcceleratorMath Library(CLAmath)Module User’s Guide (SPRC993)C28x Foundation SoftwareV2.00December 11, 2009IMPORTANT NOTICETexas Instruments and its subsidiaries (TI) reserve the right to make changes to their products or to discontinue any product or service without notice, and advise customers to obtain the latest version of relevant information to verify, before placing orders, that information being relied on is current and complete. All products are sold subject to the terms and conditions of sale supplied at the time of order acknowledgement, including those pertaining to warranty, patent infringement, and limitation of liability.TI warrants performance of its semiconductor products to the specifications applicable at the time of sale in accordance with TI’s standard warranty. Testing and other quality control techniques are utilized to the extent TI deems necessary to support this warranty. Specific testing of all parameters of each device is not necessarily performed, except those mandated by government requirements.Customers are responsible for their applications using TI components.In order to minimize risks associated with the customer’s applications, adequate design and operating safeguards must be provided by the customer to minimize inherent or procedural hazards.TI assumes no liability for applications assistance or customer product design. TI does not warrant or represent that any license, either express or implied, is granted under any patent right, copyright, mask work right, or other intellectual property right of TI covering or relating to any combination, machine, or process in which such products or services might be or are used. TI’s publication of information regarding any third party’s products or services does not constitute TI’s approval, license, warranty or endorsement thereof.Reproduction of information in TI data books or data sheets is permissible only if reproduction is without alteration and is accompanied by all associated warranties, conditions, limitations and notices. Representation or reproduction of this information with alteration voids all warranties provided for an associated TI product or service, is an unfair and deceptive business practice, and TI is not responsible or liable for any such use.Resale of TI’s products or services with statements different from or beyond the parameters stated by TI for that products or service voids all express and any implied warranties for the associated TI product or service, is an unfair and deceptive business practice, and TI is not responsible nor liable for any such use.Also see: Standard Terms and Conditions of Sale for Semiconductor Products./sc/docs/stdterms.htmMailing Address:Texas InstrumentsPost Office Box 655303Dallas, Texas 75265Copyright ©2002, Texas Instruments IncorporatedContents1.Introduction_____________________________________________________42.Other Resources________________________________________________43.Installing the Macro Library______________________________________53.1.Where the Files are Located (Directory Structure)_________________5ing the CLAmath Macro Library________________________________64.1.The C28x CPU and the CLA are Friends!__________________________64.2.C28x C Code____________________________________________________74.3.CLA Assembly Code____________________________________________74.4.Math Lookup Tables_____________________________________________84.5.Linker File______________________________________________________85.CLAmath Macro Summary_______________________________________9CLAcos_______________________________________________________________9 CLAdiv________________________________________________________________9 CLAisqrt_____________________________________________________________10 CLAsin_______________________________________________________________10 CLAsincos___________________________________________________________11 CLAsqrt______________________________________________________________11 CLAatan_____________________________________________________________12 CLAatan2____________________________________________________________12 6.Revision History________________________________________________13TrademarksTMS320 is the trademark of Texas Instruments Incorporated.Code Composer Studio is a trademark of Texas Instruments Incorporated.All other trademark mentioned herein is property of their respective companies1. I ntroduction The Texas Instruments TMS320C28x Control Law Accelerator math library is a collection of optimized floating-point math functions for controllers with the CLA. This source code library includes CLA assembly macros of selected floating-point math functions. All source code is provided so it can be modified for your particular requirements.2. O ther Resources There is a live Wiki page for answers to CLA frequently asked questions (FAQ). Links to other CLA references such as training videos will be posted here as well./index.php/Category:Control_Law_Accelerator_Type0Also check out the TI Piccolo page:/piccoloAnd don’t forget the TI community website:/Building CLA code requires Codegen Tools V5.2 or later.Debugging in Code Composer Studio V4:Debugging CLA code requires CCS V4.0.2. V4.02 can be installed by going to the update manager and upgrading your CCS 4.0 install.Debugging in Code Composer Studio V3.3:C2000 evaluation version 3.3.83.19 or later supports CLA debug. For updating other versions of CCS, please check the Wiki and community website for updates.3. I nstalling the Macro Library 3.1. Where the Files are Located (Directory Structure)As installed, the C28x CLAmath Library is partitioned into a well-defined directory structure.By default, the library and source code is installed into the following directory:C:\ti\controlSUITE\libs\math\CLAmathTable 1 describes the contents of the main directories used by library:Table 1. C28x CLAmath Library Directory StructureDirectory Description<base> Base install directory. By default this isC:\ti\controlSUITE\libs\math\CLAmath\v100aFor the rest of this document <base> will be omitted fromthe directory names.<base>\doc Documentation including the revision history from anyprevious release.<base>\lib The macro library include file.The macro library header file.Lookup tables used by the macros (assembly file).<base>\2803x_examples Example code for the 2803x family of devices.<base>\2803x_examples\examples Code Composer Studio V3.3 based examples<base>\2803x_examples\examples_ccsv4 Code Composer Studio V4 based examples4. U sing the CLAmath Macro LibraryThe best place to start is to open up and run some of the example programs provided. This section will walk you through the framework the examples use. Open one of the projects and follow along.4.1. The C28x CPU and the CLA are Friends!Also known as: How to Use Header Files to Make Sharing EasyIn each of the examples provided, you will find a header file called CLAShared.h. This file includes the header files, variables, and constants that we want both the C28x and the CLA code to know about.If possible, open one of these examples and follow along in the CLAShared.h file as we go through some suggested content:•IQmath: If your project uses IQmath, then include it in CLAShared.h. Now you can define variables as _iq and the CLA will also know what your GLOBAL_Q value is.•C28x Header files: These make accessing peripherals easy and can be used in bothC and assembly code. In the header file software downloads we provide a file calledDSP28x_Project.h. This will include all of the peripheral header files along with some example header files with useful definitions.•The CLAmath header file: CLAmath_type0.h. All of the symbols used by the CLAmath library are included in this header file.•Any variables or constants that both the C28x and CLA should know about. For example, if they will both access a variable called X, you can add:extern float32 X;The variable X will be created in the C28x C environment, but now the CLA will alsoknow about it.•Symbols in the CLA assembly file. These might include start/end symbols for each task. The main CPU can then use these symbols to calculate the vector address foreach task.4.2. C28x C CodeNow open the main.c file to see the system setup performed by the C28x main CPU. Notice the C28x C code includes the CLAShared.h header file discussed in the previous section. The C28x C code is responsible for•Declaring all the C28x and CLA shared variables•Assigning these variables to linker sections by using the CODE_SECTION #pragma statement. This will be used by the linker file to place the variables in the propermemory block or message RAM.•Turning on the clock to the CLA•Initializing the CLA data and program memoryNote as the examples are provided, the CLA data and program are loaded directly by Code Composer Studio. This is a great first step while debugging CLA code. Laterif you move the load address of these sections to flash, the main CPU will need tocopy them to the CLA data and program memory.•Assigning the CLA program and data memory to the CLA module. At reset these memories belong to the C28x CPU so they can be initialized like any other memory.After initialization, they can then be assigned to the CLA for use.•Enabling CLA interrupts4.3. CLA Assembly CodeThe CLA assembly code for each example is in the file CLA.asm. Open this file and notice the following:•The CLAShared.h header file discussed previously is included by using the assembly directive:.cdecls C,LIST,"CLAShared.h"•The CLAmath macro library is included as well. This will allow you to create instances of the macros in the CLA assembly file..include "CLAmathLib_type0.inc"•CLA code is assigned to its own assembly section. In this case the section is called CLAProg. Later we will use this section in the linker command file to indicate wherein the memory map the CLA code should be loaded and where it will run.• A symbol indicates the start of CLA program memory and a symbol indicates the start of each task. These can be used by the main CPU to calculate the contents of thevector register (MVECT) for each CLA task.•In one or more of the tasks, a CLA macro is instanced. The input and output of each macro instance are tied to variables you will create in the C28x code and reference in the CLAShared.h header file. A single task can include one or more macros. Themacros themselves do not include the MSTOP instruction. Notice that an “MSTOP”instruction has been inserted to indicate the end of each task..4.4. Math Lookup TablesMany of the functions in the library use look-up tables to increase performance. These tables are located in the “CLAmathTables” assembly section and are available in the file called CLAsincosTable_type0.asm.Make sure to add this file to your project if you are using the sin, cos or sincos macros.4.5. Linker FileThe linker file needs to specify the load and run memory locations for the following sections:•CLAProg – this contains the CLA program code. The examples load this code directly into CLA program memory for debug. Later you will want to load this section into flash and copy it to CLA program SARAM just as you would any time criticalsection of code.•Cla1ToCpuMsgRAM – The examples place all of the CLA to CPU message traffic in this section using the DATA_SECTION #pragma in the main C28x code. This section must be placed in the CLA to CPU message RAM.•CpuToCla1MsgRAM – Likewise, the examples place all of the CPU to CLA message traffic in this section using the DATA_SECTION #pragma in the main C28x code.This section must be placed in the CPU to CLA message RAM.•CLAmathTables – This table is used by the sin, cos and sincos macros. The contents of this section should be placed in the CLA data memory at runtime. The examplesload this code directly into CLA data memory for debug. Later you will want to loadthis section into flash and copy it to CLA data SARAM just as you would any timecritical data table.5. C LAmath Macro SummaryThe following functions are included in this release of the CLAmath Library. All of the macros are provided in the CLAmathLib_type0.inc file and can be modified as required for a particular application.cos isqrtdivision sinatan sincosatan2 sqrtsincosSingle-Precision Floating-Point COS (radians) Description Returns the cosine of a 32-bit floating-point argument “rad” (in radians)using table look-up and Taylor series expansion between the look-uptable entries.Header File C28x C code: #include "CLAmathLib_type0.h"CLA Code: .cdecls C,LIST,"CLAmathLib_type0.h"Macro Declaration CLAcos .macro y, rad ; y = cos(rad)Input: rad = radians in 32-bit floating-pointOutput: y = cos(rad) in 32-bit floating-point Lookup Tables This function requires the CLAsincosTable located in the file calledCLAsincosTable_type0.asm. By default, this table is in the assemblysection named CLAmathTables. Make sure this table is loaded into theCLA data space.Single-Precision Floating-Point Division Description Performs a 32-bit/32-bit = 32-bit single-precision floating point division.This function uses a Newton-Raphson algorithm.Header File NoneMacro Declaration CLAdiv .macro Dest, Num, DenInput: Num = Numerator 32-bit floating-pointDen = Denominator 32-bit floating-pointOutput: Dest= Num/Den in 32-bit floating-point Lookup Tables NoneSingle-Precision Floating-Point 1.0/Square Root Description Returns 1.0 /square root of a floating-point argument using a Newton-Raphson algorithm.Header File NoneMacro Declaration CLAisqrt .macro y, xInput: x = 32-bit floating-point inputOutput: y = 1/(sqrt(x)) in 32-bit floating-point Lookup Tables NoneSpecial Cases If x = FLT_MAX or FLT_MIN, CLAisqrt will set the LUF flag.If x = -FLT_MIN, CLAsqrt will set both the LUF and LVF flags.If x = 0.0, CLAsqrt sets the LVF flag.If x is negative, CLAsqrt will set LVF and return 0.0.Single-Precision Floating-Point SIN (radians) Description Returns the sine of a 32-bit floating-point argument “rad” (in radians)using table look-up and Taylor series expansion between the look-uptable entries.Header File C28x C code: #include "CLAmathLib_type0.h"CLA Code: .cdecls C,LIST,"CLAmathLib_type0.h"Macro Declaration CLAsin .macro y, rad ; y = cos(rad)Input: rad = radians in 32-bit floating-pointOutput: y = cos(rad) in 32-bit floating-point Lookup Tables This function requires the CLAsincosTable located in the file called CLAsincosTable_type0.asm. By default, this table is in the assemblysection named CLAmathTables. Make sure this table is loaded into theCLA data space.Single-Precision Floating-Point SIN and COS (radians) Description Returns the sine and cosine of a 32-bit floating-point argument “rad” (inradians) using table look-up and Taylor series expansion between thelook-up table entries.Header File C28x C code: #include "CLAmathLib_type0.h"CLA Code: .cdecls C,LIST,"CLAmathLib_type0.h"Macro Declaration CLAcos .macro y1, y2, rad, temp1, temp2Input: rad = radians in 32-bit floating-pointOutput: y1 = sin(rad) in 32-bit floating-pointy2 = cos(rad) in 32-bit floating-pointTemporary: temp1, temp2.Note: temp1 and temp2 locations are used for temporarystorage during the function execution.Lookup Tables This function requires the CLAsincosTable located in the file called CLAsincosTable_type0.asm. By default, this table is in the assemblysection named CLAmathTables. Make sure this table is loaded into theCLA data space.Single-Precision Floating-Point Square Root Description Returns the square root of a floating-point argument X using a Newton-Raphson algorithm.Header File NoneMacro Declaration CLAsqrt .macro y, xInput: x = 32-bit floating-point inputOutput: y = sqrt(x) in 32-bit floating-point Lookup Tables NoneSpecial Cases If X = FLT_MAX or FLT_MIN CLAsqrt will set the LUF flag.If X = -FLT_MIN CLAsqrt will set both the LUF and LVF flags.If X = 0.0, CLAsqrt sets the LVF flag.If X is negative, CLAsqrt will set LVF and return 0.0.Single-Precision Floating-Point ATAN (radians)Description Returns the arc tangent of a floating-point argument x . The return valueis an angle in the range [-π, π] radians.Header File C28x C code: #include "CLAmathLib_type0.h"CLA Code: .cdecls C,LIST,"CLAmathLib_type0.h"Macro Declaration CLAatan .macro y,x ; y = atan(x)Input: x in 32-bit floating-pointOutput: y = atan(x) in 32-bit floating-point Lookup Tables This function requires the CLAatan2Table located in the file called CLAatanTable_type0.asm. By default, this table is in the assemblysection named CLAmathTables. Make sure this table is loaded into theCLA data space.Single-Precision Floating-Point ATAN2 (radians)Description Returns the 4-quadrant arctangent of floating-point arguments y/x. Thereturn value is an angle in the range [-π, π] radiansHeader File C28x C code: #include "CLAmathLib_type0.h"CLA Code: .cdecls C,LIST,"CLAmathLib_type0.h"Macro Declaration CLAatan2 .macro z,y,x ; z = atan(y/x)Inputs: x in 32-bit floating-pointy in 32-bit floating-pointOutput: z = atan2(y,x) in 32-bit floating-pointLookup Tables This function requires the CLAatan2Table located in the file called CLAatanTable_type0.asm. By default, this table is in the assemblysection named CLAmathTables. Make sure this table is loaded into theCLA data space.6. R evision History V2.00: Moderate updateTwo more functions , atan and atan2 have been added to the list of available CLA math macros .V1.00a: Minor updateThis update is a minor update that does not change any of the source code. The changes were to prepare the package to be included in ControlSUITE and improve usability in Code Composer Studio V4.•Changed the default location for the ControlSUITE release of this package. The new release location is:C:\ti\controlSUITE\libs\math\CLAmath•Added example projects ported to work with Code Composer Studio V4.CCS 3.3 examples can be found in the directory:C:\ti\controlSUITE\libs\math\CLAmath\v100a\2803x_examples\examplesCCS 4 examples can be found in the directory:C:\ti\controlSUITE\libs\math\CLAmath\v100a\2803x_examples\examples_ccsv4 V1.00: Initial release。
Approximate Symbolic Pattern Matching for Protein Sequence DataBill C. H. Chang and Saman K. HalgamugeMechatronics Research Group, Department of Mechanical and Manufacturing Engineering, University ofMelbourne, Melbourne, Victoria 3010, AustraliaAbstractIn protein sequences, often two sequences that share similar substrings have similar functional properties. Learning of the characteristics and properties of an unknown protein is much easier if its likely functional properties can be predicted by finding the substrings already known from other protein sequences. The sequence pattern search algorithm proposed in this paper searches for similar matches between a pattern and a sequence by using fuzzy logic and calculates the degree of similarity from a sequence inference step. Proteins from 11 domain families are used for simulation and the result shows that the proposed algorithm is capable of identifying sequences that have a similar pattern compared to their family protein motifs.Keywords: Sequence pattern matching, approximate sequence matching, protein sequences, fuzzy, data mining1.IntroductionMining of sequence data has many real world applications [1][2]. Transaction history of a bank customer, product order history of a company, performance of the stock market [3] and biological DNA data [4] are all sequence data where sequence data mining techniques are applied. In contrast to ordinary data set, sequence data are dynamic and order dependent. An example of a sequential pattern is “A customer who bought a Pentium PC nine months ago is likely to order a new CPU chip within one month” [1].For symbolic sequential data, pattern matching can be considered as either (1) exact matching or (2) approximate matching [2]. Approximate matching is the finding of the most similar match of a particular pattern within a sequence. Quite often in real world data mining applications, exact patterns do not exist due to the large number of possible sequence combinations, and therefore, an approximate matching algorithm is required. Especially in the field of molecular biology, sequence patterns are described in an approximate way.The analysis of protein and DNA sequence data has been one of the most active research areas in the field of computational molecular biology [5]. A DNA sequence contains genetic information, which include genes, regulatory regions, and many other unknown regions. Four different “bases” – A, T, C and G are the “building blocks” in DNA sequences and a DNA sequence is composed of a combination of these four bases in a linear form [6]. Computational molecular biology research in DNA sequences mainly concentrate on gene identifications, regulatory region identifications and specie to specie comparisons. When a gene in a DNA sequence is “activated”, a process called transcription starts and mRNA (a genetic material) is produced. The sequence of mRNA is the complement (i.e. A becomes T, C becomes G, G becomes C and T becomes A) of the gene it was transcribed from. mRNA is then translated (every 3 DNA bases is translated into 1 Amino Acid, which is the building block of proteins) into protein sequence which is then folded into a functional three dimensional structure [6].The discovery of patterns within biological sequences can lead to significant biological discoveries. Sequence motif discovery algorithms can be generally categorized into 3 types: (1) String Alignment algorithms, (2) Exhaustive enumeration algorithms, and (3) Heuristic methods. Motif discovery is outside the scope of this paper and the motif patterns used for simulation are obtained from PROSITE database [7]. Please refer to [8] for a more detailed discussion on motif discovery algorithms.It is important to have a pattern matching technique to identify the biological-significant patterns within an unknown biological sequence. This presence of significant patterns within an unknown sequence gives some indication of the likelihood of its functional properties. This paper discusses the possibility of implementing an approximate pattern matching algorithm based on a fuzzy inference technique. Background on sequence searching in protein database is briefly presented in Section 2. Section 3 summarises the method of Fuzzy Sequence Pattern Searching Algorithm between two sequences. Simulation results on molecular biology data and its discussion are presented in Section 4 and Section 5. Finally, Section 6 discusses some of the possible future developments.2.Sequence Searching in Protein DatabaseBiologists often perform protein sequence searches due to the fact that similar sequences usually have similar functional properties [2]. When an unknown protein is sequenced, the scientist usually tries to get the “feel” for its functional properties by doing database searching. The functional properties of a protein can be scientifically determined using biological tests, however, the testing time period would be quite long if the scientist does not have any idea about the particular protein sequence. By doing a sequence search through protein sequence database (such as PROSITE [7], BLOCKS [9], PRINTS [10] and PFAM [11]), and if some similar sequences exist, the scientist would often test for the possibility of similar functional properties for the unknown sequences and hence, fast track the research.Generally, there are two approaches of sequence searching in proteins: sequence alignment and sequence motif searching. Sequence alignment methods have two variations: local alignment and global alignment methods [2]. Local alignment method aims to align two sequences so that the similarity between the regions of the two sequences is maximised. Local alignment such as BLAST [12] is useful to determine the functional similarity between two sequences. On the other hand, global alignment methods aim to align two sequences so that the similarity between the two sequences is maximised (globally). Global alignment method is useful to determine the relevance between two sequences in terms of their inheritance. Alignment algorithm is usually implemented based on dynamic programming technique and variation of alignment methods can be achieved by manipulating the scores for matches, mismatches, gaps, etc [2]. Sequence motif searching techniques identify the existence of motifs within an unknown protein sequence.A Protein sequence motif, signature or consensus pattern, is a short sequence that is found within sequences of a same protein family [13]. PROSITE [7] is one of the protein motif database where scientist can search for occurrences of protein motifs in his/her unknown protein sequences.Apart from searching for similar sequences, protein sequence searching can also potentially obtain the structural information about an unknown protein. A protein conformation is often described in terms of three structural levels: (1) the Primary Structure, which is a linear sequence of polypeptide chain (series of linked amino acids), (2) the Secondary Structure, which describes the path that the polypeptide backbone of the protein follows in space, and (3) the Tertiary Structure, which describes the organisation in three dimensions of all the atoms in the polypeptide chain [6]. A protein’s sequence which is also known as the primary structure of the protein can often be used to predict its secondary structure [14]. However, the prediction of a protein’s tertiary structure is still difficult at this stage.In real world biological applications, most relevant sequences are “similar” instead of exactly the same. It is therefore useful to search sequences using fuzzy logic where approximate pattern searching can be implemented. Current approximate searching algorithms (such as Prosite Scan [15]) is rigid in a way that its definition of “similarity” is fixed. In the proposed algorithm, the user is able to define the meaning of “similarity” by adjusting the membership functions for sequence search. This way, a scientist is able to identify an unknown sequence’s functional properties using the past experience and expertise.3.Approximate Sequence Pattern Searching AlgorithmIn exact sequence pattern matching problems, we aim to find a substring in text T that is exactly the same as the searching pattern P. In biological sequence data applications, exact patterns are rare, but sequences belonging to the same functional family usually have “similar” substrings within each of the sequences. Hence, there is a requirement of an approximate sequence pattern searching algorithm for biological data analysis.The proposed algorithm aims to find a substring, P’, within a text, T, that is “most similar” to a searching pattern, P. A sequence data can be interpreted as a series of events, E I, separated by their event intervals, I ij. A sequence can be described as:E1 – I1,2 – E2 – I2,3 – E3 … - I(n-1),n – E nFor example, the sequence ATG has three events A, T and G. The event intervals between A and T, and T and G are both zero (i.e. I1,2= I2,3= 0). The concept of event intervals is important when the searching pattern, P, contains wild cards. A wild card, usually represented by letter “X” in molecular biology literature, can match to any other symbols. For instance, sequences AXC and ABC are considered to be an “exact match” as X can be matched to B (in this case) or any of the possible symbols/events. Since X is a wild card and not an identified event for the search algorithm, the sequence AXC has only two events, A and C separated by an event interval of one.A classifying type fuzzy system without defuzzification [16] is used for the proposed algorithm. There are four main steps in the proposed algorithm. Firstly, a searching pattern (string), P is decomposed to obtain events and event intervals. Then, the obtained events and event intervals are fuzzified in the Sequence Fuzzification step. A Sequence Inference step follows to determine the sequences that are “similar” to the searching pattern P. Finally, a sequence search is conducted to determine the “similarity” between a text T and the pattern P. An overview of the algorithm procedure is shown in Figure 1.Figure 1. Overview of Approximate Sequence Searching Algorithm.3.1Sequence DecompositionIn this step, the searching pattern P is decomposed to obtain events and event intervals. Events in the sequence are identified and stored in an event distribution matrix. From this event distribution matrix, event intervals for all events can be calculated.An event can consist of one or more symbols/characters, and event width is the number of symbol(s)/character(s) of an event. “C” is an event with event width equals to one, whereas “CT” is an event with event width of two. The number of decomposition level needed corresponds to the event width of an event. First level decomposition identifies each character as an event and k-th level decomposition identifies events with an event width of k.An example of event identification for sequence CTGACAG and its event distribution matrix is shown in Figure 2.3.2Sequence FuzzificationThe searching pattern P is fuzzified by applying fuzzification techniques to the events and event intervals obtained from the previous step. In the fuzzification step, fuzzy membership functions of events and event intervals are generated. There are three fuzzy variables: event content, event interval, and total number of events.Figure 2. Event identification and event distribution matrix for sequence CTGACAG.3.2.1Event Content Fuzzy Membership FunctionsAn event can be fuzzified based on its content, or character(s)/symbols(s) presented. The assignment of the fuzzy membership functions is dependant on the requirement of the specific task or expert knowledge. For example, if event TG in pattern P, is considered to be important even if only one of the symbols exists. We can assign XG and TX a value of 0.5, where symbol X can match to any other symbols. A fuzzy membership function for this event can be generated as shown in Figure 3.3.2.2Event Interval Fuzzy Membership FunctionsThe length of an event interval can be fuzzified to represent Long, Medium, Short, or any other linguistic terms. This concept is useful in biological applications since number of wild cards varies for many protein family motifs. An example of fuzzy membership functions for event interval is shown in Figure 4.Figure 3. Fuzzy membership function for event TG.3.2.3Total Event Fuzzy Membership FunctionsIn some applications or tasks, we would like to detect a sequence data even if some of the events in the searching pattern are non-existent in the sequence data. Especially in the biological data where two proteins share “enough similarity” in sequences may have similar functional properties. This is also known as the “First fact of biological sequence analysis” [2]: “In bio-molecular sequences (DNA, RNA, or amino acid sequences), high sequence similarity usually implies significant functional or structural similarity [2]. For example, the sequence ATGCA and ATGCC may have the same functional property even though ATGCC only has four events of ATGCA. An example of total event fuzzy membership function for P = ATGCA is shown in Figure 5. Here the variable Tot, is used to describe the total number of events.Figure 4. Fuzzy membership function for event interval: Long, Medium, and Short.Figure 5. Total event fuzzy membership function for P = ATGCA.3.3Sequence InferenceThis step generates an array of sequences P’ that are “similar” to the searching pattern P. The degree of similarity is determined by the fuzzy rule:eventR i: IFE1 occurs AND event E2occurs AND event interval between E1 and E2 is I1 AND …event E n-1 occurs AND event E n occurs AND event interval between E n-1 and E n is I n-1, AND the total number of events is Tot, THEN Pattern P’i is similar to P with a degree Y i.Where Y i = T-norm(µ(E1), µ(E2), …, µ(E n), µ(I1), µ(I2), …, µ(I n-1), µ(Tot))The T-norm used can be multiplication and the choice of functions will depend on the need of specific applications.3.4Sequence SearchingThe array of similar sequences P’ obtained from the previous step is then used for the determination of similarity between a sequence text T and a searching pattern P. Each P’i is compared with sequence T as an exact matching problem and if P’i exists in T, then the similarity between P and T is Y i. Since the sequence T can match to many of the sequences in P’, the similarity between T and the searching pattern P is determined as:Y = F(Y j), for P’j exists in T.and the function F is Maximum.4.Simulation ResultsIn this simulation, firstly, artificially created sequence data is used to demonstrate the use of the proposed algorithm. Sequence data of eleven protein families from Swiss-Prot Protein Database [15] are then used to demonstrate the applicability of the proposed algorithm on real biological applications.4.1Simulation with Artificial DataArtificially generated sequences shown in Figure 6 are used to demonstrate the use and scope of the proposed algorithm.In this simulation, we will demonstrate the use of the proposed algorithm by using the event intervalmembership functions, event content membership functions, and the total event membership functions inour sequence searching pattern P . We start our simulation from searching sequences that contain the pattern P , where:P = A , then some small number of wild cards, and then another A , (i.e. A -x(0,6)-A )The event interval between the two A s is short with a membership function as shown in Figure 4. This search found 14 valid sequences with a membership function value greater than zero. The result of search is presented as “ membership function value/sequence number”: [0.8/2; 0.1/3; 1/4; 1/7; 0.8/8; 1/9; 1/10;0.8/11; 0.2/13; 0.8/14; 1/15; 0.8/17; 1/18; 1/20].Figure 6. 20 artificially generated sequences.Let us define an arbitrary symbol “$” which has a membership function shown in Figure 7. The searching pattern P is extended to include two more “$” symbols after the second A . So the searching pattern becomes:P = A , then some small number of wild cards, and then another A , then two “$” symbolsHere, we try to demonstrate the use of event membership functions. This search yielded 11 valid sequences: [0.2/2; 0.05/3; 0.4/4; 0.4/7; 0.2/8; 0.5/9; 0.2/11; 0.05/14; 0.25/15; 0.25/18; 0.25/20]. Then, we add another symbol F at the end of the searching pattern P :P = A , then some small number of wild cards, and then another A , then two “$” symbols, then F .With the new search pattern P, there are 5 valid sequences: [0.2/2; 0.2/8; 0.2/11; 0.25/18; 0.25/20]. This pattern has five events: A, A, $, $, and F. If we want to detect sequences that have the pattern P with at least four of the five events present, the total number of valid sequences becomes 16. These are: [0.2/2;0.025/3; 0.2/4; 0.25/5; 0.2/7; 0.2/8; 0.25/9; 0.2/11; 0.25/13; 0.025/14; 0.125/15; 0.125/16; 0.25/17; 0.25/18;0.125/19; 0.25/20].ABCDEFGHIJFDAMNVBZGFGA FASDFASDFSAFHXVCNB NBVCXBGADTGBHADFDDYUTUKD HGAGFDVRYTHAGFTADFADBNBZVC OFLEPFKSSSFFASDF ZMVKFIEJKFAVCGG KFDAFEFADFAGAFGADATD KKFASDFADSFTRTQR KDKKQDAFADAGAFDGA KFAFKDAOEOAFAAGFGAGADYRR FAUQFASDFKNCDDSAEOFDFS DAIIHRIEPDNFDKPSEJDIFJDLKE DFAFDFHAGKDKFKPPPPDFAJNFE FASDFUWADFIANCZSAFOIE PAADSIOPOQERIAFJDNAFJAFAFU HSFDDFAUERUIQJFJASDFNDFA RTYSDFHASDORQERIHAFDFAS AFADSFHAFASDFAVDSFA DFASDFEURERUEHS KASFNVKDADFASDFOE1234567891011121314151617181920Figure 7. Fuzzy membership function for event “$”.4.2Simulation with Protein Sequence DataC2H2 Zinc Finger proteins are used for demonstration of the proposed algorithm and this simulation is done using PROSITE database release 39 [17]. “C2H2 are nucleic acid-binding protein structures first identified in the Xenopus transcription factor TFIIIA. These domains have since been found in numerous nucleic acid-binding proteins. A zinc finger domain is composed of 25 to 30 amino-acid residues. There are two cysteine or histidine residues at both extremities of the domain, which are involved in the tetrahedral coordination of a zinc atom”[17]. In PROSITE, the motif pattern for C2H2 Zinc Finger proteins is:C-x(2,4)-C-x(3)-[LIVMFYWC]-x(8)-H-x(3,5)-HThe symbol x(i,j) represents the existence of i to j number of wild cards, whereas x(i) means that there are i number of wild cards. The section [LIVMFYWC] represents one of the “LIVMFYWC” symbol is represented. The process of fuzzy pattern searching starts with the Sequence Decomposition step described in Section 3.1.4.2.1Sequence DecompositionSince our searching pattern P = C-x(2,4)-C-x(3)-[LIVMFYWC]-x(8)-H-x(3,5)-H, does not contain substrings of multiple characters, 1st level decomposition is used. Five events and four event intervals are identified:Events:E1 = C; E2 = C; E3= [LIVMFYWC]; E4 = H; E5 = H.Event Intervals: I1,2 = 2,3,4; I2,3 = 3; I3,4 = 8; I4,5 = 3,4,5;4.2.2Sequence FuzzificationThe events and event intervals are fuzzified according to the method described in Section 3.2. The membership functions implemented are shown in Figure 8. The event contents for single character symbol (C, and H) are not fuzzified in this simulation. For event E3, since no preference of symbols is given, all symbols inside the bracket is given a membership degree of 1. These membership functions can bemodified according to the specific needs of the user.4.2.3Sequence InferenceThe inference rule used for this pattern is:IF event E1, E2, E3, E4and E5occur, AND their event intervals are I1,2, I2,3, I3,4 and I4,5, AND the total number of events is Tot, THEN Pattern P’i is similar to P with a degree Y i.Where Y i = T-norm(µ(E1), µ(E2), µ(E3), µ(E4), µ(E5), µ(I1,2), µ(I2,3), µ(I3,4), µ(I4,5), µ(Tot))The T-norm used is multiplication.4.2.4Sequence SearchingSince a protein sequence T can match to many of the sequences in P’, the similarity between T and the searching pattern P is determined as:Y = F(Y j), for P’j exists in T.and the function F is Maximum.The simulation results show that the proposed algorithm identified 416 out of 418 Zinc Finger Protein Sequences. The two protein sequences already experimentally identified as Zinc Finger Proteins but not detected by the proposed algorithm are:!YMDFVAAQCLVSISNRAAPEHGVAPDAERLRLPEREVTKEHGDPGDTWKDYCTLVTIAKSL LDLNKYRPIQTPSVCSDSLESPDEDMGSDSDVTTESGSSPSHSPEERQDPGXAPSPLSLLHPGV AAKGKHASEKRHK!HIAHHTLPCK C PI C GKP F APWLLQG H IRT H TGESPSVCQHCNRAFAThe first sequence does not seem to have a subsequence that is similar to this common zinc finger motif, whereas the second sequence can be easily identified by modifying the membership function for I3,4. Simulation result for other 10 randomly selected protein families (from PROSITE) is shown in Table 1. 5.DiscussionThe simulation shows that the proposed algorithm is capable of doing approximate sequence pattern searching with a high success rate. From the simulation of C2H2 Zinc Finger Proteins, we found that 1 out of the 418 sequences does not have similarity with the motif C-x(2,4)-C-x(3)-[LIVMFYWC]-x(8)-H-x(3,5)-H. This shows the variety of biological sequences. Although they are all catagorised as C2H2 Zinc Finger proteins, they may still have different sequence structures [18]. One way to overcome this problem may be to adopt a more general motif sequence. However, this strategy may result in a higher percentage of false positives.Simulation results from Table 1 shows that the proposed algorithm can detect motif patterns as good as the PrositeScan program or better (PS00028, PS00150, PS00605). Detection of all protein sequences in a protein family is sometimes not possible due to the reason that some protein sequences only have part of the motif.Figure 8. Fuzzy membership functions for Zinc Finger Protein Motif.The fuzzy membership functions can be customised to suit a specific need. In the simulation, they are designed so that “similar” sequences will have membership function degrees of greater than zero. Of course, the term “similar” is fuzzy, and its definition can be different from one user to another. For example, in a sequence motif detection application for an unknown sequence where no motifs are found, the membership function can be designed so that all “similar” patterns have a membership function degree close to one. This way, a remotely similar pattern would have a membership degree greater than zero and this pattern can be detected.The proposed algorithm has been successfully applied to number of protein sequence motifs searching with motifs presented in the form of “Regular Expression”. This algorithm can also be extended to tackle the searching problems where sequence pattern is defined in an approximate way, as for example, the pattern for promoter sequences in Escherichia coli (E. coli) has been described as [19]:1.all known E. coli promoters that use Eσ70 have at least two of the three most conserved bases in the –10 region. (TataaT, where capital letters represent “most conserved” bases)2.all promoters have at least one of the most highly conserved TTG residues in the –35 region3.those promoters with poor homology to the consensus in the –35 regions are frequently positivelycontrolled by dissociable activators, and4.the promoters used by E. coli Eσ32 during the heat shock response have similar –35 region sequences,but very different –10 region sequences.These four rules provide the basic searching pattern for promoter sequences and the identification of promoters in a DNA sequence is still one of the most difficult tasks in molecular biology research due to its “approximate” nature.6.ConclusionThis paper presents an approximate sequence pattern searching algorithm and it was successfully implemented to perform a protein sequence search for 11 randomly selected protein families. Simulation results show that the proposed algorithm is useful in identifying patterns with variable length wild cards and sequence symbol substitutions. The number of symbols presented in a pattern can also be fuzzified to adjust for the variations in the real world applications.The author is currently working on an adaptive algorithm which “tunes” the membership functions to improve the classification performance. An extension of the proposed algorithm is expected to search for multiple patterns generated from the protein motif extraction algorithm proposed in [8]. The author is also looking to apply this algorithm to the problems of promoter sequence identification.7.References[1]J. Han and M. Kamber. Data Mining: Concepts and Techniques. Academic Press, 2001.[2] D. Gusfield. Algorithms on Strings, Trees, and Sequences. Cambridge University Press, 1997.[3] E. Gately. Neural Network for Financial Forecasting. John Wiley & Sons Inc, 1996.[4]S. Misener and S. A. Krawetz. Bioinformatics: Methods and Protocols. Human Press Inc, 2000[5]S. Salzberg, D. Searls and S. Kasif. Computational Methods in Molecular Biology. Elsevier,1998.[6] B. Lewin. Genes V. Oxford University Press, 1994.[7]K. Hoffman, P. Bucher, L. Falquet and A. Bairoch. The PROSITE database, its status in 1999.Nucleic Acids Research 27:215-219, 1999.[8]Bill C. H. Chang and Saman K. Halgamuge. Protein Motif Extraction with Neuro-FuzzyOptimisation. Bioinformatics, issue 7, vol 18, 2002.[9]S. Henikoff, J. Henikoff and S. Pietrokovski. Blocks: a non-redundant database of proteinalignment blocks derived from multiple compilations. Bioinformatics, 15: 471-479, 1999[10]T. Attwood, D. Flower, A. Lewis, J. Mabey, S. Morgan, P. Scordis, J. Selley, and W. Wright.PRINTS prepares for the new millenium. Nucleic Acids Research, 27: 220-225, 1999.[11] A. Bateman, E. Birney, R. Durbin, S. Eddy, R. Finn and E. Sonnhammer. Pfam 3.1: 1313multiple alignments and profile HMMs match the majority of proteins. Nucleic Acids Research 27:260-262, 1999.[12]S. Altschul, etal. “Basic Local Alignment Search Tool”, in Journal of Molecular Biology. Vol.215, 1990. p.403-410.[13]P. Bork and E. Koonin. Protein sequence motifs. Curr. Opin. Struct. Biol. 6:366-376, 1996.[14] B. Berger and M. Singh. An Iterative Method for Improved Protein Structural Motif Recognition.In Proceedings of RECOMB 1997, p. 37-46, New Mexico, USA, 1997.[15]Prosite Scan Internet site. http://npsa-pbil.ibcp.fr/cgi-bin/npsa_automat.pl?page=npsa_prosite.html[16]S. K. Halgamuge. “Self Evolving Neural Network for Rule Based Data Processing”, in IEEETransactions on Signal Processing, November, 1997.[17]Swiss-Prot Protein Database. /sprot/[18]S. Bohm, D. Frishman and H. Mewes. “Variations of the C2H2 zinc finger motif in the yeastgenome and classification of yeast zinc finger proteins”, Nucleic Acid Research, Vol. 25, no. 12, June 1997. p. 2464-2469.[19]W. McClure. “Mechanism and Control of Transcription Initiation in Prokaryotes”, Annual Reviewin Biochemistry, Vol. 54, 1985. p. 171-204.。
软件学报ISSN 1000-9825, CODEN RUXUEW E-mail: jos@Journal of Software,2011,22(7):1524−1537 [doi: 10.3724/SP.J.1001.2011.03869] +86-10-62562563 ©中国科学院软件研究所版权所有. Tel/Fax:∗可判定的时序动态描述逻辑常亮1+, 史忠植2, 古天龙1, 王晓峰21(桂林电子科技大学广西可信软件重点实验室,广西桂林 541004)2(中国科学院计算技术研究所智能信息处理重点实验室,北京 100190)Decidable Temporal Dynamic Description LogicCHANG Liang1+, SHI Zhong-Zhi2, GU Tian-Long1, WANG Xiao-Feng21(Guangxi Key Laboratory of Trusted Software, Guilin University of Electronic Technology, Guilin 541004, China)2(Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, The Chinese Academy of Sciences, Beijing100190)+ Corresponding author: E-mail: changl@Chang L, Shi ZZ, Gu TL, Wang XF. Decidable temporal dynamic description logic. Journal of Software,2011,22(7):1524−1537. /1000-9825/3869.htmAbstract: The dynamic description logic DDL (dynamic description logic) provides a kind of action theory basedon description logics. It is a useful representation of the dynamic application domains in the environment of theSemantic Web. In order to bring the representation capability of the branching temporal logic into the dynamicdescription logic, this paper treats the time slices of temporal logics as the executions of atomic actions, so that thetemporal dimension and the dynamic dimension can be unified. Based on this idea, constructed over the descriptionlogic ALCQIO, a temporal dynamic description logic, named TD ALCQIO, is presented. Tableau decision algorithm isprovided for TD ALCQIO. Both the termination and the correctness of this algorithm have been proved. The logicTD ALCQIO not only inherits the representation capability provided by the dynamic description logic constructed overALCQIO (attributive language with complements, qualified number restrictions, inverse roles and nominals), but italso has the ability to describe and reason about some temporal features such as the reachability property and thesafety property of the whole dynamic application domains. Therefore, TD ALCQIO provides further support forknowledge representation and reasoning in the environment of the Semantic Web.Key words: dynamic description logic; branching temporal logic; knowledge representation and reasoning;action theory; Tableau decision algorithm摘要: 动态描述逻辑DDL(dynamic description logic)提供了一种基于描述逻辑的动作理论,适用于语义Web环境下对动态领域知识的刻画和推理.为了将分支时序逻辑的刻画能力引入到动态描述逻辑中,将时间的进展体现为原子动作的执行,从而将时序维与动态维统一起来.在此基础上,从描述逻辑ALCQIO出发构建了一个时序动态描述逻辑TD ALCQIO,给出了TD ALCQIO的Tableau判定算法,并证明了算法的可终止性和正确性.TD ALCQIO不仅兼容了构建在描述逻辑ALCQIO基础上的动态描述逻辑的刻画和推理能力,而且还可从可达性、安全性等角度对整个动态领域的时序特征进行刻画和推理,从而为语义Web环境下对动态领域知识的刻画和推理提供了进一步的逻辑支持.∗基金项目: 国家自然科学基金(60903079, 60775035, 60963010, 60803033); 国家高技术研究发展计划(863)(2007AA01Z132);国家重点基础研究发展计划(973)(2007CB311004); 广西自然科学基金(0832006Z)收稿时间:2009-11-03; 修改时间: 2010-03-05; 定稿时间: 2010-05-05常亮等:可判定的时序动态描述逻辑1525关键词: 动态描述逻辑;分支时序逻辑;知识表示和推理;动作理论;Tableau判定算法中图法分类号: TP181文献标识码: A描述逻辑在语义Web中扮演着关键角色,是W3C推荐的Web本体语言OWL(Web ontology language)的逻辑基础.描述逻辑的主要特点在于提供了命题逻辑所无法比拟的刻画能力,同时又保证了相关推理问题的可判定性,并且具有有效的判定算法和推理机制作为支撑.针对Web环境的各种特征和应用需求,研究者提出描述逻辑的各种扩展形式,以期为语义Web提供更为充分的逻辑支持.例如,针对Web环境的开放性提出了非单调描述逻辑[1]和多值描述逻辑[2],针对Web应用中需要处理的不确定信息提出了模糊描述逻辑和粗糙描述逻辑等[3,4].针对描述逻辑只能处理静态领域知识的局限,文献[5]将动态逻辑的刻画成分引入到描述逻辑中,构建了一种动态描述逻辑DDL(dynamic description logic).文献[6]针对描述逻辑ALCO(attributive language with complements and nominals),ALCQO(attributive language with complements, qualified number restrictions and nominals)和ALCQIO(attributive language with complements, qualified number restrictions, inverse roles and nominals)依次给出了相应的动态描述逻辑系统及其Tableau判定算法.文献[7]在DDL中引入了对动作执行过程进行刻画和推理的机制.DDL提供了一类基于描述逻辑的动作理论,为语义Web服务的建模和推理以及在此基础上的服务发现和组合等提供了有效的途径和工具[8].与动态逻辑相对应,时序逻辑是研究动态系统的另外一种有效工具.在对动态系统进行刻画时,动态逻辑主要关注于各个动作的输入与输出、前提条件与执行效果等,而时序逻辑则关注于整个动态系统所表现出来的时序特征.因此,在对描述逻辑进行动态扩展的基础上,如果进一步引入时序逻辑的刻画成分,则将为动态领域的知识刻画和推理提供更强的逻辑支持.但存在的挑战是:动态描述逻辑已经在描述逻辑的基础上引入了一个动态维,形成了具有二维特征的逻辑系统.如果再引入一个时序维,则可能会得到具有三维特征的逻辑系统,而具有三维特征的逻辑系统一般来说是不可判定的[9].针对上述问题,本文借鉴了动态线性时序逻辑DLTL(dynamic linear time temporal logic)中对时间的处理方式[10],将时间的进展体现为原子行动的执行,从而将动态维与时序维统一起来.在此基础上,本文在动态描述逻辑中引入分支时序逻辑的刻画成分,将动作执行过程中蕴含的各种时序关系通过路径量词和时态算子刻画出来.最终,本文在描述逻辑ALCQIO的基础上构建了一个时序动态描述逻辑TD ALCQIO,并为其提供了相应的Tableau判定算法.1 时序动态描述逻辑TD ALCQIOTD ALCQIO的基本符号包括由概念名组成的集合N C、由角色名组成的集合N R、由个体名组成的集合N I以及由原子动作名组成的集合N A.从这些符号出发,可以通过各种构造符递归地生成角色、概念、公式和动作.R−|R i∈N R}.定义1. R是TD ALCQIO中的角色当且仅当R∈N R∪{i定义2. TD ALCQIO中的概念由如下产生式生成:C,D::=C i|{u}|¬C|C D|∀R.C|≤nR.C,其中,C i∈N C,u∈N I,R为角色,n为非负整数.此外,与描述逻辑ALCQIO相同,可以引入形如C D,∃R.C,≥nR.C, 以及⊥的概念,分别作为¬(¬C ¬D),¬(∀R.¬C),¬(≤(n−1)R.C),C ¬C和¬(C ¬C)的缩写.令C i为概念名,D为概念,则称C i≡D为概念定义式.对于由概念定义式组成的任一有限集合T,如果每个概念名最多在T中概念定义式的左边出现1次,则称T为TBox.相对于某个TBox T,如果某个概念名C i出现在T中概念定义式的左边,则称C i为被定义的概念名,否则称其为简单概念名.定义3. TD ALCQIO中的公式由如下产生式生成:ϕ,ψ::=C(u)|R(u,v)|〈π〉ϕ|E(ϕUπψ)|A(ϕUπψ)|¬ϕ|ϕ∨ψ,1526 Journal of Software 软件学报 V ol.22, No.7, July 2011其中,u ,v ∈N I ,C 为概念,R 为角色,π为动作.此外,与动态描述逻辑D-ALCQIO [6]相同,可以引入形如[π]ϕ,ϕ∧ψ, ϕ→ψ,true 以及false 的公式,分别作为¬〈π〉¬ϕ,¬(¬ϕ∨¬ψ),¬ϕ∨ψ,ϕ∨¬ϕ和¬(ϕ∨¬ϕ)的缩写.给定某个TBox T ,如果C 为简单概念名,u ,v ∈N I ,R 为角色,则将形如C (u ),¬C (u ),R (u ,v )以及¬R (u ,v )的公式都称为简单公式.定义4. TD ALCQIO 中的动作由如下产生式生成:π,π′::=α|ϕ?|π⋃π′|π;π′|π*,其中,α∈N A ,ϕ为公式.相对于某个TBox T ,将形如α≡(P ,E )的表达式称为原子动作定义式,其中:(1) α∈N A 表示所定义的原子动作;(2) P 是由公式组成的有限集合,称为该原子动作的前提条件;(3) E 是由简单公式组成的有限集合,称为该原子动作的执行效果.对于由原子动作定义式组成的任一有限集合Ac ,如果每个原子动作名最多在Ac 中某个定义式的左边出现1次,则称Ac 为ActBox.相对于某个ActBox Ac ,如果原子动作α出现在Ac 中某个定义式的左边,则称α是相对于Ac 被定义的.如果公式ϕ中出现的所有原子动作都是相对于Ac 被定义的,则称公式ϕ是相对于Ac 被定义的.TD ALCQIO 的语义结构从整体上体现为由多个可能世界构成的空间.在其中的每个可能世界下都分别对概念名、角色名和个体名进行解释;每个动作被解释为由关于可能世界的序列组成的集合,体现了动作执行过程中可能形成的所有轨迹.形式定义如下:定义5. TD ALCQIO 结构是一个三元组M =(Σ,ΔM ,I ),其中:(1) Σ是形如Σ=(W ,⋅T )的框架,在该框架中:W 是由可能世界组成的非空集合,函数⋅T 将N A 中的每个原子动 作名αi 映射为W 上的某个二元关系T i α⊆W ×W ;(2) ΔM 是由个体组成的非空集合,作为该结构的论域;(3) 函数I 对W 中每个可能世界w 赋予一个解释I (w )=(ΔM ,⋅I (w )),其中的解释函数⋅I (w )满足以下条件: (i) 将N C 中的每个概念名C i 解释为ΔM 的某个子集()I w i C ⊆ΔM ;(ii) 将N R 中的每个角色名R i 解释为ΔM 上的某个二元关系()I w i R ⊆ΔM ×ΔM ;(iii) 将N I 中的每个个体名p i 解释为ΔM 中的某个元素()I w i p ∈ΔM ,并且对于W 中任一可能世界w ′都有()().I w I w i i p p ′=由于p i 的解释与可能世界无关,因而也将()I w i p 简记为.I i p定义6. 给定某个结构M =(Σ,ΔM ,I )以及由该结构中的可能世界构成的任一序列(w 1,w 2,...),如果对于每一对 可能世界w i 和w i +1(其中i ≥1)都存在某个αi ∈N A 使得(w i ,w i +1)∈,T i α则称该序列是M 中的一条轨迹.对于任一轨 迹τ,分别用|τ|和τ[i ]表示τ的长度以及τ中的第i 个可能世界.定义7. 令T 1,T 2是由轨迹组成的两个集合,对它们的混合积T 1×T 2定义如下:T 1×T 2:={(τ1[1],τ1[2],…,τ1[|τ1|],τ2[2],…,τ2[|τ2|])|τ1∈T 1,τ2∈T 2,τ1的长度有限并且τ1[|τ1|]=τ2[1]},即,对于分别来自T 1和T 2的每一对轨迹τ1,τ2,如果τ1的长度有限并且τ1的最后一个可能世界与τ2的第1个可能世界相同,则将这两条轨迹串联起来并且去掉其中重复的可能世界τ2[1].定义8. 令M =(Σ,ΔM ,I )为TD ALCQIO 结构,其中,Σ=(W ,⋅T ).对TD ALCQIO 中的概念、公式和动作的语义归纳定义如下.首先,对于W 中的任一可能世界w ,将任一角色R 解释为ΔM 上的某个二元关系R I (w ),将任一概念C 解释为ΔM 的某个子集C I (w ).归纳定义如下:(1) (R −)I (w ):={(y ,x )|(x ,y )∈R I (w )};(2) {u }I (w ):={u I },其中,u ∈N I ;(3) (¬C )I (w ):=ΔM \C I (w ),其中的“\”为集合差运算;(4) (C D )I (w ):=C I (w )∪D I (w ),其中的“∪”为集合并运算;(5) (∀R .C )I (w ):={x |对于任一y ∈ΔM :如果(x ,y )∈R I (w ),则必然有y ∈C I (w )};常亮 等:可判定的时序动态描述逻辑1527(6) (≤nR .C )I (w ):={x |#{y |(x ,y )∈R I (w )并且y ∈C I (w )}≤n },其中的“#”表示集合的元素个数. 其次,对于W 中的任一可能世界w ,用(M ,w ) ϕ表示公式ϕ在结构M 中的可能世界w 下成立.根据ϕ的结构归 纳定义如下:(7) (M ,w ) C (u ) iff u I ∈C I (w );(8) (M ,w ) R (u ,v ) iff (u I ,v I )∈R I (w );(9) (M ,w ) ¬ϕ iff (M ,w ) ϕ;(10) (M ,w ) ϕ∨ψ iff (M ,w ) ϕ或者(M ,w ) ψ;(11) (M ,w ) 〈π〉ϕ iff 存在M 中某条长度有限的轨迹τ,使得τ∈πT ,τ [1]=w 和(M ,τ [|τ|]) ψ;(12) (M ,w ) E (ϕU πψ) iff 存在M 中的某条轨迹τ以及某个正整数k ,使得τ∈πT ,τ[1]=w ,1≤k ≤|τ|, (M ,τ[k ])ψ,并且对于任意1≤i <k 都有(M ,τ[i ]) ϕ;(13) (M ,w ) A (ϕU πψ) iff 对于M 中的任一轨迹τ :如果τ∈πT 并且τ[1]=w ,则必然存在某个正整数k 使得 1≤k ≤|τ|,(M ,τ[k ]) ψ以及对于任意1≤i <k 都有(M ,τ[i ]) ϕ.最后,对于任一动作π,将其映射为由M 中的轨迹组成的某个集合πT .根据π的结构归纳定义如下:(14) (ϕ?)T :={(w ,w )|(M ,w ) ϕ};(15) (π⋃π′)T :=πT ∪π′T ;(16) (π;π′)T :=πT ×π′T ,其中的“×”为关于轨迹集合的混合积运算;(17) (π*)T :=(π0)T ∪(π1)T ∪(π2)T ∪…,其中,(π0)T :={(w ,w )|w ∈W },并且对于任意i ≥1都有(πi )T :=(πi –1;π)T .称某个TD ALCQIO 结构M =(Σ,ΔM ,I )(其中,Σ=(W ,⋅T ))是某个TBox T 的模型,表示为M T ,当且仅当对于任一C i ≡D ∈T 和任一w ∈W 都有()I w i C =D I (w ).称某个TD ALCQIO 结构M =(Σ,ΔM ,I )(其中,Σ=(W ,⋅T ))是某个ActBox Ac 的模型,表示为M Ac ,当且仅当对于任一α≡(P ,E )∈Ac 都有αT ={(w ,w ′)∈W ×W |① 对于任一ψ∈P 都有(M ,w ) ψ;② 对于任一简单概念名C 都有C I (w ′)= (C I (w )∪{u I |C (u )∈E })\{u I |¬C (u )∈E };以及③ 对于任一角色名R 都有R I (w ′)=(R I (w )∪{(u I ,v I )|R (u ,v )∈E })\{(u I ,v I )|¬ R (u ,v )∈E }}.公式的可满足性问题是TD ALCQIO 中最基本的推理问题,定义如下:定义9. 令T ,Ac 分别为TBox 和ActBox,ϕ为公式,称ϕ相对于T 和Ac 是可满足的,当且仅当存在某个结构 M =(Σ,ΔM ,I )以及该结构中的某个可能世界w 使得M T ,M Ac 和(M ,w ) ϕ.2 TD ALCQIO 的描述和推理能力从上一节关于概念、公式、动作、TBox 以及ActBox 的语法和语义定义可以看出,TD ALCQIO 兼容了构建在描述逻辑ALCQIO 上的动态描述逻辑D-ALCQIO [6]的描述能力.此外,由于TD ALCQIO 中引入了形如E (ϕU πψ)和A (ϕU πψ)的公式,使其进一步具有了分支时序逻辑的刻画能力.具体来说,我们可以引入路径量词E ,A 和时态算子X ,U ,构造出形如EX ϕ,E (ϕU ψ)和A (ϕU ψ)的时序断言,并 将它们定义为TD ALCQIO 中已有公式的缩写形式,即:EX ϕ 〈α1⋃…⋃αn 〉ϕE (ϕU ψ) E (ϕ*1(...)n a a ∪∪Uψ) A (ϕU ψ) A (ϕ*1(...)n a a ∪∪U ψ)其中的α1,…,αn 是N A 中所有的原子动作.将这些缩写带入定义8之后,可以得到如下语义解释:(1) (M ,w ) EX ϕ iff 存在某个原子动作α∈N A 和某个可能世界w ′∈W 使得(w ,w ′)∈αT 并且(M ,w ′) ψ;(2) (M ,w ) E (ϕU ψ) iff 存在M 中的某条轨迹τ以及某个正整数k ,使得τ[1]=w ,1≤k ≤|τ|,(M ,τ[k ]) ψ以及对于任意1≤i <k 都有(M ,τ[i ]) ϕ;(3) (M ,w ) A (ϕU ψ) iff 对于M 中的任一轨迹τ:如果τ[1]=w ,则必然存在某个正整数k 使得1≤k ≤|τ|,1528 Journal of Software软件学报 V ol.22, No.7, July 2011(M,τ[k]) ψ以及对于任意1≤i<k都有(M,τ[i]) ϕ.可见,时序断言EXϕ,E(ϕUψ)和A(ϕUψ)在TD ALCQIO中的语义解释与在CTL中的语义解释[11]类似;区别在于CTL语义结构中无限长的状态序列在这里被限制成与具体动作相关的轨迹.在这些断言的基础上,我们还可以进一步引入时态算子F,G,构造出形如EFϕ,AFϕ,EGϕ,AGϕ和AXϕ的时序断言,分别作为E(true Uϕ),A(true Uϕ),¬AF(¬ϕ),¬EF(¬ϕ)以及¬EX(¬ϕ)的缩写.下面通过例子进一步考察TD ALCQIO的刻画和推理能力.假设某个Web服务系统可以让用户通过信用卡在线购买图书和CD,我们应用TD ALCQIO对其刻画和推理如下:首先给出该系统中涉及的基本符号:• 概念名集合N C={person,creditCard,cd,book,customer,VIPCustomer,instore};• 角色名集合N R={has,bought};• N I是由个体名Tom,Jack,Visa,HarryPotter,BackStreetBoys等组成的有限集合;• N A是由原子动作名buyBook Tom,Har,buyBook Jack,Har,buyCD Tom,Bac,buyCD Jack,Bac,order Har,order Bac等组成的有限集合.在此基础上,可以将该系统中相关的静态领域知识刻画为如下概念定义式:customer≡person ∃has.creditCard,VIPCustomer≡customer ≥3 bought.(cd book).这些概念定义式的直观含义为:顾客是持有信用卡的人员;VIP顾客是购买了至少3本CD或图书的顾客.用T shop表示由这些概念定义式组成的TBox.同时,可以将该系统中提供的原子Web服务刻画为以下原子动作定义式:buyBook Tom,Har≡({customer(Tom),book(HarryPotter),instore(HarryPotter)},{¬instore(HarryPotter),bought(Tom,HarryPotter)}),buyCD Jack,Bac≡({customer(Jack),cd(BackStreetBoys),instore(BackStreetBoys)},{¬instore(BackStreetBoys),bought(Jack,BackStreetBoys)}),order Har≡({(book cd)(HarryPotter),¬instore(HarryPotter),(¬∃bought−.person)(HarryPotter)},{instore(HarryPotter)}).这3个原子服务依次表示:顾客Tom购买图书HarryPotter,顾客Jack购买CD BackStreetBoys,以及商家在库存中没有图书或CD HarryPotter的情况下进行进货.其中,关于order Har的原子动作定义式表示:如果HarryPotter是一本图书或CD,库存中没有HarryPotter,并且HarryPotter没有被任何人买走,则相应的原子服务可以被执行,并且在执行之后产生的效果是使得HarryPotter存在于库存中.其他两个原子动作定义式的直观含义可以类似地得到.类似地,对原子动作buyBook Jack,Har,buyCD Tom,Bac,order Bac等也可以进行相应的定义,分别表示顾客Jack购买图书HarryPotter,顾客Tom购买CD BackStreetBoys,以及商家对BackStreetBoys进行进货.用Ac shop表示由上述原子动作定义式组成的ActBox.在原子动作定义式的基础上,还可以通过顺序、测试、选择、迭代等动作构造符对Web服务组合方案进行刻画.例如,该Web服务系统可能为VIP顾客提供某个Web服务组合方案,用来保证VIP顾客能够买到库存中没有的商品.其中,针对顾客Tom和图书HarryPotter,可以将相应的组合方案VIPbuy Tom,Har刻画为VIPCustomer(Tom)?;((instore(HarryPotter)?;buyBook Tom,Har)∪(¬instore(HarryPotter)?;order Har;buyBook Tom,Har)).该组合方案首先要求VIPCustomer(Tom)成立.接下来,如果instore(HarryPotter)成立,则直接调用Web服务buyBook Tom,Har;否则,先调用Web服务order Har然后再调用buyBook Tom,Har.通过与动态描述逻辑相同的方式,Web 服务组合方案中经常用到的Sequence,Choice,Any-Order,If-Then-Else,Iterate,Repeat-While,Repeat-Until等控制结构都可以在TD ALCQIO中得到刻画[8].常亮 等:可判定的时序动态描述逻辑1529最后,可以通过一些不含有动作的公式对Web 服务系统的当前状态进行刻画,例如, VIPCustomer (Tom ),person (Jack ),creditCard (Visa ),has (Jack ,Visa ),book (HarryPotter ),cd (BackStreetBoys ),¬instore (HarryPotter ),instore (BackStreetBoys ).这些公式表达了以下信息:Tom 是一名VIP 顾客,Jack 是一名持有Visa 信用卡的人员,HarryPotter 是一本没在库存中的图书,BackStreetBoys 是一张在库存中的CD.用A shop 表示由这些公式组成的集合.上面刻画的各部分知识一起形成了一个基于描述逻辑ALCQIO 的动作理论[8],实现了对我们所考察的Web 服务系统的建模.接下来,我们可以分别从动态特征和时序特征两个方面对该系统的性质进行考察.一方面,借助形如〈π〉ϕ和[π] ϕ的动作断言,可以对动作的可执行性问题、投影问题、规划问题等进行刻画.在此基础上,与文献[12]类似,可以借助公式的可满足性问题实现对这些问题的推理.例如,公式〈VIPbuy Tom ,Har 〉true 的直观含义是“动作VIPbuy Tom ,Har 可以被成功地执行”.如果该公式可以按照文献[12]的方式从上面构造的动作理论推导出来,则表明动作VIPbuy Tom ,Har 在Web 服务系统的当前状态下是可以被执行的.再如,公式[VIPbuy Tom ,Har ]bought (Tom ,HarryPotter )的直观含义是“只要动作VIPbuy Tom ,Har 执行结束,公式bought (Tom ,HarryPotter )就一定成立”.同样,如果该公式可以从上面构造的动作理论推导出来,则表明Web 服务系统在当前状态下具有相应的投影性质.最后,如果上面两个公式所刻画的性质都成立,则表明动作VIPbuy Tom ,Har 是Web 服务系统在当前状态下实现目标公式bought (Tom ,HarryPotter )的一个规划.类似地还可以验证,动作“order Har ;buyBook Tom ,Har ”也是在当前状态下实现目标公式bought (Tom ,HarryPotter )的一个规划.另一方面,借助时序动作断言E (ϕU πψ),A (ϕU πψ)以及在其基础上定义的EF ϕ,E (ϕU ψ),AG ϕ等时序断言, 可以对Web 服务系统的可达性和安全性进行刻画和推理.可达性是指系统在运行过程中可以到达某个状态或出现某种情况.例如:公式EF bought (Tom ,HarryPotter )表示Web 服务系统存在某种执行方式使得Tom 买到HarryPotter ; 公式E ((¬∃bought −.person )(HarryPotter ) U bought (Tom ,HarryPotter ))表示存在某种执行方式使得Tom 买到HarryPotter ,且在此之前HarryPotter 没有被任何人购买过;公式AG (EF (∃bought −.customer )(HarryPotter ))表示Web 服务系统总可以达到让HarryPotter 被顾客买走的情况.此外,借助时序动作断言E (ϕU πψ),还可以针对某个具体的动作π来考察相应的可达性.例如,公式E (¬instore (HarryPotter ),Tom Har VIPbuy Uinstore (HarryPotter ))表示存 在动作VIPbuy Tom ,Har 的某种执行方式,使得在执行过程中会出现库存中有HarryPotter 的情况,并且在此之前的库存中一直没有HarryPotter .为了借助公式可满足性问题来实现对可达性的推理,我们需要引入一个前提公式∑来保证Web 服务系统中的各个原子服务在前提条件得到满足的情况下都将被执行.具体来说,对于上面刻画的Web 服务系统,令 α1,…,αn 是TBox T shop 中定义的各个原子动作,令Conj (n a P )表示将原子动作αi 的前提条件n a P 中所有元素合取后得到的合取式.那么相应的前提公式∑为[(α1⋃…⋃αn )*]((Conj (n a P )→〈α1〉true)∧...∧(Conj (n a P )→〈αn 〉true)).在此 基础上,对于任何一个关于可达性的公式ϕ来说,Web 服务系统具有相应的可达性特征当且仅当公式¬(∑→ (Conj (A shop )→ϕ))相对于TBox T shop 和ActBox Ac shop 是不可满足的.应用本文下一节给出的关于公式可满足性的判定算法,可以验证本文考察的Web 服务系统具有前面列举的3个可达性特征.安全性是指在系统运行过程中某些“不好的”情况永远不会发生.例如:公式AG ¬(bought (Jack ,HarryPotter )∧bought (Tom ,HarryPotter ))表示在任何时候图书HarryPotter 都不会既被Jack 购买了又被Tom 购买了;公式AG ((∃bought −.customer )(BackStreetBoys )∨instore (BackStreetBoys ))则表示在任何时候,CD BackStreetBoys 要么被顾客购买了,要么还在库存中.对安全性的推理同样可以借助TD ALCQIO 中公式的可满足性问题来实现.具体来说,对于任何一个关于安全性的公式ψ,Web 服务系统具有相应的安全性特征当且仅当公式¬(Conj (A shop )→ψ)相对于TBox T shop 和ActBox Ac shop 是不可满足的.应用本文下一节给出的判定算法,可以验证这里的Web 服务系统具有上面列举的两个安全1530 Journal of Software软件学报 V ol.22, No.7, July 2011性特征.然而,假如我们将Web服务中的原子动作order Har改写为如下形式:order Har≡({(book cd)(HarryPotter),¬instore(HarryPotter)},{instore(HarryPotter)}),则可以验证,此时的Web服务系统将不再具有公式AG¬(bought(Jack,HarryPotter)∧bought(Tom,HarryPotter))刻画的安全性质.总之,除了具有动态描述逻辑中对动态领域进行刻画和推理的能力之外,TD ALCQIO还可以从可达性、安全性等角度对动态领域的整体特征进行刻画和推理.在对语义Web服务进行建模的基础上,这些刻画和推理能力将为Web服务的发现、组合以及对组合方案的验证等提供有效的支持[8].3 TD ALCQIO的Tableau判定算法令T,Ac分别为TBox和ActBox,ϕinit是相对于Ac被定义的任一公式.本节给出TD ALCQIO的Tableau判定算法,对ϕinit相对于T和Ac的可满足性进行判定.判定算法的基本思路与文献[6,7]相同.为了表述简便,与文献[7]相同,我们首先引入符号nf T,Ac(ϕinit),用来表示将ϕinit中出现的每个被定义的概念名以及每个被定义的原子动作根据T和Ac中的定义式进行替换后得到的公式,从而将ϕinit中涉及到的与T和Ac相关的信息都编码到nf T,Ac(ϕinit)中;由于我们要求TBox T和ActBox Ac 之间不存在循环定义[13],因而这种替换过程总是可以终止的.其次,对于由简单公式组成的任一集合E,用E¬表示集合{ϕ¬|ϕ∈E},其中的ϕ¬表示与公式¬ϕ逻辑等价的简单公式.最后,对于任一个体名u和任一概念C,引入形如@u C的概念;相应的语义定义为:对于任一TD ALCQIO结构M=(Σ,ΔM,I)以及其中的任一可能世界w,如果u I∈C I(w),则(@u C)I(w)=ΔM;否则,(@u C)I(w)=∅.在此基础上,对前缀以及带前缀的公式定义如下:定义10. 任一前缀σ.ε是由动作σ以及关于简单公式的集合ε组成的二元组,并且满足如下产生式规则:σ.ε::= (∅,∅).∅| σ;(P,E).(ε\E¬)∪E,其中,(P,E)是表示成二元组形式的原子动作,σ;(P,E)是新生成的顺序动作,(ε\E¬)∪E是新生成的由简单公式组成的集合.也将(∅,∅).∅称为初始前缀,在下文中表示为σ0.ε0.令σ.ε为前缀,ϕ为公式,则称σ.ε:ϕ为带前缀的公式.在动态描述逻辑D-ALCQIO[6]和EDDL(ALCQO)[7]的判定算法的基础上,本文算法的关键是对TD ALCQIO中形如E(ϕUπψ),¬E(ϕUπψ),A(ϕUπψ)和¬A(ϕUπψ)的公式进行处理,难点在于对这些公式中动作π的执行过程进行监控.针对该难点,下面将这些公式改写为基于有穷自动机的表示形式.首先,对于任一动作π,令∑π是由出现在π中的所有原子动作和测试动作组成的有限集合.在此基础上,将∑π中的每个元素看作一个字符,用L(π)表示通过以下规则定义的由字符串组成的最小集合:①如果α为原子动作或测试动作,则L(α):={α};②L(π⋃π′):=L(π)∪L(π′);③L(π;π′):={l1l2|l1∈L(π)并且l2∈L(π′)};④L(π*):=L(π 0)∪L(π 1)∪L(π 2)∪…,其中,L(π 0)为空串,并且对于任意i≥1都有L(π i):=L(π i−1;π).显然,L(π)中的各个字符串分别对应于动作π的一种执行情况;另一方面,动作π的所有执行情况都包括在L(π)中.对于任一字符串l∈L(π),用|l|表示该字符串的长度,用l[i]表示其中的第i个字母(即出现在第i个位置上的原子动作或测试动作).借助这些符号,可以将定义8中时序动作断言的语义定义改写为以下形式: (12′) (M,w) E(ϕUπψ) iff 存在某个字符串l∈L(π,M中的某条轨迹τ以及某个正整数k,使得:|τ|=|l|+1,对于任意1≤i≤|l|都有(τ[i],τ[i+1])∈l[i]T,τ[1]=w,1≤k≤|τ|,(M,τ[k]) ψ以及对于任意1≤j<k都有(M,τ[j]) ϕ;(13′) (M,w) A(ϕUπψ) iff 对于任一字符串l∈L(π)和M中的任一轨迹τ:如果|τ|=|l|+1,τ[1]=w以及对于任意1≤i≤|l|都有(τ[i],τ[i+1])∈l[i]T,则必然存在某个正整数k使得1≤k≤|τ|,(M,τ[k]) ψ以及对于任意1≤j<k都有(M,τ[j]) ϕ.显然,上述定义分别与定义8(12)和定义8(13)等价,区别在于这里突出了引起状态转换的原子动作和测试动作.常亮等:可判定的时序动态描述逻辑1531其次,从形式语言的角度看,TD ALCQIO中对动作的语法定义满足形式语言中对正规表达式的要求;任一动作π都是字母表∑π上的正规表达式,相应的L(π)则是由π表达的正规集.因此,基于形式语言与自动机理论,对任一动作π都可以相应地构造一个非确定有穷自动机A=(Q,∑π,δ,q0,F),使得该自动机接受的语言(用L(A)表示)恰好为L(π);其中,Q为有穷状态集,∑π是自动机A的输入字符表,δ:Q×∑π→2Q为转移函数,q0∈Q为初始状态,F⊆Q为终结状态集.该构造过程可以在多项式时间内完成,并且自动机A中的状态个数与动作π的长度呈线性关系[14].例如,如果动作π为α1;(ϕ?;(α2⋃α3))*,其中的α1,α2和α3为原子动作,则构造后得到的自动机为A=(Q,Σπ,δ,q0, F),其中,Q={q0,q1,q2},Σπ={α0,α1,α2,ϕ?},F={q1},并且有δ(q0,α1)=q1,δ(q1,ϕ?)=q2,δ(q2,α2)=q1和δ(q2,α3)=q1.对于任一非确定有穷自动机A=(Q,Σπ,δ,q0,F)以及其中的任一状态q∈Q,本文用A(q)表示将A的初始状态置为q之后得到的状态机,即A(q)=(Q,Σπ,δ,q,F).接下来,我们引入形如E(ϕU A(q)ψ)和A(ϕU A(q)ψ)的带自动机的公式,并且在定义8的基础上将它们的语义定义为以下形式:(1) (M,w) E(ϕU A(q)ψ) iff 存在某个字符串l∈L(A(q))、M中的某条轨迹τ以及某个正整数k,使得:|τ|=|l|+1,对于任意1≤i≤|l|都有(τ[i],τ[i+1])∈l[i]T,τ[1]=w,1≤k≤|τ|,(M,τ[k]) ψ以及对于任意1≤j<k都有(M,τ[j]) ϕ;(2) (M,w) A(ϕU A(q)ψ) iff 对于任一字符串l∈L(A(q))以及M中的任一轨迹τ:如果|τ|=|l|+1,τ[1]=w并且对于任意1≤i≤|l|都有(τ[i],τ[i+1])∈l[i]T,则必然存在某个正整数k使得1≤k≤|τ|,(M,τ[k]) ψ以及对于任意1≤j<k都有(M,τ[j]) ϕ.最后,从以上语义定义可以比较直接地得到以下性质:性质1. 对于任一动作π和任一非确定有穷自动机A(q)=(Q,Σ,δ,q,F),如果L(π)=L(A(q)),则公式E(ϕUπψ)和A(ϕUπψ)分别与公式E(ϕU A(q)ψ)和A(ϕU A(q)ψ)逻辑等价.性质2. 对于任一非确定有穷自动机A(q)=(Q,Σ,δ,q,F),存在下面3组关于公式的逻辑等价关系:(A1) A(ϕU A(q)ψ)≡ψ∨(ϕ∧¬(∨α∈Σ,q′∈δ(q,α)〈α〉¬A(ϕU A(q′)ψ)));(A2) 如果q∈F,则E(ϕU A(q)ψ)≡ψ∨(ϕ∧(∨α∈∑,q′∈δ(q,α)〈α〉E(ϕU A(q′)ψ)));(A3) 如果q∉F,则E(ϕU A(q)ψ)≡(ψ∧E(true U A(q)true))∨(ϕ∧(∨α∈Σ,q′∈δ(q,α)〈α〉E(ϕU A(q′)ψ))).需要指出的是,性质2中针对公式E(ϕU A(q)ψ)分别考察了q∈F和q∉F两种情况.其中,当q∉F时,如果当前状态下公式ψ已经成立,则还要求公式E(true U A(q)true)也成立,目的是保证E(ϕU A(q)ψ)语义定义中的“存在某个字符串l∈L(A(q))和M中的某条轨迹τ,使得|τ|=|l|+1并且对于任意1≤i≤|l|都有(τ[i],τ[i+1])∈l[i]T”.与文献[6,7]相同,TD ALCQIO中Tableau判定算法的基本操作是应用扩展规则对分枝逐步进行扩展.下面先给出对分枝及相关术语的定义.定义11. 任一分枝B是由以下4类元素组成的有限集合:(1) 带前缀的公式;(2) 形如X≡〈π*〉ϕ的可能性标记,其中的〈π*〉ϕ是由迭代动作π*和公式ϕ构成的动作断言,X为符号串;(3) 形如u=v的关于个体名的等式,其中, u,v∈N I;(4) 形如u≠v的关于个体名的不等式,其中,u,v∈N I.对于任一分枝B,用B Ind表示由B中关于个体名的所有等式和不等式组成的集合.在此基础上,与文献[7]相同,用*IndB表示B Ind关于等价关系“=”和对称关系“≠”的闭包.此外,对于任一个体名u,用[u]=B表示u在B中关于等价关系“=”的等价类,即[u]=B:={v|u=v∈*Ind }.B对于任一分枝B,如果满足表1~表5中某条扩展规则的前件部分,则可以根据该规则相应地在B中加入带前缀的公式、关于个体名的等式或者关于个体名的不等式.定义12. 如果表1~表5中所有扩展规则都不能对B进行扩展,则称B为饱和的.定义13. 对于任一σ.ε:E(ϕU A(q)ψ)∈B(其中,A(q)=(Q,Σ,δ,q,F)),如果存在某两个前缀σ1.ε1,σ2.ε2以及自动机A(q)中的某个状态q′,使得ε1=ε2,q′∈F,σ1.ε1:E(ϕU A(q′)ψ)∈B和σ2.ε2:ψ∈B,则称σ.ε:E(ϕU A(q)ψ)在B中被实现了;否则,称σ.ε:E(ϕU A(q)ψ)为在B中未被实现的带自动机的存在性时序动作断言.。
FPGA Implementation of a FaithfulPolynomial Approximation for Powering Function ComputationJ.A.Pi˜n eiro,J.D.Bruguera,J.M.MullerDepartment of Electrical and Computer EngineeringUniv.Santiago de Compostela,Spain.CNRS,Project CNRS/ENSL/INRIA/ARENAIRELIP,Ecole Normale Superieure de Lyon,France.e-mail:alex,bruguera@c.es,jmmuller@ens-lyon.frAbstractA FPGA implementation of a method for the calculationof faithfully rounded single-precisionfloating-point power-ing()is presented in this paper.A second-degree min-imax polynomial approximation is used,together with theemployment of table look-up,a specialized squaring unitand a fused accumulation tree.The FPGA implementationof an architecture with a latency of3cycles and a through-put of one result per cycle has been performed using a Xil-inx XC4036XL device.The implemented unit has an opera-tion frequency over33MHz.1.IntroductionFPGAs are re-programmable devices which can imple-ment very complex computations in a single chip,and becustomized nearly instantaneously.Their short develop-ment time and cost-effectiveness for low-volume produc-tion make them suitable devices for rapid-prototyping andtesting.But theirflexibility,improving capacity and per-formance are opening up a new line in high-performancecomputation:reconfigurable computing.Reconfigurable computing[2]is intended tofill thegap between hardware and software,achieving muchhigher performance than standard microprocessors hard-ware,while retaining much of theflexibility of a softwaresolution.Reconfigurable systems usually consist of a com-bination of reconfigurable logic(FPGAs or other customFunction Look–Up Table size square root()8reciprocal()9inverse square()10Figure1.Pipelined unit scheme(computation;)the coefficients,and leads to an important reduction both in the look-up tables to employ and in the combinational hard-ware which performs the computation of the expression(1).An exhaustive error computation,already taking into ac-count thefinite wordlength of the stored coefficients and the error introduced on the later computation of the second–degree polynomial,has been performed[11]to guarantee that thefinal results are faithfully rounded to the nearest. We perform this rounding scheme by adding a1to the in-termediate result in position,and then truncating the intermediate result at position.The specific value of parameter depends on the range of the output results,and so on the function to compute.This addition can be per-formed with no cost,by storing=instead of in the corresponding look-up table.Summarizing,there is a minimum value of which al-lows the achievement of faithfully rounded results for each function.Once is set,the minimax approximation for the specific function is computed using Maple,minimizing the wordlengths of the coefficients and within the error bound.A Maple program can automatically generate the binary values of the coefficients to store in the look-up ta-bles,whose sizes are determined by the value of parameter and by the exact wordlengths of,and.The look-up table size is different for each function,as shown in Table1.Once the coefficients have been obtained,and the computation of has been performed in parallel,we propose to compute the expression(1)by means of an ac-cumulation tree with the partial products both of and as inputs,together with the coefficient.The method presented here can easily be generalized.It allows to get small order and coefficients(or higher or-der coefficients for polynomials of degree more than), which results in smaller and faster multipliers,and also smaller tables.3.ArchitectureWe describe in this section an architecture with a latency of3cycles and a throughput of one result per cycle im-plementing our method to faithfully compute the powering function()in a single–precisionfloating–point format. Wefirst describe the composition and operationalflow of the three stages of our architecture,and then outline the in-ternal structure of the main logic blocks employed.Figure1 shows this pipelined unit scheme for the specific case of in-verse square root computation(=;=8).There are three stages in our architecture.In thefirst one, three tables are addressed using the upper part of the input operand,,to look-up the coefficients,and, which are then stored in their corresponding registers.The computation of producing a result in carry-save repre-sentation is performed by a specialized squaring unit in par-allel with the table look-ups,and the recoding to signed–digit radix4of this result is later performed.This value, which is also stored in a register,is going to be employed as a multiplier in the next stage to generate the partial products of.A binary to SD radix4recoding of is also per-formed in parallel with the table look-ups and the com-putation,and the result of this recoding is also stored.This value is going to be employed as a multiplier for the partial products generation in the next stage.The critical path of this stage is going to be the slower one between the path containing the table look-ups and registers and the path containing the specialized squaring unit,the CS to SD-4re-coding unit and the corresponding registers.The second stage is the accumulation tree stage,begin-ning with the partial products generation(both ofand of),and then performing the accumulation of all these products plus the coefficient.The reduction on the coefficients wordlength due to the employment of the min-imax approximation allows an important reduction on the area of the accumulation tree.As shown in Figure1,4:2 compressors are employed in this tree.The critical path of this stage comprises a partial product generation and three levels of4:2carry-save adders(CSAs).The last stage of this pipelined architecture consists of the assimilation from redundant arithmetic(carry-save)to a non-redundant(binary)representation of the result,which is guaranteed to be faithfully rounded.This assimilation is carried out by a carry-prefix adder(CPA).It is possible to design architectures for the computation of any two different functions(or)by only du-plicating the table size and inserting multiplexers to select between the corresponding coefficients,requiring only mi-nor modifications on the accumulation tree.Look-up tablesThe three polynomial coefficients of the minimax approx-imation are looked-up in tables addressed by the=8bit word,and their minimum wordlength is set by the error computation,in order to guarantee faithful rounding to the nearest of the results.For the inverse square root function, ,and wordlengths are26,14and7,respectively, which leads to a total table size of,as shown in Table1.For a second–degree Taylor approximation,coefficients of wordlengths 29,19and11are necessary to guarantee faithfully rounded results.These coefficients wordlengths not only lead to the employment of bigger tables,but also increase the size of the unit which performs the computation of the second–degree polynomial(1).Specialized squaring unitSome strategies can be employed to reduce the amount of CLBs and delay of the unit which computes the squaring of the lower part of the input operand,.These strategies[5] go from re-arranging the partial product matrix to taking advantage of the leading zeros of.Since,the partial product matrix is sym-metric with respect to the anti-diagonal,which allows the reduction of the number of partial products to be accumulated by using the identitiesand.The original matrix(Fig-ure2(a))can be replaced by an equivalent matrix(Fig-ure2(b))consisting of the partial products on the anti-diagonal,plus the partial products above it shifted one position to the left.To further reduce the size and delay of the equiva-lent matrix,it is also important to notice that,and therefore has m lead-ing zeros.This means that will have at least2m leading zeros.The exhaustive error computation per-formed[11]shows that the least significant bits ofare not important to thefinal result.Therefore,it is1a 1a a 2a 2a 2a 2a 2a 2a 2a 22a a 34a a 56a 7a 8a 8a 7a 6a a 54a a 3a 23a 3a 3a 3a 3a 3a 3a 3a 4a 4a 4a 4a 4a 4a 4a 4a a 5a 5a 5a 5a 5a 5a 5a 56a 6a 6a 6a 6a 6a 6a 6a 7a 7a 7a 7a 7a 7a 7a 7a 8a 8a 8a 8a 8a 8a 8a 8a 2a 2a 2a 2a 2a 2a a 3a 3a 3a 3a 3a 31a 4a 4a 4a 4a 4a 4a a 5a 5a 5a 5a 5a 56a 6a 6a 6a 6a 6a 7a 7a 7a 7a 7a 7a 8a 8a 8a 8a 8a 8a 1a 1a 1a 1a 1a 1a 1a 1a 1a 1a 1a 1a 1a (a)Original matrix1a a 2a 34a a 56a 7a 8a a 2a 34a a 56a 7a 8a 8a 8a 8a 8a 8a 8a a 2a 2a 2a 2a 27a 7a 7a 7a 7a a 3a 3a 36a 6a 6a 4a 4a a 5a 5a 54a a 2a 3a 34a 4a a 5a 5a 54a a 36a 6a 7a 6a 01a 1a 1a 1a 1a 1a 1a (b)Equivalent matrixFigure 2.squaring matricesFigure 3.Specialized squaring unit operation ()C 1C 2x 2C 2x 21111111111116 pp8 pp11112-15-20-25-30-352222-52-1022Figure 4.Partial products to be accumulated ()possible to truncate the partial products of the squar-ing computation at position .The error intro-duced by this truncation (slightly corrected by adding a1at position,as shown in Figure 3)has been proved to be bounded byThe partial products on the right of the dashed vertical line are not accumulated,so only two levels of adders (the first one,with 3:2CSAs;the second one,with 4:2CSAs),plus an initial and stage (),are required to generate the carry–save (CS)representationof.Fused accumulation treeOne of the main features of our method is the employmentof a unified accumulation tree to perform the computation of (1).The total delay and number of CLBs needed for this unit are smaller than the corresponding to employ two small multipliers and a final adder for performing this com-putation.This is due to our having taken advantage of thenumber of leading zeros of(leading zeros)and of (leading zeros),and we have employed SD radix 4rep-resentation to reduce the number of partial products to be accumulated.For the inverse square root (=),has 15signif-icant bits.Therefore,has 16leading zeros,and a wordlength of 12bits.Employing SD radix 4recoding,these wordlengths lead to a generation of 8partial prod-ucts to computeand 6partial products to compute .The total number of products to be accumulated is 15(;the last one is the coefficient ),and so only3levels of 4:2CSA adders are needed to perform the whole computation.The critical path of this stage (which is the critical path of the unit,as will shown in Section 4)consists of the partial product generation (ppThe arrangement of the matrix of partial products to beaccumulated is shown in Figure 4.This figure shows the effect of the reduction on the wordlengths of coefficients and on the total area of the accumulation tree due to the employment of the minimax approximation (affectsto 6partial products,whileis involved in the generation of 8).The circles at the beginning of each word mean a complement of the sign bit,and the ones must be added at the specific positions shown to avoid the sign-extension of the partial products to be accumulated [7],allowing an important area reduction.Final assimilationThe final assimilation of the result from redundant arith-metic (carry-save representation)to binary representation is performed by a carry prefix adder (CPA ).This unit has an internal structure [6]consisting of 5levels of depth,where advantage is taken from the fact that there is no input carry.The prefix–adder internal structure,as shown in Figure 5,consists of an initial stage where (carry–generate )and(carry–propagate )bits are generated from andFigure5.Internal structure of a CPA with8-bit input wordsbits:with,being the wordlength of the input operands.After this initial stage,levels of and group bits are generated:with meaning here not single bits,but groups(from bit to bit,as shown in Figure5).Thefinal stage performs the computation of the output bits(the result of the addition):This can be done because and,so the addition operation()is being performed.The employment of a CPA structure allows a significant reduction on the delay of this unit,which is proportional to .Ripple–carry adders have a delay directly related to.4.FPGA Implementation and ResultsThe pipelined architecture we have described in the last section has been synthesized following a VHDL–based de-signflow[12][13].First of all,the logic blocks have been described in VHDL,and simulated to guarantee a correct behaviour.Then a single VHDLfile has been written,hi-erarchically connecting all the blocks,and inserting the se-quential logic needed to pipeline the architecture.After ex-haustive simulation,this architecture has been synthesized employing a Synopsys FPGA Compiler tool.The resultingfile of this pre-layout synthesis has been then introduced in Design Manager and Flow Engine com-puter tools from Xilinx,where it has been mapped,placed and routed into a Xilinx XC4036XL FPGA device[16].A cycle time of30.2ns has been achieved as a result of this implementation,which leads to an operation frequency over33MHz,employing a total number of1130config-urable logic blocks(CLBs).Table2shows the results of implementing each stage into a XC4036XL FPGA device, and the total number of CLBs and minimum clock cycle for the complete pipelined architecture.This table illustrates the importance of employing a second–degree approximation:the look-up tables(first stage)require twice the number of CLBs than stages2and 3together;if afirst–degree approximation had been em-ployed(tables with11input bits:8times the size of ourPipeline stage clock period(ns)764Second stage30.2811130。
Approximating Description Logic Classification forSemantic Web ReasoningPerry Groot1,Heiner Stuckenschmidt2,and Holger Wache21Radboud University Nijmegen,Toernooiveld1,6500GL Nijmegen,The NetherlandsPerry.Groot@science.ru.nl,2Vrije Universiteit Amsterdam,de Boelelaan1081a,1081HV Amsterdam,The Netherlands{holger,heiner}@cs.vu.nl.Abstract.In many application scenarios,the use of the Web ontology languageOWL is hampered by the complexity of the underlying logic that makes reason-ing in OWL intractable in the worst case.In this paper,we address the questionwhether approximation techniques known from the knowledge representation lit-erature can help to simplify OWL reasoning.In particular,we carry out experi-ments with approximate deduction techniques on the problem of classifying newconcept expressions into an existing OWL ontology using existing Ontologies onthe web.Our experiments show that a direct application of approximate deduc-tion techniques as proposed in the literature in most cases does not lead to animprovement and that these methods also suffer from some fundamental prob-lems.1Introduction and MotivationA strength of the current proposals for the foundational languages of the Semantic Web is that they are all based on formal logic.This makes it possible to formally reason about information and derive implicit knowledge.However,this reliance on logics is not only a strength but also a weakness.Traditionally,logic has always aimed at modelling idealised forms of reasoning under idealised circumstances.Clearly,this is not what is required under the practical circumstances of the Semantic Web.Instead,the following are all needed:–reasoning under time-pressure–reasoning with other limited resources besides time–reasoning that is not‘perfect’but instead‘good enough’for given tasks under given circumstancesIt is tempting to conclude that symbolic,formal logic fails on all these counts,and to abandon that paradigm.Our aim is to keep the advantages of formal logic in terms of definitional rigour and reasoning possibilities,but at the same time address the needs of the Semantic Web.Research in the past few years has developed methods with the above properties while staying within the framework of symbolic,formal logic.However,many of thosepreviously developed methods have never been considered in the context of the Seman-tic Web.Some of them have only been considered for some very simple underlying description languages [1].As the languages proposed for modelling Ontologies in the Semantic Web are becoming more and more complex,it is an open question whether those approximation methods are able to meet the practical demands of the Semantic Web.In this article,we look at approximation methods for Description Logics (DLs),which are closely related to some of the currently proposed Semantic Web languages,e.g.,OWL.The remainder of the article is structured as follows.Section 2gives an overview of various approximation approaches and techniques useful in the context of the Semantic Web.Thereafter,the article focuses on the investigation of a particular approximation technique in the context of a particular reasoning task.Section 3describes the approx-imation approach used.Section 4describes the reasoning task focused on.Section 5gives experimental results of the approximation method applied to the classification of concepts in a number of Ontologies.Section 6gives conclusions,discusses the results and the applicability of the analysed approximation approach to the Semantic Web.2Approximation ApproachesFig.1.Architecture of a KR system based on Description Logic together with possi-ble approximation approaches.A typical architecture for a KR system based on DLs can be sketched as in Fig-ure 1[2],which contains three compo-nents that can be approximated to obtain a simplified system that is more robust andmore scalable.These components are:(1)the underlying description language,(2)the knowledge base,and (3)the reasoner.The knowledge base itself comprises twocomponents (TBox and ABox),whichcan also be approximated as a whole or separately.Some general approximation techniques that can be applied to one or more of these components are the following:Language Weakening:The idea of language weakening is based on the well-known trade-off between the expressiveness and the reasoning complexity of a logical lan-guage.By weakening the logical language in which a theory is encoded,we are able to trade the completeness of reasoning against run-time.For example,[3]shows how hierarchical knowledge bases can be used to reason approximately with disjunctive in-formation.The logic that underlies OWL Full for example is known to be intractable,reasoners can use a slightly weaker logic (e.g.,OWL Lite)that still allows the com-putation of certain consequences.This idea can be further extended by starting with a very simple language and iterating over logics of increasing strength supplementing previously derived facts.Knowledge Compilation:In order to avoid complexity at run-time,knowledge compi-lation aims at pre-processing the ontology off-line such that on-line reasoning becomes faster.For example,this can be achieved by explicating hidden knowledge.Derived facts are added to the original theory as axioms,avoiding the need to deduce them again.In the case of ontological reasoning,implicit subsumption and membership rela-tions are good candidates for compilation.For example,implicit subsumption relations in an OWL ontology could be identified using a DL reasoner,the resulting more com-plete hierarchy could be encoded e.g.,in RDF schema and used by systems that do not have the ability to perform complex reasoning.This example can be considered to be a transformation of the DL language.When one transforms an ontology into a less expressive DL language[4,5],this often results in an approximation of the original ontology.Approximate Deduction:Instead of modifying the logical language,approximations can also be achieved by weakening the notion of logical consequence[1,6].The approximated consequences are usually characterised as sound but incomplete,or complete but unsound.Only[1]has made some effort in the context of DLs.The approximation method focused on in this article belongs to the last category, however there is not always a clear classification of one method to the three categories defined above.In the following section we discuss in more detail what is meant by approximating DLs in the remainder of this paper.3Approximating Description LogicsThe elements of a DL are concept expressions and determining their satisfiability is the most basic task.Other reasoning services(e.g.,subsumption,classification,instance re-trieval)can often be restated in terms of satisfiability checking[2].With approximation in DLs,we mean determining the satisfiability of a concept expression through some other means than computing the satisfiability of the concept expression itself.This use of approximation differs with other work on approximating DLs[4,5]in which a con-cept expression is translated to another concept expression,defined in a second typically less expressive DL.In our approach(originally proposed in[1]),in a DL only other,somehow‘related’, concept expressions can be used that are in some way‘simpler’when determining their satisfiability.For example,a concept expression can be related to another concept ex-pression through its subsumption relation,and a concept expression can be made sim-pler by either forgetting some of its subconcepts or by replacing some of its subconcepts with simpler concepts.In particular,there are two ways that a concept expression C can be approximated by a related simpler concept expression D.Either the concept expres-sion C is approximated by a weaker concept expression D(i.e.,less specific,C D) or by a stronger concept expression D(i.e.,more specific,D C).When C D,un-satisfiability of D implies unsatisfiability of C.When D C,satisfiability of D implies satisfiability of C.Note that this is similar to set theory.For two sets C,D,when C⊆D holds,emptiness of D implies emptiness of C,and when D⊆C holds,non-emptiness of D implies non-emptiness of C.In[1]Cadoli and Schaerf propose a syntactic manipulation of concept expressions that simplifies the task of checking their satisfiability.The method generates two se-quences of approximations,one sequence containing weaker concepts and one sequence containing stronger concepts.The sequences of approximations are obtained by substi-tuting a substring D in a concept expression C by a simpler concept.More precisely,for every substring D they define the depth of D to be‘the number of universal quantifiers occurring in C and having D in its scope’[1].The scope of ∀R.φisφwhich can be any concept term containing ing the definition of depth a sequence of weaker approximated concepts can be defined,denoted by C i,by replacing every existentially quantified subconcept,i.e.,∃R.φwhereφis any concept term,of depth greater or equal than i by .Analogously,a sequence of stronger approximated concepts can be defined,denoted by C⊥i,by replacing every existentially quantified subconcept of depth greater or equal than i by⊥.The concept expressions are assumed to be in negated normal form(NNF)before approximating them.These definitions lead to the following result:Theorem1.For each i,if C i is unsatisfiable then C j is unsatisfiable for all j≥i, hence C is unsatisfiable.For each i,if C⊥i is satisfiable then C⊥j is satisfiable for all j≥i,hence C is satisfiable.These definitions are illustrated by the following concept expression in NNF taken from the Wine ontology3Merlot≡Wine ≤1madeFromGrape. ∃madeFromGrape.{MerlotGrape},which states that a Merlot wine is a wine that is made from the Merlot grape and no other grape.This concept expression contains no∀-quantifiers.Therefore the depth of the only existentially quantified subconcept‘∃madeFromGrape.{MerlotGrape}’is0. Substituting either or⊥leads to the following approximations for level0:Merlot 0≡Wine (≤1madeFromGrape. ) ,Merlot⊥0≡Wine (≤1madeFromGrape. ) ⊥.No subconcepts of level1appear in the concept expression for Merlot.Therefore, Merlot 1and Merlot⊥1are equivalent to Merlot.The nesting of existential and uni-versal quantifiers is an important measure of the complexity of satisfiability checking when considered from a worst case complexity perspective[8].This is a motivation for Cadoli and Schaef to make their specific substitution choices.Furthermore,they are able to show a relation between C i-and C⊥i-approximation and their multi-valued logic based on S-1-and S-3-interpretations[1].Therefore,properties obtained for S-1-and S-3-approximation also hold for C i-and C⊥i-approximation.These properties include the following:(1)Semantically well founded,i.e.,there is a relation with a logic that can be used to give meaning to approximate answers;(2)Computationally attractive, i.e.,approximate answers are cheaper to compute than the original problem;(3)Dual-ity,i.e.,both sound but incomplete and complete but unsound approximations can be 3A wine and food ontology which forms part of the OWL test suite[7].constructed;(4)Improvable,i.e.,approximate answers can be improved while reusing previous computations;(5)Flexible,i.e.,the method can be applied to various problem domains.These properties were identified by Cadoli and Schaerf to be necessary for any approximation method.Although the proposed method by Cadoli and Schaerf[1]satisfies the needs of the Semantic Web identified in Section1in theory,little is known about the applicability of their method to practical problem solving.Few results have been obtained for S-1-and S-3-approximation when applied to propositional logic[9–11],but no results are currently known to the authors when their proposed method is applied to DLs.Current work focuses on empirical validation of their proposed method.Furthermore,DLs have changed considerably in the last decade.Cadoli and Schaerf proposed their method for approximating the language ALE(they also give an extension for ALC),but ALE has a much weaker expressivity then the languages that are currently proposed for ontology modeling on the Semantic Web such as OWL.The applicability of their method to a more expressive language like OWL is an open question.Current work takes the method of Cadoli and Schaerf as a basis and focuses on extending it to more expressive DLs. 4Approximating ClassificationThe problem of classification is to arrange a complex concept expression into the sub-sumption hierarchy of a given TBox.We choose this task for two reasons.First,the worst-case complexity of a classification algorithm for expressive representation lan-guages like OWL-DL is known to be intractable.Efficient alternatives have only been proposed for subsets of DLs[12].Second,classification is a very important part of many other reasoning services and applications.For example,classification is used to generate the subsumption hierarchy of the concept descriptions in an ontology.Furthermore,classification is used in the task of retrieving instances.From a theoretical point of view,checking whether an instance i is member of a concept Q can be done by proving the unsatisfiability of¬Q(i).Doing this for all existing instances,however,is intractable.Therefore,most DL systems use a process that reduces the number of instance checks.It is assumed that the ontology is classified and all instances are assigned to the most specific concept they belong to. Instance retrieval is then done byfirst classifying the query concept Q in the subsump-tion hierarchy and then selecting the instances of all successors of Q and of all direct predecessors of Q that pass the membership test in Q.We conclude that there is a lot of potential for approximating the classification task.In the following,wefirst describe the process of classification in DL systems.Af-terwards we explain how the approximation technique introduced in Section3can be used to approximate(part of)this problem.For classifying a concept expression Q into the concept hierarchy(Algorithm1)a number of subsumption tests are required for comparing the query concept with other concepts C i in the hierarchy.As the classification hierarchy is assumed to be known, the number of subsumption tests can be reduced by starting at the highest level of the hierarchy and to move down to the children of a concept only if the subsumption test is positive.The most specific concepts w.r.t.the subsumption hierarchy which passed theAlgorithm1classificationRequire:A query concept QV ISITED:=∅R ESULT:=∅G OALS:={ }while Goals=∅doC∈Goals where{direct parents of C}⊆VisitedG OALS:=Goals\{C}V ISITED:=Visited∪{C}if subsumed-by(Q,C)thenG OALS:=Goals∪{direct children of C}R ESULT:=(Result∪{C})\{all ancestors of C}end ifend whileif|Result|=1∧subsumed-by(C,Q)thenE QUAL:=‘yes’elseE QUAL:=‘no’end ifreturn Equal,Resultsubsumption test are collected for the results.At the end of the algorithm,we check if the result is subsumed by Q as this implies that both are equal.Algorithm1contains more than one step that can be approximated.For example, the subsumption tests,represented by subsumed-by(X,Y)in the algorithm,can be approximated using the method of Cadoli and Schaerf.The subsumption test Q C can be reformulated into the unsatisfiability test of Q ¬C(Algorithm2).The idea is to replace standard subsumption checks by a series of approximate checks of increasing exactness.In particular,we use weaker approxima-tions C i in the C -subsumption algorithm(Algorithm3)and stronger approxima-tions C⊥i in the C⊥-subsumption algorithm(Algorithm4).(Note the difference be-tween Algorithm2and Algorithms3and4.)The approximations are easily constructed in a linear way.When the approximation at a certain level I does not lead to a conclusion (based on Theorem1)the level I is increased by one.This is repeated until the original concept expression is obtained,i.e.,the exact subsumption test has to be performed.Al-gorithm5integrates both approximations in one procedure.The approximate versions, Algorithm2subsumptionRequire:A complex concept expression CRequire:A Query QC URRENT:=Q ¬CR ESULT:=unsatisfiable(Current)return ResultAlgorithm3C -subsumption Require:A complex concept expression C Require:A Query QI:=0repeatC URRENT:=(Q ¬C) IR ESULT:=unsatisfiable(Current)if Result=‘true’thenbreakend ifI:=I+1until Current=Q ¬Creturn ResultAlgorithm4C⊥-subsumption Require:A Query QI:=0repeatC URRENT:=(Q ¬C)⊥IR ESULT:=unsatisfiable(Current)if Result=‘false’thenbreakend ifI:=I+1until Current=Q ¬Creturn ResultAlgorithm5C⊥I-C I-subsumption Require:A complex concept expression C Require:A Query QI:=0repeatC URRENT:=(Q ¬C)⊥IR ESULT:=unsatisfiable(Current)if Result=‘false’thenbreakend ifC URRENT:=(Q ¬C) IR ESULT:=unsatisfiable(Current)if Result=‘true’thenbreakend ifI:=I+1until Current=Q ¬Creturn Resulti.e.,C -subsumption ,C ⊥-subsumption ,and C ⊥I -C I -subsumption will re-place the method subsumed-by in Algorithm 1in the forthcoming experiments.FactQueryRacer Fig.2.Architecture of experimental setup.While these approximate versions can in principle be applied to all occurrencesof subsumption tests,we restricted the useof approximations to the first part of thealgorithm where the query concept is clas-sified into the hierarchy.Each DL reasoner (e.g.,Fact [13],Racer [14,15])implements the classifi-cation functionality internally.In orderto obtain comparable statements aboutapproximate classification,independentlyfrom the implementation of a particularDL reasoner,which may use highly opti-mised heuristics,we implement our ownand independent classification method.The classification procedure was built ontop of an arbitrary DL reasoner accord-ing to Algorithm 1,which can call the various approximation forms stated in Al-gorithms 3,4,and 5The satisfiabilitytests are propagated to the DL reasonerthrough the DIG interface [16]as depicted in Figure 2.5ExperimentsThe main question focused on in the experiments is which form of approximation,i.e.,C i ,C ⊥i ,or their combination,can be used to reduce the complexity of the classification task.The focus of the experiments will not be on the overall computation time,but on the number of operations needed.The goal of approximation is to replace costly reasoning operations by a (small)number of cheaper approximate reasoning operations.The suitability of the method of Cadoli and Schaerf therefore depends on the number of classical reasoning operations that can be replaced by their approximate counterparts without changing the result of the computation.In the experiments queries are generated automatically.The system randomly se-lects a number of concept descriptions from the loaded ontology.These definitions are used as queries and are reclassified into the subsumption hierarchy.Note that the queries are first randomly selected,then they are used in the experiments with all forms of ap-proximation.The first experiments were made with the TAMBIS ontology in which we (re)classified 16unfoldable concept definitions.4Only the approximation method orig-inally suggested by Cadoli and Schaerf [1]for ALE (described in Section 3)was used.4A biochemistry ontology developed in the TAMBIS project [17].The results of the first experiments are shown in Table 1,which is divided into four columns.Each column reports the number of subsumption tests when using a certain form of approximation.The first column reports results for the experiment with normal classification (i.e.,without approximation),the second column for C ⊥i -approximation,the third column for C i -approximation,and the fourth column for a combination of C ⊥i -and C i -approximation.Each column of Table 1is divided into a number of smaller rows and columns.The rows represent the level of the approximation used,where N denotes normal subsump-tion testing,i.e.,without approximation.The columns represent whether the subsump-tion test resulted in true or false.5This distinction is important,because Theorem 1tells us that only one of those two results will immediately lead to a reduction in complexity,while for the other result approximation has to continue at the next level.This continues until no more approximation steps can be done.Table 1.Subsumption tests for the reclassification of 16concepts in TAMBIS.normalC ⊥i C i C ⊥i &C i true falsetrue false true false true false C ⊥015732C 08181C ⊥015732Tambis (16)C ⊥100C 100C 08149N 24279N 24247N 16279N 16247The first column shows that for the reclassification of 16concepts in the TAMBIS ontology,24true subsumption tests and 279false subsumption tests were needed.The second column shows that C ⊥i -approximation leads to a change in normal sub-sumption pared to the normal case,the number of false subsumption tests are reduced from 279to 247.However,the 24true subsumption tests are not reduced.Notethat 32(279-247)false subsumption tests are replaced by 157true C ⊥0-subsumption tests and 32false C ⊥0-subsumption tests.6The third column shows that C i -approximation also leads to a change in normalsubsumption tests,but quite different when compared to C ⊥i -approximation.With C i -approximation we reduce the true subsumption tests from 24to 16.However,the 279false subsumption tests are not reduced.Note that 8(24-16)true subsumption tests arereplaced by 8true C 0-subsumption tests and 181false C 0-subsumption tests.Analo-gously to C ⊥i -approximation,no C i -approximation was used when this would not leadto a change in the subsumption expression.The fourth column shows the combination of C ⊥i -and C i -approximation by usingthe approximation sequence C ⊥0,C 0,C ⊥1,C 1,...,C ⊥n −1,C n −1,normal .This combination 5We will use the shorthand ‘true subsumption test’and ‘false subsumption test’to indicate these two distinct results.6Note that the numbers do not add up.The reason for this is that approximation is not used when there is no change in the subsumption expression after approximation,i.e.,when C ⊥i =C the DL reasoner is not called and no subsumption check for C ⊥i is performed.leads to a reduction of normal subsumption tests,which is the combination of the re-ductions found when using C⊥i-or C i-approximation by itself.The true subsumption tests are reduced from24to16and the false subsumption tests are reduced from279 to247.Note that the reduction of8(24-16)true subsumption tests and32(279-247) are now replaced by157true C⊥0-subsumption tests,32false C⊥0-subsumption tests,8 true C 0-subsumption tests,and149false C 0-subsumption tests.5.1Analysis of C⊥i-/C i-ApproximationThe approximation of concept classification in the TAMBIS ontology using the method of Cadoli and Schaerf reveals at least four points of interest.First,Table1shows that using C⊥i-approximation can only lead to a reduction of the false subsumption tests and C i-approximation can only lead to a reduction of the true subsumption tests.These results could be expected as they follow from Theorem1and are reflected by Algo-rithm3,4,ing Theorem1we have the following reasoning steps for C⊥i-approximation:Query Concept⇔(Query ¬Concept)is satisfiable⇐(Query ¬Concept)⊥i is satisfiable.Hence,when(Query ¬Concept)⊥i is not satisfiable,nothing can be concluded and approximation can not lead to any gain.Using Theorem1we have the following reasoning steps for C i-approximation: Query Concept⇔(Query ¬Concept)is not satisfiable⇐(Query ¬Concept) i is not satisfiable. Hence,when(Query ¬Concept) i is satisfiable,nothing can be concluded and approximation can not lead to any gain.Second,no approximations are used on a level higher than zero.This is a direct consequence of the TAMBIS ontology containing no nested concept definitions.Further on,we show this to be the case for most ontologies found in practice.Third,both C⊥i-and C i-approximation are not applied in all subsumption tests that are theoretically possible.With normal classification303(24+279)subsump-tion tests are needed.However,with C⊥i-approximation in only189(157+32)cases approximation was actually used.In the remaining114(303-189)cases approxima-tion had no effect on the concept definitions,i.e.,C⊥i=C,and no test was therefore performed.Hence,in38%of the subsumption tests,approximation was not used.Sim-ilar observations hold for C i-approximation.This observation indicates that C⊥i-/C i-approximation is not very useful(at least for the TAMBIS ontology)for approximating classification in an ontology.Fourth,apart from the successful reduction of normal subsumption tests,we must also consider the cost for obtaining the reduction.For example,with C⊥i-approximation we obtained a reduction in32false subsumption tests,i.e.,32normal false subsump-tion tests could be replaced by32cheaper false C⊥0-subsumption tests,however it alsocost an extra157true C⊥0-subsumption tests that did not lead to any reduction.As noth-ing can be deduced from these157true C⊥0-subsumption tests,these computations are wasted and reduce the gain obtained with the32reduced false subsumption tests consid-erably.Obviously,these unnecessary true C⊥0-subsumption tests should be minimised. Nofinal verdict can be made however,because it all depends on the computation time needed to compute the normal subsumption tests and C⊥0-subsumption tests,but157 seems rather high.Similar observations hold for C i-approximation.Analysing the high amount of unnecessary subsumption tests,we discovered a phenomenon,which we call term collapsing.We illustrate term collapsing through an example taken from the Wine ontology.Suppose that during a classification the subsumption test Query WhiteNonSweetWine is generated.The definition for WhiteNonSweetWine is:Wine ∃hasColor.{White} ∀hasSugar.{OffDry,Dry}.The subsumption query isfirst transformed into a satisfiability test,i.e.,Query WhiteNonSweetWine⇔Query ¬WhiteNonSweetWine is unsatisfiable,be-cause C⊥i-/C i-approximation is defined in terms of satisfiability checking.The definition of¬WhiteNonSweetWine is≡¬(Wine ∃hasColor.{White} ∀hasSugar.{OffDry,Dry})≡¬Wine ∀hasColor.¬{White} ∃hasSugar.¬{OffDry,Dry}.and therefore the approximation(¬WhiteNonSweetWine) 0is≡(¬Wine ∀hasColor.¬{White} ∃hasSugar.¬{OffDry,Dry}) 0≡(¬Wine) 0 (∀hasColor.¬{White}) 0 (∃hasSugar.¬{OffDry,Dry}) 0≡¬Wine ∀hasColor.¬{White}≡ .Therefore,approximating the expression Query ¬WhiteNonSweetWine results in checking unsatisfiability of Query 0,i.e.,(Query ¬WhiteNonSweetWine) 0⇔Query 0 ⇔Query 0is unsatisfiable.This test most likely fails,because in a consistent ontology Query will be satisfiable and as Query is more specific than Query 0,i.e.,Query Query 0,the latter will be satisfiable.Analogously,applying C⊥i-approximation may result in a collapse of the Query to ⊥.This occurs whenever Query contains a conjunction with at least one∃-quantifier. In this case,the entire subsumption test is collapsed into checking the satisfiability of ⊥.As⊥can never be satisfied,this results in an unnecessary subsumption test.For the TAMBIS ontology we counted the numbers of occurrences of term collaps-ing in approximated concept expressions.With C i-approximation65terms out of181 collapsed.In other words,35.9%of the approximated false subsumption tests are obvi-ously not needed and should be avoided.With C⊥i-approximation it is even more severe drastic:157terms out of157collapsed.With C⊥i-and C i-approximation190terms out of306collapsed.In other words62.1%of the approximated subsumption tests are not needed.。