当前位置:文档之家› Abstract Relation Extraction for Semantic Intranet Annotations

Abstract Relation Extraction for Semantic Intranet Annotations

Relation Extraction for Semantic Intranet Annotations
Technical Report kmi-06-17 August 2006 Lucia Specia Claudio Baldassarre Enrico Motta
Short versions:
? ? A hybrid approach for extracting semantic relations from texts. 2nd Workshop on Ontology Learning and Population (OLP2) at COLING/ACL 2006, pp. 57-64. July 22, Sydney. A hybrid approach for relation extraction aimed to semantic annotations. 7th Flexible Query Answering Systems, FQAS-2006, LNAI 4027, pp. 564-576. June 7-10, Milan.

Relation Extraction for Semantic Intranet Annotations
Lucia Specia, Claudio Baldassarre, and Enrico Motta Knowledge Media Institute & Centre for Research in Computing The Open University Walton Hall, MK7 6AA, Milton Keynes, UK {L.Specia, C. Baldassarre, E.Motta}@https://www.doczj.com/doc/668000460.html,
Abstract
We present an approach for ontology driven extraction of relations from texts aimed mainly to produce enriched semantic annotations for the Semantic Web. The approach exploits linguistic and empirical strategies, by means of a pipeline method involving processes such as a parser, part-of-speech tagger, named entity recognition system, and pattern-based classification, and resources including ontology, knowledge and lexical databases. A preliminary evaluation with 25 sentences showed that the use of knowledge intensive resources and strategies together with corpus-based techniques to process the input data allows identifying and discovering relevant relations between known and new entity pairs mentioned in the text. Besides semantic web annotations, the system can be used for other tasks, including ontology population, since it identifies new instantiations of existent relations and entities, and ontology learning, since it discovers new relations, which are not part of the ontology.
1. Introduction
Relation Extraction is generally concerned about the identification of the semantic relations between pairs of terms in unstructured or semi-structured natural language documents. In our context, we use an ontology-oriented form of relation extraction, following the approximation suggested by (Bontcheva and Cunningham, 2003), Ontology Driven Information Extraction, regarding the research area in which ontologies are used to guide process of information extraction, in our case, relation extraction. In this report, we use the term “relation extraction” to refer to both (1) the identification of relations between two terms in texts when these relations are already “known”, that is, they belong to our domain ontology as possible relations between the concepts underlying those terms; (2) the discovery of new relations between two concepts underlying terms in the text, that is, relations that do not belong to our ontology. Semantic relations extracted from texts are useful several applications, including acquisition of terminological data, development and extension of lexical resources or ontologies, question answering, information retrieval, semantic web annotation, etc. We focus on the application of both known and new relations to semantically annotate knowledge coming from raw text, within a framework to automatically acquire high quality semantic metadata for the Semantic Web. In that framework, applications such as a semantic web portal (Lei et al., 2006) acquire, integrate and manage heterogeneous data coming from texts, databases, domain ontologies, and knowledge bases. Known entities occurring in the text, i.e., entities that are included in the instantiation of our domain ontology, which we call “knowledge base”, are semantically annotated with their properties, also available in that knowledge base or in other databases. New entities, as given by a named entity recognition system, are annotated without any additional information. In that context, the goal of the relation extraction approach presented here is to extract relational knowledge about entities, i.e., to identify the semantic relations between pairs of entities in the input texts. Entities can be both known and new, since named 1

entity recognition is also performed on the input texts. As previously mentioned, relations include those already predicted as possible by the domain ontology and new (unpredicted) relations. The relation extraction approach makes use of the domain ontology and its corresponding instantiated knowledge base as main resources to guide the extraction, but also uses other knowledge-based and empirical resources and strategies. These include a lexical database, lemmatizer, syntactic parser, part-of-speech tagger, named entity recognition system, and pattern matching model. We experiment with this approach to annotate part of our intranet data, more specifically, texts from the Knowledge Media Institute (KMi)1 internal newsletter. Our hypothesis is that by integrating corpus and knowledge-based techniques and using rich linguistic processing strategies in an automated fashion, we can achieve effective results, accurately acquiring relevant relational knowledge. For semantic annotation tasks, this helps producing a rich representation of the input data. Moreover, this can be used for to populate the already existent ontology, since new instances of concepts and new instances of relations between concepts are identified. As emphasized by Magnini et al (2005), although the process of identifying new instances of concepts can be thought of as the well known task of Named Entity Recognition, in ontology population we are interested in classifying a term independently of the context in which it appears, differently from the named entity classification task, where all the occurrences of the recognized terms have to be classified. It can also used as a first step for ontology learning (relation learning, but not concept learning), since new, potentially useful, relations are also extracted. The rest of the report is organized as follows. We first describe related work on relation extraction aiming at various applications, including semantic annotations and ontology learning (Section 2). We then present our approach in detail, showing its architecture and describing each of its main components (Section 3). We also present the results of a preliminary evaluation of 25 sentences (section 4). Finally, we discuss our conclusions and the next steps (Section 5).
2. Related work
Several approaches have been proposed for the extraction of relations from unstructured sources. Recently, they have focused on the use of supervised or unsupervised corpus-based techniques in order to automate the task. A very common approach is based on pattern matching: given a previously defined set of patterns, usually composed by triples including two nominal expressions and one verbal expression, mainly SVO (subject-verb-object) triples, relations are extracted by matching the new text against the patterns, using strategies varying from exact matching to more elaborated metrics to identify the similarity between the terms/verbs in the pattern and in the text. The basic idea of these metrics is to somehow generalize the patterns so that they can be applied to new cases. In general, in approaches involving generalization of patterns, the way the patterns are generalised can vary considerably: some remove restrictions on certain elements of the patterns and then search the corpus instances which match the relaxed patterns (e.g. Soderland, 1999; Yangarber, 2000). Others (e.g. Chai and Biermann, 1999; Catala et al., 2000; Stevenson, 2004) use more linguistically motivated strategies, e.g., based on WordNet. In this process, problems such as sense ambiguity will arise. Most of the approaches keep the ambiguity or require the user to identify the correct sense during the definition of the extraction patterns (e.g., Catala et al., 2000), while a few others use word sense disambiguation methods for that (e.g., Chai and Biermann, 1999). The patterns can be defined manually or automatically, in an unsupervised way. Among the unsupervised approaches based on pattern-matching, some employ a small number of seed
1
https://www.doczj.com/doc/668000460.html,/
2

patterns of relations as a starting point to bootstrap a pattern learning process, using similarity measures to extract new patterns based on the initial ones. The approach proposed by (Yangarber et al., 2000), e.g., extracts relations by means of a process to generalize a set of seed patterns. The seed patterns are composed by SVO triples with named-entities as subject and object (e.g.: Company-hire-Person). Entities in the patterns are grouped in pairs, in order to provide sufficient frequency to obtain reliable statistics, and each pair is considered a pattern. These pairs are then used to gather the missing words in the missing role. A new SVO triple, also annotated with named entities, is classified as relevant or not by means of a probability-based metric. At each iteration, patterns in the corpus matching one of the already existent patterns are evaluated and one of them is chosen to be added to the set of patterns. (Stevenson, 2004) also used a set of relevant seed patterns to learn other patterns to extract relations, based on the similarity among words in those patterns and in the text, given by a WordNet-based semantic similarity metric. SVO patterns are produced by a parser and those considered similar to already existent patterns are added to the set of patterns. The similarity measure employed is that defined by (Lin, 1998). In subsequent work (Stevenson and Greenwood, 2005), different similarity measures based on vector space models are employed. Similarly to these two approaches, several others rely on the mapping of syntactic relations/dependencies onto semantic relations, but use other strategies instead of (or together with) pattern matching, mostly empirical strategies. The supervised approach proposed by (Soderland, 1999), e.g., learns extraction patterns of relations described within a single sentence from shallow parsed or non-annotated texts already marked with named entities. The tagging process is interactive and interleaved with the learning: the system presents the user a batch of instances to tag, and, based on the user’s choices, induces a set of new rules from the expanded training set. The work proposed by (Miller et al., 2000) treats relation extraction as a form of probabilistic parsing where parse trees are augmented to identify entities and the relations between them. Once this augmentation is made in a set of tree banks as training examples, the parser is trained using a supervised learning algorithm and can then be used with new sentences to extract their relations (Gamalho et al., 2002) employ an unsupervised strategy for clustering semantically similar syntactic dependencies, according to their selectional restrictions. A set of interpretation rules are then applied to classify the syntactic dependencies in order to extract semantic relations. The semantic relations are organized according to a hierarchical structure similar to the one used in WordNet. The supervised approach proposed by (Chieu and Ng, 2002) uses a maximum entropy learning algorithm as classification strategy. A score is computed for each pair of entities which occur in the same sentence (based on parsing information) to determine whether or not they represent a “true” relation. (Roth and Yih, 2002) propose a probabilistic framework with supervised classifiers to recognize entities and relations. Classifiers are first trained separately for entities and relations, and then their output (conditional probabilities for each entity and relation) is used together with constraints induced between relations and entities, such as selectional restrictions of verbs established in terms of types of entities, in order to make global inferences for the most probable assignment for all entities and relations under consideration. In the supervised approach proposed by (Zelenko et al., 2003), each occurrence of a pair of entities in a sentence is classified as a positive or negative instance of the relation. Instances are expressed by a pair of entities and their position in a shallow parse tree representing the sentence containing the pair. Classification uses both support vector machines and voted perceptron as learning algorithms with a specialized kernel model for that shallow parse representation.
3

(Zhao and Grishman, 2005) propose a supervised relation detection approach that combines clues from three different levels of syntactic processing - tokenization, sentence parsing and deep dependency analysis - using support vector machines as learning algorithm. Each source of information is represented by kernel functions. Composite kernels are then developed to integrate and extend individual kernels, allowing information from different levels to contribute to the process in an integrate way. Focusing on complex relations, that is, n-ary relations that can involve n entities that are not in the same sentence, (McDonald et al., 2005) propose a two-stage method for extracting relations between named entities in biomedical texts. In the first stage, a supervised classifier based on maximum entropy recognizes all pairs of related entities, and a graph is built from the output of this classifier, having the entities as nodes and drawing edges between those which are related. The second stage scores maximal cliques in that graph as potential complex relation instances, that is, a subset of all the relations in which the binary classifier believes that all entities involved are pairwise related. Similarly to our proposal, the framework presented by (Iria and Ciravegna, 2005) aims at the automation of semantic annotations according to ontologies. Several supervised algorithms can be used on the training data represented through a canonical graph-based data model. The framework includes a shallow linguistic processing step, in which corpora are analyzed and a representation is produced according to the data model, and a classification step, where classifiers run on the datasets produced by the linguistic processing step. (Magnini et al., 2005) also present a proposal for annotating textual data according to a domain ontological, within the ONTOTEXT Project. Their idea is very similar to ours, in the sense that they also aim to identify mentions of known entities and relations in texts, as well as new entities and relations, in order to annotate a corpus of news and populate their ontology. This is an ongoing project and the particular strategy to extract relations has not been reported so far. However, the overall architecture is based on supervised learning from manually annotated texts. Recently, many relation extraction approaches have been proposed focusing on the particular task of ontology development (learning, extension, population). These approaches aim to learn taxonomic or non-taxonomic relations between concepts, instead of lexical items. However, in essence, they can employ similar techniques as the other approaches described here to extract the relations. Additional strategies can be applied to determine whether the relations can be lifted from lexical items to concepts, as well as to determine the most appropriate level of abstraction to describe a relation. For example, (Maedche, 2002) proposes an approach for the extraction of non-taxonomic relations for ontology learning aiming at the Semantic Web. The approach uses an algorithm to discover generalised association rules and an already existent taxonomy as background knowledge in order to extract relations with the appropriate level of abstraction. The approach first uses a linguistic component to find out which pairs of lexical entries co-occur in the text, deriving statistical data indicating co-occurrences of concepts. The taxonomy is analysed to check whether taxonomic relations hold between these concepts. The generalised association rule algorithm determines confidence and support measures for the relations between the concepts and for relations at higher levels of abstraction. Finally, the algorithm determines the most appropriate level of abstraction to describe that conceptual relation by pruning the less adequate ones. The approach proposed by (Reinberger et al., 2004) aims to extract semantic relations from text in an unsupervised way, with the ultimate goal of ontology development from scratch. Given a pre-processed corpus, SVO triples produced by a shallow parser are considered as potential relations between terms. Relevant relations are extracted by analysing simple statistics measures on the prepositional structures of the sentences. Both arguments in the triple (i.e., SO) must be part of a prepositional structure considered relevant by the statistical analysis.
4

Given a large amount of data, (Reinberger and Spyns, 2004) use unsupervised statistical methods based on frequency information over linguistic dependencies in order to establish relations between concepts from a corpus of the biomedical domain. Their main linguistic assumptions are the principle of selectional restrictions and the notion of co-composition. Clustering and pattern matching algorithms are used on a syntactic context to discover relations; however, these are not labelled. (Schutz and Buitelaar, 2005) also rely on the verb as the element connection concepts. Their goal is to identify highly relevant triples over concepts from an ontology, in order to extend this ontology with such relations. The system extracts relevant verbs and their grammatical arguments from a domain-specific text collection, and then computes the corresponding relations via a combination of linguistic and statistical processing. In the linguistic phase, dependency structure analysis and named-entity/concept tagging recognition are carried out. During the statistical phase, to identify the most relevant triples, several computations are accomplished: relevance ranking, cross-reference of relevant nouns and verbs with predicateargument pairs, and computation of co-occurrence-scores. A comprehensive description of several other approaches for conceptual relation extraction aiming at ontology learning can be found in (Maedche, 2002) and (Gomez-Perez and Manzano-Macho, 2003). In general, these approaches only focus on learning new relations that will constitute or extend an ontology. Therefore, they do not explore already existent relations, which besides providing straightforward semantic annotations for our purposes, can be used as seeds to boost the relation learning process. Among the other approaches previously described in this section, apart from the frameworks proposed by (Iria and Ciravegna, 2005) and (Magnini et al., 2005), both ongoing projects, none is aimed for semantic annotations. In fact, most of the relation extraction approaches are not designed for specific applications. Instead, they are designed for “general purpose” relation extraction. Therefore, it is hard to tell how well they would perform in real applications. In most of the evaluations that are presented, the systems extract usually simple and very well-known relations (such as “company-hires-person”), but there are no concerns about the relevance of these relations to a certain application and / or domain. Finally, among all the described approaches, only a few explore both knowledge-based and empirical resources and strategies, which can potentially convey both accurate and comprehensive results. In the next section we describe our approach to relation extraction, which (1) was designed for the specific goal of producing intranet semantic annotations; and (2) merges knowledge-based and empirical features that have already shown to be effective in many of the previous works in an integrated way, in order to achieve more comprehensive and accurate results.
3. A hybrid approach for relation extraction
The proposed approach for relation extraction is hybrid in the sense that it employs knowledgebased (knowledge-intensive) and (weakly-supervised) corpus-based techniques. Our core strategy consists of mapping linguistic components with certain syntactic relationship (a linguistic triple) into their corresponding semantic components. This includes mapping not only the relations, but the linguistic terms linked by those relations. The detection of the linguistic triples involves a series of linguistic processing steps. The mapping between terms and concepts is guided by a domain ontology and a named entity recognition system. The identification of the relations relies on the knowledge available in the domain ontology, in lexical databases, and on a pattern-based classification model.
5

External Resource
START
Programme Main Flow
RAW TEXT
YES
LINGUISTIC COMPONENT + MINIPAR Entity
CLASSIFIED
NO
STOP
SYNTACTIC TRIPLE
TERM LEXICON
RSS for TERMS KNOWLEDGE BASE TERMS mapped
NO
E–SPOTTER + XML PARSER
YES
How many TERMS?
>1
NO
CHOOSES 1
WORDNET
1
ONTOLOGY RSS for RELATIONS
YES
TRIPLES LEXICON
RELATION mapped
NO
PATTERN BASED CLASSIFICATION
YES
NO
RELATIONS?
PRODUCE ANNOTATION & PATTERN
ANNOTATION
PATTERN
Figure 1: Architecture of the relation extraction approach As previously mentioned, the main goal of this approach is to provide enriched information for the Semantic Web. Other potential applications include: 1) Ontology population: terms are mapped into new instances of concepts of the ontology, and instances of ontology relations between concepts are identified. 2) Ontology learning: new relations between concepts are identified, which can be used as a first step to extend an existent ontology (e.g., (Schutz and Buitelaar, 2005)). Certainly, a subsequent step to lift relations between instances to an adequate level of abstraction would be
1> 1>
How many
CHOOSES 1
YES
SIMILAR pattern
NO
1 1 1 1
YES
PATTERNS
6

necessary (e.g., (Maedche, 2002)). Alternatively, the relations extracted for instances could be validated by a domain expert. The overall architecture of our system is illustrated in Figure 1. The main components are explained in details in the next sections.
3.1 Context and resources As shown in Figure 1, the input to the system consists of electronic Newsletter Texts (KMi Planet2). These are texts describing news of several natures related to KMi members: projects, publications, events, awards, etc. The archive contains several hundreds newsletters published since 01/1997. The size of the texts can vary. For example, in a sample of 10 news texts taken from 10/12/2005 to 20/03/2006, it varied from 97 do 295 words. On average, the number of words per text in that sample was 178. The genre of texts we are addressing is, therefore, journalistic-like, while the domain is of that KMi. Our domain ontology is called KMi-basic-portal-ontology and was designed based on the AKT reference ontology3 to include concepts relevant to the KMi domain. All the instantiations of concepts in this ontology are stored in a knowledge base (KB) called KMi-basic-portal-kb. The other two resources used in our architecture are the lexical database WordNet (Fellbaum, 1998) and the set of Patterns, described in Section 3.4.
3.2 Identifying linguistic triples Given a newsletter text, the first step of the relation extraction approach is to process the natural language text in order to identify linguistic triples, that is, sets of three elements with a syntactic relationship, which can indicate potentially relevant semantic relations between two entities. In our architecture, this is accomplished by two modules, the Linguistic Component (LC) and the parser Minipar. LC is based on an adaptation of the linguistic component defined in Aqualog (Lopez et al., 2005). Aqualog is a question answering system which also uses the triple approach to identify and map linguistic triples into ontology triples. It uses GATE (Cunningham et al., 1992) infrastructure and resources, by means of GATE’s API. The main processing resources from GATE employed in our architecture are: 1) ANNIE Tokenizer; 2) ANNIE Sentence splitter; 3) ANNIE Part-of-speech (POS) tagger; 4) ANNIE VP Chunker. On top of these processing resources, which produce a series of syntactic annotations for the input text, LC uses a grammar, still under the execution of a GATE controller, in order to identify important linguistic triples. This grammar is implemented in Jape (Cunningham et al., 2000) and interpreted by a Jape transducer. A Jape grammar allows the definition of pattern of rules to recognize regular expressions using the annotations provided by GATE. One example of pattern to extract potential relations between nominal terms is illustrated in Figure 2. ((VERB)+ (RB)? NOUNA PREPS (VERB)?) Figure 2: Example of Jape pattern
2 3
https://www.doczj.com/doc/668000460.html,/kmiplanet/ https://www.doczj.com/doc/668000460.html,/projects/akt/ref-onto/
7

The terms used in the patterns are macros previously defined to generalise the POS tags produced by the POS tagger. For example, “VERB” is a generic term to represent any kind of verb tag given by the POS tagger: VBN, VBZ, VBG, VBD, VBP, VB, and FVG. “RB” represents tags for adverbs, while NOUNA, tags for nouns and certain adjectives and pronouns, and PREPS, tags for certain prepositions. Thus, this pattern matches constructions with one or many verbs, optionally followed by an adverb, with a mandatory sequence of a NOUNA and a preposition, optionally followed by another verb. The main type of construction aimed to be identified by our linguistic component involves a verb as indicative of a potential relation and two nouns as terms linked by that relation. However, our patterns can also account for other types of constructions, including, for example, the use of comma to implicitly indicate a relation, as in sentence (1). In this case, having identified that “AKT” is a project and “Enrico Motta” is a person, it is possible to guess the relation indicated by the comma (for example, “work”). Some examples triples identifies by our patterns, taking the newsletter text in Figure 3 as input, are given in Figure 4. (1) “Enrico Motta, in AKT now, ….”.
Nobel Summit on ICT and public services Peter Scott 10.12.05 Peter Scott attended the Public Services Summit in Stockholm, during Nobel Week 2005. The theme this year was Responsive Citizen Centered Public Services. The event was hosted by the City of Stockholm and Cisco Systems Thursday 8 December - Sunday 11 December 2005. The Nobel Week Summit provides an unusual venue to explore the possibilities of the Internet with top global decision-makers in education, healthcare and government and to honor the achievements of the 2005 Nobel Peace Prize Laureate.
Figure 3: Example of news

Figure 4: Examples of linguistic triples for the text in Figure 3 Since Jape patterns are based only on shallow syntactic information, they are not able to capture certain potentially relevant triples. To overcome this limitation, we also employ a parser as a complementary resource to produce linguistic triples: Minipar (Lin, 1993). Minipar produces functional dependencies/relations for the components in a sentence, including subject and object relations with respect to a verb. This allows capturing some implicit relations, such as indirect objects and some long distance dependencies, which could not be identified by the Jape patterns. In fact, the use of Minipar complements the linguistic patterns, allowing a larger number of linguistic triples to be obtained. We use the standalone version of Minipar, instead of GATE’s version, since it performs differently in that framework. We then extract the syntactic relations of interest for each sentence, i.e., subject-verb-object (and modifiers) dependency triples. For example, some triples extracted for the text in Figure 3 are shown in Figure 5.
8

subject[peter_scott]+verb[attend]+verb_mod[during_nobel_week_2005]+object[public_services _summit]+object_mod[in_stockholm] subject[theme]+verb[be]+object[responsive] subject[city]+subj_mod[of_stockholm]+verb[host]+object[event] subject[nobel_week_summit]+verb[provide]+object[venue] subject[nobel_week_summit]+verb[explore]+object[possibility]+object_mod[of_internet]
Figure 5: Examples of tuples extracted from Minipar’s dependency trees We convert Minipar’s representation into a triple format, replicating the verb when it is related to more than one subject or object. Therefore, the intermediate representations provided both by GATE and the Jape grammars and by Minipar consist of triples of the type:

3.3 Identifying ontological entities and relations Given a linguistic triple, the next step is two verify whether the verbal expression in that triple conveys a relevant semantic relationship between entities (given by the terms) potentially belonging to an ontology. This is the most important phase of our approach and is represented by a series of modules in our architecture in Figure 1. As first step we try to map the linguistic triple into an ontology triple by using an adaptation of the Relation Similarity Service (RSS) developed in Aqualog (Lopez et al., 2005). RSS tries to make sense of the linguistic triple by looking at the structure of the domain ontology and the information stored in its corresponding KB. Besides looking for an exact matching between the components of the linguistic triple and the components of the ontology and KB, RSS also considers partial matchings by using a set of resources in order to account for minor lexical or conceptual discrepancies between these two elements. These resources include metrics for string similarity matching, synonym relations given by WordNet, and a lexicon of previous mappings between the two types of triples. One important feature of the RSS in Aqualog is that is meant to be used in an interactive fashion. When there is no matching between the linguistic and ontology triple, the user is expected to point out the appropriate mapping, so that the system can process the question. Additionally, the user is expected to disambiguate among several options of mappings for terms or relations. The goal in our approach, however, is to automate the annotation process as much as possible. We use RSS in a slightly different way to identify partial matching for terms (nominal phrases) and relations (verbal expressions) (modules RSS for Terms and RSS for Relations, respectively, in the architecture). As we explain in Sections 3.3.1 and 3.3.2, we first try to map terms, and if the mapping succeeds, we deal with relations. Additionally, we employ other components to find a matching with an ontology triple when there is no matching according to RSS, as discussed in Section 3.4. 3.3.1 Mapping terms RSS for Terms checks if each of the terms, i.e., noun phrases, in the linguistic triple is an entity potentially belonging to our domain, according to the following attempts: 1) Search the lexicon of previously mapped terms (Term Lexicon) for exact matching of the term with any of the entries.
9

2) Search the KB for an exact matching of the term with any instance. 3) If there is not an exact matching in the KB and lexicon, apply string similarity metrics4 to calculate the similarity between the given term and each instance of the KB. A hybrid scheme combining three metrics is used: jaro-Winkler, jlevelDistance and wlevelDistance. This scheme checks different combinations of threshold values for the three measures. It has proved to work reasonably well in experiments with many variations of metrics and thresholds. 3.1) If there is more that one possible matching, check whether any of them is a substring of the term. For example, the instance name for “Enrico Motta” is a substring of the term “Motta”, and thus it should be preferred to any other instance. For example, similarity values returned for the term “vanessa” with instances potentially relevant for the mapping are given in Figure 6. In this case, the combination of threshold specified for the three string metrics is met for a given instance of the KB, “Vanessa Lopez”, and thus the mapping is accomplished.
Checking similarities for term “vanessa" jaroDistance for “vanessa” and “vanessa-lopez” = 0.8461538461538461 wlevel for “vanessa” and “vanessa-lopez” = 1.0 jWinklerDistance for “vanessa” and “vanessa-lopez” = 0.9076923076923077
Figure 6: String similarity measures for the term “vanessa” and the instance “Vanessa Lopez” 3.2) If there is still more than one possible matching for the term (no substring matching or more than one substring matching), ask the user (which is assumed to be a domain expert) to choose one of the options. Here we could have assumed that there was not enough evidence to map that term, and therefore the linguistic triple should be discarded. However, initial experiments showed that in many cases we would be discarding relevant triples with valid possible matchings returned by the similarity metrics. In fact, in many cases the input linguistic term contains noise due to the use of patterns to recognize the terms (both by the LC and by a named entity recognition system, discussed later in this section). For example, a modifier appearing in the beginning of a sentence (and thus starting with a capital letter), when followed by a proper name, is usually considered part of the proper name. For example, in a sentence starting with “Director Enrico Motta …”, “director-enrico-motta” can be extracted as a term. This prohibits the system to find an exact matching, but we expect the string similarity metrics to help. However, since they do not explore the semantics of the words, they capture other unrelated terms, and thus it is necessary to rely on the user to filter these terms out. The matching chosen by the user is added to the Term Lexicon so that it can be used in future mappings of the same term (step (1)). At this stage, given a linguistic triple with 2 terms, , we will have four possible situations for the terms, representing total or partial or total matching with instances of the ontology, or no matching at all: 4) If none of the previous attempts succeeds for a certain term (i.e., 3 last cases presented above), it can be the case that the term in the linguistic triple expresses a new entity, which is not
4
Available in https://www.doczj.com/doc/668000460.html,/projects/simmetrics/.
10

part of the KB. In order to check if it in fact conveys a new entity that is relevant to our domain, or if it instead has no corresponding conceptual representation in our domain, we use a named entity recognition system to identify the class of the term (if any), according to the classes of the domain ontology. The system, ESpotter++, is an extension of the named entity recognition system ESpotter (Zhu et al., 2005). ESpotter is an adaptive named entity recognition system based on a mixture of lexicon (gazetteers) and patterns. It can be adapted to particular domains by means of the customization of its lexicon and patterns of entities. In our approach, we extended ESpotter by including new entities (extracted from several gazetteers), new types of entities, and a small set of efficient patterns for the new types of entities. In Espotter++, all types of entities correspond to generic classes of the domain ontology under consideration. These types include: person, organization, event, publication, location, project, research-area, technology, date, time, etc. For example, for the text in Figure 3, part of the output produced by ESpotter++ is shown in Figure 7.
- - Example of input and outoput - -
Figure 7: Example of entities recognised by ESpotter++ for the news in Figure 3 If ESpotter++ is not able to identify the types of a term, it might be either the case that the term does not have any conceptual mapping (for example “it”), or the case that the conceptual mapping is not part of our domain ontology. Since we are not concerned with potential new concepts, which represent a complex issue on ontology learning, we do not attempt to map such term, and thus the process stops without any annotation. It is important to emphasize that each of the two terms in the linguistic triple can be mapped according to different strategies (exact or partial matching or ESpotter++). However, if at least one of the terms is not mapped according to the mentioned steps, the process stops and no annotation is produced. 3.3.2 Mapping relations In order to map the verbal expression of the linguistic triple into a conceptual relation, we assume that the terms of that triple have already been mapped either into instances of certain classes in the KB, by the component RSS for Terms, or into potential new instances of ontology classes, by ESpotter++, as explained in the last section. RSS for Relations then checks if the verbal expression in the triple matches any already existent possible relation between the classes (and superclasses) of those instances in the ontology. The following attempts are then made: 1) Search the ontology for an exact matching of the verbal expression with any existent relation between the classes (and superclasses) of the instances under consideration.
11

2) If there is not an exact matching with the ontology classes, apply the string similarity metrics to calculate the similarity between the given verbal expression and the possible relations between classes corresponding to the terms in the linguistic triple (as explained for terms). 3) If there is not an acceptable similarity or there is more than one matching considered similar, search for similar relation mappings for the classes under consideration in a lexicon of mappings previously accomplished according to users’ choices in the on-line version of the question answering system Aqualog5. This lexicon contains a set of complete ontology triples with the original verbal expression which was mapped to the conceptual relation (alias). The use of this lexicon represents a simplified form of pattern matching, in which only exact matching is considered. Examples of entries in the lexicon are shown in Table 1.
class1 project project alias (relation) conceptual relation class2 works has-project-member person cite has-publication publication
Table 1: Examples of the lexicon of patterns 4) If there is not an exact matching with entries in the lexicon of patterns and / or there are multiple possibilities of matchings coming from the string similarity metrics (step (2)), search for synonyms of the given verbal expression in WordNet, in order to verify if there is a synonym which matches (complete or partially, using string similarity metrics) with any existent relation for the classes (or superclasses) of the terms under consideration. Three situations may arise from these attempts to map the linguistic triple into an ontology triple: (1) matching with one single relation of the ontology; (2) matching with more than one conceptual relation; and (3) no matching at all. That is: . If the matching attempt succeeds with only one conceptual relation, then the triple can be formalized into a semantic annotation. This allows the annotation of an already possible relation (according to the ontology) for two instances of the KB or new instances identified by ESpoter++. The produced triple generalized to the classes of the entities, i.e., , is added to the set of Patterns of ontology triples, which is used to identify new relations (see Section 3.4). If it there is more than one possible matching for the conceptual relation, the system gives the options to the user (assumed to be a specialized user). At a previous stage, we tried to use a disambiguation module to choose among multiple possible relations. This module, SenseLearner (Mihalcea and Csomai, 2005), is a supervised word sense disambiguation (WSD) system already trained on a corpus tagged with WordNet senses. Therefore, the only way it could be effectively used in our system was when the WordNet component used by RSS returned more than one synonym for the verb in a linguistic triple and these would match different conceptual relations. In that case, the WSD module could identify in which sense the verb was being used in the sentence and therefore allow the system to choose for one among the possible matchings. Additionally, since the disambiguation system had been trained in another corpus, of a different domain, in most of the cases, the relation to be disambiguated was not present in the set of examples, and thus no sense could be retrieved to it. In order to benefit from this system (or any effective disambiguation system), we would have to train it on our corpus of newsletter texts, but this would require “sense” tagging this corpus with the appropriate senses, an effort that we consider would not pay off.
5
In https://www.doczj.com/doc/668000460.html,:8080/aqualog/index.html.
12

Finally, if there is no matching for the relation, or if, given multiple “possible” relations, the user states that none of them is appropriate, the system triggers an alternative strategy to find out whether the verbal expression in the triple represents a relevant relation not covered by the ontology (or expressed in a different way): the Pattern-based Classification model (Section 3.4).
3.4 Identifying new relations – the pattern-based classification (PBC) model The process described in Section 3.3 for the identification of relations accounts only for the relations already predicted as possible in the domain ontology. However, we are also interested in the additional information that can be provided by the text, in the form of new types of relations for known or new entities. In order to discover these relations, we employ a pattern matching strategy to identify relevant relations between classes of entities. As previously mentioned, pattern matching strategies constitute the basis of many relation extraction approaches. This strategy has proved to be an efficient way to extract semantic relations, but in general has the drawback of requiring the possible relations to be previously defined. In order to overcome this limitation, we employ a Pattern-based Classification model, which can identify similar patterns based on a small initial number of patterns. To identify the relations, we rely on the patterns already produced by the linguistic component, which are mostly SVO triples. Although this is not a highly expressive representation, it covers a significant number of relations on the text, with a low complexity. We
consider patterns of relations between classes of entities, instead of the entities themselves, since we believe that it would not be possible to accurately judge the similarity between patterns of the kinds of entities we are addressing (names of people, locations, etc). Thus, our patterns consist of triples of the type . These are compared against a given triple, also using the classes of its terms, in order to classify the relation in that triple as relevant or non-relevant.
Our pattern-based classification model is similar to the one proposed by (Stevenson, 2004) (Section 2). It is a weakly-supervised corpus-based module which takes as examples a small set of relevant SVO patterns, called seed patterns, and compares the pattern to be classified against all the relevant ones, using a WordNet-based semantic similarity metric. In our case, the initial seed patterns mixes patterns extracted from the lexicon generated by Aqualog’s users, as described in Section 3.3.2, and a small number of manually defined relevant patterns. Currently, the set contain 65 patterns. This set of patterns is then enriched with new patterns as our system annotates relevant relations for given entities: the system adds new ontology triples to the initial set of Patterns. Some examples of seed patterns are illustrated in Table 2: each pattern consists of the ontology triple , while alias (relation) is the relation used in the linguistic triple, which was mapped into the conceptual_relation.
class1 project project person person person person organization alias (relation) has-member publish publish work implement participate hire conceptual relation has-project-member has-publication has-publication work-for develop attend-by has-employee class2 person publication conference project technology event person
Table 2: Examples of seed patterns
13

Our similarity metric requires that the terms and relations in both the pattern and the triple being mapped belong to WordNet. Since WordNet is not meant to contain multi-word expressions, such as “has-project-member”, in order to compute the similarity for these multiword expressions, we defined a set of simple heuristics to identify the most representative word in the expression. If the multi-word expression is the conceptual relation of the pattern, we use the alias of relation as most representative word. For example, for the multi-word conceptual relation “work-for”, we use the alias “work”. However, if the multi-word expression is one of the classes, e.g., “organization-unit”, or is the verbal expression in the linguistic triple we want to map, i.e., cases for which there is no alias, the heuristics use the governance relations given by Minipar to identify the syntactic head of the expression. They also state that the head has to be a verb for the relation, but a noun for the terms. For example, the head of the verbal expression “has-publication” is “has”, while the heat of the term “organization-unit” is “unit”. Having identified the most significant word of each multi-word expression, we use them to calculate the similarity score as if they were single words in the pattern/triple. Likewise Stevenson, we use the semantic similarity metric proposed by (Lin, 1998), which has shown to be suitable for this purpose. This metric assigns numerical values to each node in WordNet hierarchy representing the amount of information it contains, IC, which is derived from corpus probabilities: IC(s) = -log(Pr(S)). To calculate the similarity between two senses s1 and s2, it takes the lowest common subsumer, lcs(s1,s2), which is defined as the sense with the highest information content which subsumes both senses in the WordNet hierarchy, that is: sim( s1 , s 2 ) = 2 × IC (lcs ( s1 , s2 )) IC ( s1 ) + IC ( s 2 ) Stevenson adapts this metric to compute the similarity between two words, w1 and w2, instead of two senses, by analysing the set of senses for each of the words, S(w), and choosing the pair of senses which maximizes the similarity score, according to the formula: word _ sim ( w1 , w2 ) = MAX sim( s1i , s2 j )
1≤ i ≤ | S ( w1 )| 1≤ j ≤ | S ( w2 ) |
Additionally, the author extends the metric to compute the similarity between two patterns, p1 and p2, consisting of m and n elements, respectively (in our case, m and n are always 3):
n
psim( p1 , p 2 ) =
∑ word _ sim( p
i =1
1i
, p 2i )
MAX (m, n)
We consider relevant the patterns for which the score is greater than the threshold of 0.90 for the formula above. In that case, a new annotation is produced for the entities in the linguistic triple and the new relation. Additionally, as previously mentioned, if the triple is not part of the set of patterns, it is added to it, for future use. The level of novelty of the extracted (new) relations is determined as a consequence of the value of the threshold for the formula above (psim). A high threshold means that the relation is considered relevant only if triple is very similar to at least one of the existent patterns. Is also guarantees that even if the two classes in the triple completely match classes in a pattern (score = 1 for each of the classes), the relation will not be considered relevant unless it has some similarity with the relation in the pattern (if the relation similarity is = 0, psim = 0.67). A slightly smaller value for the threshold could make other also relevant relations to be acquired, but since we add each the mapped triple to the set of patterns, relaxing the threshold to allow less closely 14

related relations to be learned can be tricky: it will possibly result in noise, which will be propagated and increased as the system is used. One example of relation “learned” by the pattern-based classification model for the given sentence is shown in Figure 8. The figure also shows the linguistic triple, partial ontology triple (with terms mapped only), triple submitted to the system (with the classes of the mapped terms), most similar pattern, similarity value achieved for that pattern, and finally the new pattern added to the set of patterns for future use.
Sentence: KMI is headed by Enrico Motta Linguistic triple: Partial ontology triple: Pattern given to PBC: Most similar pattern: Similarity value: 1 Relation learned: headed-by New pattern:
Figure 8: Example of use of the PBC model It is important to notice that, although WordNet is also used in the RSS component, in that case only synonyms are being checked, while in this case the similarity metric explores deeper information in WordNet, considering the meaning (senses) of the words and the hierarchical structure of WordNet. It is also important to distinguish the semantic similarity metrics employed here from the string metrics used in RSS. String similarity metrics simply try to capture minor variations on the strings representing terms/relations, they do not account for the meaning of those strings. Alternative semantic similarity metrics exploiting other information in WordNet (Pedersen et al., 2004) can be investigated in future work.
3.5 Annotating relevant relations In order to formalize the relations extracted, we use OWL representations, as specified by the Semantic Web framework. For example, the representation of the entity “John Domingue” and relations extracted from the sentences in Figure 9 is given in Figure 10.
John Domingue is member of AKT John Domingue is the Scientific Director for DIP
Figure 9: Example of input sentences

Figure 10: OWL annotations produced for the news in Figure 9 15

4. Evaluation and discussion
In order to evaluate our system, we manually selected a set of 25 sentences from the newsletters that mentioned entities belonging to our domain (people, technologies, projects, etc.), i.e., sentences from where instances and relations could be extracted. We restricted the selection to relatively simple sentences, having one potential triple per sentence. At this stage, we are more concerned about the precision of the system, instead of its coverage. Complex sentences cannot be processed by the system, due to limitations inherited from the components extracting the linguistic triples (linguistic component and Minipar parser). For example, Minipar is not able to process sentences with more than 1000 characters, and cannot capture very long distance dependencies (involving subject and object elements, in this case). These are, however, expected limitations even for the state of the art linguistic processing tools. Examples of sentences used in our evaluation are shown in Figure 11.
(1) Adam Ingram visited the Open University today. (2) Sun Microsystems hosts an internal KMi Stadium Webcast, on March 12th 1997. (3) Hon Roy MacLaren visited Milton Keynes today. (4) Dr Hans Geiser visited the Open University on Wednesday 21st May, 1997. (5) Dr Geiser, Director of the UN Staff College in Turin, visited the OU. (6) Roxana is currently working in KMi on the SUPER project. (7) KMi’s Deputy Director John Domingue is the Scientific Director for DIP. (8) The DIP team at KMi incorporates Roxana Belecheanu. (9) KMi hosts the 9th Computational Linguistics UK Research Colloquium (CLUK 06). (10)Peter Scott attended the Public Services Summit in Stockholm, during Nobel Week 2005. (11)Simon Buckingham Shum has been working with the ILO’s HIV/AIDS Education in the Workplace programme. (12)Dr Dawei Song has joined KMi as a Senior Lecturer on September 12th, 2005.
Figure 11: Examples of news sentences used in the evaluation As a measure of overall performance, we calculated coverage, precision and recall for the produced ontology triples, given the input sentences, as shown in Table 3. Since we have two parallel modules processing the input sentences to produce the intermediate linguistic triples (i.e., Linguistic Component – LC – and Minipar), we can have as twice triples as input sentences. However, as we discuss later, when both modules are able to produce correct linguistic triples, the corresponding resulting ontology triples are consistent, i.e., they are exactly the same for linguistic triples generated by both LC and Minipar. On the other hand, when LC or Minipar produce incorrect linguistic triples, they are not mapped into ontology triples. Therefore, here we consider the intersection of the produced ontology triples, which amounts to 17.
# Sentences 25 Coverage (17/25) = 0.68 Recall (15/25) = 0.60 Precision (15/17) = 0.88
Table 3: Overall figures for ontology triples generated from input sentences Since we have different components processing the various steps of the system, we also present, in Table 4 – Table 7, the results of submitting our test set of 25 sentences to the system according to each of these components. Essentially, we have: ? 2 modules producing linguistic triples: Linguistic Component (LC) and Minipar; ? 2 modules identifying terms: Relation Similarity Service (RSS) and ESpotter++; 16

?
2 modules identifying relations: RSS and Pattern-based Classification (PBC).
In Table 4 we focus on the performance of the modules in identifying linguistic triples from the sentences. Given the total of input sentences n (n = 25), with one potential linguistic triple each, the column “# Ling. triples” shows the number (m) of linguistic triples actually identified by the modules. The column “Coverage” shows the proportion of identified triples given the total of potential linguistic triples n, i.e., expresses the coverage (m/n) of LC and Minipar in identifying linguistic triples (correctly or incorrectly). The column “Recall” shows the proportion of correctly identified linguistic triples given the total n. Finally, the last column, “Precision”, shows the proportion of correctly identified linguistic triples given the number of identified linguistic triples m.
LC Minipar # Ling. triples (m) 12 21 Coverage (12/25) = 0.48 (21/25) = 0.84 Recall (11/25) = 0.44 (19/25) = 0.76 Precision (11/12) = 0.92 (19/21) = 0.90
Table 4: Figures for linguistic triples extracted by the LC and Minipar Regarding the overlapping in the triples identified by both components in Table 3, looking at the linguistic triples correctly identified by each of them, we see that each identifies a different subset of triples. In this case, LC identifies 2 triples that were not covered by Minipar, while Minipar identifies other 11 triples that were not covered by LC. Therefore, taking the nonoverlapping set of linguistic triples provided by both components, we obtain 23 triples. In Table 5 we show the performance of the system on mapping the correct linguistic triples identified by each of the modules (LC and Minipar) into ontology triples, without going into details about which internal modules were involved in this mapping. Ideally, each linguistic triple should produce an ontology triple. The column “# Ont. triples” gives the number of ontology triples created from correct linguistic triples. The column “Coverage” shows the percentage of mapped triples, over the total of correct linguistic triples given by LC and Minipar (11 by LC and 19 by Minipar), i.e., expresses the coverage of the system in identifying ontology triples from correct linguistic triples (correctly or incorrectly). The column “Recall” shows the percentage of correctly identified ontology triples over the total of correct linguistic triples, i.e, expresses the recall of the system in identifying ontology triples. Finally, the column “Precision” shows the ratio of correctly identified ontology triples by the number of identified ontology triples.
# Ont. triples 10 14 Coverage Recall (10/11) = 0.91 (9/11) = 0.82 (14/19) = 0.74 (12/19) = 0.63 Precision (9/10) = 0.90 (12/14) = 0.86
Linguistic triple by LC Linguistic triple by Minipar
Table 5: Figures for ontology triples produced from the LC’s and Minipar’s linguistic The fact that we are using two modules to identify linguistic triples allowed us to identify more ontology triples, since, as previously mentioned, LC and Minipar were able to recognize linguistic triples for different sentences. In fact, triples produced by Minipar allowed 6 different ontology triples to be mapped, when compared to the mappings for linguistic triples produced by LC. On the other hand, linguistic triples produced by LC gave origin to 3 additional ontology triples. Regarding the consistency of the ontology triples produced from linguistic triples provided by both components, it is important to say that when the same linguistic triple was mapped by both components, the resulting ontology triples were identical. The resulting incorrect mappings were due to incorrect mapping of either any of the terms or the verbal expression in the linguistic triple. Similarly, null mappings were due to the lack of mapping for either one of the terms or the verbal expression. 17

In Tables 6 and 7 we analyze each of these elements, i.e., terms and verbal expressions, individually, also taking into account which module was used to map each of them. Therefore, still focusing on the performance of the system to map linguistic triples into ontology triples, now we analyze first the performance of the system in mapping terms into conceptual entities (Table 6), and then the performance in mapping verbal expressions into conceptual (Table 7). Here we do not distinguish if the triple were identified by LC or Minipar; instead, we consider the set of all 23 (non-overlapping and correct) linguistic triples. Therefore, we can have at most 46 terms and 23 verbal expressions mapped. In Table 6, given the total of terms in the correctly identified linguistic triples (each linguistic triple has two terms), the column “# Terms” shows the number of terms that were mapped to a conceptual representation (instance or new entity), by both RSS and ESpotter++. The column “Coverage” shows the proportion of (correctly or incorrectly) mapped terms given the total of terms to be mapped. The column “Recall” shows the proportion of correctly mapped terms given the total of terms to be mapped. Finally, the column “Precision” shows the proportion of correctly mapped terms given the number of mapped terms.
# Terms 35 8 Coverage (35/46) = 0.76 (8/11) = 0.72 Recall (32/46) = 0.70 (8/11) = 0.72 Precision (32/35) = 0.91 (8/8) = 1
Mapped by RSS Mapped by ESpotter++
Table 6: Figures for conceptual terms mapped by RSS and ESpotter++ It is important to notice that ESpotter++ is only triggered when RSS does not find any possible direct mapping for the term and the user is not satisfied with the options of mappings given by RSS and chooses to use ESpotter++. Therefore, here the figures reflect the use of ESpotter++ in those cases only, that is, for the 11 terms out of the total of 46, since 35 were mapped by RSS. Finally, in Table 7, given the total of verbal expressions in the 23 correctly identified linguistic triples, the column “# Relations” shows the number of verbal expressions that were mapped into conceptual relations (known or new), by both RSS and PBC. The column “Coverage” shows the proportion of (correctly or incorrectly) mapped verbal expressions given the total of verbal expressions to be mapped, i.e., 23. The column “Recall” shows the proportion of correctly mapped verbal expressions given the total of verbal expressions to be mapped. The column “Precision” shows the ratio of correctly mapped verbal expressions by the number of mapped verbal expressions. It is worth recalling that the system only attempts to map verbal expressions once both terms in the linguistic triple are mapped. So, in some cases, the relation was not found because at least one of the terms was not mapped. Out of the possible number of relations to be mapped (23), 2 were not mapped because of this reason, while 1 case was not mapped because one of the terms was incorrectly mapped (by RSS). Other 3 cases were not mapped because of the low coverage of our resources (KB, set of patterns). For the remaining 17 cases, a relation was found. In only 2 of the cases, the relation found (by PBC) was incorrect, and this always due to incorrect mappings of the terms.
# Relations 6 11 Coverage (6/23) = 0.26 (11/17) = 0.65 Recall (6/23) = 0.26 (9/17) = 0.53 Precision (6/6) = 1 (9/11) = 0.81
Mapped by RSS Mapped by PBC
Table 7: Figures for conceptual relations mapped by RSS and PBC Again, it is worth noticing that the PBC model is only triggered when RSS cannot find any relation to directly map the verbal phrase (the threshold is not achieved) and the user is not satisfied with any of the options of mappings given by RSS and chooses to “learn” the relation. Therefore, here we are only taking into account the use of PBC in those cases, and not to map all the verbal phrases (i.e., for the 17 cases out of the total of 23, since 6 were mapped by RSS). 18

The coverage (and consequently, recall) of RSS was low particularly due the lack of knowledge in our KB. In this case, we should rely on the PBC module. However, here the coverage of this module was also not very high, due to the very high threshold specified to determine whether a relation is similar to the ones in the set of patterns. Therefore, so far the “new” relations being learned are very closely related to the ones already existing in our ontology or manually defined seed patterns. In order to capture relevant relations which are not so close to the ones already existent, we can try different thresholds (and metrics) to learn these relations. Alternatively, we can increase the set of seed patterns.
5. Conclusions and future work
We presented a hybrid approach for the extraction of semantic relations from texts. It is mainly aimed to produce enriched semantic annotations for the Semantic Web, but can be also used for ontology population and ontology learning. Initial evaluation experiments on a small dataset of news texts yielded promising results. We believe these are due to the use of multiple components to tackle certain problems, mainly, the use of both shallow (Jape) patterns and syntactic parser to identify linguistic triples, the use of several resources to accomplish the mapping of these triples into ontology concepts and relations, and the use of a machine learning-based technique (PBC) to capture relations that are not explicit in the ontology. In fact, the evaluation showed that these different modules complement each other in producing correct annotations. In future work we intend to improve the Linguistic Component, so that it can cover more non-conventional (SVO-type) relations. Particularly, we are interested addressing complex (nary) relations, instead of relations between two elements only. This will improve the linguistic coverage of the approach. In order to improve the linguistic-conceptual mapping coverage, we plan to add more metadata and their corresponding ontologies to RSS (both RSS for Terms and RSS for Relations), and possibly use Swoogle (Ding et al., 2004) to gather this ontologies and metadata from the web. We also plan to add other lexical resources to improve the mapping of relations, such as FrameNet (Baker et al., 1998), which has a richer and more structured description of lexical items. The inclusion of these resources in our architecture is illustrated in Figure 12. This will also decrease the need of user interaction when mapping terms and relations, since more instances and classes will be analyzed, potentially covering the terms and relations in the linguistic triple with exact matching. Certainly, problems may arise such as having the same term/verbal expression in more than one ontology, representing different concepts / relations. We can use other textual sources (Wikipedia, web) to gather more information to disambiguate entities (Bunescu and Pasca, 2006, e.g.). Even with the use of multiple resources to provide a broader mapping coverage, certainly we will encounter linguistic triples with new relations, which do not belong to any of our ontologies. We intend to carry on using the pattern-based classification model to cope with these new relations; however, we plan to extend the set of seed patterns, and also to play with different threshold for the semantic similarity metric. In order to avoid the system to learn non-relevant relations, we will have to perform some additional reasoning on the learned relations in order to verify whether they are really relevant to our domain.
19

固相萃取柱知识点

1、使用阳离子固相萃取柱前为什么要用甲醇和水活化 要是使用的是高聚物基质的阳离子柱,可直接上样,不用活化,要是使用的是硅胶基质的阳离子柱,活化是为了打开键合在硅胶上的碳基团链,使之充分发生作用,甲醇是为了与碳链互溶,用水过度是为了能和样品溶液相溶。 2、固相萃取技术原理及应用 一、固相萃取基本原理与操作 1、固相萃取吸附剂与目标化合物之间的作用机理 固相萃取主要通过目标物与吸附剂之间的以下作用力来保留/吸附的 1)疏水作用力:如C18、C8、Silica、苯基柱等 2)离子交换作用:SAX, SCX,COOH、NH2等 3)物理吸附:Florsil、Alumina等 2、p H值对固相萃取的影响 pH值可以改变目标物/吸附剂的离子化或质子化程度。对于强阳/阴离子交换柱来讲,因为吸附剂本身是完全离子化的状态,目标物必须完全离子化才可以保证其被吸附剂完全吸附保留。而目标物的离子化程度则与pH值有关。如对于弱碱性化合物来讲,其pH值必须小于其pKa值两个单位才可以保证目标物完全离子化,而对于弱酸性化合物,其pH值必须大于其pKa值两个单位才能保证其完全离子化。对于弱阴/阳离子交换柱来讲,必须要保证吸附剂完全离子化才保证目标物的完全吸附,而溶液的pH值必须满足一定的条件才能保证其完全离子化。

3、固相萃取操作步骤及注意事项 针对填料保留机理的不同(填料保留目标化合物或保留杂质),操作稍有不同。 1)填料保留目标化合物 固相萃取操作一般有四步(见图1): ? 活化---- 除去小柱内的杂质并创造一定的溶剂环境。(注意整个过程不要使小柱干涸) ? 上样---- 将样品用一定的溶剂溶解,转移入柱并使组分保留在柱上。(注意流速不要过快,以1ml/min为宜,最大不超过5ml/min)? 淋洗---- 最大程度除去干扰物。(建议此过程结束后把小柱完全抽干) ? 洗脱---- 用小体积的溶剂将被测物质洗脱下来并收集。(注意流速不要过快,以1ml/min为宜) 如下图1:

CIE基本色度学分析与计算

高工LED技术中心发布时间:2009-08-04 16:07:39设置字体:大中小 色度学是门研究彩色计量的科学,其任务在于研究人眼彩色视觉的定性和定量规律及应用。彩色视觉是人眼的种明视觉。彩色光的基本参数有:明亮度、色调和饱和度。明亮度是光作用于人眼时引起的明亮程度的感觉。一般来说,彩色光能量大则显得亮,反之则暗。色调反映颜色的类别,如红色、绿色、蓝色等。彩色物体的色调决定于在光照明下所反射光的光谱成分。例如,某物体在日光下呈现绿色是因为它反射的光中绿色成分占有优势,而其它成分被吸收掉了。对于透射光,其色调则由透射光的波长分布或光谱所决定。饱和度是指彩色光所呈现颜色的深浅或纯洁程度。对于同一色调的彩色光,其饱和度越高,颜色就越深,或越纯;而饱和度越小,颜色就越浅,或纯度越低。高饱和度的彩色光可因掺入白光而降低纯度或变浅,变成低饱和度的色光。因而饱和度是色光纯度的反映。100%饱和度的色光就代表完全没有混入白光阴纯色光。色调与饱和度又合称为色度,它即说明彩色光的颜色类别,又说明颜色的深浅程度。 应强调指出,虽然不同波长的色光会引起不同的彩色感觉,但相同的彩色感觉却可来自不同的光谱成分组合。例如,适当比例的红光和绿光混合后,可产生与单色黄光相同的彩色视觉效果。事实上,自然界中所有彩色都可以由三种基本彩色混合而成,这就是三基色原理。 基于以上事实,有人提出了一种假设,认为视网膜上的视锥细胞有三种类型,即红视谁细胞、绿视锥细胞和蓝视锥细胞。黄光既能激励红视锥细胞,又能激励绿视锥细胞。由此可推论,当红光和绿光同时到达视网膜时,这两种视锥细胞同时受到激励,所造成的视觉效果与单色黄光没有区别。 三基色是这样的三种颜色,它们相互独立,其中任一色均不能由其它二色混合产生。它们又是完备的,即所有其它颜色都可以由三基色按不同的比例组合而得到。有两种基色系统,一种是加色系统,其基色是红、绿、蓝;另一种是减色系统,其三基色是黄、青、紫(或品红)。不同比例的三基色光相加得到彩色称为相加混色,其规律为: 红+绿=黄 红+蓝=紫 蓝+绿=青

颜色的暗示

黄色给人心理上的感觉是最温暖的颜色,而蓝色是最冷的颜色。越温暖的颜色中黄色成分越多。越冷的颜色中蓝色成分越多。 白色: 白色包括全部的光谱,有增强作用,对人体的全部能量系统有净化和排污作用,可以唤醒人体巨大的创造性。它能够稳定人的能量系统,使之整体提高。即白色放大各种颜色的疗效。 黑色: 黑色也包含了全部的光谱,还是一种遮盖混乱的颜色。黑色是保护色,可以帮助特别敏感的人平静下来 黑色激活人体的雌性和磁性能量,并强化他们的功能。帮助极端的人恢复平衡,特别是那些要失控的人黑色和白色混合使用效果最佳,黑色可以激活下意识水平,帮助疏导冲动和疯狂的想法。要谨慎使用黑色,过多使用黑色会引起沮丧,强化消极情绪和思想。千万不要单独使用,必须与其他颜色混合使用。 红色: 红色温暖、主动,能够唤醒我们身体的生命力。红色可以治疗感冒、弱循环和粘膜疾病。红色增强体力和个人意志,可以刺激根深的激情,例如:性和爱、勇气、仇恨和报复。红色还可以提高温度,促进血液循环。过多使用红色导致过分刺激,恶化病症。 橘黄色: 橘黄色是快乐和智慧色。它激发人的社会性,与我们的情感健康和肌肉紧密相连。它可以使身体恢复活力,促进食物吸收。一场大病后它是很好的滋补,因为它促进排泄系统的循环。该色主要与绿和兰的调和色结合使用。患情感麻痹的人和心情沮丧的人可以用它辅助治疗。 黄色: 黄色能刺激人的大脑能力,可以治疗沮丧,帮助重新唤起生活的热情,唤醒人的自信与乐观,治疗消化疾病,对于胃、肠、膀胱以及整个消化系统都有益处。黄色有助于平衡整个肠胃系统。金色和黄色的调和色对身体和大脑都有益处。 绿色: 绿色平衡人的整体能量,增加人的敏感性和同情心。绿色有安神的效果,对有炎症的身体特别有镇静作用,能够安抚神经系统绿色可以促进友情、增加希望、提高信任、增加和平。绿色对于负担过重的大脑有安神和激发活力的作用。适合治疗心脏类疾病、高血压、溃疡、疲劳和头痛。对于癌症、肿瘤及任何有肿胀的病人千万不要使用绿色。

戊基磷酸二戊酯萃取色层分离-电感耦合等离子体质谱法测定铀产品

2015年7月July 2015 岩 矿 测 试 ROCK AHD MIHERAL AHALYSIS Vol.34,Ho.4 414~419 收稿日期:2015-04-24;修回日期:2015-06-20;接受日期:2015-07-03 作者简介:朱留超,助理研究员,从事核保障技术研究。E-mail :zhuliuchao@https://www.doczj.com/doc/668000460.html, 。通讯作者:赵永刚,研究员,从事核材料分析及保障技术研究。E-mail :zhaoyg@https://www.doczj.com/doc/668000460.html, 。 文章编号:02545357(2015)04041406DOI :10.15898/https://www.doczj.com/doc/668000460.html,ki.11-2131/td.2015.04.006 戊基磷酸二戊酯萃取色层分离-电感耦合等离子体质谱法测定铀产品中9种杂质元素 朱留超,王同兴,赵永刚 ,徐常昆,赵立飞,姜小燕,赵兴红 (中国原子能科学研究院放射化学研究所,北京102413) 摘要:铀产品中杂质元素的含量测定在核法证学溯源分析或燃料元件厂质量检验中具有重要应用价值,保证测量的准确度主要在于控制流程空白、提高杂质元素的回收率。本文建立了戊基磷酸二戊酯(UTEVA )树脂快速分离铀与杂质元素、电感耦合等离子体质谱法(ICP -MS )测定杂质元素含量的系统流程。结果表明,UTEVA 树脂对铀的吸附能力强,铀样品取样量为16.43mg 时,全流程对铀的去污因子大于3?105,9种杂质元素(锰钼镍铜铬铝钛钒镉)的回收率为95.1%~105.1%,国家标准物质GBW04205中杂质元素的分析结果与参考值在不确定度(k =2)范围内一致。本工作建立的分离流程对铀的去污效果好,特别适用于样品量少的情况下铀中杂质元素的分析,为核法证分析最终的归因溯源或燃料质量检验提供了技术支持。 关键词:铀产品;杂质元素;去污因子;戊基磷酸二戊酯(UTEVA )树脂萃取色层分离;电感耦合等离子体质谱法中图分类号:O657.63;O614.62 文献标识码:A 核法证学产生于上世纪九十年代,主要是运用各种分析技术对截获的非法贩卖的核材料或放射性物质进行分析,并为侦测和阻止涉核犯罪提供线索 和证据[1-3] 。这门综合性学科的关注点与核燃料循 环的每个环节密切相关,铀作为核武器的主要原料,是核法证学研究的主要对象之一。铀在反应堆中使用时,对其中杂质元素含量有严格的要求,一般应小于5μg /g ;同时,由于不同厂家生产的铀产品中杂 质元素的含量各不相同,核法证分析者可以利用这 一特点,结合形态分析、同位素分析[4-6]、化学成分 分析、年龄分析[7] 等信息进行核材料的归因分析, 追溯可疑样品的来源。铀中杂质元素含量携带铀冶炼和转化工艺的信息,已成为核法证溯源中的重要依据,准确测定杂质元素的含量则是核法证分析实验室的主要考核指标。 对于铀中杂质元素的分析,需预先将铀与杂质元素分离,目前国内外大多采用萃取色层法,这种方法步骤简便、分离速度快,已有学者使用TBP 、阳离 子交换树脂[8-9] 分离铀基体中的稀土元素及其他微 量杂质元素,但分析时铀样品取样量较大(300mg ~1g ),铀残留量为微克量级,可能会对杂质元素测 量产生干扰,同时未见对流程去污因子的描述。戊基磷酸二戊酯(UTEVA )树脂是近年研制的一种分离树脂,因其性能可靠、分离效果好、流程本底低而 越来越多地被用于样品中铀、镎、钚的分离[10-12] ,国外Quemet 等[13]利用UTEVA 树脂对铀中杂质元素 进行了分离测定,但在我国尚未见报道。在利用UTEVA 分离铀基体时,一般采取纯化试剂、减少流 程环节等步骤控制流程空白。在测量方法方面,由于铀产品中杂质元素含量大多在微克量级,目前常用电感耦合等离子体发射光谱法(ICP -OES )和电 感耦合等离子体质谱法(ICP -MS )[14-15] 进行测定, 由于铀的谱线复杂,对杂质元素的测量产生谱线干扰,更多的是应用ICP -MS 法进行分析。 2013年12月,核法证学技术专家组首次组织 了我国5个具有核法证分析能力的实验室开展铀中 — 414—

固相微萃取_气相色谱法测定废水中三乙胺和苯胺

固相微萃取气相色谱法测定废水中三乙胺和苯胺 宋艳涛1 ,王正萍1 ,王 琳2 (11南京理工大学化工学院306教研室,江苏 南京 210094;21天津市环境监测中心站,天津 300191) 摘 要:采用固相微萃取气相色谱法建立了测定废水中三乙胺和苯胺的方法,对萃取条件和气相色谱条件进行了优化。方法的相对标准偏差为015%~018%,加标回收率为9613%~10710%。总分析时间小于1h,适用于快速分析废水中胺类化合物。 关键词:固相微萃取;气相色谱;三乙胺;苯胺 中图分类号:O657171 文献标识码:B 文章编号:10062009(2005)01003702 A M ethod of Fa st D eterm i n i n g Tr i ethyl am i n e and Benzenam i n e by Soli d Pha se M i cro extracti on Ga s Chroma tography i n W a ste W a ter S ONG Yan 2tao 1 ,WANG Zheng 2p ing 1 ,WANG L in 2 (1.L ab 306of Che m ical Engineering D epa tr m en t,N anjing U niversity of Science &Technology,N an jing,J iangsu 210094,China;2.Tianjin Environm ental M onitoring Center ,Tianjin 300191,China ) Abstract:An analytical method of the s olid phase m icr o extracti on gas chr omat ography was devel oped t o deter m ine triethya m ine and benzena m ine in waste water,and the extracti on conditi ons are op ti m ized .The results shows that the average recoveries of triethylam ine and benzena m ine are 96.3%and 107.0%,and the relative standard deviati ons of the m are 0.5%and 0.8%,res pectively .The p r oposed method is si m p le,fast,s olvent free and can be app lied t o deter m ine the a m ines in waste water . Key words:M icr o extracti on;GC;Triethyla m ine;Benzena m ine 收稿日期:20040216;修订日期:20040922 作者简介:宋艳涛(1976—),女,山东青岛人,讲师,硕士,从事环境监测教学和研究工作。 水中胺类化合物主要来源于染料、农药和医药生产中产生的废水,它在水中的含量一般较低。目前有分光光度法、荧光法、离子色谱法、液相色谱法和气相色谱法等测定方法,它们各有特点。用固相 微萃取(SP ME )与气相色谱(GC )联用可将样品的富集和分析结合起来,以蓝色P DMS/DVB 萃取头,浸入式富集样品,不需作前处理,直接对水中的三乙胺和苯胺作色谱测定,取得较好的结果。1 实验 111 主要仪器和试剂 Varian CP 3380气相色谱仪,氢火焰离子化检 测器;CPSI L 19CB 石英毛细管色谱柱,30m ×0.25mm ×0.25μm ,Chr ompack 公司;SP ME 装置,Varian 公司;Supelco 蓝色P DMS/DVB 固相微萃取 头。三乙胺;苯胺。 112 操作步骤 11211 混合标准溶液的配制 分别移取14158mg/L 三乙胺标准储备液110mL,25154mg/L 苯胺标准储备液210mL 于100mL 棕色容量瓶中,用蒸馏水稀释至刻度。11212 色谱条件 升温程序80℃保持1m in,以10℃/m in 升温至210℃,保持5m in,再以20℃/m in 升温至230℃;进样口温度230℃;检测器温度250℃;载 气(高纯氮)20mL /m in;氢30mL /m in;空气300mL /m in;总分析时间20m in 。11213 标准曲线的绘制 分别移取11458mg/L 三乙胺标准使用液 — 73—第17卷 第1期环境监测管理与技术2005年2月

计色制及色度图

计色制及色度图 给定一种颜色,可以找到配出这种颜色所需的三基色的混合比例,确定三基色分量与所需颜色的数值关系由配色实验来完成。 1. 配色实验。 (1). 配色实验可通过比色计来进行,如下图所示: 比色时有两块互成直角放置的全反射面,由它将观察者的视场分为两等分。把待配色的彩色光投射到屏幕的一边,而将三基色光投射到屏幕的另一边,分别调节三个基色光的强度,直到混合后产生的彩色与待配色的色度和亮度完全一致为止。从基色调节装置上分别读出各个基色的数值,由此就可写出配色方程式。 (2).配色方程式: 式中F表示待配色的彩色光的彩色量;(R)、(G)、(B)分别为红(波长700nm)、绿(波长546.1nm)、蓝(波长435.8nm)三基色的单位量,其中,1(R)= 1 cd,1(G)= 4.5907cd,1(B)= 0.0601cd;R、G、B分别为三基色的调节器的读数,也称为三基色系

数。 式(1-6)的配色方程式,适合于配置一切彩色,只不过对于不同彩色三色系数不同而已。 (3). 对于等能白光,R = B = G = 1,即: 其光通量为: 2. 计色制及色度图。 (1). RGB计色制及其色度图: A. RGB计色制: 以(R)、(G)、(B)为单位量,用配色方程进行彩色量度和计算的系统称为RGB计色制。 B. 实际中,彩色质的区别决定于色调和饱和度,即色度。色度与三基色系数的比例有关。为此,引入三基色相对系数r、g、b。 C. 三基色相对系数r、g、b: m = R + G + B ,则r、g、b分别为: 因为R、G、B三个色系数的比例关系与r、g、b的比例关系相同,所以它们都可以表示同一彩色的色度,且 D. 由于r、g、b三者之和为1,所以只要知道其中两个的值,就可以确定第三个的值。因此,只要选两个三基色相对系数,就可用二维坐标来表示各种彩色光的色度。RGB色度图就是在

常用固相萃取柱

常用固相萃取柱 HLB是英文"亲水-亲脂平衡"(hydrophilic-l;pophilicbalance)的缩写,它是. 一种新型的反相吸附剂,能同时表现出对亲水性化合物和亲脂性化合物的双重保留特性。 固相萃取柱产品和应用指南(SPE column)返回 提供VARIAN公司BondElut、Agilent公司AccuBond系列固相萃取柱,另可提供经济型国产萃取小柱及填料,并可根据用户需要订做 各种规格产品 1word格式支持编辑,如有帮助欢迎下载支持。

硅胶上键合乙基 500mg 500mg 1000mg 3ml 6ml 6ml 50 30 30 合物。500mg 500mg 1000mg 3ml 6ml 6ml 50 30 30 核酸碱,核苷,表面活化剂。容量:0.2毫当 量/克。 Phenyl 硅胶上键合苯基 100mg 200mg 500mg 500mg 1000mg 1ml 3ml 3ml 6ml 6ml 100 50 50 30 30 相对C18和C8,反相萃取,适合 于非极性到中等极性的化合物 Alumnia A (acidic) 酸性 PH ~5 100mg 200mg 500mg 500mg 1000mg 1ml 3ml 3ml 6ml 6ml 100 50 50 30 30 极性化合物离子交换和吸附萃取,如维生 素. Silica 无键合硅胶 100mg 200mg 500mg 500mg 1000mg 1ml 3ml 3ml 6ml 6ml 100 50 50 30 30 极性化合物萃取,如乙醇,醛, 胺,药物,染料,锄草剂,农药, 酮,含氮类化合物,有机酸,苯 酚,类固醇 Alumnia B (basic) 碱性 PH~8.5 100mg 200mg 500mg 500mg 1000mg 1ml 3ml 3ml 6ml 6ml 100 50 50 30 30 吸附萃取和阳离子交换。 Cyano(CN) 硅胶上键合丙氰基烷 100mg 200mg 500mg 500mg 1000mg 1ml 3ml 3ml 6ml 6ml 100 50 50 30 30 反相萃取,适合于中等极性的 化合物,正相萃取,适合于极性 化合物,比如,黄曲霉毒素,抗 菌素,染料,锄草剂,农药 ,苯 酚,类固醇。弱阳离子交换萃 取,适合于碳水化合物和阳离 子化合物。 Alumnia N (neutral) 中性 PH~6.5 100mg 200mg 500mg 500mg 1000mg 1ml 3ml 3ml 6ml 6ml 100 50 50 30 30 极性化合物吸附萃取。调节pH,阳和阴离。 子交换.适合于维生素,抗菌素,芳香油,酶, 糖苷,激素 Amino(NH2) 硅胶上键合丙氨基 100mg 200mg 500mg 500mg 1000mg 1ml 3ml 3ml 6ml 6ml 100 50 50 30 30 正相萃取,适合于极性化合物。 弱阴离子交换萃取,适合于碳 水化合物,弱性阴离子和有机 酸化合物。 Florisil 填料-硅酸 镁 100mg 200mg 500mg 500mg 1000mg 1ml 3ml 3ml 6ml 6ml 100 50 50 30 30 极性化合物的吸附萃取,如乙醇,醛,胺,药 物,染料,锄草剂,农药,PCBs,酣,含氮类化 合物,有机酸,苯酚,类固醇 固相萃取柱及填料(SPE column) 2word格式支持编辑,如有帮助欢迎下载支持。

TEVA-UTEVA萃取色层分离后处理铀产品中镎的方法研究

第48卷第5期 原子能科学技术Vol.48,No.5 2014年5月AtomicEnergyScienceandTechnologyMay2014 TEVA‐UTEVA萃取色层 分离后处理铀产品中镎的方法研究 苏玉兰1,金 花1,应浙聪1,赵胜洋1,徐 琛2 (1.中国原子能科学研究院放射化学研究所,北京 102413;2.环境保护部核与辐射安全中心,北京 100082) 摘要:本文建立了有效、定量分离后处理铀产品中镎的方法。该方法采用TEVA‐UTEVA萃取色层分离后处理铀产品中的微量镎,先将镎调至四价后过TEVA萃取色层柱,再用2mol/LHNO3‐0畅2mol/LH2C2O4解吸吸附在TEVA树脂上的镎,最后将镎解吸液稀释后过UTEVA萃取色层柱进一步分离。 结果表明,镎的平均全程回收率约为94%,铀的去污因子>104。 关键词:铀产品;镎;TEVA‐UTEVA萃取色层法;分离 中图分类号:TL274 文献标志码:A 文章编号:1000‐6931(2014)05‐0786‐06 收稿日期:2013‐02‐05;修回日期:2013‐05‐29 作者简介:苏玉兰(1963—),女,北京人,高级实验师,从事分析化学研究doi:10.7538/yzk.2014.48.05.0786SeparationMethodofNeptuniumFromReprocessedUraniumbyTEVA‐UTEVAExtractionChromatographyColumn SUYu‐lan1,JINHua1,YINGZhe‐cong1,ZHAOSheng‐yang1,XUChen2 (1.ChinaInstituteofAtomicEnergy,P.O.Box275‐88,Beijing102413,China; 2.NuclearandRadiationSafetyCenter,MinistryofEnvironmentalProtection,Beijing100082,China)Abstract: AnefficientandquantitativeextractionchromatographicmethodusingcommerciallyavailableTEVAresinandUTEVAresinwasdevelopedfortheseparationofneptuniumfromreprocesseduranium.BeforethesamplewasloadedontoTEVAcolumn,neptuniumwasadjustedtoNp(Ⅳ),andthentheeluentof2mol/LHNO3‐0畅2mol/LH2C2O4wasloadedtoelutetheneptunium,theuraniumineluentafterdilutingwasretainedonUTEVAresin,whileNp(Ⅳ)wasstrippedfromtheresin.Theaveragerecoveryofneptuniumofthewholeseparationprocedureisabout94%andthedecontaminationcoefficientofuraniumismorethan104. Keywords:uraniumproduct;neptunium;TEVA‐UTEVAextractionchromatographicmethod;separation 237Np是高毒α核素,核燃料后处理铀产品中对镎的控制很严格[1‐4]。铀产品中的大量铀 会对其中微量镎的测量产生严重干扰[5‐6],因此 后处理铀产品中微量镎的分析是后处理镎分析中较重要也是最困难的。由于大量铀基体的存在,后处理铀产品中微量镎分析的重点是微量镎的分离,分离效果的好坏直接影响测量结果的可靠性和准确性。因此建立一个准确可靠、

固相微萃取-气相色谱-质谱法测定水中双酚A

收稿日期:2004208229;修回日期:2004210230 作者简介:李 英(1968~),女(汉族),高级工程师,主要从事色谱分析研究。E 2m ail :ciqly 16@https://www.doczj.com/doc/668000460.html, 第26卷第1期 2005年2月 质谱学报 Jou rnal of Ch inese M ass Spectrom etry Society V o l .26 N o .1Feb .2005 固相微萃取-气相色谱-质谱法测定水中双酚A 李 英,王楼明,张 琛,叶锐均,张 昱,刘 丽,刘志红 (深圳出入境检验检疫局,广东深圳 518045) 摘要:基于固相微萃取技术(SP M E )具有前处理方法操作简单、避免使用有机溶剂、回收率高等优点,且容易 与其它分析仪器联用,建立了一种顶空衍生固相微萃取2气相色谱2质谱法(SP M E 2GC M S )测定水中双酚A 的分析方法。通过优化SP M E 纤维头、萃取温度和时间、解吸时间、搅拌速度、pH 值等萃取条件及衍生化温度和衍生化时间等衍生化条件,实现了水中痕量双酚A 的快速测定。结果表明:用85Λm 的聚丙烯酸酯(PA )萃取涂层对水中的双酚A 萃取效果很好,双酚A 在0.01~100Λg L 的质量浓度范围内,线性良好,相关系数>0.999,方法的检测限为2.5ng L (S N =3),相对标准偏差(n =7)为3.13%,回收率为86.3%~95.6%。关键词:固相微萃取;气相色谱2质谱(GC M S );双酚A ;萃取条件;衍生条件 中图分类号:O 657163;O 6251323 文献标识码:A 文章编号:100422997(2005)01218204 D eterm i na tion of B isphenol A i n W a ter by Sol id Pha se M icroextrac ion Coupled w ith Ga s Chroma tography -M a ss Spectrom etry L I Y ing ,W AN G L ou 2m ing ,ZHAN G Chen ,YE R u i 2jun ,ZHAN G Yu ,L I U L i ,L I U Zh i 2hong (S henz hen E n try 2E x it Insp ection and Q ua ran tine B u reau ,S henz hen 518045,Ch ina ) Abstract :A new m ethod fo r the determ inati on of b isp heno l A in w ater has been estab lished by coup ling so lid 2p hase m icroex tracti on (SPM E )w ith gas ch rom atograp hy 2m ass sp ectrom 2etry becau se of SPM E w ith the characteristics of si m p le p retreatm en t p rocedu re w ithou t o r 2gan ic ,h igh recovery rate and easy to be coup led w ith o ther analytical in strum en ts .T h rough op ti m izing ex tracti on and derivatizati on conditi on s of SPM E ,such as fibers ,ex tracti on tem 2p eratu re and ex tracti on ti m e ,deso rp ti on ti m e ,stirring rate ,pH value ,derivatizati on tem 2p eratu re and derivatizati on ti m e ,the rap id determ inati on of trace b isp heno l A in w ater w as p erfo rm ed .T he resu lts show ed that a SPM E fiber w ith 85Λm po lyacrylate coated w as u sed to ex tract b isp heno l A from w ater w ith satisfacto ry resu lts .T he linear ranges of b isp heno l A w ere from 0.01Λg L to 100Λg L ,the relative coefficien t w as above 0.999.T he detecti on li m its and the relative standard deviati on w ere 2.5ng L (S N =3)and 3.13%,resp ective 2ly .T he recovery w ere from 86.3%to 95.6%. Key words :so lid 2p hase m icroex tracti on ;gas ch rom atograp hy 2m ass sp ectrom etry ;b isp he 2no l A ;ex tracti on conditi on ;derivati on conditi on

不同刺激颜色对视觉反应时的影响

不同刺激颜色对视觉反应时的影响 摘要研究了不同刺激颜色和是否用优势手做出反应对视觉简单、辨别和选择反应时的影响。被试为华东师范大学本科生一名。本研究分为三个实验。实验1给被试分别呈现红、黄、蓝、绿四种圆形颜色刺激,让被试在看到刺激后立即按键。实验2要求被试在看到红、黄、蓝、绿四种圆形颜色刺激时分别只对某一种颜色的刺激作出反应。实验3要求被试在看到红、黄、蓝、绿四种圆形颜色刺激时按下对应的键。并在实验1、3中设置了非优势手组。结果发现:总体而言,三种视觉反应时中不同颜色的视觉反应时均无显著差异。优势和非优势手视觉简单反应时存在显著差异,而视觉选择反应时不存在显著差异。 关键词减数法简单反应时辨别反应时选择反应时 1引言 天文学家贝塞尔用反应时发现人差方程的27年后,生理学家赫尔姆霍茨用反应时实验成功的测定了蛙的神经传导速度,这也是第一个反应时的测验。现在普遍把反应时分为三个时段:第一时段,刺激使感受器产生了兴奋,其冲动传递到感觉神经元的时间;第二时段,神经冲动经感觉神经传至大脑皮质的感觉中枢和运动中枢,从那里经运动神经到效应器官的时间;第三时段,效应器官接受冲动后开始效应活动的时间。 由于神经传导的时间远远小于反应时的时间。荷兰生理学家唐德斯意识到可以利用反应时来测量各种心理活动所需的时间,并发展出唐德斯反应时ABC三种反应时任务。A反应为简单反应时,是指给被试呈现单一的刺激,同时要求被试他们只作单一的反应;B反应为辨别反应时,包含简单反应和刺激辨别两个成分;C反应时选择反应时,包含简单反应、刺激辨别和反应选择三个成分。唐德斯的减数法认为反应时的不同成分是连续但独立的,可以通过几种反应时相减得到辨别时间和选择时间。【1】 现在,反应时已经成为心理研究中的一种常用方法,用于内隐记忆、认知、人格等多方面的研究。通过对反应时的研究,研究者发现焦虑症与抑郁症组视觉、听觉反应时各项指标与对照组比较明显延长,抑郁组比焦虑组反应时延长更突出。【2】得出焦虑与抑郁患者均存在一定程度神经心理损害特征抑郁患者更明显的结论。而在某些实验中内隐实验与外显实验的分离现象很有可能是因为外显行为是在被试较高意识状态下发生的,通过语言的描述,使问卷结果易受人格、情绪及社会称许等变量的影响,而内隐实验可以较好地排除这一影响。【3】根据上面的介绍,本实验基于唐德斯的减数法假设,计算色觉简单、辨别、选择反应时,并从中分离出色觉的辨别时间和选择时间。 2方法 2.1被试 华东师范大学心理与认知科学学院本科生一名,女,18岁。 2.2材料和仪器 仪器:计算机及PsyTech心理实验系统 材料:直径为100像素4种颜色(红、黄、绿、蓝)圆。 2.3实验步骤 2.3.1色觉简单反应时 (1)登录并打开PsyTech心理实验系统,选择实验列表中的“视觉简单反应时”。点击“进入实验”,实验者可在此界面设置参数,然后点击“开始实验”。被试可先进行练习,也可直接进入实验。 (2)实验指导语为:这是一个视觉简单反应时实验。请你手指放在面前的红键上,眼睛注视屏幕,出现“预备”时你要准备反应,一旦出现颜色圆立即按反应键,要求既快又准。

固相萃取柱常见问题及对策 SPE问题

固相萃取柱常见问题及对策SPE问题

问题可能的原因解决方法 回收率低 柱活化条件不恰当 根据固定相的不同正确活化SPE小 柱 反相填料:甲醇、乙腈等,1倍柱管 体积或3倍柱床体积 正相填料:非极性有机溶剂如正己烷 等,1倍柱管体积或3倍柱床体积 离子交换填料:1倍柱管体积的甲醇、 乙腈或异丙醇等与水互溶的极性溶 剂。 样品溶剂对目标成分 的作用力比固定相强 选择对目标组分具有更强选择性的 SPE小柱; 调整样品溶剂的PH值,增加目标组 分在固定相中的作用力; 改变样品溶剂的极性,降低目标组分 在溶剂中的作用力。 清洗溶剂选择不当; 洗脱能力太强 使用正确的清洗溶剂; 选择洗脱能力更弱的溶剂载样时流速过快 重力自然载样或控制载样流速 ≤1ml/min SPE小柱太小 用更大规格的SPE小柱; 用选择性更强的SPE小柱; 用载样量更大的SPE小柱 洗脱前SPE小柱清洗 溶剂抽干不充分 充分抽干冲洗溶剂 洗脱不充分 增加洗脱剂的体积;增加洗脱剂的强 度;用更小规格的SPE小柱 洗脱时流速过快或过 慢 控制流速1-2ml/min 目标成分不能从SPE小柱上 洗脱固定相对目标组分选 择性太强 选择对被分析物保留较弱的小柱; 选择洗脱能力更强的洗脱溶剂。对酸 碱目标成分,调节洗脱剂的PH值,

减弱固定相对目标组分的选择性 SPE小柱规格太大 增加洗脱剂的用量;用更小规格的SPE小柱 洗脱剂用量不足增加洗脱剂 洗脱时流速过快或过 慢 控制流速1-2ml/min 洗脱剂洗脱能力太弱调节洗脱剂的PH值,提高溶剂对目标成分的洗脱能力(对酸、碱目标成 分); 改变洗脱液的极性,提高溶剂对目标 成分的洗脱能力 重现性差 上样前柱床干涸重新活化SPE小柱 超出小柱的载样能力 减小载样量,或用规格更大的SPE小 柱 载样过程流速过高 降低流速,重力自然载样或控制载样 流速≤1ml/min 清洗溶剂洗脱能力太 强 降低清洗溶剂洗脱强度洗脱流速过快 先让洗脱液渗透小柱,再对小柱抽真 空或加压,控制流速1-2ml/min 洗脱剂分两次加入洗脱剂用量不足 增加洗脱剂的用量或者用更小规格的 SPE小柱 干扰物清洗不 干净固定相对干扰物的选 择性太强 选择合适的清洗液,有选择性的将干 扰物清洗掉 选择对目标组分专属性更强的SPE 小柱 固定相残留的干扰物活化前先用洗脱剂清洗SPE小柱 操作过程流速 过低样品中含有过多的颗 粒物质 载样前过滤或离心样品溶液或改用溶 解能力更强的样品溶剂

顶空固相微萃取-气相色谱-质谱联用

顶空固相微萃取-气相色谱-质谱联用 分析纺织品中挥发性有机物* 蔡积进张卓旻李攻科 中山大学化学与化学工程学院,广东,广州 510275 摘要本文以顶空固相微萃取(Head Space Solid Phase Microextraction,HSSPME)和 气相色谱-质谱(GC/MS)联用技术分析纺织品中的五种常见挥发性有机物(Volatile Organic Compounds,VOCs):甲苯、4-乙烯基环己烯、苯乙烯、萘和1-苯基环己烯。 优化了顶空体积、平衡时间、萃取时间、萃取温度、搅拌速率、加盐种类和浓度以及GC/MS条件。建立了快速测定纺织品中VOCs的方法,方法对五种待测物质均具有较宽线性范围,分别为0.087~870,3.32~3320,2.28~2280,0.015~150和0.5~500 ng/g;检出限分别为0.005、0.042、0.67、0.008和0.011 ng/g。分析加标实际样品,回收率在80.1~122%之间,RSD在0.8~8.6%之间。方法符合纺织品中痕量VOCs 的快速分析要求。 关键词:固相微萃取;气相色谱-质谱;纺织品;挥发性有机物 生态纺织品标准100(Oeko-Tex Standard 100)[1]是纺织品领域通行的技术标准,严格规定了残留有毒、有害VOCs的释放量。为推动纺织品质量达到出口标准,需建立有效快速的VOCs 检测方法。由于纺织品VOCs的含量很低,常规的预富集浓缩方法很难满足分析需要,达不到相应的灵敏度要求。SPME是八十年代末Pawliszyn等[2]研制开发的一种非溶剂分析萃取技术,具有操作简单、萃取速度快、选择性和适应性好等优点。而HSSPME应用于纺织品中,一方面继承了顶空技术操作简单、不受样品基体干扰的优点;另一方面又能在采样的同时进行浓缩,大大提高了分析灵敏度。国内已有学者用SPME技术对纺织品中残留干洗溶剂(如四氯乙烯和三氯乙烯等)和驱虫剂(如二氯苯和萘等)进行分析[3~5]。本文建立了HSSPME-GC-MS联用分析纺织品中常见VOCs的分析方法,方法灵敏度高,重现性好,适合于纺织品中多种痕量挥发性有机物的分析。 1 实验 1.1 仪器及操作条件 1.1.1 仪器 SPME手动取样装置,100 μm聚二甲基硅氧烷(PDMS),电磁搅拌/加热操作台,搅拌子(3.0 mm×10.0 mm),10、15、40 mL顶端带有孔盖子和聚四氟乙烯隔垫的样品瓶(Supelco 公司)。HP-6890气相色谱仪带质谱检测(MSD-5973)配G1701B.02.05工作站(Hewlett-Packard, USA),所用色谱柱为HP-VOC熔融毛细管柱(60 m×0.32 mm×1.8 μm)。 1.1.2 GC-MS的操作条件 色谱条件:进样口温度为250 ℃,进样口关闭五分钟,不分流进样。采用程序升温,初始 资金项目:国家质检总局科研资助项目(2002IK034)、 中山大学化学院第四届创新化学实验与研究基金(批准号:03002号)。 第一作者:蔡积进(1982年出生),男,中山大学化学与化工学院材料化学专业00级 指导教师:李攻科,E-mail :cesgkl@https://www.doczj.com/doc/668000460.html,.

固相萃取柱

SPE固相萃取各个填料等的区别 CNWBOND Carbon-GCB(碳黑) 石墨化碳黑(CNWBOND Carbon-GCB)固相萃取小柱在萃取很多极性物质,如氨基甲酸酯和硫脲等农药,有着比C8或C18更高更稳定的回收率。有数据显示,石墨化碳黑SPE同时提取食品中超过200多种农残有很好的效果,如有机氯、有机磷、含氮以及氨基甲酸酯类农药等。Carbon-GCB石墨化碳黑由于其非多孔性,对样品的吸附不要求扩散至有孔区域,所以萃取过程非常迅速。此外,虽然其比表面积小于硅胶基质,对化合物的吸附容量却比硅胶大一倍有余。由于Carbon-GCB碳表面的正六元环结构,使其对平面分子有极强的亲和力,非常适用于很多有机物的萃取和净化,尤其适于分离或去除各类基质如地表水和果蔬中的色素(如叶绿素和类胡萝卜素)、甾醇、苯酚、氯苯胺、有机氯农药、氨基甲酸盐、三嗪类除草剂等。技术参数:目数120-400目,比表面积100 m2/g。 CNWBOND Coconut Charcoal(活性炭) 椰子壳活性炭专用于美国环保署EPA 521方法(饮用水中亚硝胺的检测)以及EPA 522方法(饮用水中1,4-二噁烷的检测)。 技术参数:目数80-120目。 CNWBOND Si (硅胶) CNWBOND Silica硅胶是极性最强的小柱,填料为酸洗硅胶,它通常从非极性溶剂中通过氢键相互作用提取极性化合物,然后再通过提高溶剂的极性来洗脱物质。 技术参数:粒径40-63μm,平均孔径60Å,未封尾。 CNWBOND Florisil PR 农残级弗罗里硅土同样适合于分离有机氯农残、胺类、多氯联苯(PCBs)、酮类以及有机酸等,粒径更大,满足EPA 608方法。 技术参数:目数60-100目。 CNWBOND Florisil(弗罗里硅土) 弗罗里硅土作为氧化镁复合的极性硅胶吸附剂(硅镁吸附剂),适合于从非极性基质中吸附极性化合物,如分离有机氯农残、胺类、多氯联苯(PCBs)、酮类以及有机酸等。 技术参数:目数100-200目,比表面积289 m2/g。

常用固相萃取柱

常用固相萃取柱

常用固相萃取柱 HLB是英文"亲水-亲脂平衡"(hydrophilic-l;pophilicbalance)的缩写,它是. 一种新型的反相吸附剂,能同时表现出对亲水性化合物和亲脂性化合物的双重保留特性。 固相萃取柱产品和应用指南(SPE column)返回 提供VARIAN公司BondElut、Agilent公司AccuBond系列固相萃取柱,另可提供经济型国产萃取小柱及填料,并可根据用户需要订做 各种规格产品 填料含量容量包装应用范围填料含量容量包装应用范围 ODS(C18) 硅胶上键合十八烷基 100mg 200mg 500mg 500mg 1000mg 1ml 3ml 3ml 6ml 6ml 100 50 50 30 30 反相萃取,适合于非极性到中等 极性的化合物,比如,抗菌素, 巴比妥酸盐,酞嗪,咖啡因,药 物,染料,芳香油,脂溶性维生 素,杀真菌剂,锄草剂,农药,碳 水化合物,对羟基甲苯酸取代酯, 苯酚, 邻苯二甲酸酯,类固醇, 表面活化剂,茶碱,水溶性维生 素。 EVIDEXII 辛烷和阳 离子交换 树脂 200mg 400mg 3ml 6ml 50 30 Amphetamina/Methamphetamine、 PCP、 Benzoylecgonine、 Codeine/Morphine、 THC- COOH(Marijuana) Cctyl(C8) 硅胶上键合辛烷 100mg 200mg 500mg 500mg 1000mg 1ml 3ml 3ml 6ml 6ml 100 50 50 30 30 反相萃取,适合于非极性到中等 极性的化合物,比如,抗菌素, 巴比妥酸盐,酞嗪,咖啡因,药 物,染料,芳香油,脂溶性维生 素,杀真菌剂,锄草剂,农药,碳 水化合物,对羟基甲基酸取代酯, 苯酚,邻苯二甲酸酯,类固醇,表 SAX 硅胶上键 合卤化季 氨盐 100mg 200mg 500mg 500mg 1000mg 1ml 3ml 3ml 6ml 6ml 100 50 50 30 30 强阴离子交换萃取,适合于阴离子,有机酸,核酸, 核苷酸, 表面活化剂。容量:0.2毫当量/克。

相关主题
文本预览
相关文档 最新文档