A Method for Surveying the Long-RNA Landscape of Exosomes
- 格式:docx
- 大小:837.74 KB
- 文档页数:10
A Method for Converting Thesauri toRDF/OWLMark van Assem1,Maarten R.Menken1,Guus Schreiber1,Jan Wielemaker2,and Bob Wielinga21Vrije Universiteit Amsterdam,Department of Computer Science{mark,mrmenken,schreiber}@cs.vu.nl2University of Amsterdam,Social Science Informatics(SWI){wielemaker,wielinga}@swi.psy.uva.nlAbstract.This paper describes a method for converting existing the-sauri and related resources from their native format to RDF(S)andOWL.The method identifies four steps in the conversion process.Ineach step,decisions have to be taken with respect to the syntax or se-mantics of the resulting representation.Each step is supported througha number of guidelines.The method is illustrated through conversions oftwo large thesauri:MeSH and WordNet.1IntroductionThesauri are controlled vocabularies of terms in a particular domain with hi-erarchical,associative and equivalence relations between terms.Thesauri such as NLM’s Medical Subject Headings(MeSH)are mainly used for indexing and retrieval of articles in large databases(in the case of MeSH the MED-LINE/PubMed database containing over14million citations3).Other resources, such as the lexical database WordNet,have been used as background knowl-edge in several analysis and semantic integration tasks[2].However,their native format,often a proprietary XML,ASCII or relational schema,is not compati-ble with the Semantic Web’s standard format,RDF(S).This paper describes a method for converting thesauri to RDF/OWL and illustrates it with conversions of MeSH and WordNet.The main objective of converting existing resources to the RDF data model is that these can then be used in Semantic Web applications for annotations. Thesauri provide a hierarchically structured set of terms about which a commu-nity has reached consensus.This is precisely the type of background knowledge required in Semantic Web applications.One insight from the submissions to the Semantic Web challenge at ISWC’034was that these applications typically used simple thesauri instead of complex ontologies.Although conversions of thesauri have been performed,currently no accepted methodology exists to support these efforts.This paper presents a method that 3/entrez/4http://www-agki.tzi.de/swc/swc2003submissions.html2Van Assem,Menken,Schreiber,Wielemaker and Wielingacan serve as the starting point for such a methodology.The method and guide-lines are based on the authors’experience in converting various thesauri.This paper is organized as follows.Section2provides introductory information on thesauri and their structure.In Sect.3we describe our method and the ratio-nale behind its steps and guidelines.Sections4and5each discuss a case study in which the conversion method is applied to MeSH and WordNet,respectively. Additional guidelines that were developed during the case studies,or are more conveniently explained with a specific example,are introduced in these sections. Related research can be found in Sect.6.Finally,Sect.7offers a discussion.2Structure of ThesauriMany thesauri are historically based on the ISO2788and ANSI/NISO Z39.19 standards[3,1].The main structuring concepts are terms and three relations between terms:Broader Term(BT),Narrower Term(NT)and Related Term (RT).Preferred terms should be used for indexing,while non-preferred terms are included for use in searching.Preferred terms(also known as descriptors) are related to non-preferred terms with Use For(UF);USE is the inverse of this relation.Only preferred terms are allowed to have BT,NT and RT relations.The Scope Note(SN)relation is used to provide a definition of a term(see Fig.1).Fig.1.The basic thesaurus relations.Scope note is not shown.Two other constructs are qualifiers and node labels.Homonymous terms should be supplemented with a qualifier to distinguish them,for example“BEAMS (radiation)”and“BEAMS(structures)”.A node label is a term that is not meant for indexing,but for structuring the hierarchy,for example“KNIVES By Form”.Node labels are also used for organizing the hierarchy in eitherfields or facets.The former divides terms into areas of interest such as“injuries”and “diseases”,the latter into more abstract categories such as“living”and“non-living”[3].The standards advocate a term-based approach,in which terms are related directly to one another.In the concept-based approach[7],concepts are inter-related,while a term is only related to the concept for which it stands;i.e.a lexicalization of a concept[12].The concept-based approach may have advan-tages such as improved clarity and easier maintenance[6].A Method for Converting Thesauri to RDF/OWL3 3Method DescriptionThe method is divided into four steps:(0)a preparatory step,(1)a syntactic conversion step,(2)a semantic conversion step,and(3)a standardization step. The division of the method into four steps is an extension of previous work[15].3.1Step0:PreparationTo perform this step(and therefore also the subsequent steps)correctly,it is essential to contact the original thesaurus authors when the documentation is unclear or ambiguous.An analysis of the thesaurus contains the following:–Conceptual model(the model behind the thesaurus is used as background knowledge in creating a sanctioned conversion);–Relation between conceptual and digital model;–Relation to standards(aids in understanding the conceptual and digital model);–Identification of multilinguality issues.Although we recognize that multilinguality is an important and complicating factor in thesaurus conversion(see also[8]),it is not treated in this paper.3.2Step1:syntactic conversionIn this step the emphasis lies on the syntactic aspects of the conversion pro-cess from the source representation to RDF(S).Typical source representations are(1)a proprietary text format,(2)a relational database and(3)an XML representation.This step can be further divided into two substeps.Step1a:structure-preserving translation.In Step1a,a structure-preserving translation between the source format and RDF format is performed, meaning that the translation should reflect the source structure as closely as pos-sible.The translation should be complete,meaning that all semantically relevant elements in the source are translated into RDF.Guideline1:Use a basic set of RDF(S)constructs for the structure-preserving translation.Only use constructs for defining classes,sub-classes,properties(with domains and ranges),human-readable rdfs:label s for class and property names,and XML datatypes.These are the basic build-ing blocks for defining an RDF representation of the conceptual model.The remaining RDF(S)and OWL constructs are used in Step2for a semantically oriented conversion.However,one might argue that the application of some constructs(e.g.domains and ranges)also belongs to semantic conversion.4Van Assem,Menken,Schreiber,Wielemaker and WielingaGuideline2:Use XML support for datatyping.Simple built-in XML Schema datatypes such as xsd:date and xsd:integer are useful to supply schemas with information on property ing user-defined XML Schema datatypes is still problematic5;hopefully this problem will be solved in the near future.Guideline3:Preserve original naming as much as possible.Preserving the original naming of entities results in more clear and traceable conversions.Prefix duplicate property names with the name of the source entity to make them unique.The meaning of a class or property can be explicated by adding an rdfs:comment,preferably containing a definition from the orig-inal documentation.If documentation is available online,rdfs:seeAlso or rdfs:isDefinedBy statements can be used to link to the original documen-tation and/or definition.Guideline4:Translate relations of arity three or more into structures with blank nodes.Relations of arity three or more cannot be translated directly into RDF properties.If the relation’s arguments are independent of each other,a structure can be used consisting of a property(with the same name as the original relation)linking the source entity to a blank node (representing the relation),and the relation’s arguments linked to the blank node with an additional property per argument(see examples in Sect.4).Guideline5:Do not translate semantically irrelevant ordering infor-mation.Source representations often contain sequential information,e.g.ordering of a list of terms.These may be irrelevant from a semantic point of view,in which case they can be left out of the conversion.Guideline6:Avoid redundant information.Redundant information creates representations which are less clear and harder to maintain.An example on how to avoid this:if the Unique Identifier(UI)of a resource is recorded in the rdf:ID,then do not include a property that also records the UI. Guideline7:Avoid interpretation.Interpretations of the meaning of infor-mation in the original source(i.e.,meaning that cannot be traced back to the original source or documentation)should be approached with caution,as wrong interpretations result in inconsistent and/or inaccurate conversions.The approach of this method is to postpone interpretation(see Step2b).Instead of developing a new schema(i.e.,thesaurus metamodel),one can also use an existing thesaurus schema,such as the SKOS(see Sect.3.4),which already defines“Concept”,“broader”,etc.This may be a simpler approach than to first develop a new schema and later map this onto the SKOS.However,this is only a valid approach if the metamodel of the source and of SKOS match. 5/2001/sw/WebOnt/webont-issues.html#I4.3-Structured-DatatypesA Method for Converting Thesauri to RDF/OWL5 A drawback is that the naming of the original metamodel is lost(e.g.“BT”instead of“broader”).For thesauri with a(slightly)different metamodel,it is recommended to develop a schema from scratch,so as not to lose the original semantics,and map this schema onto SKOS in Step3.Step1b:explication of syntax.Step1b concerns the explication of informa-tion that is implicit in the source format,but intended by the conceptual model. The same set of RDF(S)constructs is used as in Step1a.For example,the AAT thesaurus[11]uses node labels(called“Guide Terms”in AAT),but in the AAT source data these are only distinguished from normal terms by enclosing the term name in angle brackets(e.g.<KNIVES by Form>).This information can be made explicit by creating a class GuideTerm,which is an rdfs:subClassOf the class AATTerm,and assigning this class to all terms with angle brackets. Other examples are described in Sects.4and5.3.3Step2:Semantic ConversionIn this step the class and property definitions are augmented with additional RDFS and OWL constraints.Its two substeps are aimed at explication(Step 2a)and interpretation(Step2b).After completion of Step2a the thesaurus is ready for publication on the Web as an“as-is”RDF/OWL representation. Step2a:explication of semantics.This step is similar to Step1b,but now more expressive RDFS and OWL constructs may be used.For example,a broaderTerm property can be defined as an owl:TransitiveProperty and a relatedTerm property as an owl:SymmetricProperty.A technique that is used in this step is to define certain properties as special-izations of predefined RDFS properties,e.g.rdfs:label and rdfs:comment.For example,if a property nameOf is clearly intended to denote a human-readable label for a resource,it makes sense to define this property as a subproperty of rdfs:label.RDFS-aware tools will now be able to interpret nameOf in the intended way.Step2b:interpretations.In Step2b specific interpretations are introduced that are strictly speaking not sanctioned by the original model or documen-tation.A common motivation is some application-specific requirement,e.g.an application wants to treat a broaderTerm hierarchy as a class hierarchy.This can be stated as follows:broaderTerm rdfs:subPropertyOf rdfs:subClassOf. Semantic Web applications using thesauri will often want to do this,even if not all hierarchical links satisfy the subclass criteria.This introduces the notion of metamodeling.It is not surprising that the schema of a thesaurus is typically a metamodel:its instances are categories for describing some domain of interest. Guideline8:Consider treating the thesaurus schema as a metamodel.The instances of a thesaurus schema are often general terms or concepts,that oc-cur as classes in other places.RDFS allows one to treat instances as a classes:6Van Assem,Menken,Schreiber,Wielemaker and Wielingasimply add the statement that the class of those instances is a subclass of rdfs:Class.For example,an instance i is of class C;class C is declared to be an rdfs:subClassOf rdfs:Class.Because instance i is now also an instance of rdfs:Class,it can be treated as a class.The above example of treating broader term as a subclass relation is similar in nature.A schema which uses these constructions is outside the scope of OWL DL.Application developers will have to make their own expressivity vs.tractabil-ity trade-offhere.The output of this step should be used in applications as a specific interpretation of the thesaurus,not as a standard conversion.3.4Step3:StandardizationSeveral proposals exist for a standard schema for thesauri.6Such a schema may enable the development of infrastructure that can interpret and interchange thesaurus data.Therefore,it may be useful to map a thesaurus onto a stan-dard schema.This optional step can be made both after Step2a(the result may be published on the web)and Step2b(the result may only be used in an application-specific context).Unfortunately,a standard schema has not yet been agreed upon.As illustration,the schema of the W3C Semantic Web Advanced Development for Europe(SWAD-E)project is mapped to MeSH in Sect.4.7 The so-called“SKOS”schema of SWAD-E is concept-based,with class Concept and relations narrower,broader and related between Concepts.A Concept can have a prefLabel(preferred term)and altLabel s(non-preferred terms).Also provided is a TopConcept class,which can be used to arrange a hierarchy under special concepts(such asfields and facets,see Sect.2).TopCon-cept is a subclass of Concept.Note that because SKOS is concept-based,it may be problematic to map term-based thesauri.4Case one:MeSHThis section describes how the method has been applied to MeSH(version2004 8).The main source consists of two XMLfiles:one containing so-called descrip-tors(228MB),and one containing qualifiers(449Kb).Each has an associated DTD.Afile describing additional information on descriptors was not converted. The conversion program(written in XSLT)plus links to the original source and outputfiles of each step can be found at http://thesauri.cs.vu.nl/.The conversion took two persons approximately three weeks to complete.6/SWAD/thes links.html7/2001/sw/Europe/reports/thes/1.0/guide/8/mesh/filelist.htmlA Method for Converting Thesauri to RDF/OWL74.1Analysis of MeSHThe conceptual model of MeSH is centered around Descriptors,which contain Concepts[9].In turn,Concepts consist of a set of Terms.Exactly one Concept is the preferred Concept of a Descriptor,and exactly one Term is the preferred Term of a Concept.Each Descriptor can have Qualifiers,which are used to indicate aspects of a particular Descriptor,e.g.“ABDOMEN”has the Qualifiers “pathology”and“abnormalities”.Descriptors are related in a polyhierarchy, and are meant to represent broader/narrower document retrieval sets(i.e.,not a subclass relation).Each Descriptor belongs to one(or more)offifteen Categories, such as“Anatomy”and“Diseases”[10].The Concepts contained within one Descriptor are also hierarchically related to each other.This model is inconsistent with the ISO and ANSI standards,for several reasons.Firstly,the model is concept-based.Secondly,Descriptors contain a set of Concepts,while in the standards a Descriptor is simply a preferred term. Thirdly,Qualifiers are not used to disambiguate homonyms.4.2Converting MeSHStep1a:structure-preserving translation.In the XML version of MeSH, Descriptors,Concepts,Terms and Qualifiers each have a Unique Identifier(UI). Each Descriptor also has a TreeNumber(see[1]).This is used to indicate a position in a polyhierarchical structure(a Descriptor can have more than one TreeNumber),but this is implicit only.Relations between XML elements are made by referring to the UI of the relation’s target(e.g.<SeeRelatedDescriptor> contains the UI of another Descriptor).In Step1a,this is converted into in-stances of the property hasRelatedDescriptor.The explication of TreeNum-bers is postponed until Step1b.Most decisions in Step1a concern which XML elements should be translated into classes,and which into properties.The choice to create classes for Descrip-tor,Concept and Term are clear-cut:these are complex,interrelated structures.A so-called<EntryCombination>relates a Descriptor-Qualifier pair to another Descriptor-Qualifier pair.Following Guideline4,two blank nodes are created (each representing one pair)and related to an instance of the class EntryCombi-nation.As already mentioned,relations between elements in XML MeSH are made by referring to the UI of the target.However,each such relation also in-cludes the name of the target.As this is redundant information,the name can be safely disregarded.Guideline9:Give preference to the relation-as-arc approach over the relation-as-node approach.In the relation-as-arc approach,relations are modeled as arcs between entities(RDF uses“properties”to model arcs).In the relation-as-node approach,a node represents the relation,with one arc relating the source entity to the relation node,and one arc relating the relation node to the destination entity[7].The relation-as-arc approach is more natural to the RDF model,and also allows for definition of property semantics(symmetry,inverseness,etc.)in OWL.8Van Assem,Menken,Schreiber,Wielemaker and WielingaIt is not always possible to follow Guideline9, e.g.in the case of MeSH <ConceptRelation>.Although the straightforward choice is to create a prop-erty that relates two Concepts,this is not possible as each ConceptRelation has an additional attribute(see Guideline4).Again,a blank node is used(the relation-as-node approach).Guideline10:Create proxy classes for references to external resources if they are not available in RDF.Each Concept has an associated Seman-ticType,which originates in the UMLS Semantic Network9.This external resource is not available in RDF,but might be converted in the near fu-ture.In MeSH,only the UI and name of the SemanticType is recorded.One could either use a datatype property to relate the UI to a Concept(again, the redundant name is ignored),or create SemanticType instances(empty proxies for the actual types).We have opted for the latter,as this simplifies future integration with UMLS.In this scenario,either new properties can be added to the proxies,or the existing proxies can be declared owl:sameAs to SemanticType instances of a converted UMLS.Guideline11:Only create rdf:ID s based on identifiers in the original source.A practical problem in the syntactical translation is what value to assign the rdf:ID attribute.If the original source does not provide a unique identifier for an entity,one should translate it into blank nodes,as opposed to generating new identifiers.A related point is that if the UI is recorded using rdf:ID,additional properties to record an entity’s UI would introduce redundancy,and therefore shouldn’t be used.Guideline12:Use the simplest solution that preserves the intended se-mantics.In XML MeSH,only one Term linked to a Concept is the preferred term.Some terms are permutations of the term name(indicated with the attribute isPermutedTermYN),but unfortunately have the same UI as the Term from which they are generated.A separate instance cannot be created for this permuted term,as this would introduce a duplicate rdf:ID.Two obvious solutions remain:create a blank node or relate the permuted term with a datatype property permutedTerm to Term.In thefirst solution,one would also need to relate the node to its non-permuted parent,and copy all information present in the parent term to the permuted term node(thus introducing redundancy).The second solution is simpler and preserves the intended semantics.Step1b:explication of syntax.In Step1b,three explications are made. Firstly,the TreeNumbers are used to create a hierarchy of Descriptors with a[Descriptor]subTreeOf[Descriptor]property.Secondly,the TreeNum-ber starts with a capital letter which stands for one offifteen Categories.The class Category and property[Descriptor]inCategory[Category]are intro-duced to relate Descriptors to their Catogories.Thirdly,the ConceptRelations 9/pubs/factsheets/umlssemn.htmlA Method for Converting Thesauri to RDF/OWL9 are translated into three properties,brd,nrw and rel,thus converting from a relation-as-node to a relation-as-arc approach(see Guidelines12and9).This requires two observations:(a)the values NRW,BRD and REL of the attribute relationName correspond to narrower,broader and related Concepts;and(b) the relationAttribute is not used in the actual XML,and can be removed. Without the removal of the relationAttribute,the arity of the relation would have prevented us from using object properties.Some elements are not explicated,although they are clear candidates.These are XML elements which contain text,but also implicit information that can be used to link instances.For example,a Descriptor’s<RelatedRegistryNumber> contains the ID of another Descriptor,but also other textual information.Split-ting this information into two or more properties changes the original semantics, so we have chosen to create a datatype property for this element and copy the text as a literal value.Step2a:explication of semantics.In Step2a,the following statements are added(a selection):–The properties brd and nrw are each other’s inverse,and are both transitive, while rel is symmetric;–A Concept’s scopeNote is an rdfs:subPropertyOf the property rdfs:comment;–All properties describing a resource’s name(e.g.descriptorName)are de-clared an rdfs:subPropertyOf the property rdfs:label;–Each of these name properties is also an owl:InverseFunctionalProperty, as the names are unique in the XMLfile.Note that this may not hold for future versions of MeSH;–All properties recording a date are an owl:FunctionalProperty;–The XML DTD defines that some elements occur either zero or once in the data.The corresponding RDF properties can also be declared functional;–As a Term belongs to exactly one Concept,and a Concept to exactly one De-scriptor,[Concept]hasTerm[Term]as well as[Descriptor]hasConcept [Concept]is an owl:InverseFunctionalProperty;Unfortunately,the relation represented by class EntryCombination cannot be supplied with additional semantics,e.g.that it is an owl:SymmetricProperty (see Guideline9).Step2b:interpretations.In Step2b,the following interpretations are made, following Guideline8.Note that these are examples,as we have no specific application in mind.–brd is an rdfs:subPropertyOf rdfs:subClassOf;–Descriptor and Concept are declared rdfs:subClassOf rdfs:Class.10Van Assem,Menken,Schreiber,Wielemaker and WielingaStep3:standardization.In Step3,a mapping is created between the MeSH schema and the SKOS schema.The following constructs can be mapped(using rdfs:subPropertyOf and rdfs:subClassOf):–mesh:subTreeOf onto skos:broader;–mesh:Descriptor onto skos:Concept;–mesh:hasRelatedDescriptor onto skos:related;–mesh:descriptorName onto skos:prefLabel.There is considerable mismatch between the schemas.Descriptors are the cen-tral concepts between which hierarchical relations exist,but it is unclear how MeSH Concepts and Terms can be dealt with.SKOS defines datatype proper-ties with which terms can be recorded as labels of Concepts,but this cannot be mapped meaningfully onto MeSH’Concept and Term classes.For example, mesh:conceptName cannot be mapped onto skos:prefLabel,as the former’s do-main is mesh:Concept,while the latter’s domain is skos:Concept(skos:Concept is already mapped onto mesh:Descriptor).Furthermore,the mesh:Category can-not be mapped onto skos:TopCategory,because TopCategory is a subclass of skos:Concept,while mesh:Category is not a subclass of mesh:Descriptor.5Case two:WordNetThis section describes how the method has been applied to WordNet release 2.0.The original source consists of18Prologfiles(23MB in total).The con-version programs(written in Prolog)plus links to the original source as well as the outputfiles of each step can be found at http://thesauri.cs.vu.nl/.The conversion took two persons approximately three weeks to complete.Note that Step3for WordNet is not discussed here for reasons of space,but is available at the forementioned website.5.1Analysis of WordNetWordNet[2]is a concept-based thesaurus for the English language.The concepts are called“synsets”which have their own identifier.Each synset is associated with a set of lexical representations,i.e.its set of synonyms.The synset concept is divided into four categories,i.e.nouns,verbs,adverbs and adjectives.Most WordNet relations are defined between synsets.Example relations are hyponymy and meronymy.There have been a number of translations of WordNet to RDF and OWL formats.Dan Brickley10translated the noun/hyponym hierarchy directly into RDFS classes and subclasses.This is different from the method we propose, because it does not preserve the original source structure.Decker and Melnik11 have created a partial RDF representation,which does preserve the original 10/Archives/Public/www-rdf-interest/1999Dec/0002.html11/library/A Method for Converting Thesauri to RDF/OWL11 structure.The conversion of the KID Group at the University of Neuchatel12 constitutes an extension of representation both in scope and in description of semantics(by adding OWL axioms).We follow mainly this latter conversion and relate it to the steps in our conversion method.In the process we changed and extended the WordNet schema slightly(and thus also the resulting conversion).5.2Converting WordNetStep1a:structure-preserving translation.In this step the baseline classes and properties are created to map the source representation as precisely as pos-sible to an RDF representation:–Classes:SynSet,Noun,Verb,Adverb,Adjective(subclasses of SynSet), AdjectiveSatellite(subclass of Adjective);–Properties:wordForm,glossaryEntry,hyponymOf,entails,similarTo, memberMeronymOf,substanceMeronymOf,partMeronymOf,derivation, causedBy,verbGroup,attribute,antonymOf,seeAlso,participleOf, pertainsTo.Note that the original WordNet naming is not very informative(e.g.“s”repre-sents synset).For readability,here we use the rdfs:label s that have been added in the RDF version.All properties except for the last four have a synset as their domain.The range of these properties is also a synset,except for wordForm and glossaryEntry.Some properties have a subclass of SynSet as their domain and/or range,e.g.entails holds between Verbs.The main decision that needs to be taken in this step concerns the following two interrelated representational issues:1.Each synset is associated with a set of synonym“words”.For example,the synset100002560has two associated synonyms,namely nonentity and nothing.Decker and Melnik represent these labels by defining the(multi-valued)property wordForm with a literal value as its range(i.e.as an OWL datatype property).The Neuchatel approach is to define a word as a class in its own right(WordObject).The main disadvantage of this is that one needs to introduce an identifier for each WordObject as it does not exist in the source representation,and words are not unique(homonymy).2.The last four properties in the list above(antonymOf,etc.)do not representrelations between synsets but instead between particular words in a synset.This also provides the rationale for the introduction of the class WordObject in the Neuchatel representation:antonymOf can now simply defined as a property between WordObjects.We prefer to represents words as literal values,thus avoiding the identifier prob-lem(see Guideline11).For handling properties like antonymOf we defined a helper class SynSetWord with properties linking it to a synset and a word.For each subclass of SynSet,an equivalent subclass of SynSetWord is introduced(e.g. SynSetVerb).A sample representation of an antonym looks like this:12http://taurus.unine.ch/GroupHome/knowler/wordnet.html。
高三英语学术研究方法创新思路探讨单选题30题及答案1. In academic research, a hypothesis is a(n) _____.A. proven factB. educated guessC. random thoughtD. unimportant idea答案:B。
在学术研究中,假设是有根据的推测。
选项A,假设不是已被证实的事实;选项C,假设不是随机的想法;选项D,假设在学术研究中很重要,不是不重要的想法。
2. A research question in academic studies should be _____.A. broad and vagueB. narrow and specificC. unanswerableD. unrelated to the topic答案:B。
学术研究中的研究问题应该是具体且有针对性的。
选项A,宽泛和模糊的问题不利于深入研究;选项C,无法回答的问题没有研究价值;选项D,与主题无关的问题不符合要求。
3. In academic research, data collection is often done through _____.A. guessingB. imaginationC. observation and surveysD. making assumptions答案:C。
在学术研究中,数据收集通常通过观察和调查进行。
选项A,猜测不能用于数据收集;选项B,想象不能作为数据收集的方法;选项D,做假设不是数据收集的方法。
4. A literature review in academic research is mainly to _____.A. show off one's knowledgeB. find existing research on the topicC. fill up pagesD. repeat what others have said答案:B。
高一英语科学发现词汇应用高级单选题40题1. In the process of scientific research, a(n) ____ is often put forward first, which is a proposed explanation for a phenomenon.A. experimentB. hypothesisC. observationD. conclusion答案:B。
解析:本题考查科学发现相关词汇的辨析。
“hypothesis”意为假设,在科学研究中通常首先提出一个假设来解释某种现象,这是科学研究的常见步骤。
“experiment”是实验,是用来验证假设的手段,而非首先提出的内容。
“observation”是观察,虽然观察也是研究的一部分,但不是这种对现象提出解释的概念。
“conclusion”是结论,是在经过一系列研究之后得出的结果,不是一开始就提出的。
2. Scientists made a careful ____ of the strange phenomenon before they started their research.A. experimentB. hypothesisC. observationD. discovery答案:C。
解析:这里考查词汇的语境运用。
“observation”表示观察,科学家在开始研究之前会对奇怪的现象进行仔细观察,这是符合逻辑的。
“experiment”是进行实验,在还未开始研究时不会先进行实验。
“hypothesis”是假设,此时还未到提出假设的阶段。
“discovery”是发现,这里强调的是对现象的观察过程,而不是发现这个结果。
3. The ____ they designed was very complicated but it could test their hypothesis effectively.A. experimentB. observationC. conclusionD. theory答案:A。
高三英语学术研究方法创新不断练习题40题答案解析版1. When conducting academic research, if you want to find the most relevant literature in a specific field, which of the following is the most important step in using a literature database?A. Entering random keywordsB. Using broad and general keywordsC. Selecting accurate and specific keywordsD. Ignoring the keyword search function and browsing randomly答案:C。
解析:在学术研究中,当使用文献数据库时,选择准确和具体的关键词是最重要的步骤。
如果输入随机关键词(A选项),可能会得到大量不相关的文献。
使用宽泛和笼统的关键词((B选项)会导致结果过于繁杂且相关性不强。
而完全忽略关键词搜索功能随机浏览((D 选项)效率极低,难以找到特定领域最相关的文献。
2. In academic research, which type of literature database is more likely to provide in - depth and specialized research articles?A. General - purpose databasesB. Social media platformsC. Specialized databases in the research fieldD. Public libraries' general collections答案:C。
解析:在学术研究中,专门针对研究领域的专业数据库更有可能提供深入和专业的研究文章。
高二英语科研项目实施单选题40题(带答案)1.In the scientific research project, we need to collect data _____.A.accuratelyB.exactlyC.preciselyD.correctly答案:A。
“accurately”强调准确地,在科研项目中收集数据需要准确无误。
“exactly”表示确切地、完全地;“precisely”精确地,和“accurately”意思较为接近但在科研收集数据的场景下,“accurately”更常用;“correctly”正确地,通常用于方法等正确,不太符合收集数据的语境。
2.When presenting the research results, we should express our ideas _____.A.clearlyB.obviouslyC.apparentlyD.visibly答案:A。
“clearly”清晰地,在展示研究结果时要表达清晰。
“obviously”明显地;“apparently”显然地;“visibly”看得见地,后三个选项不太符合表达想法的语境。
3.The scientific research project requires ______ teamwork.A.cohesiveB.unitedC.cooperativeD.joined答案:C。
“cooperative”合作的,科研项目需要合作的团队合作。
“cohesive”有结合力的;“united”联合的;“joined”连接的,这三个选项不太符合团队合作的语境。
4.We must analyze the data ______ to draw accurate conclusions.A.thoroughlypletelyC.entirelyD.wholely答案:A。
“thoroughly”彻底地,分析数据需要彻底才能得出准确结论。
高二英语科研项目实施单选题40题1.The scientist who discovered the new element is giving a lecture on our campus. This scientist is highly respected. The underlined part is a(n)________.A.attributive clauseB.appositive clauseC.adverbial clause答案:A。
解析:题干中“who discovered the new element”是一个定语从句,修饰先行词“scientist”,说明是哪个科学家。
B 项同位语从句是对一个名词进行解释说明,这里不是。
C 项状语从句不符合语境。
2.The question________we should carry out this experiment remains unanswered.A.thatB.whetherC.what答案:B。
解析:“whether”在句中引导同位语从句,解释说明“question”的内容,即“我们是否应该进行这个实验”。
“that”引导同位语从句时无实际意义,在该句中不合适。
“what”不能引导同位语从句。
3.In the research project, we need to find a place________we can conduct our experiments safely.A.thatB.where答案:B。
解析:“where”在句中引导定语从句,修饰先行词“place”,在从句中作地点状语。
“that”和“which”在定语从句中通常作主语或宾语,这里不适合。
4.The idea________came up at the meeting was very creative.A.thatB.whatC.which答案:A。
高中英语科技论文翻译单选题40题答案解析版1.The term “artificial intelligence” is often abbreviated as _____.A.AIB.IAC.AIID.IIA答案:A。
“artificial intelligence”的常见缩写是“AI”。
B 选项“IA”并非此词组的缩写。
C 和D 选项都是错误的组合。
2.In a scientific research paper, “data analysis” can be translated as _____.A.数据分析B.数据研究C.数据调查D.数据理论答案:A。
“data analysis”的正确翻译是“数据分析”。
B 选项“数据研究”应为“data research”。
C 选项“数据调查”是“data survey”。
D 选项“数据理论”是“data theory”。
3.The phrase “experimental results” is best translated as _____.A.实验结论B.实验数据C.实验结果D.实验过程答案:C。
“experimental results”的准确翻译是“实验结果”。
A 选项“实验结论”是“experimental conclusion”。
B 选项“实验数据”是“experimental data”。
D 选项“实验过程”是“experimental process”。
4.“Quantum mechanics” is translated as _____.A.量子物理B.量子力学C.量子化学D.量子技术答案:B。
“Quantum mechanics”是“量子力学”。
A 选项“量子物理”是“quantum physics”。
C 选项“量子化学”是“quantum chemistry”。
D 选项“量子技术”是“quantum technology”。
高三英语学术研究方法单选题30题1. In a literature review, which of the following is the most important step?A. Collecting a large number of sourcesB. Selecting relevant and reliable sourcesC. Reading the sources quicklyD. Copying the content of the sources directly答案:B。
本题考查文献综述中最重要的步骤。
选项A 收集大量来源固然重要,但质量更关键;选项C 快速阅读来源可能会忽略重要信息;选项 D 直接复制来源内容是学术不端行为。
选项 B 选择相关可靠的来源是确保文献综述质量的关键步骤。
2. When conducting a literature review, how should you handle contradictory information from different sources?A. Ignore it and focus on the consistent informationB. Choose the information that supports your hypothesisC. Analyze and try to reconcile the differencesD. Just randomly pick one of the pieces of information答案:C。
在进行文献综述时,面对不同来源的矛盾信息,选项A 忽略它只关注一致信息可能会导致研究不全面;选项B 只选择支持假设的信息会使研究有偏差;选项 D 随机挑选信息是不科学的。
选项C 分析并尝试调和差异是正确的处理方式。
3. What is the purpose of citing sources in a literature review?A. To show off your knowledgeB. To increase the word count of your reviewC. To give credit to the original authors and support your argumentsD. To make the review look more complicated答案:C。
高二英语研究方法单选题50题1. In an English research project about students' reading habits, if you want to get a large amount of data quickly, which data collection method is more suitable?A. Case studyB. QuestionnaireC. ObservationD. Experiment答案:B。
解析:问卷法(Questionnaire)是一种能够快速收集大量数据的方法,适用于获取学生阅读习惯等方面的广泛信息。
案例研究(Case study)侧重于深入分析个别案例,无法快速获取大量数据。
观察法(Observation)更多是直接观察行为,但可能难以快速覆盖大量样本。
实验法 Experiment)主要用于验证因果关系,不是快速获取大量数据的首选方法。
2. When you are doing an English research on teachers' teaching methods and you want to have in - depth conversations with them to understand their thoughts, which method should you choose?A. Document analysisB. InterviewC. SurveyD. Test答案:B。
解析:访谈(Interview)可以让研究者与教师进行深入的对话,从而理解他们的想法。
文献分析 Document analysis)侧重于对已有文献的研究,不能直接获取教师的想法。
调查(Survey)比较宽泛,可能无法进行深入交流。
测试(Test)主要用于评估能力等,而不是获取教师的教学想法。
高中英语学术论文研究方法练习题50题含答案解析1.In academic research, which of the following is NOT a common method of data collection?A.InterviewsB.QuestionnairesC.GuessingD.Observations答案解析:C。
Guessing 不是学术研究中常见的数据收集方法。
Interviews( 访谈)、Questionnaires( 问卷调查)和Observations( 观察)都是常见的学术研究资料收集方法。
2.For a research on students' learning habits, which method is likely to provide the most in-depth information?A.SurveysB.Case studiesC.Literature reviewsD.Random selection答案解析:B。
对于关于学生学习习惯的研究,案例研究 Case studies)可能提供最深入的信息。
Surveys(调查)可以获得一定的信息,但不如案例研究深入。
Literature reviews( 文献综述)主要是对已有文献的综合分析,不是直接收集关于学生学习习惯的信息。
Random selection( 随机选择)只是一种选择样本的方法,不是数据收集方法。
3.Which of the following data collection methods is suitable for large-scale research?A.Interviews with a few peopleB.Questionnaires distributed to a large number of peopleC.In-depth case studies of one personD.Observations of a small group答案解析:B。
A Method for Surveying the Long-RNA Landscape of ExosomesJon R. Armstrong1, Jeff Hiken1, Ajay Khanna1, Stephanie Huelga2, Luke Sherlin2 andDoug Amorese21. Cofactor Genomics, St. Louis, MO2. NuGEN, San Carlos, CAIntroductionThe emergence of next-generation sequencing platforms has enabled the discovery and interrogation of RNA from tissues, biofluids, and extracellular vesicles at remarkable levels. Recently, interest has grown in sequencing RNA found in extracellular vesicles called exosomes, because of their role in cell-cell communication and immunoregulatory processes [1]. The contents of exosomes have been shown to consist of multiple RNA types including, micro RNA (miRNA) and messenger RNA (mRNA), as well as other biologically active molecules [2, 3]. Extensive work has focused on assessing the miRNA content of exosomes and it’srole as a putative disease biomarker. However, other types of RNA, such aslong-RNA (> 200 bp; long non-coding RNA [lncRNA], mRNA, and circular RNA [circRNA]), may play an equal or more important role in cellular communication and cause alterations to biological pathways that affect disease development. Long-RNAs are a minor proportion of the RNA material in exosomes and until recently, it was extremely challenging to generate sequencing libraries from the very small amountsof long-RNA extracted from exosomes. However, new methods have emerged, such as NuGEN’s Ovation® Single Cell RNA-Seq System, which enable a single library to be generated from extremely small amounts of long-RNA. Unlike poly(A) library methods, these libraries retain the full complement of long-RNA species (e.g. lncRNA, mRNA, and circRNA). In order to test the Ovation Single Cell RNA-seq System as an applicable method for generating next-generation sequencing libraries, consisting of multiple long-RNA species, and explore the population of long-RNA molecules in exosome samples, we applied the Ovation Single Cell RNA-Seq System to total RNA extracted from exosomes. The libraries were sequenced and the resulting data was analyzed to characterize the landscape of exosome derived long-RNA species. Here, we present the method workflow, number and types of observed long-RNA’s, relative proportions of long-RNA’s across multiple control and renal cancer exosome samples and propose functional categories for genes enriched in exosome samples.Materials and MethodsSample ProcurementNormal control (n=3) and renal cancer (n=3) serum samples from separate individuals were purchased from Bioserve (Beltsville, MD). Normal control and renal cancer serum samples were matched, based on meta-data, as closely as possible prior to ordering. Samples were stored at -80 ˚C before exosome isolation.Exosome and RNA Isolation1 – 3 ml of serum was passed through 0.8 µm syringe filters (Millipore Millex-AA [cat. no. SLAA033SB]) to remove cell-membrane debris. Exosomes were then isolated using the exoRNeasy Serum/Plasma Maxi Kit (Qiagen, Valencia, CA), according to manufacturers directions. Briefly, exosomes were bound to a column matrix and washed. Exosome RNA was eluted from the matrix with Qiazol, and then isolated using RNeasy MinElute spin columns. RNA concentrations were quantitated using an RNA 6000 Pico assay run on the Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA).Library GenerationNGS libraries, for paired-end sequencing on the Illumina sequencing platform, were prepared using the Ovation Single Cell RNA-Seq System (NuGEN Technologies, San Carlos, CA). Illumina-compatible barcoded adaptors were included in the kit. After adaptor ligation, libraries are pre-amplified with 14 PCR cycles (Library Amplification I). For 10 ng exosome RNA input, libraries were further amplified with 10 PCR cycles (Library Amplification II).Sequencing and AnalysisLibraries from each sample were multiplexed and sequenced on the NextSeq 500 instrument (Illumina, San Diego, CA) as paired-end 150 base reads (2×150). The resulting reads in FASTQ format were aligned to GENCODE v24 lncRNA and protein-coding RNA transcript sets. Alignments were performed using Novoalign (version 3.02.05; Novocraft, Malaysia) in “-r all” mode to allow reads to align to multiple transcripts. RPKM values for the alignments were calculated from the BAM output files, using the standard RPKM formula, with a custom Perl script. Reads were also aligned to the hg19 reference genome using STAR (version2.4.1c) with option “–outFilterMultimapNmax 1” to only allow unique alignments [4]. Transcript RPKM values for each sample were loaded into Cofactor Genomics’ ActiveSite Differential Expression Viewer to derive aggregate and unique transcript counts. The software program Known and Novel Isoform Explorer (KNIFE) was used for circRNA detection and run according to the author’s recommendations [5]. The following command line was used: completeRun.sh $PWD/reads complete $PWD thisproject 13sam circReads 67 1 none. RNA-SeQC was used to generate read designations for each sample [6]. Database for Annotation, Visualization, and Integrated Discovery (DAVID) (version 6.7) was used to identify gene functional annotation terms that were significantly enriched in genes shared across all renal cancer samples [7]. A list of gene symbols was generated for the aggregate dataset and was used as input into DAVID.ResultsMolecular Workflow. Serum samples from renal cancer and normal control individuals (n=6) were enriched for exosomes and total RNA was extracted in order to investigate the efficacy of the NuGEN Ovation Single Cell RNA-Seq System for interrogating the composition of long-RNA molecules from exosomes. The NuGEN Ovation Single Cell RNA-Seq System is a random priming library method which will prime and produce cDNA from many types of RNA species in a sample. Duplicate technical replicate libraries were produced from each sample to investigate the variability induced by the Ovation Single Cell RNA-Seq library protocol. 12 total libraries were generated from 6 samples and the libraries were sequenced to ~10 million reads each on the Illumina NextSeq 500 platform using paired-end 150 base reads.Pre- and Post-Library Bioanalyzer Traces. The top two panels represent bioanalyzer traces from two samples following RNA extraction. The bottom two panels represent bioanalyzer traces after final library amplification. Exosomes were isolated from each sample, the RNA extracted and yield measured. Serum samples yielded an average of 22.6 ng/mL (SD = 8.1 ng) of total RNA from exosomes. Sequencing libraries were generated using 10 ng of exosome total RNA input to the NuGEN Ovation Single Cell RNA-Seq System. The bioanalyzer traces for replicate libraries exhibited equivalent topologies (not shown) demonstrating that the Ovation Single Cell RNA-Seq library method was robust at a 10 ng total RNA input amount. The input amount of long-RNA (> 200 bases) was likely underestimated since a large proportion of the measured concentration consisted of small-RNA species, identified by the large spike (< 100 bases) at the beginning of the bioanalyzer traces. The NuGEN Single Cell System employs a magnetic bead cleanup step which removes adapter ligated species smaller than 200 bases. Thus, cDNA inserts larger than approximately 70 bases were retained for sequencing (Illumina adapters add 130 bases to the inserts). miRNA species were removed by the bead cleanup.Read alignment percentages. The table above presents the percentage of read alignments for each library and RNA species (Genome = hg19 alignments; mRNA = GENCODE protein coding database alignments; circRNA = detection using KNIFE software; lncRNA = GENCODE lncRNA database alignments) including averages (Avg.) and standard deviations (SD). Although the percentage of read alignments across samples was not highly correlated, the percentage alignment between replicates was largely comparable. Differences in the RNA content of exosomes is to be expected, however technical replicate libraries should exhibit nearly the same sequencing metrics (as seen here) if the molecular steps in a chosen library method are robust.mRNA and lncRNA Transcript Scatter Plots. Scatter plots were generated, using RPKM, for mRNA and lncRNA species to investigate the correlation and molecular noise between replicate library samples. The slope and coefficient of determination (R2) were calculated for each comparison. The plots in Figure 5A were generated using RPKM signals of protein-coding alignments from Normal (red) and Renal Cancer (blue) samples. The lower plot (green) displays the slope and R2 between Normal Sample 1-1 and Renal Cancer Sample 1-1. Replicate libraries exhibit excellent linear reliability while, as expected, the plot of Normal Sample 1-1 and Renal Cancer Sample 1-1 exhibits less linear reliability (a greater number of differentially expressed coding transcripts). Figure 5B shows the same plots for lncRNA alignments. The above plots exhibit higher levels of noise than might be seen when comparing RNA sequencing data from tissue samples. However, exosomes originate from many different tissues and also exhibit a temporal component. Thus, higher levels of noise may be inherent with these types of samples. In addition, sampling noise could arise during the sequencing process (i.e. low number of reads per sample) or molecular noise from the Ovation Single Cell System at the current input amount (10 ng). These sources of noise could be attenuated by generatingadditional sequencing reads for each sample and/or increasing the amount oflong-RNA input to the molecular method.Average Number of Each RNA Type.The average numbers of each RNA type, from normal and renal cancer samples, are shown in the table above. The average was calculated from the summed number of signals in replicate libraries. A lower RPKM threshold cutoff was employed to count only high confidence signals (mRNA = 100, lncRNA = 500, circRNA = 5).Shared and Unique Protein-Coding and lncRNA Transcripts. The number of aggregate and unique protein-coding and lncRNA transcripts were calculated across normal and renal cancer sample groups using a lower RPKM threshold cutoff of 100 and 500, respectively. Figure 7A represents the number of aggregate and unique mRNA transcripts for normal and renal cancer samples. Figure 7B represents the number of aggregate and unique lncRNA transcripts. Normal samples shared a greater percentage of both protein-coding and lncRNA transcripts than renal cancer samples (normal – protein-coding = 33%, lncRNA = 27%; renal cancer – protein-coding = 30%, lncRNA = 18%). The detection of circRNA species was determined using a separate analysis algorithm (KNIFE), thus the number of shared and unique circRNA species was not calculated.Gene Ontologies for Aggregate Renal Cancer Genes. The chart above shows the top 10 significantly shared gene ontology (GO) terms across the aggregate group of renal cancer genes. The 706 transcripts (including isoforms) shown above in Figure7A – Renal Cancer, represent 191 genes which were used as input to DAVID. Over 75% of the coding transcripts in these exosome samples were involved in translation and elongation functions.Conclusions∙Exosomes contain an abundance of long-RNA species including protein-coding genes (mRNA), lncRNA, and circRNA.∙The Ovation Single Cell RNA-Seq System is an effective and robust method for generating next-generation sequencing libraries consisting of long-RNA from exosomes.∙mRNA, lncRNA, and circRNA can be identified and analyzed from a single library preparation using multiple data analysis pipelines.∙Numerous genes are shared across exosome samples, with the largest proportion making up translation and translational elongation functions.References1.Théry C., Ostrowski M., Segura E. Membrane vesicles as conveyors of immuneresponses. Nature Reviews Immunology. 2009;9(8):581–593.2.Valadi H, Ekstrom K, Bossios A, Sjostrand M, Lee JJ, LotvallJO. Exosome-mediated transfer of mRNAs and microRNAs is a novelmechanism of genetic exchange between cells. Nat Cell Biol. 2007:9(6):654–9.3.Schorey JS, Bhatnagar S. Exosome Function: From Tumor Immunology toPathogen Biology. Traffic (Copenhagen, Denmark). 2008;9(6):871-881.4.Dobin A, Davis CA, Schlesinger F, et al. STAR: ultrafast universal RNA-seqaligner. Bioinformatics.2013;29(1):15-21.5.Szabo L, Morey R, Palpant NJ, Wang PL, Afari N, Jiang C, Parast MM, MurryCE, Laurent LC, Salzman J. Statistically based splicing detection reveals neural enrichment and tissue-specific induction of circular RNA during human fetal development. Genome Biology. 2015, 16:126.6.Deluca DS, Levin JZ, Sivachenko A, Fennell T, Nazaire MD, Williams C,Reich M, Winckler W, Getz G. (2012) RNA-SeQC: RNA-seq metrics forquality control and process optimization. Bioinformatics (2012) 28 (11):1530-1532.7.Glynn Dennis Jr., Brad T. Sherman, Douglas A. Hosack, Jun Yang, Michael W.Baseler, H. Clifford Lane, Richard A. Lempicki. DAVID: Database forAnnotation, Visualization, and Integrated Discovery. Genome Biology 2003 4(5): P3。