ABSTRACT Type-based Inference of Size Relationships for XML Transformations
- 格式:pdf
- 大小:201.47 KB
- 文档页数:10
什么是宏平均(macro-average)和微平均(micro-average)什么是宏平均(macro-average)和微平均(micro-average)Fri, 05/14/2010 - 14:53 — Fuller宏平均(macro-average)和微平均(micro-average)是衡量⽂本分类器的指标。
根据Coping with theNews: the machine learning wayWhen dealing with multiple classes there are two possible ways of averaging thesemeasures(i.e. recall, precision, F1-measure) , namely, macro-average andmicro-average. The macro-average weights equally all the classes, regardless of how manydocuments belong to it. The micro-average weights equally all the documents, thus favouringthe performance on common classes. Different classifiers will perform different in commonand rare categories. Learning algorithms are trained more often on more populated classesthus risking local over-fitting.宏平均指标相对微平均指标⽽⾔受⼩类别的影响更⼤⽂章《⼀种快速⾼效的⽂本分类⽅法》给出了⼏个⽂本分类性能评估的公式。
对于给定的某个类别,a 表⽰被正确分到该类的实例的个数,b 表⽰被误分到该类的实例的个数,c 表⽰属于该类但被误分到其它类别的实例的个数,则准确率(p)和召回率(r)和F-指标分别被定义为:r = a / (a + c), if a + c > 0; otherwise r = 1p = a / (a + b), if a + b > 0; otherwise p = 1其中参数β⽤来为准确率(p)和召回率(r)赋予不同的权重,当β取1 时,准确率和召回率被赋予相同的权重。
•方法学•应用R软件b m e ta程序包实现贝叶斯M eta分析与M e ta回归石丰豪\孟蕊\芮明军、马爱霞〃1. 中国药科大学国际医药商学院(南京211198)2. 中国药科大学药物经济学评价研究中心(南京211198)【摘要】R软件bm eta程序包是一款通过调用IAGS软件来实现贝叶斯M eta分析和M eta回归的程序包,该程序基于“马尔可夫链-蒙特卡罗”(MCMC)算法来合并不同类型资料(二分类、连续和计数)的各种效应量 (OR、M D和IRR)。
该程序包具有命令函数参数少、提供模型丰富、绘图功能强大、易于理解和掌握等优点。
本 文将结合实例介绍展示bmeta程序包实现贝叶斯Meta分析与M eta回归的完整操作流程。
【关键词】R语言;bmeta程序包;贝叶斯Meta分析;M eta回归Perform ing Bayesian m eta-analysis and m eta-regression using bmeta package in R softwareSHIFenghao1,M ENGRui1,RUIMingjun1,M AAixia121. School of I nternational Pharmaceutical Business, China Pharmaceutical University, Nanjing 211198, P.R.China2. Pharmacoeconomic Evaluation Research Center, China Pharmaceutical University, Nanjing211198, P.R.China Correspondingauthor:MAAixia,Email:*****************【Abstract】The R software bmeta package is a package that implements Bayesian meta-analysis and meta-regression by invoking JAGS software. The program is based on the Markov Chain Monte Carlo (MCMC) algorithm to combine various effect quantities (OR, MD and IRR) of different types of data (dichotomies, continuities and counts). The package has the advantages of fewer command function parameters, rich models, powerful drawing function, easy of understanding and mastering. In this paper, an example is presented to demonstrate the complete operation flow of bmeta package to implement bayesian meta-analysis and meta-regression.【Key words 】R language; bmeta package; Bayesian meta-analysis; Meta-regressionM eta分析作为一种整合单个研究效应量进行 证据合并的常用统计方法,在循证医学中占有重要 地位叭贝叶斯M eta分析是基于贝叶斯统计发展 起来的一种的M eta分析方法,主要采用“马尔科 夫链一蒙特卡罗 ”(Markov chain Monte C arlo, MCMC)方法,因其在处理复杂随机效应、分层结 构或是稀疏数据时比频率学M eta分析方法更有优 势,目前越来越受欢迎。
BioControlInstructions for AuthorsScopeBioControl is the official journal of the International Organization for Biological Control (IOBC). It includes original papers on basic and applied research in all aspects of biological control of invertebrate or vertebrate pests, diseases, and weeds. Subject areas covered in BioControl comprise biology and ecology of organisms for biological control, and various facets of their use including any biological means of control for integrated pest management (IPM) such as plant resistance, pheromones and intercropping. Interdisciplinary papers with a global perspective on the use of biological control in integrated pest management systems are strongly encouraged. Developments in molecular biology and biotechnology that have direct relevance to biological control will also be considered for publication. Organisms covered by BioControl include parasitoids, invertebrate and vertebrate predators of pest animals and plants, mites, plant and insect pathogens, nematodes, and weeds.Article TypesOriginal research papers are full length papers describing original research or a novel hypothesis. A concise presentation is encouraged; the paper should not exceed 25 pages of double-spaced typed text (including abstract, tables, figures and references). One double-spaced typed page contains approximately 300-350 words.Forum papers aim to stimulate discussion and debate, particularly by presenting new ideas and by suggesting alternative interpretations to the more formal research papers published in BioControl and elsewhere.Reviews are normally by invitation from the Editor-in-Chief, or by an Associate Editor. However, authors are encouraged to submit a tentative title and a table of contents of a proposed review for consideration. These reviews should not exceed 40 pages of double-spaced typed text (including abstract, tables, figures and references). Letters to the Editor usually on matters of general concern to biocontrol research, are welcome but should not exceed 4 typed pages. Examples of topics for such letters include solutions to long-standing problems, exposure of significant contradictions, responses to a hypothesis or other letters to the editors published in BioControl. The decision to publish submitted letters resides with the Editor-in-Chief.Manuscript submissionLanguageManuscripts should be written in clear, concise, and grammatically correct English. British or American English spelling and terminology should be used, but either one should be followed consistently throughout the article. If English is not your native language we strongly urge you to have the text of your paper checked by a native English speaker before submission. Manuscripts that are inadequately prepared will not be considered for publication and will be returned to the authors.Legal requirementsSubmission of a manuscript implies: that the work described has not been published before; that it is not under consideration for publication anywhere else; that its publication has been approved by all co-authors, if any, as well as by the responsible authorities - tacitly or explicitly - at the institute where the work has been carried out. The publisher will not be held legally responsible should there be any claims for compensation.PermissionsAuthors wishing to include figures, tables, or text passages that have already been published elsewhere are required to obtain permission from the copyright owner(s) and to include evidence that such permission has been granted when submitting their papers. Any material received without such evidence will be assumed to originate from the authors.How to submitAuthors should submit their manuscripts online. Electronic submission substantially reduces the editorial processing and reviewing times and shortens overall publication times. Please connect directly to the site and upload all of your manuscript files following the instructions given on the screen.Upon submission, the e-mail addresses of all authors will be requested. At the end of the submission process, the corresponding author will receive an acknowledgement e-mail and all co-authors will be contacted automatically to confirm their affiliation to the submitted work./bicoManuscript preparationBioControl adheres to a policy of blinded reviewing, in which the identity of the authors is, as much as possible, kept from reviewers. Similarly, reviewers' names are kept confidential. Authors are therefore encouraged to avoid explicit disclosure of their identity in the text of their manuscript, as for example, by use of a header. In some cases the Editor-in-Chief may decide that direct discussion between author(s) and reviewer(s) would be helpful, but names are never disclosed without explicit permission.Blind title page A page giving only the title without the authors' names should be provided for use in the (double blind) review process. Do not include author(s) name(s) in the text or page header.AbstractPlease provide an abstract of 100 to 150 words. The abstract should not contain any undefined abbreviations or unspecified references.KeywordsPlease provide 4 to 6 keywords which can be used for indexing purposes.TextPlease double-space all material, including notes and references. All pages should be numbered consecutively, and lines should also be numbered within each page.Main text should contain (1) an INTRODUCTION summarizing the background and aims and ending with a very brief statement of what has been achieved by the work; (2) a MATERIAL AND METHODS section containing sufficient detail so that all procedures can be repeated (in conjunction with cited references); (3) a RESULTS section presenting results without extended lines of inference, arguments or speculations; (4) a DISCUSSION section interpreting the results and explaining the importance and relevance of the research.Text formatting• Use a normal, plain font (e.g., 12 point Times Roman) f or text.• Use the automatic page numbering function to number the pages.• Do not use field functions.• Use tab stops or other commands for indents, not the space bar.• Use the table function, not spreadsheets, to make tables.• Use the equation editor or MathType for equations.Note: If you use Word 2007, do not create the equations with the default equation editor but use MathType instead. Save your file in two formats: doc and rtf. Do not submit docx files.Heading levelsPlease use no more than three levels of displayed headings.AbbreviationsAbbreviations should be defined at first mention and used consistently thereafter. Alternatively, they can be collected in a separate list following the Keywords.SI units, numbersPlease always use internationally accepted signs and symbols for units, SI units.NomenclatureAuthors should adhere to the rules governing biological nomenclature, as laid down in the International Code of Botanical Nomenclature, the International Code of Nomenclature of Bacteria, and the International Code of Zoological Nomenclature. All biotica (crops, plants, insects, birds, mammals, etc.) should be identified by their scientific names including authors (and Order: Family, when appropriate) when the English term is first used in the main text, with the exception of common domestic plants and animals.Please relate to scientific names as follows:a) In the TITLE only give the Latin name but NO authority or (Order: Family)b) In the ABSTRACT all Latin names should be accompanied with the correct authority and if applicable with (Order: Family)c) In addition, at the FIRST MENTION in the body of the text - and only then - these data should be givend) The order, family of the most important organisms in the paper (e.g., those referred to in the title), should also go in the KEYWORDS list. Please give full genus and species names again, anywhere in the text where there is likely to be ambiguity.FootnotesFootnotes on the title page are not given reference symbols. Footnotes to the text are numbered consecutively; those to tables should be indicated by superscript lower-case letters (or asterisks for significance values and other statistical data).ReferencesThe list of References should only include works that are cited in the text and that have been published or accepted for publication. Personal communications and unpublished works should only be mentioned in the text. Do not use footnotes or endnotes as a substitute for a reference list. References in English available at the international level should be preferred and authors are encouraged to cite references of works published in previous issues of BioControl.Citation in textCite references in the text by name and year in parentheses. Some examples:• Negotiation research spans many disciplines (Thompson 1990).• This result was later contradicted (Becker and Seligman 1996).• This effect has been widely studied (Abbott 1991; Barakat et al. 1995; Kelso and Smith 1998; Medvec et al. 1993).List styleReference list entries should be alphabetized by the last names of the first author of each work.Journal articleSmith J, Jones M Jr, Houghton L (1999) Future of health insurance. N Engl J Med 965:325–329BookSouth J, Blass B (2001) The future of modern genomics. Blackwell, LondonBook chapterBrown B, Aaron M (2001) The politics of nature. In: Smith J (ed) The rise of modern genomics, 3rd edn. Wiley,New YorkArticle by DOISlifka MK, Whitton JL (2000) Clinical implications of dysregulated cytokine production. J Mol Med. doi:10.1007/s001090000086Online documentDoe J (1999) Title of subordinate document. In: The dictionary of substances and their effects. Royal Society of Chemistry. Available via DIALOG. /dose/title of subordinate document. Cited 15 Jan 1999 Always use the standard abbreviation of a journal name according to the ISSN List of Title Word Abbreviations, see /2-22661-LTWA-online.phpFor authors using EndNote, Springer provides an output style that supports the formatting of in-text citations and reference list.∙Endnote StyleARTWORK AND ILLUSTRATIONS GUIDELINESElectronic Figure Submission∙Supply all figures electronically.∙Indicate what graphics program was used to create the artwork.∙For vector graphics, the preferred format is EPS; for halftones, please use TIFF format. MSOffice files are also acceptable.∙Vector graphics containing fonts must have the fonts embedded in the files.∙Name your figure files with "Fig" and the figure number, e.g., Fig1.eps.Line Art∙Definition: Black and white graphic with no shading.∙Do not use faint lines and/or lettering and check that all lines and lettering within the figures are legible at final size.∙All lines should be at least 0.1 mm (0.3 pt) wide.∙Scanned line drawings and line drawings in bitmap format should have a minimum resolution of 1200 dpi.∙Vector graphics containing fonts must have the fonts embedded in the files.Halftone Art∙Definition: Photographs, drawings, or paintings with fine shading, etc.∙If any magnification is used in the photographs, indicate this by using scale bars within the figures themselves.∙Halftones should have a minimum resolution of 300 dpi.Combination Art∙Definition: a combination of halftone and line art, e.g., halftones containing line drawing, extensive lettering, color diagrams, etc.∙Combination artwork should have a minimum resolution of 600 dpi.Color Art∙Color art is free of charge for online publication.∙If black and white will be shown in the print version, make sure that the main information will still be visible. Many colors are not distinguishable from one another when converted to black and white. A simple way to check this is to make a xerographic copy to see if the necessary distinctions between the different colors are still apparent.∙If the figures will be printed in black and white, do not refer to color in the captions.∙Color illustrations should be submitted as RGB (8 bits per channel).Figure Lettering∙To add lettering, it is best to use Helvetica or Arial (sans serif fonts).∙Keep lettering consistently sized throughout your final-sized artwork, usually about 2–3 mm (8–12 pt).∙Variance of type size within an illustration should be minimal, e.g., do not use 8-pt type on an axis and 20-pt type for the axis label.∙Avoid effects such as shading, outline letters, etc.∙Do not include titles or captions within your illustrations.Figure Numbering∙All figures are to be numbered using Arabic numerals.∙Figures should always be cited in text in consecutive numerical order.∙Figure parts should be denoted by lowercase letters (a, b, c, etc.).∙If an appendix appears in your article and it contains one or more figures, continue the consecutive numbering of the main text. Do not number the appendix figures,"A1, A2, A3, etc." Figures in online appendices (Electronic Supplementary Material) should, however, benumbered separately.Figure Captions∙Each figure should have a concise caption describing accurately what the figure depicts. Include the captions in the text file of the manuscript, not in the figure file.∙Figure captions begin with the term Fig. in bold type, followed by the figure number, also in bold type.∙No punctuation is to be included after the number, nor is any punctuation to be placed at the end of the caption.∙Identify all elements found in the figure in the figure caption; and use boxes, circles, etc., as coordinate points in graphs.∙Identify previously published material by giving the original source in the form of a reference citation at the end of the figure caption.Figure Placement and Size∙When preparing your figures, size figures to fit in the column width.∙For most journals the figures should be 39 mm, 84 mm, 129 mm, or 174 mm wide and not higher than 234 mm.∙For books and book-sized journals, the figures should be 80 mm or 122 mm wide and not higher than 198 mm.PermissionsIf you include figures that have already been published elsewhere, you must obtain permission from the copyright owner(s) for both the print and online format. Please be aware that some publishers do not grant electronic rights for free and that Springer will not be able to refund any costs that may have occurred to receive these permissions.In such cases, material from other sources should be used.AccessibilityIn order to give people of all abilities and disabilities access to the content of your figures, please make sure that ∙All figures have descriptive captions (blind users could then use a text-to-speech software or a text-to-Braille hardware)∙Patterns are used instead of or in addition to colors for conveying information (colorblind users would then be able to distinguish the visual elements)∙Any figure lettering has a contrast ratio of at least 4.5:1TABLES, FIGURES, ETC.• Tables should appear each on separate sheets after the list of references.• Tables should always be cited in text in consecutive numerical order.• For each table, please supply a table title. The t able title should explain clearly and concisely thecomponents of the table.• Identify any previously published material by giving the original source in the form of a reference at the end of the table heading.Footnotes to tables should be indicated by superscript lower-case letters (or asterisks for significance values and other statistical data) and included beneath the table body.Figures• Figures should appear each on separate sheets after the list of references, after tables. All captions should be grouped on a separate sheet before figures.• Figure parts should be denoted by lowercase letters.• Figures should always be cited in text in consecutive numerical order.• For each figure, please supply a figure caption.• Make sure to identify all eleme nts found in the figure in the caption.• Identify any previously published material by giving the original source in the form of a reference at the end of the caption.For more information about preparing your illustrations, please follow the hyperlink to the artworkinstructions on the right.Authors biographyTo be submitted along with the manuscript (in a separate file) and preferably no longer than 100 words.Authors are invited to include a brief Authors Biography, which will appear after the References section.This is not mandatory. This provides an opportunity to present brief details of the authors and the overall research projects within which the published work has been carried out. The authors biography is not intended to replace standard acknowledgments, but rather to provide readers with an outline of the structure and objectives of the research teams, or groups, responsible for the work. An example of sucha box is:This research is part of a PhD project of Kim Surther devoted to the analysis of the host specificity of different Trichogramma species against the European corn borer Ostrinia nubilalis. Dr Susan Wardren is studying population genetics (mainly on parasitoid insects). Dr. Ted Fitzmeiter is involved in developing field experiments for testing the efficiency of potential biological control programmes. On particular interest is the analysis of inter-specific variation in insect parasitoids for finding efficient biocontrol agents.This work was carried out in the Ecology of Parasitoids group (lead by Franck Vernont) at INRA, Sophia Antipolis, France.AcknowledgmentsAcknowledgments of people, grants, funds, etc. should be submitted in a separate file. The names of funding organizations should be written in full.StatisticsCorrect and accurate statistical methods should be used to analyze data presented in the manuscript, and especially for non-Gaussian traits. Generalized Linear Model (GLM) should always be preferred overdata transformation or non-parametric procedures (for counts, percentages & time durations, at least). Standard errors (SE) have always to be indicated in the text, tables and figures (including for simple percentages). If needed, authors are strongly invited to take advice from a Statistician. Correspondingly, manuscripts judged to be based on inadequate or inaccurate statistical methods will be rejected without the possibility of resubmission.ETHICAL RESPONSIBILITIES OF AUTHORSCOMPLIANCE WITH ETHICAL STANDARDSDISCLOSURE OF POTENTIAL CONFLICTS OF INTERESTAfter acceptanceUpon acceptance of your article you will receive a link to the special Springer web page with questions related to: Open ChoiceIn addition to the normal publication process (whereby an article is submitted to the journal and access to that article is granted to customers who have purchased a subscription), Springer now provides an alternative publishing option: Springer Open Choice. A Springer Open Choice article receives all the benefits of a regular subscription-based article, but in addition is made available publicly through Springer's online platform SpringerLink. We regret that Springer Open Choice cannot be ordered for published articles.Springer Open ChoiceDOES SPRINGER PROVIDE ENGLISH LANGUAGE SUPPORT?Copyright transferAuthors will be asked to transfer copyright of the article to the IOBC (or grant the IOBC exclusive publication and dissemination rights). This will ensure the widest possible protection and dissemination of information under copyright laws.Open Choice articles do not require transfer of copyright as the copyright remains with the author. In opting for open access, they agree to the Springer Open Choice Licence.Offprints/ReprintsFree and/or additional offprints can be ordered by the corresponding author. Seventy-five offprints of each contribution are supplied free of charge to the corresponding author.Color in printOnline publication of color illustrations is free of charge. For color in the print version, authors will be expected to make a contribution towards the extra costs.Online firstThe article will be published online after receipt of the corrected proofs. This is the official first publication citable with the DOI. After release of the printed version, the paper can also be cited by issue and page numbers.Proof readingThe purpose of the proof is to check for typesetting errors and the completeness and accuracy of the text, tables and figures. Substantial changes in content, e.g., new results, corrected values, title and authorship, are not allowed without the approval of the Editor.After online publication, further changes can only be made in the form of an Erratum, which will be hyperlinked to the article.。
基于动态贝叶斯网络的意图分析算法樊振华;师本慧;陈金勇;段同乐【摘要】传统的意图分析方法面临部分方法仅针对单个目标进行静态分析,以及精确推理耗费计算量过大的问题.针对上述问题,提出了一种新的基于动态贝叶斯网络的意图分析算法.该算法以群目标为对象,综合己方意图、交火程度、相对实力和相对速度等多种因素构建动态贝叶斯网络,并根据马尔可夫性实现快速近似推理,进一步通过融合估计得到对方的行动意图.仿真结果表明,该算法对复杂战场环境下群目标的行动意图能够实现动态可靠的评估,辅助支撑作战决策.【期刊名称】《无线电工程》【年(卷),期】2017(047)011【总页数】5页(P41-44,78)【关键词】意图分析;动态贝叶斯网络;近似推理【作者】樊振华;师本慧;陈金勇;段同乐【作者单位】中国电子科技集团公司第五十四研究所,河北石家庄050081;中国电子科技集团公司第五十四研究所,河北石家庄050081;中国电子科技集团公司第五十四研究所,河北石家庄050081;中国电子科技集团公司第五十四研究所,河北石家庄050081【正文语种】中文【中图分类】TP391Abstract Traditional Intention Analysis (IA) methods are confronted with the problems that most of them only focus on the static analysis of a single target and exact inference brings too much computational burden.For this reason,a novel Dynamic Bayesian Network (DBN) based IA algorithm is proposed.In the proposed algorithm,firstly,DBN is constructed with various factors,i.e.,our intention,firefight,relative strength and relative velocity,for the IA of group targets.Then,the fast approximate inference is implemented according to Markov property.Finally,the analysis result of intention is obtained by fusion.Simulation results show that the proposed algorithm can reliably and dynamically evaluate the intention of group targets in complex battlefield environment.Key words intention analysis;dynamic Bayesian network;approximate reasoning随着高新技术的不断发展,当今地区冲突呈现出对象多元化和环境复杂化的特点,面对观测数据量急剧上升的情况,如果仍然依靠人工处理,则时效性和一致性均难以满足实际需求[1]。
Abstract一、在摘要中直接提出论文主题的句型和句式1、In this paper,we present a… approach to…本文提出了一种针对…的…方法。
2、In this paper,we describe improved… models for…本文介绍几种针对…的改进的…模型。
3、We propose a new… model and…algorithm that enables us to…我们提出一种新的…模型和…算法,它让我们能够…4、We present a…model that enables…我们提出了一种…模型,它使我们能够…5、This paper demonstrates the ability of …to perform robust and accurate…本文证明了…进行…可靠准确的…的能力。
6、In this paper we report results of a…approach to…本文报导了…的…方法的实验结果。
7、This paper demonstrates that…can effectively…with very high accuracy.本文证明,…能够有效地准确地…8、The purpose/goal/intention/objective/object/emphasis/aim of this paper is…本文的目的是…9、The primary/chief/overall/main object of this study is to survey…本研究的首要目标是考察…10、The chief aim of this paper/research/study/experiment/the present work is…本文的主要目标是…11 、The emphasis of this study lies in …我们的研究重点是…12、The work presented in this paper focuses on…本文所述工作重点放在…13、Our goal has been to provide…我们的目标是提供…14、The main objective of our investigation has been to obtain some knowledge of …我们的研究目标是获取有关…的知识。
Type-based Inference of Size Relationships for XML TransformationsZhendong Su <su@>Gary Wassermann <wassermg@>Department Computer Science University of California,Davis Davis,CA95616-8562USAABSTRACTXML transformation languages(e.g.,XSLT)take an XML document as input and produce another XML document as output.It is useful to know statically that such transforma-tions always produce valid documents,for static debugging of the transformation program or for eliminating dynamic checks on the output documents.Type-and automata-theoretic techniques that exploit XML’s tree structure have been proposed to address this problem.However,existing approaches are not capable of reasoning about size informa-tion of produced XML documents,such as that two locations in the output documents always have the same number of elements,which occurs when data is repeated.This paper presents a type-based inference system to discover size re-lationships in output documents from XML transformation programs through refined type checking.For example,our system can identify program fragments producing the same number of elements for all input documents.Programs that use or produce parallel or repeated data will benefit from this analysis.The novel aspects of our system are tech-niques to deal with the rich tree structure of XML types (i.e.,schemas),whereas array analyses(e.g.,bounds check-ing)for languages such as C deal withflat arrays.1.INTRODUCTIONSince XML[9]became a W3C recommendation in1998, XML has been increasingly accepted as the standard for-mat for electronic data exchange.Two parties who wish to exchange data generally organize their data differently. Thus,one or both of the parties must transform their data so that it is suitable for the other to use.In the context of XML,“schemas”(e.g.,XML Schema)[21]are used to spec-ify data organization.When data is exchanged using XML, the recipient specifies a schema to which all received XML documents must conform.The sender must write a trans-formation program to convert data from his own schema to Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on thefirst page.To copy otherwise,to republish,to post on servers or to redistribute to lists,requires prior specific permission and/or a fee.Copyright200X ACM X-XXXXX-XX-X/XX/XX...$5.00.the recipient’s schema.If the sender can determine that his program performs the transformation correctly,no run-time checks are necessary.This gives rise to the XML type checking problem.Let T be the set of all XML documents(T is a mnemonic for “trees”;all XML documents have a tree structure).An XML type is a subset of all documents:τ⊆T,often calleda schema.The XML type checking problem asks,for sourceand target typesτs andτt respectively,and transformation program P,is it true that∀x∈τs.P(x)∈τt[25]?One common approach to answer this question is based on type inference:an output typeτ is conservatively inferred based on the program and the source type:P(τs)⊆τ .If the inferred type is a subtype of the target type,τ ⊆τt,then the program successfully type checks.We introduce the notion of sizes in XML documents and types:a size denotes the number of XML elements and/or scalars in a consecutive sequence under a common parent.For a particular XML document,sizes are always known constants.However,sizes may not remain constant across all documents conforming to a single type.In this case,the sizes of the type are represented by variables,which may be constrained to allow only values valid for some document within the type.When some sizes of a type are constrained in terms of other sizes(currently not supported in XML Schema),we call those size relations.Because of the com-mon use of Kleene stars in types,it is generally impossible to discover the actual values of sizes.Rather,we aim at discovering relationships among sizes in output documents.Some practical settings require size information.For ex-ample,in a document with parallel lists of movie titles and the years those movies were made,the length of those lists can vary provided that they are equal to each other.Alter-natively,consider a specification manual that must include the same information in multiple languages.The number of headings in one linguistic section can vary provided that it equals the number of headings in every other linguistic sec-tion.Size relationships arise in settings that include parallel or repeated data.The ability to infer size relationships may find application in ensuring the correct composition of Web services[7].No previous technique for XML type checking can accu-rately type check a program when the target type has size relations and the program output is not confined to a regular subtype of the output type.The decidability of XML type checking has been established using k-pebble tree transduc-ers when no size relations are present[20].It is unclear how 1τt=docJ J Jr r rtitles isbns title*n isbn*nτ =doc G G Gu u utitles isbnstitle*isbn*τ τt!Figure1:A target type with size relations.Con-servatively inferred types using existing techniques cause correct programs to fail to type check.well these automata-based techniques would work in prac-tice because of their high computational complexity,and more fundamentally,how to incorporate size information into these formalisms to retain decidability of type check-ing.Existing type-based approaches[8]may provide more practical,if less precise,solutions.However,currently these approaches are unable to infer types with size relations.The main contribution of our paper is a type-based inference sys-tem to discover size relations for XML transformation pro-grams.To the best of our knowledge,ours is thefirst system capable of reasoning about size relations for XML transfor-mations.Several languages have been proposed for XML trans-formations,including XSLT[6],XQuery[8],XDuce[14], CDuce[2],HaXml[27],and Relaxer[11].The XML trans-formation language in this paper used to explain our tech-nique has much of the expressive power of these languages. It includes iterations over subtrees,pattern matching based on tag or type,conditional expressions,etc.Section2intro-duces our language.For illustration purposes,we considerfirst the following XQuery program:<doc><titles>for$a in document("cat.xml")//catalog/book/title return$a</titles><isbns>for$b in document("cat.xml")//catalog/book/isbn return$b</isbns></doc>The program takes an XML catalog of books and creates an output document with lists of titles and ISBNs,perhaps for easy ISBN lookup by title in a printed listing.Because each book has exactly one title and ISBN,the lists rooted at titles and isbns must have the same number of elements—that is,they must have the same size.The programmer would like to confirm that this size relationship holds. Figure1gives the output typeτt(omitting the scalar chil-dren of title and isbn)as the programmer intends it.We portray types as trees to provide a graphical view of the tree structure of XML types.A vertical or diagonal line means that the type at the lower end of the line is a child of the type at the upper end.In the typeτ in Figure1,titles,isbns is a sequence type;the sequence constructor is implicit be-cause titles and isbns are children of the same parent and are next to each ing existing type inference meth-ods,the typeτ would be inferred.Becauseτ τt,this correct program fails to type check.Repeated data arises in many settings,and each time it does,this shortcoming of existing techniques limits the amount of automated checkingτs=rootlev1∗lev2∗1for w in children(root)do2for x in children(w)do3S,4for y in children(root)do5for z in children(y)do6T(a)(b)Figure2:A source type with nested repetitions.rootlev1*lev2*root1lev1*mlev2*n i,i∈[1..m]root1lev1*mlev2*n(a)(b)(c)Figure3:Two ways to annotate a source typeτs.available to programmers.1.1Difficulties with Size Relation InferenceAtfirst consideration,it may seem as though the use of integer constraints,which enable array analyses in languages such as C,would be sufficient for inferring size relationships.Surprisingly,it is not that simple.The main problem is that because XML transformations operate on trees,a very rich data structure,size relationship inference must interrelate tree sub-structures.Standard array analyses,however,need only reason about how size information for arrays,aflat data structure,flows in C-like programs.Consider the source typeτs and program shown in Fig-ure2.Lines1–3and4–6of the program in Figure2have the same semantics as/root/~/~(where~is a wildcard that matches any tag),except line3substitutes an S for what-ever the output would have been,and equivalently with T on line6.In general,the semantics of paths can be achieved through nested for and case expressions.For example,in Figure10,lines1–10are equivalent to/catalog/book/title.Clearly this program produces the same number of S’s as T’s.However,standard type systems perform a modular analysis,using only the types of subexpressions and some global type environments for type checking and inference.Therefore,in discovering a relation,such as size equality between two expressions,the type system is restricted to using information it has available at the time of typing both expressions:the input type.It must discover size relations between the input type and the type of each expression in order to relate the sizes of the expressions’types to each other.Suppose that in hoping to discover the size relation-ships precisely,we annotateτs as in Figure3b,where1,m, and n i are size annotations for the corresponding types.We argue that the precision aimed at cannot be achieved.The for expression has the form(for x in e1do e2).The expression e1evaluates to a list,and for each element a in that list,x gets bound to a and e2gets executed.There are two approaches to typing the for expression.Wefirst look at the approach used in XQuery’s type system[8]:first,x is bound to the union of the unit types in e1’s type,τ1,and e2is typed once with x having that binding.Then type constructors are added to the inferred type based on their2τ1=lev1*mlev2*n i,i∈[1..m]......τu=lev11lev2*n1|...|n m.......................................................τ1b=lev2*n1|...|n m⇒τ1b=lev2*r,{min(n i)≤r≤max(n i)} .......................................................τub=lev21...τ2b=S1.......................................................τ =S*p,{min(m×n i)≤p≤max(m×n i)}Figure4:Some inferred types in ourfirst attempt to infer all size relationships precisely.τu1=lev11lev2*n1τu2=lev11lev2*n2τu3=lev11lev2*n3···Figure5:Some inferred types in our next attempt to infer all size relationships precisely.occurrences inτ1.In inferring a type for the program in Figure2b,ifτs is annotated as shown in Figure3b,the type of expression children(root)on line1isτ1,as shown in Figure4.Fig-ure4also shows the rest of the types we refer to in this paragraph.Suppose wefind the type of w,τu,by taking the union of the unit types inτ1,as done in XQuery’s type system.The type of children(w)on line2is thenτ1b.How-ever,it is not clear what the size annotation onτ1b means. To add clarity,we rewrite the size annotation as shown in Figure4.We can nowfind the unit type to which x gets assigned asτub,and so the body of the inner for expression on line3is typed asτ2b.We then go back up and compose the typeτ2b with*r.When going up again tofind the type of the for expression on line1,we compose the type of the nested for expression with*m.The result isτ (for clarity, r has been simplified out of the constraints). Unfortunately,all we know about p’s value is that it is confined to a given range.When the for expression on line 4is typed,the result will also be a starred type whose size is confined to the same range.Knowing that two numbers are in the same range is usually not sufficient to relate the two numbers concretely(e.g.,describing one as a function of the other).Thus,this approach is not suitable for discovering interesting size relations.The second approach to typing the for expression is the one taken by Fernandez et al.[10]:find a type for the body of a for expression for every named unit type inτ1and combine them based on the type structure ofτ1.Given the annotation forτs in Figure3b,the type of children(root) is againτ1,as shown in Figure4.Because the number of children(n i)may be different for each of the m lev1’s,thefirst unique unit type here isτu1,as shown in Figure5.Thesecond isτu2,the third isτu3,etc.Trying to infer a typefor the for expression by inferring a type for e2based onτu1,τu2,τu3,...,is cumbersome and requires complicatedsymbolic reasoning about summations such as mi=1n i.Why cannot we get precise size information through pre-cise source type annotations?When a type element in a tree type has a Kleene star(e.g.,lev1*in Figure2),its childrenin the type tree(e.g.,lev2*)represent uniformly all lists of child elements of the starred element in an actual document.Adding precise size annotations to Kleene starred elements(e.g.,lev2*n i)of the input type distinguishes within the typetree the concrete lists that the Kleene starred element rep-resents.After distinctions have been added to the input type,either the type system“factors out”the distinctions, resulting in a loss of precision(as shown in Figure4),or in addressing the distinctions directly,the type system faces increased complexity(as shown in Figure5).In this paper, we present an approach to overcome these problems.1.2Our ApproachThe key insight of our approach is that the elements ofa list are usually treated uniformly,both in the type andin the program,so the only information we need is the to-tal number of elements in a concrete tree represented by an element(or more precisely,by a path)in the type tree.In some XML transformation languages,there is no mech-anism to access the elements in a list non-uniformly.For example,it is impossible to remove thefirst element froma given list.In other languages(e.g.,CDuce),constructsthat handle list-elements non-uniformly can be typed con-servatively.We take advantage of this in our analysis.We annotate the source type as shown in Figure3c.The annota-tion n on lev2in Figure3c denotes the total number of lev2 elements in the input document that have as parent a lev1 element and as grandparent a root element.Note the differ-ence between this annotation and a size:the elements may not all have a common parent.For an alternating sequence of for and case expressions that match the semantics of /root/lev1/lev2,1the body of the innermost case will be executed n times.Consequently our type system multiplies the size of the innermost case expression by n tofind the size of the outermost for expression.Because our annotations do not introduce distinctions into τs,we avoid the trouble shown in Figure5.We therefore leverage the more powerful second approach to typing the for expression.The union operation used in thefirst ap-proach loses information whenever an element type has more than one child type and so cannot achieve the precision nec-essary to infer size relationships.Conditional expressions are often used to select certain elements of a list and pass over others,so they,too,influence the sizes of output types.A solution that leads to sizes confined to ranges has the same problems as discussed in Section1.1,but unless all parts of the boolean expression are static,we cannot determine statically which branch of the conditional will be executed.To address this we use pair types.Like conditional types[1],pair types preserve the relationship between the types of the branches of conditional expressions and true and false evaluations of the boolean expression.We relate the size of a pair type to the sizes of the conditional expression’s true and false branches as well as to the identity of the boolean condition.If,in a pre-processing phase,two boolean conditions can be found to be equivalent,then it becomes possible to relate the sizes of the corresponding conditional expressions.2.THE SOURCE LANGUAGE1Thefirst and second halves of the program in Figure2 would match this semantics if the appropriate case expres-sions were inserted.3tag avariable xconstant n::=c int|c str|c booloperator op::=+|-|and|or|=|<expression e::=n|x|a[e]|e,e|()|e op e|let x=e do e|for x in e do e|children(e)|case e of x:p=>e|x=>e end|if e then e else e|f(e;...;e)pattern p::=a|~|sdata d::=n|a[d]|d,d|() Figure6:XML transformation language.tag atype name xsize type r::=c|nscalar type s::=String|Boolean|Integer unit type u::=a[τ]|~[τ]|stype z::=x|τ,τ|τ∗|()|<τ,τ>|τ|τ|∅annotated typeτ::=u1|z rFigure7:Type language.Figure6gives the syntax of our XML transformation lan-guage.Most of the constructs are standard.The expres-sion a[e]constructs XML elements.Paths can be expressed through for and case expressions.We omit the expres-sion to select the parent of an element,which can be typed conservatively,as is done in other XML transformation lan-guages with type systems,such as XQuery.Beyond that,our language does not include,for example,sorting,explicit type casts,and modules.We do not expect much difficulty in ex-tending our technique to cover these language constructs. Note that the case expression matches a value against a p,defined by the“pattern”derivation(which parallels the “unit type”derivation in Figure7).Also,as shown by the “data”derivation,we denote XML elements as tag[...] rather than<tag>...</tag>to simplify notation.The dy-namic semantics for this language is standard,but the com-panion technical report gives a complete presentation[24].3.OUR TYPE LANGUAGEFigure7gives our type language.The“size type”shows that either a constant or a variable can be used as a size an-notation on a type.The size annotation denotes the number of unit values that may be matched to the annotated type. The grammar allows nonsense types to be written(e.g.,()2), but our type system only infers meaningful types and pro-grams only type check if all types are meaningful.The wild-card unit type,~[τ],is defined such that a[τ]is a subtype of~[τ]for all tags a,following Fernandez et al.[10]. Among the types z,“τ,τ”is the type of two values in sequence.The union type is“τ|τ”;a value whose type is either of the choices matches it.We introduce the pair type,“<τ,τ>,”for typing conditional expressions.Likeexpression e::=c|n|(Γπ(π))|(e)|e+e|e×epathπ::=π/a|εconstraint C::=C∪{n=e}|C∪{π}|∅Figure8:Constraint language.the choice type,a value of either of the two types may match it.Unlike the choice type,the order of the two types is pre-served,i.e.,(τ1|τ2)=(τ2|τ1),but<τ1,τ2>=<τ2,τ1>, whenτ1=τ2.Because the order is preserved,it is possible to reason about the types of different conditional expressions in relation to each other.The“∅”is an identity for choice types,and is needed for typing repetition expressions.We also have a constraint language to capture size re-lations.Figure8shows our constraint language.There are two kinds of constraints.Thefirst kind consists of equality constraints between a size variable and an arith-metic expression.The functionΓπmaps paths to size vari-ables/constants.The second kind of constraint is a path,π, which is not explicitly related to anything else in the con-straints or types.Paths remain in the constraint set only temporarily during the typing of for expressions that match the semantics of paths.Paths,the functionΓπ,and the con-nection between them are explained in Section4.3.4.TYPE RULESWe use a constraint-based formulation of our type system.The type judgmentΓ e:τ,C is read:in environmentΓ, expression e has typeτ,where the size variables inΓand τare subject to the constraints C.Type environments are defined by the following grammar:Γ::=∅|Γ {x:τ}|Γ {for x:τ}where{for x:τ}is used in typing the for expression.A type environment,Γ,maps variables and“for”variables to types according to the following rules:Γ {x:τ}(x )=τif x=x=Γ(x )otherwiseΓ {for x:τ}(x )=τif x=x=Γ(x )otherwiseDue to space constraints we explain here the typing of the three most interesting expressions to size types in increasing order of difficulty.The complete list of type rules can be found in the companion technical report[24].In the type rules that we discuss next,z,u,andτare as in Figure7:z is a type without a size annotation,u is a unit type without an annotation,andτis a type with an annotation.In Section4.1,we explain the type rule for sequence ex-pressions,in which two subexpressions are put in sequence.In Section4.2,we explain the type rule for conditionals.In Section4.3,we explain the typing of the for expression, which is the most involved because it is the main language construct used to produce subtrees of unknown size.We also discuss the typing of recursive functions in Section4.3.2.4.1Sequence Expressions4The type rule for sequence expressions is as follows:τ1=z m11τ2=z m22Γ e1:τ1,C1Γ e2:τ2,C2n is freshΓ e1,e2:(τ1,τ2)n,C1∪C2∪{n=m1+m2}This rule is straightforward:the number of XML elements produced by the sequence expression as a whole is the sum of the numbers of XML elements produced by its subexpres-sions.The rule adds the constraint that n,the size of the sequence expression,equals m1+m2,the sum of the sizes of the subexpressions.4.2Conditional ExpressionsOur type system allows the true and false branches of a conditional expression to have different types.The type of the conditional expression in most type systems is a choice type composed of the types of the branches:(τ1|τ2).The loss of precision from this approach poses a problem:we can determine that the value of the size variable for the conditional expression is within the range of the sizes of its branches,but we can no longer conclude that two sizes are equal.We address this by means of a pair type,introduced in Section3,plus related constraints.The main ideas is this:if two different if expressions have the same boolean condition with equivalently bound variables and get executed the same number of times in one run of the program,then their true and false branches get executed the same number of times respectively.We can use unification to conservatively determine which variables have the same binding.A straightforward analysis can conserva-tively determine which boolean expressions are equivalent and are executed the same number of times.All if ex-pressions are given labels,and two if expressions will have the same label if and only if their boolean expressions were found to be equivalent.Our type rule for conditional expressions is as follows:Γ e b:Boolean1,C bΓ e1:τ1,C1Γ e2:τ2,C2τ1=z1n1τ2=bel=p n is fresh Γ if e b then e1else e2:<τ1,τ2>n,C b∪C1∪C2∪{n=p×n1+notp×n2}The hypothesis“bel=p”extracts the label,and uses it as a fresh size variable.The rule also uses the label to create a fresh size variable,notp,which represents the unknown number of times the boolean expression evaluates to false. The constraint uses the label to relate the sizes of conditional expressions with equivalent booleans.4.3Repetition ExpressionsThe for expression is the principle mechanism for pro-ducing lists of unspecified length,and is the most involved to handle in our type system.Consider the input type and the program fragment shown in Figure9.By inspection of the input type,we conclude that the expression“children(book0)”has type:τ=title[String],author[String]+nIn other words,all data that children(book0)produces are included in the set defined byτ.Although the“n”inτdoes not tell us how many elements each member ofτhas,it does say that the number of elements produced is the same as the number of the input elements.The body of the forbook G Gtitle author+nString Stringlet book0:Book1for x in children(book0)2do case x of3x1:author=>x14x2=>()5end(a)(b)Figure9:Simple type tree and expression.expression operates on each element individually from the value of the“children”expression.Each execution of the for expression produces data included in the type:τ =author[String]+nTwo main ideas allow us to infer such types.Thefirst idea is the use of auxiliary rules for typing the for expres-sion.Auxiliary rules allow us to infer a type for the for expression which has the same regular structure as the type it iterates over.The second idea is that the input type and the program contain implicit size-related information,which we make explicit.We determine which information to make explicit based on the argument in Section1.1.We then equip the for rules to use this information.4.3.1Auxiliary RulesAuxiliary rules for the for expression were introduced by Fernandez,Sim´e on,and Wadler[10],and we review them here.The for expression has the form:for x in e1do e2.Suppose that for the program fragment in Figure9b,we have determined that“children(book0)”has type:τ1=title,author+String StringWe show the sequence constructor here to make it explicit that this is a sequence type.Fernandez et al.’s type rule (which does not reason about sizes)for the for expression looks roughly like:Γ e1:τ1Γ {for x:τ1} e2:τ2Γ for x in e1do e2:τ2We have already foundτ1,and at the top level,τ1is a se-quence type.Tofindτ2for the second hypothesis,wefirst use the following auxiliary rule:Γ {for x:τ1} e2:τ 1Γ {for x:τ2} e2:τ 2Γ {for x:τ1,τ2} e2:τ 1,τ 2This rules types the original program fragment by•finding the for expression’s type as though e1had typetitle[..],•finding the for expression’s type as though e1had typeauthor[..]+,•and putting those two types in sequence.5τs =catalog book *O O O O n n nng g g g g g g g g author +title subtitle?ISBN StrStrStrInt−→τs =catalog 1book *n P P P P m m mm f f f f f ff f f f author +m title n subtitle?r ISBN n Str mStr nStr rInt nFigure 11:The input type tree annotated.let cat0:Catalog1titles[for w in children(cat0)do 2case w of 3w1:book =>4for x in children(w1)do 5case x of 6x1:title =>x17x2=>()8end 9w2=>()10end ],11isbns[for y in children(cat0)do 12case y of 13y1:book =>14for z in children(y1)do 15case z of 16z1:isbn =>z117z2=>()18end 19y2=>()20end ]Figure 10:A program that makes a list of all titles followed by all ISBNs from a catalog.Other auxiliary rules handle other type constructors,such as repetitions (“+”).If e 1’s type is a unit type (e.g.,title[..]),then the for expression can be typed like a let expression.The purpose of these auxiliary rules is to decompose the in-put type,infer types for the for expression over unit types,and compose the inferred types according to the structure of the input type.The resulting type of the for expression in Figure 9is:τ2=(),author +=author +StringString4.3.2Making Size-related Information ExplicitIn Sections 1.1and 1.2,we argued that a natural,type-based analysis can reason about size information at,but not below,the top level of repetition.We define top-level repe-tition for a given type to be types which are repeated (*’d)and which have no repeated ancestors.We get the informa-tion that we need for reasoning about sizes by answering:1.Which for expressions produce top-level repetition in the output type?2.Which parts of the input type do those for expressions iterate over?3.How many elements are in those parts of the inputtype?Aside from conditional expressions,which Section 4.2ad-dressed,the for expression is the only expression which can both produce output of unspecified length and have other expressions nested in it.To answer the first question con-servatively,we use the syntactic structure of the program.A for expression may not produce top-level repetition if it is nested in an expression e ,where e is not a for or case expression and e is nested in the body of a for expression.Otherwise,the for expression will produce top-level repeti-tion.We answer the second question similarly.Consider the nested for and case expressions on lines 1–10of the pro-gram in Figure 10,for which τs in Figure 11is the input type.Collectively,these expressions iterate over the title ele-ment,or more specifically the path /catalog/book/title ,in τs .We infer this path by checking for certain syntactic structures (e.g.,e 1is a children expression in all for ex-pression,and each expression uses a variable bound by the expression above it),and piecing together the patterns from the case expressions.If the relevant syntactic structures do not appear,we do not infer a path.We make paths explicit for each group of for and case expressions that produce top-level repetition by labeling the first expression with start and the last expression with the path it iterates over.To answer the third question,we first note that the num-ber of elements represented by a path in the input type is the same each time an expression iterates over that path.We therefore give names to the numbers of elements along each path of the input type.Figure 11shows an example of this naming.Starting from the root,sizes are given names.If the number of child elements per parent element is constant (e.g.,1title per book ),the child’s size is proportional to the parent’s.These names get used as size variables when an expression outputs the corresponding portion of the in-put tree.The precise details on how size names are put on the input type tree and how the transformation program is annotated can be found in the companion technical report (see Section 3.2.3[24]).Recursion,both in the input type and the program,is another source of statically unknown output.In the case of recursive types,we give a name to the total number elements represented by each tag at the first “back-edge.”In a recur-sive type,there is at least one pair of type elements or type constructors where each can be reached from the other.We say that a back-edge goes to the type element/constructor that can be reached from the root of the type tree first.A type element/constructor with an incoming first back-edge has no anscestors with incoming back-edges.Figure 12shows a recursive type first textually [10]then graphically.The annotation at the choice type-constructor means,for 6。