Scalable Query Reformulation Using Views in the Presence of Functional Dependencies

格式：pdf
大小：83.80 KB
文档页数：12

下载文档原格式

第四章全局查询处理和优化

§4.4 查询优化的基础
2、查询树在查询树中，叶子表示关系，中间节点表示运算，前序遍历关系表示运算次序。定义： ROOT：=T T：=R/（T） /TbT/UT U：=σF/ПA b： =∞/X/∪/∩//∝
§4.4 查询优化的基础
3、举例例4.2.1 设有一供应关系数据库，有供应者和供应两关系，如下：供应者：SUPPLIER{SNO，SNAME，AREA} 供应者编号供应者姓名供应者所属地域供应：SUPPLY{SNO，PNO，QTY} 供应者编号零件号质量查询要求：找出地域在″北方″供应100号零件的供应商的信息。 SQL查询语句：SELECT SNO，SNAME FROM SUPPLIER，SUPPLY WHERE AREA=″北方″AND PNO=100 AND SUPPLIER.SNO=SUPPLY.SNO
§4.4 查询优化的基础
（2）等价变换重复律：UR ≡ UUR 交换律：U1U2R ≡ U2U1R 分配律：U（RbS）≡（UR）b（US）结合律：Rb1（Sb2T）≡ （Rb1S）b2T 提取律：（UR）b（US） ≡ U（RbS）其中：R、S、T为关系，U1、U2、U为一元运算符，b1、b2、 b为二元运算符。
§4.4 查询优化的基础
3、举例等价的关系表达式： Q1：ПSNO，SNAMEσAREA=″北方″σPNO=100 （SUPPLIER∞SUPPLY）查询树：
§4.2 Overview of Query Processing
通常用SQL语言操纵语言来表达全局查询。之后，由系统将其转换成内部表示。实际上，在查询执行过程时，最终涉及的是具体场地上的物理关系的查询。影响查询处理效率的因素有：网络传输代价（数据量和延迟等）、局部I/O代价及CPU 使用情况代价等，但主要由网络通信代价和局部 I/O代价来衡量。不同的分布式数据库系统可能对评估查询处理的传输代价和I/O代价的侧重不同，同时，为提高查询的效率，在查询处理过程中还要进行优化处理，查询优化就是确定出一种执行代价最小的查询执行策略或寻找相对较优的操作执行步骤。一般可采用多级优化。本章介绍全局查询的处理与优化。

lambdaquerychain分组聚类操作

lambdaquerychain分组聚类操作LambdaQueryChain 是一个Spring Data JPA 的工具类，它提供了一种链式编程的方式来构建查询。

如果你想在LambdaQueryChain 中进行分组和聚合操作，你可以使用`groupBy` 和`aggregate` 方法。

以下是一个示例，说明如何使用LambdaQueryChain 进行分组和聚合操作：```javaimport org.springframework.data.jpa.domain.Specification;import org.springframework.data.jpa.repository.JpaRepository;importorg.springframework.data.jpa.repository.support.SimpleJpaRepository;import org.springframework.data.jpa.query.QueryUtils;importmbdaQueryChain;importmbda.PredicateFunction;import javax.persistence.EntityManager;import javax.persistence.TypedQuery;import javax.persistence.criteria.*;import java.util.*;public class MyRepositoryImpl extends SimpleJpaRepository<MyEntity, Long> implements MyRepositoryCustom {private final EntityManager entityManager;public MyRepositoryImpl(EntityManager entityManager) {super(MyEntity.class, entityManager);this.entityManager = entityManager;}@Overridepublic List<MyEntity> customGroupedAggregation() {// 创建查询条件PredicateFunction<MyEntity> predicateFunction = (root, query, criteriaBuilder) ->criteriaBuilder.equal(root.get("someField"), "someValue");// 使用LambdaQueryChain 进行分组和聚合操作LambdaQueryChain<MyEntity> queryChain = query().where(predicateFunction);queryChain = queryChain.groupBy("groupField"); // 分组依据的字段queryChain = queryChain.aggregate(result -> result.<Integer>sum("anotherField")); // 聚合操作，这里是对anotherField 进行求和操作TypedQuery<MyEntity> typedQuery = queryChain.getQuery(); // 获取TypedQuery 对象，可以进一步定制查询条件或执行查询等操作// 执行查询并返回结果return typedQuery.getResultList();}}```在上面的示例中，我们首先创建了一个查询条件`PredicateFunction`，然后使用`LambdaQueryChain` 进行分组和聚合操作。

lambdaquery()的用法

lambdaquery()的用法LambdaQuery()是一个函数库，它是JavaScript库的一部分，为了提供更方便的列表查询方式。

LambdaQuery不仅具有较高的灵活性，还可以实现强大的查询功能。

下面，我们将分步骤讲解LambdaQuery()的用法。

第一步：引入LambdaQuery在使用前，需要先引入LambdaQuery的js文件。

可以通过在HTML页面中引入该js文件来使用LambdaQuery。

第二步：初始化LambdaQuery在HTML页面引入LambdaQuery后，接下来需要初始化LambdaQuery。

初始化LambdaQuery的方法如下所示。

```var lq = new LambdaQuery();```第三步：构造查询条件在对列表数据进行查询之前，需要先构造查询条件。

下面是构造查询条件的常用方法。

1. Lambda表达式Lambda表达式是在C#语言中广泛使用的一种表达式，它表示一个委托或Lambda表达式树。

通过Lambda表达式，我们可以简便地表达一些简单或复杂的逻辑。

以下是Lambda表达式的示例。

```lq.where(p => === 'John');```2.静态方法LambdaQuery还提供了一些静态方法来构造查询条件。

以下是一些常用的静态方法。

- where：用于构造where子句。

- orderBy：用于构造ORDER BY子句。

- groupBy：用于构造GROUP BY子句。

- select：用于构造SELECT子句。

第四步：执行查询完成查询条件的构造后，接下来需要执行查询。

执行查询的方法如下所示。

```var result = lq.execute(dataSource);```在上面的代码中，dataSource是一个数组，它是需要查询的数据源。

执行查询后，result将返回查询结果。

第五步：展示查询结果最后一步是将查询结果展示到页面上。

《大数据与云计算》课件——11.Hbase

HBASE数据库简介
数据库的核心目的是实现数据的高效管理，传统关系数据库一度占据商业数据库应用的主流位置
完备的关系理论基础事务管理机制的支持高效的查询优化机制
HBASE数据库简介
随着信息化浪潮和互联网应用的兴起，传统的关系型数据库在一些业务上开始呈现不足：
无法满足海量数据的管理需求无法满足数据高并发的需求无法满足高可扩展性和高可用性的需求
HBASE数据库简介
存储模式：
关系数据库是基于行模式存储的。我们说每一行就是一条记录。 HBase是基于存储的，每个列簇都由几个文件保存，不同列簇的文件是分离的。并且列簇中的列是可以动态增加的，而关系数据库需要一开始就设计好。除此之外，HBase可以自动切分数据，关系型数据库则需要我们人工切分数据。
HBASE数据库简介
数据索引：
关系数据库通常可以针对不同列构建复杂的多个索引，以提高数据访问性能。HBase只有一个索引——行键，通过巧妙的设计，HBase中的所有访问方法，或者通过行键访问，或者通过行键扫描，从而使得整个系统不会慢下来。
HBASE数据库简介
可伸缩性：关系数据库很难实现横向扩展，纵向扩展的空间也比较有限。相反，HBase
智能建造技术专业资源库
大数据与云计算
知识点
HBASE 数据库简介
HBASE数据库简介
引言
存储与管理贯穿大数据处理过程的始终。
HBASE数据库简介
传统的关系型数据库难以应对大数据挑战。
HBASE数据库简介
分布式数据库
我们知道一台普通PC机的硬盘大概可以存储 1Tb的数据，那么10Tb，100Tb，1000Tb怎么办？再比如现在我们大多数同学都有云存储空间，而且还不小有50GB的空间，那么10个、100个、10000个同学呢？我们说1万个同学就有1万个 50GB大小的空间,也就是500TB，这500TB的信息显然不可能在一台计算机上存储。那又该如何存储，如何查询呢？

sql lambda表达式

sql lambda表达式【1.SQL Lambda表达式简介】SQL Lambda表达式，简称Lambda表达式，是一种在SQL查询中使用的表达式，主要用于对数据进行筛选、排序等操作。

Lambda表达式的出现，使得SQL查询更加简洁、高效，同时也提高了可读性。

【mbda表达式在SQL中的应用场景】Lambda表达式在SQL中的应用场景主要包括以下几种：1.替代传统的IF条件语句：在使用Lambda表达式进行查询时，可以避免使用复杂的IF条件语句，使查询语句更加简洁。

2.排序依据：Lambda表达式可以作为ORDER BY子句的依据，对查询结果进行排序。

3.聚合函数的参数：Lambda表达式可以作为聚合函数（如SUM、AVG、MAX等）的参数，对数据进行汇总计算。

4.分组与聚合：Lambda表达式可以用于GROUP BY子句，实现对数据的分组与聚合操作。

【3.使用Lambda表达式的注意事项】mbda表达式的语法要求：Lambda表达式的基本语法为“λ参数名1: 参数类型1，参数名2: 参数类型2，… -> 返回类型”。

2.参数类型需与对应列的数据类型匹配：在使用Lambda表达式时，需要注意参数类型与查询语句中对应列的数据类型相匹配，以避免出现类型不匹配的错误。

3.避免使用过于复杂的Lambda表达式：虽然Lambda表达式可以使查询语句更加简洁，但过于复杂的表达式反而会影响可读性。

在实际应用中，应尽量保持Lambda表达式的简单明了。

4.测试与调试：在使用Lambda表达式时，务必进行充分的测试与调试，确保查询结果正确无误。

【4.总结】SQL Lambda表达式为SQL查询提供了更加简洁、高效的方式，使得我们对数据进行筛选、排序等操作变得更加简单。

asqueryable的用法

asqueryable的用法"asqueryable"的用法指的是在LINQ（Language-Integrated Query）查询中的一种方法。

LINQ是一种在.NET框架中用于查询数据的技术，它允许开发人员使用类似于SQL的语法来查询各种数据源，包括集合、数据库和XML。

在LINQ中，有一个名为"asqueryable"的方法，它用于将泛型集合转换为实现IQueryable接口的查询对象。

通过调用"asqueryable"方法，可以将普通的集合转换为可查询的集合，从而可以在查询中应用更多的操作和条件。

例如，假设有一个名为"students"的List<Student>集合，我们可以使用"asqueryable"方法将其转换为可查询的集合，然后在查询中应用各种条件和操作符，如Where、OrderBy、Select等。

下面是一个简单的示例：csharp.List<Student> students = GetStudents(); // 从某个数据源获取学生列表。

// 将集合转换为可查询的集合。

IQueryable<Student> queryableStudents =students.AsQueryable();// 在查询中应用条件。

var result = queryableStudents.Where(s => s.Age > 18).OrderBy(s => stName).Select(s => new { s.FirstName, stName });// 执行查询。

foreach (var student in result)。

{。

Console.WriteLine($"{student.FirstName}{stName}");}。

query dsl语法

query dsl语法Query DSL（Domain-Specific Language）是Elasticsearch提供的一种查询语法，它是基于JSON的结构化查询语言，用于构建和执行复杂的查询操作。

Query DSL提供了丰富的查询语法和功能，可以帮助开发人员灵活地组合和执行各种查询操作，从而实现高效的搜索和过滤操作。

在使用Query DSL时，首先需要了解一些基本的语法规则。

Query DSL主要由两个部分组成，即查询和过滤。

查询部分用于指定搜索条件和匹配规则，而过滤部分用于对查询结果进行进一步的筛选和过滤。

在查询部分，可以使用不同的查询类型来指定不同的匹配规则。

例如，可以使用match查询来进行全文检索，使用term查询来进行精确匹配，使用range查询来进行范围匹配等等。

每个查询类型都有相应的字段和参数，可以根据具体需求进行调整和配置。

在过滤部分，可以使用诸如term、range、exists等查询类型来对查询结果进行筛选和过滤。

过滤查询不会影响得分计算，只是对结果进行一个简单的二选一操作，而查询查询则会根据相关性进行排名和得分计算。

除了基本的查询和过滤功能，Query DSL还提供了一些高级功能，如bool查询、match_phrase查询、aggregation聚合等等。

bool查询可以将多个查询条件进行逻辑组合，实现AND、OR、NOT等复杂的查询逻辑。

match_phrase查询可以实现短语匹配，保留相对顺序的全文匹配功能。

aggregation聚合可以对查询结果进行分组、统计和计算等操作，生成丰富的分析报告和可视化图表。

为了更好地理解和使用Query DSL，可以通过实例来学习和实践。

下面是一个示例，展示了如何使用Query DSL进行基本的查询和过滤操作：```GET /index_name/_search{"query": {"bool": {"must": [{ "match": { "title": "query" }},{ "term": { "category": "dsl" }}],"filter": {"range": { "date": { "gte": "2022-01-01" }}}}}}```在上述示例中，index_name表示要进行查询的索引名称，title和category表示查询条件的字段名，query和dsl表示相应的匹配关键词，date表示要进行过滤的字段名，2022-01-01表示过滤的起始日期。

lambdaquery().in的用法

lambdaquery().in的用法1.简介在编程中，我们经常需要对一组数据进行筛选或者处理。

而l a mb da qu er y().in是一种常用的方法，它可以根据给定的条件，从一个数据集合中过滤出所需的数据。

2.语法l a mb da qu er y().in主要有以下的语法结构：l a mb da qu er y().in(列表)其中，l am bd aq ue ry代表要进行查询的对象，列表是用来存储待查询的数据。

3.示例下面通过几个具体的示例来演示l am bd aq u er y().in的用法。

3.1示例1假设我们有一个学生名单的数据集合，如下所示：s t ud en ts=["张三","李四","王五","赵六"]我们想要从中筛选出名字为"张三"和"李四"的学生。

使用la mb da qu er y().in的方法，我们可以这样做：r e su lt=l am bd aq uer y().i n(st ud en ts).fi lt er(l am bd ax:x=="张三"or x=="李四")这样，r es ul t中会包含筛选出的名字为"张三"和"李四"的学生。

3.2示例2假设我们有一个商品列表的数据集合，如下所示：p r od uc ts=["苹果","香蕉","橘子","梨子"]我们想要从中筛选出所有以字母"蕉"结尾的商品名称。

使用la mb da qu er y().in的方法，我们可以这样做：r e su lt=l am bd aq uer y().i n(pr od uc ts).fi lt er(l am bd ax:x.e nd s w i th("蕉"))这样，r es ul t中会包含筛选出的所有以字母"蕉"结尾的商品名称。

基于改进概念格的无冗余关联规则提取

定义４设Ｃ（，和（ｙ是格中的２ … ｌ，ｘ）ｘ，）２个概念，其中偏序关系 “ ≤”定义为Ｃ ≤Ｃ兮Ｘ ∈Ｘ或者 ∈ｙ，此时ｉ２，２
称Ｃ是ｃ的子概念（ｂｃｎｅｔ，Ｃ是Ｃ的超概念（ｐｒ，ｓ —ｏｃｐ）２ｕｌｓｅ－ｕｃｎｅｔ。如果不存在概念ｃｘ，）ｏｃｐ）（ｙ使得Ｃ ≤Ｃ≤ 成立，则Ｉ称ｃ与Ｃ互为直接子概念和直接超概念。ｌ，量化概念格是将概念格的外延量化，忽略概念格外廷的
中啊分类号ｔＴ３１６Ｐ０．０
基于改进概念格的无冗余关联规则提取
刘霜霜，饶天贵，孙建华
（．南大学计算机与通信学院，长沙４０８；２Ｉ湖１０２．株洲南车时代电气技术中心，株洲４２０）１０１
擅
耍：在介绍概念格相关理论的基础上，提出改进概念格构造算法一
［ｙｗｒｓｏｃｐｔｃ；ｓｏｉｔｎｒｌｓｅｔｃｉｎａｇｒｈＫｅｏｄｌｃｎｅｔａｉａｓｃａｏｅ；ｘｒｔｌｏｉｍｌｔｅｉｕａｏｔ
１概述
概念是人类进行知识表达的一种手段。数据库中知识发
ｙ＝Ｘ，Ｘ和Ｙ分别称为概念的外延（ｘｅｔＥｔ）ｎ和内涵（ｔｔＩｅ）ｎｎ，分别用Ｅｔｎ（）Ｉｔｎ（）ｘｅｔ和ｎｅｔ表示。ＣＣ
［ｂｔｃｌＡｔｔｄｃｇｔａｃｔｏｙｏｎｅｔａｉ，ｈａｅｂｎｓｒ￣ｎｉｒｖｄｃｎｔｃｏｌｒｈａｄＧｄｎＡｓａｔｆｒｎｏｕｉｅｂｓｅｒｆｏｃｐｌｔｅｔｓｐｒｒｇｆｗｄａｏｅｏｓｕｔｎａｏｉｍｎｍｅｏｉｒｅｉｒｎｈｉｈｃｔｃｉｐｉｏｍｐｒｉｇｔ

数据库中不等式查询语句的resilience计算

Journal o f C om puter A p p lica tio n s计算机应用，2〇l8, 38(7): l893 -l897,1915ISSN 1001-9081CODEN JYIIDU2018-07-10文章编号：1001-9081 (2018)07-1893-05D O I：10.11772/j. issn. 1001-9081.2018010078数据库中不等式查询语句的r e silie n c e计算林杰，覃飙\覃雄派(中国人民大学信息学院，北京100872)(*通信作者电子邮箱q in b ia o@ ru c. edu. cn)摘要:针对数据库中不等式连接查询的因果关系问题，引入并实现了re s ilie n c e计算，并且为了降低其在路径类型不等式连接查询中计算的时间复杂度，提出了求解re s ilie n c e的动态规划（D P R e s i)算法。

首先，根据路径类型不等式连接查询的特点及最大流最小割原理，实现了多项式时间复杂度的M in-C u t算法；然后通过将带有不等式布尔连接查询语句的溯源表达式编辑为溯源图，进而将re s ilie n c e求解问题转换为溯源图中最短距离的计算问题，并结合溯源图的包含关系与最优子结构性质，运用动态规划的思想实现了线性时间复杂度的D P R e s i算法。

在T P C-H数据集上进行了大量实验，实验结果表明，与M in-C u t算法相比，D P R e s i算法极大地提高了re s ilie n c e计算的效率，并具有较好的扩展性。

关键词：因果关系；re s ilie n c e;不等式查询语句；湖源表达式；溯源图中图分类号：TP311 文献标志码:AResilience computation for queries with inequalities in databasesLINJie, QINBiao% QIN Xiongpai(School o f Information, Renmin University o f China, Beijing100872, China)Abstract：Focusing on the cau sality problem o f co n ju n ctive queries w ith in e q u a litie s in databases, the re silie nce com putation was intro d u ce d and im plem ented. In order to reduce the com putational tim e co m p le xity in path co n ju n ctive queries w ith in e q u a litie s, a D ynam ic Program m ing fo r R esilience ( D P R e si) algo rithm was proposed. F irs tly, according to the properties o f path co n ju n ctive queries w ith in e q u a litie s and the m a x-flow M in-C u t theorem, the M in-C u t algo rithm w ith p o lyn o m ia l tim e com plexity was im plem ented. Then by co m p ilin g the lineage expression o f a Boolean co n ju n ctive query w ith in e q u a lity in to a lineage graph, the problem o f re silie nce was transform ed in to the com putation o f the shortest distance in lineage graph. C om b ining in c lu s io n property w ith o p tim a l substructure o f the lineage graph, D PResi a lgo rithm w ith lin e a r tim e co m p le xity was im ple m en ted by a p p lyin g the idea o f dynam ic program m ing. E xtensive experim ents were carried out on the T P C-H datasets. The exp erim e ntal results show th a t D PResi a lgo rithm greatly im proves the e ffic ie n c y o f re silie nce com putation and has good s c a la b ility com pared w ith M in-C u t algo rithm.Key words：cau sality; re silie n ce; co n ju n ctive query w ith in e q u a lity; lineage expression; lineage graph〇引言近几十年来，科技得到飞速发展，数据呈爆炸式增长，信息革命推动了各个学科的共同发展，许多方向的发展由单一化到多元化。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

Scalable Query Reformulation Using Views in thePresence of Functional DependenciesQingyuan Bai, Jun Hong, and Michael F. McTearSchool of Computing and Mathematics, University of Ulster at JordanstownNewtownabbey, Co. Antrim,BT37 0QB, UK{q.bai,j.hong,mf.mctear}@Abstract. The problem of answering queries using views in data integration hasrecently received considerable attention. A number of algorithms, such as thebucket algorithm, the SVB algorithm, the MiniCon algorithm, and the inverserules algorithm, have been proposed. However, integrity constraints, such asfunctional dependencies, have not been considered in these algorithms. Someefforts have been made in some inverse rule-based algorithms in the presence offunctional dependencies. In this paper, we extend the bucket-based algorithmsto handle query rewritings using views in the presence of functionaldependencies. We build relationships between views containing no subgoal of agiven query and the query itself. We present an algorithm which is scalablecompared to the inverse rule-based algorithms. The problem of missing queryrewritings in the presence of functional dependencies that occurs in the previousbucket-based algorithms is avoided. We prove that the query rewritingsgenerated by our algorithm are maximally-contained rewritings relative tofunctional dependencies.1IntroductionIn data integration, user queries over a mediated schema need to be reformulated into queries over the schemas of data sources using source descriptions.There are two main approaches to query reformulation in data integration, i.e., Global As View (GAV for short) and Local As View (LAV for short). In the LAV approach, data sources are described by views that are defined over a mediated schema, and user queries are also posed in terms of the mediated schema. The LAV approach to query reformulation is closely related to the problem of answering queries using views, which has recently received considerable attention because of its relevance to a wide variety of data management problems: query optimization, maintenance of physical data independence, data integration, and data warehouse and web-site design [11]. Informally speaking, the problem is as follows. Suppose we are given a query Q overa database schema, and a set of view definitions V1,...,Vmover the same schema. Is itpossible to answer query Q using only the answers to V1,...,Vm, and if so, how?In data integration, views describe a set of distributed, autonomous, and heterogeneous data sources. Thus, in this context we can usually find only maximally-contained rewritings that provide the best possible answers for a given query. Many algorithms including the bucket algorithm [12],[13], the inverse rules algorithm [16],[3], the Shared Variables Bucket (SVB for short) algorithm [14], and the MiniCon algorithm [15], have been proposed. These algorithms can generate all the maximally-contained query rewritings in the absence of functional dependencies. However, they have not considered the problem of answering query using views in the presence of functional dependencies in a mediated schema. As a result, they might sometimes miss query rewritings in this case.Example 1 [4].Suppose that there are three relations in a mediated schema: Conference(P, C), Year(P, Y), and Location(C, Y, L),where, attributes P, C, Y, and L refer to Paper, Conference, Year, and Location respectively. A paper is only presented at a conference and published in a year. Also, in a given year a conference is held at a specific location. Therefore, we have the following functional dependencies:Conference: P o C; Year: P o Y; Location: C, Y o L.Suppose that there are two data sources (views):V1(P, C, Y):- Conference(P, C), Year(P, Y).V2(P, L):- Conference(P, C), Year(P, Y), Location(C, Y, L).Assume that a query asks where PODS’89 was held:Q(L):-Location (“PODS”, “1989”, L).For the sake of simplicity, we change the query with constants into the following form:Q(L):-Location (C, Y, L), C= “PODS”, Y= “1989”.All the bucket-based algorithms would try to find the mappings between the subgoalsof the query and those in views. Thus, V1would not appear in any query rewritinggenerated by these algorithms, because it does not contain the subgoal Location. Therefore all the previous bucket-based algorithms would fail to generate the following correct rewriting in the presence of functional dependencies:Q’(L):- V1(P, C, Y),V2(P, L), C= “PODS”, Y= “1989”.In [4], an algorithm based on use of inverse rules is proposed to solve this problem by using binary relations to describe functional dependencies.The papers [6],[7],[9] also consider query rewriting using views in the presence of functional dependencies. However, no bucket-based algorithm has so far been developed to address this issue. In this paper, we present a bucket-based algorithm to solve this problem. Our main contribution is the extension of the MiniCon algorithm in the presence of functional dependencies.In Section 2, we have a brief look at the previous algorithms. In Section 3, the preliminaries are given. In Section 4, we first show how our algorithm works using an example and then present our algorithm to handle query rewritings using views in the presence of functional dependencies in a mediated schema. In Section 5, we prove the correctness and computational complexity of our algorithm. At the end of the paper, we conclude and discuss our future work.2Related WorkWe divide all the algorithms for query rewriting into two categories. The algorithms of the first category are based on use of buckets while the ones of the second category are based on use of inverse rules.2.1The Algorithms Based on Use of BucketsThe bucket algorithm [12],[13] proceeds in two stages. Initially, a bucket is created for each subgoal of a query. A view is put in the bucket if it can be unified with a subgoal in the query. Next, candidate query plans are generated by picking one view from each bucket. These plans are then verified using containment tests.A non-distinguished variable that appears in more than one subgoal of a query is called a shared variable. In the SVB algorithm [14], given a query Q, two types of buckets are created. The first type of buckets, the single-subgoal buckets are built in the same way as the bucket algorithm. The second type of buckets, the shared-variable buckets are created by checking the containment mapping from a set of subgoals in Q containing a shared variable to some subgoals in a view. Once all the buckets are created, the algorithm constructs rewritings by combining views from buckets which contain disjoint sets of subgoals of Q.The MiniCon algorithm [15] proceeds in principle in the same way as the SVB algorithm. Both algorithms focus on the roles of distinguished variables and shared variables in a query. In the first phase of the MiniCon algorithm, a MiniCon Description (MCD for short) for a query Q over a view V is formed to contain a set of subgoals in Q and the mapping information. The MCDs and the minimum MCDs in the MiniCon algorithm correspond to the single-subgoal buckets and the shared-variable buckets in the SVB algorithm respectively. In the second phase, the MiniCon algorithm combines the MCDs to generate query rewritings. In [15], it is shown that the MiniCon algorithm significantly outperforms the bucket and the SVB algorithms and scales up to hundreds of views.2.2 The Algorithm Based on Use of Inverse rulesThe key idea underlying the inverse rules algorithm [16],[3] is to first construct a set of rules called inverse rules that invert the view definitions, and then replace existential variables in the view definitions with Skolem functions in the heads of the inverse rules.The rewriting of a query Q using the set of views V is simply the composition of Q and the inverse rules for V by the transformation method in [3]or the unification-join method in [16].A key advantage of the inverse rules algorithm is its conceptual simplicity. The inverse rules can be constructed in advance in polynomial time, independent of a particular query. However, even if the computational cost of constructing rewriting is polynomial, the rewriting contains rules that may cause accessing irrelevant views. The problem of eliminating irrelevant rules has exponential time complexity. Another drawback is that evaluating the inverse rules over the source extension may invert some useful computation done to produce the views. Hence, much of thecomputational advantage of exploiting the materialized view is lost due to recomputing the extensions of the database relations.3PreliminariesQueries and views.We consider the problem of answering conjunctive queries using views. A conjunctive query has the form:BBBN N ;5;5;4 where BBN N ;5;5are the subgoals referred to database relations. B;4is thehead of the query. The tuples B BBN ;;;contain either variables or constants. Werequire that the query be safe, i.e., BBBN ;;; . The variables in B;are the distinguished variables, and others are existential variables. We use Vars(Q), Q(D) to refer to all variables in Q and the evaluating result of Q over the database D respectively.A view is a named query. If the query results are stored, we refer to them as a materialized view, and we refer to the result set as the extension of the view.Query containment and equivalence.The concepts of query containment and equivalence enable us to compare between queries and rewritings. We say that a query Q 1is contained in the Q 2, denoted by Q 1 Q 2, if the answers to Q 1are a subset of the answers to Q 2for any database instance. Containment mappings provide a necessary and sufficient condition for testing query containment. A mapping M from Vars(Q 2) to Vars(Q 1) is a containment mapping if(1)M maps every subgoal in the body of Q 2to a subgoal in the body of Q 1, and (2)M maps the head of Q 2to the head of Q 1.The query Q 2contains Q 1if and only if there is a containment mapping from Q 2to Q 1.The query Q 1is equivalent to Q 2if and only if Q 1 Q 2and Q 2 Q 1.Answering queries using views. Given a query Q and a set of view definitions 9=V 1,...,V m , a rewriting of Q using the views is a query expression Q’whose body predicates are only from V 1,...,V m .Note that the views are not assumed to contain all the tuples in their definitions since the data sources are managed autonomously. Moreover, we cannot always find an equivalent rewriting of the query using the views because data sources may not contain all of the answers to the query. Instead, we consider the problem of finding maximally-contained rewritings as follows.Definition 1 (Maximally-contained rewriting [15])Q’is a maximally-contained rewriting of a query Q using the views 9=V 1,...,V m with respect to a query language /if1.for any database D, and extensions v 1,...,v m of the views such that v i V i (D), for1d i d m, then Q’(v 1,...,v m ) Q(D) for all i,2.there is no other query Q 1in the language /, such for every database D and extensions v 1,...,v m as above (1) Q’(v 1,...,v m ) Q 1(v 1,...,v m ) and (2) Q 1(v 1,...,v m ) Q(D), and there exists at least one database for which (1) is a strict subset.The MiniCon Algorithm [15].Since our aim in this paper is to extend the MiniCon algorithm in the presence of functional dependencies, let us here have a detailed look at it.In the MiniCon algorithm, a MCD plays an important role like a bucket in the bucket algorithm. It contains information about the unification and containment of subgoals between a given query and a view.Definition 2 (MiniCon Descriptions)A MCD C for a query Q over a view V is a tuple of the form B&&&&*<9K M where:x h C is a head homomorphism on V,x &<9 Bis the result of applying h C to V, i.e., BB$K <& , where B$are the headvariables of V,x M C is a partial mapping from Vars(Q) to h C (Vars(V)),xG C is a set of subgoals in Q which are covered by some subgoal in h C (V) and M C .A head homomorphism h on a view V is a mapping h from Vars(V) to Vars(V) that is the identity on the existential variables, but may equate distinguished variables.The MiniCon algorithm proceeds in the following two steps.Step 1: Forming the minimum MCDs. For each subgoal R of Q and each subgoal R’of view V, find a least restrictive head homomorphism h on V such that there exists a mapping M such that M (R )=h(R’). If h and M exist, then extend the domain of M to variables in a minimum set G of subgoals of Q such that,(a) G is mapped to a subgoal of h (V) by M ;(b)each head variable in G is mapped to a head variable in h (V);(c)if an existential variable x in G is mapped to an existential variable of h (V), then(i) each subgoal of Q involving x is in G; (ii) all variables in the comparisons B of Q that involve x are in the domain of M and comparisons of Q and h (V) are consistent.Step 2: Generating rewritings [17]. If there are minimum MCDs (h i ,v i (Y i ),M i ,G i ),i=1,...,k, such that i z j, G i G j = , and G 1 G 2 ... G k =subgoals(Q),then(1)If M i maps two or more variables x 1,...,x s in G i to the same argument, then chooseone of the variable as a representative variable, denote the representative variable of x j by EC i (x j ). For each x in Q, if EC i (x)z EC j (x) (1d i,j d k), then let EC(x) be one of them but consistently across all y for which EC i (y)=EC i (x).(2)For each y Y i , if exists x such that M i (x)=y, then let \i (y)=x, otherwise let \(y)be a distinct new variable;(3)Create the conjunctive rewriting: Q’(EC(X)):- v 1(EC(\1(Y 1))),..., v k (EC(\k (Y k ))).Functional Dependencies.An instance of a relation R satisfies the functionaldependency A1,..., Ano B if for every two tuples t and u in R with t.Ai=u.Aifori=1,...,n, also t.B=u.B. If A o B and B o C hold, then A o C holds.When the relations satisfy a set of functional dependencies 6, we define our notion of query containment as query containment relative to 6.Query Q1is contained inquery Q2relative to 6, denoted446 , if for each database D satisfying thefunctional dependencies in 6, Q1(D) Q2(D). Maximal query containment relative to6can be defined accordingly.4Query Rewriting in the Presence of Functional Dependencies Now we consider query rewriting in the presence of functional dependencies. In the context of databases, any database schema has some integrity constraints,such as functional dependencies,and/or inclusion dependencies.Thus, this issue has practical significance.We note that the MiniCon algorithm fails to form a MCD for a given query Q overa view V if(1) V contains no subgoal in Q, or(2) there is a violation on containment mapping from Vars(Q) to Vars(V), for example, a distinguished variable of Q is mapped into an existential variable of V, or (3) there is any inconsistency between comparison predicates in Q and ones in V.In this paper, we address the above problem (1) by making use of functional dependencies. We first use an example to give a detail description about the idea underlying our algorithm, and we then present our algorithm.Continue with Example 1. The following query rewriting is correct in the presence of functional dependencies:Q’(L):- V1(P,C,Y),V2(P,L), C= “PODS”,Y= “1989”.Informally, this query rewriting is generated as follows. It first finds some paperpresented at PODS’89 using V1, and then finds the location of the conference at whichthe paper was presented using V2. This plan is correct only because every paper ispresented at one conference and in one year. Note that V1is needed in the queryrewriting even though V1does not contain any subgoal in Q. In the previousalgorithms, only views that contain subgoals appearing in the query are considered. In this example, the MiniCon algorithm would not generate any query rewritingcontaining V1. As a result, it would fail to find the above rewriting. In other words, allbucket-based algorithm encounters the problem of missing query rewritings in the presence of functional dependencies in a mediated schema.In V1, attributes P, C, and Y are distinguished variables and we have in Conference, P o C, in Year, P o Y. There are two constants to appear in the form of C= “PODS”, Y= “1989”in the query Q. According to the definition of functional dependencies, there are some values (maybe single value) of P in Conference for C= “PODS”, andsome values of P in Year for Y= “1989”. This indicates that V1containing Conferenceand Year might be useful for answering the query because it can provide informationabout attribute P, which is also a distinguished variable in other views given C=“PODS”,Y= “1989”.We try to find out this kind of relationships between V 1and Q in the presence of functional dependencies. We first identify views that contain no subgoal of the query and then check whether these views can provide any useful information for the query.Among the subgoals in an identified view there may exist a set of functional dependencies that have transitive relationships with the ones among the subgoals in the query. For example, V 1contains no subgoal of Q, and there is a set of functional dependencies {P o C, P o Y} on its subgoals. In Q, there is a functional dependency {C,Y o L}. We find that there exists a relationship between V 1and Q, i.e., implicit functional dependency.In V 2, attributes P and L are distinguished variables, but attributes C and Y are shared variables. The MiniCon algorithm can unify the subgoal Location in Q with one in V 2and create a MCD for Location over V 2as follows (here, o m means the mapping through this paper):V(Y)hIGV 2(P,L)P o m P, L o m LP o m P, C o m C, Y o m YLocationHowever, attributes C and Y in V 2are not distinguished variables. Therefore, V 2can not answer Q alone given C=“PODS”,Y=“1989”. Note that attributes P and L are distinguished variables in V 2and functional dependency {P o L} holds, which means if some values of P can be passed from V 1to V 2, then the value of L can be determined using V 2. Thus, we can combine V 1and V 2to form a query rewriting of Q as follows:Q’(L):- V 1(P,C,Y),V 2(P,L), C= “PODS”,Y= “1989”.Assume that a mediated schema consists of a set of relations R 1, R 2,..., R m . The data sources are described by views V 1,...,V n that are defined in the mediated schema.Without loss of generality, we assume that there exists at most one functional dependency,A o B in a relation R , where B is a single attribute in R, A may be a set of attributes in R. We denote a set of functional dependencies in a mediated schema as 6={A i o B i , A i , B i R i , 1d i d m}.Given a conjunctive query Q over a mediated schema:BBBBN N ;5;5;5;4 Assume that there are k 1(1d k 1d k) functional dependencies among N 55, i.e.,6Q ={A i o B i , A i , B i R i , 1d i d k 1}.Given views V 1,...,V n , we divide them into twogroups: 91={V j 1,j=1,2,…,n 1} and 92= {V j 2,j=1,2,…,n 2}, where each view in 91contains at least one subgoal of Q while each view in 92contains no subgoal of Q.Wecheck each view M 9(1d j d n 2) in 92to see whether it can make any contribution to answering Q in the presence of functional dependencies.Assume that there is a set of functional dependencies: {C j o D j , C j , D j subgoal R in the body of M 9, 1d j d n 2}. If(1)there exists {R i : A i o B i } 6Q such that D j is identical to A i , and A i appears in 6Q ,(2)C j is a distinguished variable in M 9, and (3) A i appears as a constant in Q,then, M 9might be used for answering Q. M 9is put into a bucket as follows:V(Y)FD SQ(subgoal of Q)V 2jC j o A jR iFor each view L 9in 91, 1d i d n 1, the MCDs are formed using the MiniConalgorithm. The MCDs of the form B&&&&*<9K M are divided into two sets as&1={C 1j ,1d j d m 1} and &2={C 2j ,1d j d m 2}. A MCD is put in &1if its component B<9contains the distinguished variables C j and B j , where C j and B j are distinguishedvariables in some view, say V 2j , in 92and in Q respectively, and satisfy:(4) C j o D j holds in the body of M 9,(5) A j o B j holds in the body of Q, and(6) D j is identical to A j .Thus, such a view, say V 1i , can participate in join operations on C j with V 2j to get thevalues of B j . The rest of the MCDs is put in &2.Now we have a set of buckets for M 9(j=1,2,...,), two sets of the MCDs, &1and &2.The MCDs in &2can be used to generate query rewritings alone. The buckets forM 9(j=1,2,...,) are combined with the MCDs in &1to generate other query rewritings.Algorithm Bucket-Based Query Rewriting in the Presence of Functional DependenciesInput:A set of the relations in a mediated schema,R 1, R 2,..., R m and a set of functional dependencies 6={A i o B i , A i , B i R i , 1d i d m}; A set of the views V={V 1,V 2,...,V n }; A conjunctive query in the form of BBBBN N ;5;5;5;4 ; A set of functional dependencies in the body of Q:6Q ={A i o B i , A i , B i R i , 1d i d k}.Output:Q’s, the maximally-contained rewritings of Q relative to functional dependency 6.Method:We divide all views into two groups 91and 92as described above.Step 1:Check whether each view in 92satisfies the above conditions (1), (2),and (3).If so, the view is put into a bucket defined above.Step 2:Form a MCD for each view in 91using the MiniCon algorithm [15]. The MCDs are divided into two sets &1and &2.Step 3:Generate all possible query rewritings Q’s. Some query rewritings are generated in the same way as the MiniCon algorithm using the MCDs in &2. The others are generated by combining the MCDs in &1with the obtained buckets for views in 92as follows.x For every subset {&1j ,1d j d m1} of &1such that:G11 G12... G1m1=subgoals(Q) and for every i z j, G1iG1j= .x For every subset of buckets for views in 92such that the union of a set of functional dependencies in these buckets and a set of functional dependencies in the MCDs in &1can imply the functional dependencies of Q.x Combine the views in &1j ,1d j d m1with the views in buckets in the same way as inthe MiniCon algorithm to generate query rewritings.We end this section by the following example which shows the complete procedure of our algorithm.Example 2. Suppose that there are three data sources:V1(P,C,Y):-Conference(P,C),Year(P,Y).V2(P,L):-Conference(P,C),Year(P,Y),Location(C,Y,L).V3(C,Y,L):-Location(C,Y,L).Conference, Year, and Location as well as functional dependencies are the same as in Example 1. We also have the same query:Q(L):-Location (C,Y,L),C= “PODS”,Y= “1989”.We divide these views into 91={V2,V3} and 92={V1}. By checking the aboveconditions (1)-(3), a bucket is built for V1.Using the MiniCon algorithm, we get the following MCDs over the views in 91:9 < K I*9 3 / 3o P3 /o P/3o P3 &o P& <o P</RFDWLRQ 9 & < / &o P& <o P< /o P/&o P& <o P< /o P//RFDWLRQThe first MCD belongs to &1and the second is in &2, because in V2of the first MCD, P is a distinguished variable in a view in92and L is a distinguished variable in Q, and functional dependency {P o L} holds.Hence, we get a query rewriting of Q from &2alone.Q’(L):- V3(C,Y,L), C= “PODS”,Y= “1989”.Another rewriting is generated by combining V1with V2.Q’’(L):- V1(P,C,Y),V2(P,L),C= “PODS”,Y= “1989”.5Computational Complexity and Correctness of Our AlgorithmIn this section, we illustrate that our algorithm is a scalable extension of the MiniCon algorithm by taking into account functional dependencies in query reformulation and prove the correctness of our algorithm.5.1 Computational Complexity of the AlgorithmIn the MiniCon algorithm, any views in either 91or 92are considered for forming theMCDs. In Step 1 of our algorithm, the algorithm checks for each view in 92by testing above three conditions (1)-(3). In the worst case the complexity of this process is n*_6Q _*_6VQ _, where n is the number of the views in the second group, _6Q _and _6VQ _arethe numbers of functional dependencies in the query and in each view in 92respectively. In Step 2 of our algorithm, the computation is the same as in theMiniCon algorithm when forming the MCDs for each view in 91. The computation is increased when dividing the MCDs into two groups. In the worst case the complexity of this process is n*_6Q _*_6VQ _. In Step 3, the complexity of our algorithm is the sameas the MiniCon algorithm when generating the query rewritings from the MCDs in &2.Due to generating additional query rewritings by combining the MCDs in &1and thebuckets for views in 92, the computation of this step is exponential in terms of _92_*_&1_. However, in the worst case, the computation of the MiniCon algorithm is exponential in terms of the number of the views. In [15], it is shown that the MiniCon algorithm can scale up to hundreds of views. Our algorithm has the same scale-up.5.2 Correctness of the AlgorithmTheorem 1.Given a conjunctive query Q, conjunctive views 9, and a set of functional dependencies 6in a mediated schema, our algorithm produces the union of conjunctive queries that is the maximally-contained rewriting of Q using 9relative to 6.Proof:In our algorithm, some query rewritings are generated from the MCDs in &2along in the same way as the MiniCon algorithm. Therefore, these rewritings are maximally-contained rewritings. The others are generated by combining the MCDs in &1with the buckets of views in 92.We show that these rewritings are also maximally-contained rewritings.As shown above, the MCDs of &1are needed to answer the query if the other information is required. We claim that (1) the conditions (1)-(3) show the buckets ofviews in 92have useful information for answering the query, and (2) the conditions(4)-(6) guarantee the combination of the MCDs of &1with the buckets of 92in Step 3of our algorithm can exactly provide the answers to the query. The claim (1) is valid for the following reasons: The conditions (1) and (3) mean that for evaluating attribute A i in a subgoal R i of a query, the value of another attribute C j in a subgoal R j of viewV is needed even though V does not contain any subgoal of the query. The condition(2) guarantees that the required value of C j can be transferred to the query because C j is a distinguished variable. The claim (2) is valid because for the MCDs in &1, their components B <9contain the distinguished variables C j , B j , and the conditions (4)-(6)guarantee that the transitive property of functional dependencies holds among thequery, the views in &1and the views in 92. Therefore, the combination of the MCDsof &1with the buckets of 92in the same way as the MiniCon algorithm can provide the answers to the query in the presence of functional dependencies.According to the proofs given in [15], if there exist some query rewritings, then these rewritings are the maximally-contained rewritings.6Conclusion and Future WorkIn this paper, we have considered the query rewriting problem in data integration by using the approach of answering query using views. The algorithms for this problem can be divided into two groups, bucket-based algorithms and inverse rules-based algorithms. The algorithms based on inverse rules have considered the problem of query rewriting in the presence of functional dependencies and inclusion dependencies in a mediated schema. However, there has been no bucket-based algorithm for this problem. We consider query rewriting in the presence of functional dependencies in a mediated schema. We give three conditions to check whether a view which does not contain any subgoal of a query is needed for answering the query when there are a set of functional dependencies in a mediated schema. We divide all the MCDs in the MiniCon algorithm into two groups. The MCDs in one of groups can not answer the query alone. We combine these MCDs with the buckets created for views identified by the above three conditions, and then generate the query rewritings which are missed by the previous bucket-based algorithms. The problem of missing query rewritings in the presence of functional dependencies that occurs in the previous bucket-based algorithms is avoided. Our algorithm is scalable compared to the inverse rule-based algorithms for query rewriting in the presence of functional dependencies.In the future, we will study the problem of query rewriting in the presence of functional dependencies in the case of join lossless decomposition. Inclusion dependencies are another important type of integrity constraints in databases. There has been no bucket-based algorithm for query rewriting in the presence of inclusion dependencies. This would be another issue to be addressed in the future.References1. S.Abiteboul, R.Hull, and V.Vianu. Foundations of Databases, Addison-Wesley Publishing Company, 1995.2. A.V.Aho, Y.Sagiv, and J.D.Ullman. Equivalences among relational expressions,SIAM Journal on Computing, 8(3): 218-246, May 1979.。