The Vagabond parallel temporal object-oriented database system Versatile support for future
- 格式:pdf
- 大小:107.37 KB
- 文档页数:12
小学上册英语第6单元自测题英语试题一、综合题(本题有100小题,每小题1分,共100分.每小题不选、错误,均不给分)1.My neighbor is very __________. (友好)2.The baby is ___ (sleeping).3.What is the capital of Italy?A. AthensB. MadridC. RomeD. Paris4.She rides her ______ to school. (bike)5.My favorite animal is a _______ because it’s cute.6.Which of these is a popular fruit?A. SpinachB. CarrotC. StrawberryD. BroccoliC7.What type of animal is a parrot?A. MammalB. FishC. ReptileD. BirdD8. A __________ (生态研究) can help protect plants.9.小果子) grows on trees in summer. The ___10.Which animal is known for building dams?A. BeaverB. RabbitC. SquirrelD. Fox11.The _______ (蜗牛) moves slowly.12.The Earth's layers are made up of different types of ______.13.The process by which a gas turns into a liquid is called _______.14.The puppy is _______ (在追逐)蝴蝶。
15. A molecule is made of two or more ______ bonded together.16.What do we call the study of the structure and function of cells?A. CytologyB. HistologyC. AnatomyD. PhysiologyA17.My dog loves to dig _________ (洞) in the yard.18.What do we use to cook food?A. OvenB. TableC. ChairD. BedA19.What do you call the outer layer of the Earth?A. MantleB. CrustC. CoreD. LithosphereB20.The __________ (历史的形状) is constantly evolving.21.The __________ (气体) state has no fixed shape or volume.22.The __________ is formed from the remains of plants and animals.23.I like to plant ______ in my garden.24.What do we call a group of stars that form a pattern?A. GalaxyB. ConstellationC. NebulaD. Asteroid25.The _____ (城市) is bustling with activity.26.The candy is _______ (sweet) and chewy.27.What do you call a story that is passed down through generations?A. MythB. NovelC. PlayD. Article28.We are having a ________ (聚会) this Friday.29.My friend is __________ (有影响力).30.The flowers are _____ (beautiful/ugly) in spring.31.The ______ is known for her contributions to literature.32. A hamster stores food in its ______ (巢).33.The __________ is a region known for its wildlife safaris.34.What do we call the process of making a picture using ink?A. DrawingB. PaintingC. PrintingD. SculptingC35.I have a _______ for you.36.I like to _____ with my friends. (hang out)37.We have math _____ today. (class)38.What do you call a large group of animals living together?A. PackB. HerdC. FlockD. All of the above39.My aunt is a great ____ (chef) and makes amazing food.40.What is the color of an emerald?A. BlueB. GreenC. RedD. Yellow41.The playground is ________.42.What do you call a person who fixes computers?A. MechanicB. TechnicianC. EngineerD. ProgrammerB43.What is the opposite of fast?A. QuickB. SlowC. SpeedyD. RapidB44.An endothermic reaction absorbs _____ from its surroundings.45.What is the name of the famous scientist known for his work on the principles of mechanics?A. Isaac NewtonB. Albert EinsteinC. Galileo GalileiD. Johannes KeplerA46.What is the name of the famous scientist who developed the theory of relativity?A. Isaac NewtonB. Albert EinsteinC. Galileo GalileiD. Nikola TeslaB47. A reaction that occurs without the input of energy is called a ______ reaction.48.What do we call the area of land that is covered with trees?A. DesertB. ForestC. PlainD. MountainB Forest49.We like to swim in the _____ (游泳池).50. A non-polar solvent is used to dissolve ______ substances.51.What is the first month of the year?A. DecemberB. JanuaryC. FebruaryD. March52.The ancient Egyptians utilized ________ (香料) in their rituals.53.The birds are _______ (chirping) outside.54.My favorite drink is _______.55.The parrot has a curved _________ (喙).56._____ (spore) is used by ferns to reproduce.57.I can ________ my shoes.58.What is the opposite of ‘cold’?A. WarmB. HotC. CoolD. Chilly59. A _____ (草) grows in the field where cows graze.60. A mixture that does not settle over time is called a ______.61.What is the name of the first space shuttle?A. ColumbiaB. ChallengerC. DiscoveryD. Atlantis62.One of my favorite memories is when I __________. It was a special day because __________. I will always cherish that moment.63.I like to eat _____. (pizza/quickly/fast)64.The library has many ______ (books).65.I enjoy making ______ (陶艺) pieces during art class.66.The capital of Myanmar is _______.67.What is the term for a scientist who studies insects?A. EntomologistB. BiologistC. ZoologistD. BotanistA68.What is the name of the famous clock tower in London?A. Big BenB. Eiffel TowerC. Leaning Tower of PisaD. Statue of LibertyA Big Ben69.The __________ (科技进步) allow for new discoveries in history.70.The llama is known for its soft _________ (毛).71.Carbon is essential for all living _____.72. A desert has very little __________.73.The apples are ___. (red)74.The capital of Denmark is _____.75.What is the name of the famous film about a young boy and his pet dinosaur?A. Jurassic ParkB. Land Before TimeC. DinosaurD. Ice Age76.The __________ was an important event in the history of the United States. (内战)77.What do you call the act of producing food?A. AgricultureB. FarmingC. HorticultureD. All of the above78.I like to ______ (参加) in writing contests.79.What is the name of the fairy tale character who lost her glass slipper?A. RapunzelB. Snow WhiteC. CinderellaD. Belle80.What is the capital of France?A. ParisB. LyonC. MarseilleD. NiceA81.The tiger prowls silently through the ______ (丛林).82.I love to ______ (与家人一起) celebrate special occasions.83.What is 10 + 10?A. 20B. 25C. 15D. 30A84. A ________ (骆驼) can go for days without water.85.What do we call an animal that primarily eats meat?A. HerbivoreB. OmnivoreC. CarnivoreD. Insectivore86.I saw a _____ (小猫) playing with a ball of yarn.87.My friend is __________ (思维灵活).88.I enjoy __________ (动词) my __________ (玩具名) in the park.89.ta Stone helped scholars understand ________. The Rose90.What is the name of the famous American singer known for her song "Like a Prayer"?A. MadonnaB. BeyoncéC. Lady GagaD. Ariana GrandeA91.The ______ (植物的生态价值) cannot be overstated.92.Chemical bonds are formed when atoms ______ with each other.bustion reactions require fuel and ______.94. A ____ is known for its hopping abilities.95. A goldfish swims _______ in its bowl.96.The chemical formula for potassium chloride is __________.97.The chemical formula for citric acid is ______.98.The playground has many ______ (slides).99.The __________ is a large area of uninhabited land.100.What do we call the imaginary line that divides the Earth into the Northern and Southern hemispheres?A. Prime MeridianB. EquatorC. LongitudeD. LatitudeB。
Rubiks Cube,a fascinating puzzle that has captivated millions of people around the world,is not just a toy but also a challenge that tests ones intelligence and patience.The cube,invented by ErnőRubik,a Hungarian architect,in1974,has since become a popular pastime and even a competitive sport.The Rubiks Cube is a3D combination puzzle consisting of six faces,each covered by nine stickers,with each face displaying a single color.The objective is to scramble the colors and then return each face to a single color by rotating the individual faces.It may sound simple,but the complexity lies in the fact that each move affects the positions of the other pieces,making it a challenging task to solve.Solving the Rubiks Cube requires a combination of logical thinking,spatial awareness, and sometimes even memorization.Many people approach the puzzle by learning algorithms,which are sequences of moves that solve a specific part of the cube.For beginners,the layerbylayer method is a common approach,where one solves the cube in stages,starting with the first layer and building up to the final configuration.The process of solving the Rubiks Cube can be broken down into several steps:1.Understanding the Cube:Before attempting to solve the cube,its essential to understand its structure and how each move affects the overall configuration.2.Solving the First Layer:This involves positioning the center pieces and then solving the edges to match with the center colors.3.Solving the Second Layer:After the first layer is complete,the next step is to solve the second layer without disrupting the first layer.4.Solving the Last Layer:This is often the most complex part,involving various algorithms to orient and permute the last layers pieces correctly.5.Practice and Mastery:With practice,solvers can memorize algorithms and improve their speed,aiming to solve the cube in the shortest time possible.The Rubiks Cube has also become a competitive activity,with speedcubers aiming to solve the cube as quickly as possible.World Cube Association WCA organizes competitions where participants are timed and ranked based on their performance.Moreover,the cube has inspired various versions and modifications,such as the2x2,4x4, 5x5,and even larger cubes,each offering its own unique set of challenges.In conclusion,the Rubiks Cube is more than just a puzzle its a testament to human ingenuity and the joy of overcoming challenges.Whether youre solving it for fun or aiming to compete,the Rubiks Cube offers a rewarding experience that sharpens the mind and provides a sense of accomplishment.。
Parallel Algorithms for Computing Temporal AggregatesJose Alvin G.Gendrano Bruce C.Huang Jim M.RodrigueBongki Moon Richard T.SnodgrassDept.of Computer Science IBM Storage Systems Division Raytheon Missile Systems Co.University of Arizona9000S.Rita Road1151East Hermans RoadTucson,AZ85721Tucson,AZ85744Tucson,AZ85706jag,bkmoon,rts@ brucelee@ jmrodrigue@AbstractThe ability to model the temporal dimension is essen-tial to many applications.Furthermore,the rate of increase in database size and response time requirements has out-paced advancements in processor and mass storage tech-nology,leading to the need for parallel temporal database management systems.In this paper,we introduce a variety of parallel temporal aggregation algorithms for a shared-nothing architecture based on the sequential Aggregation Tree algorithm.Via an empirical study,we found that the number of processing nodes,the partitioning of the data, the placement of results,and the degree of data reduction ef-fected by the aggregation impacted the performance of the algorithms.For distributed results placement,we discov-ered that Time Division Merge was the obvious choice.For centralized results and high data reduction,Pairwise Merge was preferred regardless of the number of processing nodes, but for low data reduction,it only performed well up to32 nodes.This led us to a centralized variant of Time Division Merge which was best for larger configurations having low data reduction.1.IntroductionAggregate functions are an essential component of data query languages,and are heavily used in many applications such as data warehousing.Unfortunately,aggregate com-putation is traditionally expensive,especially in a tempo-ral database where the problem is complicated by having to compute the intervals of time for which the aggregate value holds.For example,finding the(time-varying)maximum salary of professors in the Computer Science Department This work was sponsored in part by National Science Foundation grants CDA-9500991and IRI-9632569,and National Science Foundation Research Infrastructure program EIA-9500991.The authors assume all responsibility for the contents of the paper.involves computing the temporal extent of each maximum value,which requires determining the tuples that overlap each temporal instant.In this paper,we present several new parallel algorithms for the computation of temporal aggregates on a shared-nothing architecture[8].Specifically,we focus on the Aggregation Tree algorithm[7]and propose several ap-proaches to parallelize it.The performance of the parallel algorithms relative to various data set and operational char-acteristics is of our main interest.The rest of this paper is organized as follows.Section2 gives a review of related work and presents the sequential algorithm on which we base our parallel algorithms.Our proposed algorithms on computing parallel temporal aggre-gates are then described in Section3.Section4presents empirical results obtained from the experiments performed on a shared-nothing Pentium cluster.Finally,Section5con-cludes the paper and gives an outlook to future work.2.Background and Related WorkSimple algorithms for evaluating scalar aggregates and aggregate functions were discussed by Epstein[5].A dif-ferent approach employing program transformation meth-ods to systematically generate efficient iterative programs for aggregate queries has also been suggested[6].Tumas extended Epstein’s algorithms to handle temporal aggre-gates[9];these were further extended by Kline[7].While the resulting algorithms were quite effective in a uniproces-sor environment,all suffer from poor scale-up performance, which identifies the need to develop parallel algorithms for computing temporal aggregates.Early research on developing parallel algorithms focused on the framework of general-purpose multiprocessor ma-chines.Bitton et al.proposed two parallel algorithms for processing(conventional)aggregate functions[1].The Subqueries with a Parallel Merge algorithm computes par-tial aggregates on each partition and combines the partialName Salary BeginEnd Richard40K18 Karen45K820 Nathan35K712 Nathan37K1821Count BeginEnd 178 2812 11218 31820 22021 121(a)Data Tuples(b)ResultTable1.Sample Database and Its TemporalAggregationresults in a parallel merge stage to obtain afinal result.An-other algorithm,Project by list,exploits the ability of the parallel system architecture to broadcast tuples to multi-ple processors.The Gamma database machine project[4] implemented similar scalar aggregates and aggregate func-tions on a shared-nothing architecture.More recently,par-allel algorithms for handling temporal aggregates were pre-sented[11],but for a shared-memory architecture.The parallel temporal aggregation algorithms proposed in this paper are based on the(sequential)Aggregation Tree algorithm(SEQ)designed by Kline[7].The aggregation tree is a binary tree that tracks the number of tuples whose timestamp periods contain an indicated time span.Each node of the tree contains a start time,an end time,and a count.When an aggregation tree is initialized,it begins with a single node containing(see the initial tree in Figure1).In the following example[7],there are tuples to be in-serted into an empty aggregation tree(see Table1(a)).The start time value,,of thefirst entry to be inserted splits the initial tree,resulting in the updated aggregation tree shown in Figure1.Because the original node and the new node share the same end date of,a count of1is assigned to the new leaf node.The aggregation tree after inserting the rest of the records in Table1(a)is shown in Figure1.To compute the number of tuples for the periodin this example,we simply take the count from the leaf node(which is),and add its parents’count val-ues.Starting from the root,the sum of the parents’counts is and adding the leaf count,gives a total of .The temporal aggregate results are given in Table1(b).Though SEQ correctly computes temporal aggregates,it is still a sequential algorithm,bounded by the resources of a single processor machine.This makes a parallel method for computing temporal aggregates desirable.After adding [18,∞)Figure1.Example run of the Sequential(SEQ)Aggregation Tree Algorithm3.Parallel Processing of Temporal AggregatesIn this section,we proposefive parallel algorithms for the computation of temporal aggregates.We start with two simple parallel extensions to the SEQ algorithm,the Sin-gle Aggregation Tree(abbreviated SAT)and Single Merge (SM)algorithms.We then go on to introduce the Time Divi-sion Merge with Centralizing step(TDM+C)and Pairwise Merge(PM)algorithms,which both require more coordi-nation,but are expected to scale better.Finally,we present the Time Division Merge(TDM)algorithm,a variant of TDM+C,which distributes the resulting relation,as differ-entiated from the centralized results produced by the other algorithms.3.1.Single Aggregation Tree(SAT)Thefirst algorithm,SAT,extends the Aggregation Tree algorithm by parallelizing disk I/O.Each worker node reads its data partition in parallel,constructs the valid-time peri-ods for each tuple,and sends these periods up to the coordi-nator.The central coordinator receives the periods from all the worker nodes,builds the complete aggregation tree,and returns thefinal result to the client.3.2.Single Merge(SM)The second parallel algorithm,SM,is more complex than SAT,in that it includes computational parallelism along with I/O parallelism.Each worker node builds a local aggregation tree,in parallel,and sends its leaf nodes to the coordinator.Unlike the SAT coordinator,which inserts periods into an aggregation tree,the SM coordinator merges each of the leaves it receives using a variant of merge-sort.The use of this efficient merging algorithm is possible since the worker nodes send their leaves in a temporally sorted order.Finally,after all the worker nodes finish sending their leaves,the coordinator returns the final result to the client.3.3.Time Division Merge with Coordinator(TDM+C)Like SM,the third parallel algorithm also extends the aggregation tree method by employing both computational and I/O parallelism (see Figure 2).The main steps for this algorithm are outlined in Figure 3.Local TreesFigure 2.Time Division Merge with Centraliz-ing Step (TDM+C)AlgorithmStep 1.Client requestStep 2.Build local aggregation trees Step 3.Calculate local partition sets Step 4.Calculate global partition set Step 5.Exchange data and merge Step 6.Merge local results Step 7.Return results to clientFigure 3.Major Steps for the TDM+C Algo-rithm3.3.1Overall AlgorithmTDM+C starts when the coordinator receives a temporal aggregate request from a client.Each worker node is in-structed to build a local aggregation tree using its data par-tition knowing the number of worker nodes,,participating in the query.After each worker node constructs its local aggregationtree,the tree is augmented in the following manner.Thenode traverses its aggregation tree in DFS order,propagat-ing the count values to the leaf nodes.The leaf nodes now contain the full local count for the periods they represent,and any parent nodes are discarded.After all worker nodes complete their aggregation trees,they exchange minimum (earliest)start time and maximum (latest)end time values to ascertain the overall timeline of the query.Timeline Covered By NodeFigure 4.Timeline divided into partitions,forming a global partition setThe leaves of a local aggregation tree are evenly split into local partitions,consisting of a period and a tuple count.Because each partition is split to have the same (or nearly)the same number of tuples,local partitions can have different durations.The local partition set (containing par-titions)from each processing node is then sent to the coor-dinator.The coordinator takes all local partition sets 1and com-putes global partitions (how this is done is discussed in the next section).After computing the global time partition set,the coor-dinator then naively assigns the period of the partitionto theworker node,and broadcasts the global partition set and respective assignments to all the nodes.The worker nodes then use this information to decide which local ag-gregation tree leaves to send,and to which worker nodes to send them to.Note that periods which span more than one global partition period are split and each part is assigned ac-cordingly(split periods do not affect the result correctness).Each worker node merges the leaves it receives with the leaves it already has to compute the temporal aggregate for their assigned global partitions.When all the worker nodes finish merging,the coordinator polls them for their results in sequential order.The coordinator concatenates the results and sends the final result to the client.1Atotal oflocal partitions are created byworker nodes.0591030350800100015005000100000505050 151515 303030Figure 5.Local Partition Sets from ThreeWorker Nodes3.3.2Calculating the Global Partition SetWe examine in more detail the computation of the global partition set by the coordinator.Recall that the coordinator receives from each worker node a local partition set,con-sisting of contiguous partitions.The goal is to temporally distribute the computation of thefinal result,with each node processing roughly the same number of leaf nodes.As an example,Figure5presents local partitions from worker nodes.The number between each hash mark seg-menting a local timeline represents the number of leaf nodes within that local partition.The total number of leaf nodes from the nodes is.The best plan is having leaf nodes to be processed by each node.Figure4illustrates the computation of the global partition set.We modified the SEQ algorithm to compute the global partition set,using the local partition information sent by the worker nodes.We treat the worker node local parti-tion sets as periods,inserting them into the modified ag-gregation tree.From Figure5,thefirst period to be in-serted is[5,9)(50),the fourth is[0,30)(15),and the seventh is[0,10)(30),and the ninth(last)is[1000,5000)(30).This use of the Aggregation Tree is entirely separate from the use of this same structure in computing the aggregate.Here we are concerned only with determining a division of the time-line into contiguous periods,each with approximately the same number of leaves.There are three main differences between our Modified Aggregation Tree algorithm used in this portion of TDM+C and the original Aggregation Tree[7],used in step2of Figure3.First,the“count”field of this aggregation tree node is incremented by the count value of the local parti-tion being inserted,rather than.Second,a parent node must have a count value of.When a leaf node is split and becomes a parent node,its count is split proportionally be-tween the two new leaf nodes based on the durations of their respective time periods.This new parent count becomes. Third,during an insertion traversal for a record,if the search traversal diverges to both subtrees,the record count is split proportionally between the2sub-trees.Inserted Records [5,9)(50), [9,800)(50), and [800,1500)(50)(a)First3Local Partitions(b)After partition4is addedFigure6.Intermediate Aggregation Tree As an example,suppose we inserted thefirst three lo-cal partitions,and now we are inserting the fourth one [0,30)(15).The current modified aggregation tree,before inserting the fourth local partition,is shown in Figure6a. Notice that for leaf node[5,9)(50),the count value is set to instead of(first difference).The second and third differences are exemplified when the fourth local partition is added.At the root node,we see that the period for this fourth partition overlaps the periods of the left sub-tree and the right sub-tree.In the original aggregation tree,we simply added to a node’s count in the left sub-tree and the right sub-tree at the appropriate places. Here,we see the third difference.We split this partition count of in proportion to the durations of the left and right sub-trees.The root left sub-tree contains a period[0,5) for a duration of time units.The fourth local partition period is[0,30),or time units.We compute the left sub-tree’s share of this local time partition’s count as,while the right sub-tree’s share is.In this case,the left sub-tree leaf node[0,5)now has a count of (see Figure6b).We now pass down the root right sub-tree,increasing its right leaf node count from[5,9)(50)to [5,9)(52)as its share of the newly added partition’s count,, is added,by using the same proportion calculation method. At leaf node[9,800)(50),the inserted partition’s count is now down to,since was taken by node[5,9)(52).Now,the second difference comes into play.Two new leaf nodes are created by splitting[9,800)(50).The new leaves are[9,30)and[30,800).Leaf[9,30)receives all the remaining inserted partition’s count of.The count of from[9,800)(50)is now divvied up amongst the two new leaf nodes.The left leaf node receives of the ,while the right leaf node receives.So the new left leaf node is now[9,30)(12),where comes from,and the new right leaf node shows as[30,800)(49).Again,see Figure6b for the result.Table2shows the leaf node values once all local time partitions from Figure5are inserted.Count Begin End170564593910121030443035043350800218001000401000150032150050009500010000Table2.All leaf node values in a tabular formatonce all9partitions from Figure5are insertedNow that the coordinator has the global span leaf counts and the optimal number of leaf nodes to be processed by each node,it canfigure out the global partition set.For each node(except the last one),we continue adding the span leaf counts until it matches or surpasses the optimal number of leaf nodes.When the sum is more than the optimal number, we break up the leaf node that causes this sum to be greater than the optimal number,such that the leaf node count divi-sion is done in proportion to the period duration.As an example,refer to Table2.We know that the optimal number of periods per global partition is.We add the leaf node counts from the top until we reach node [10,30)(12).The sum at this point is,or more than optimal.We break up[10,30)(12)into two leaf nodes such that thefirst leaf node period should contain a count of, and the newly created leaf node should contain -ing the same idea of proportional count division,we can see that[10,28)(11)and[28,30)(1)are the two new leaf nodes. So thefirst global time partition has the period[0,28)and has a count of.The computation for the second global time partition starts at[28,30)(1).Continuing on,the global time parti-tions for this example are[0,28),[28,866),and[866,10000).The reader should be aware that this global time partition resolution algorithm is not perfect.The actual number of local aggregation tree leaves assigned to each worker node may not be identical.The reason is that the algorithm uses the local partition sets,which are just guides for the global partitioning.When a local partition has leaf nodes in pe-riod[9,800),the global partition scheme assumes a uniform distribution,while the actual leaf nodes distribution may be heavily skewed.3.3.3Expected PerformanceWe expect better scalability for TDM+C as compared to the SAT and SM algorithms because of the data redistribution and its load-balancing effect.However,there are two globalStep1.Client requestStep2.Build local aggregation treesStep3.While notfinal aggregation tree Mergebetween2nodesStep4.Return results to clientFigure7.Major Steps for the PM Algorithm synchronization steps that may limit the performance ob-tained.First,all of the local partition sets must be com-pleted before the global time set partitioning can begin.Sec-ond,all of the worker nodes must complete their merges and send their results to the coordinator before the client can re-ceive thefinal result.The next algorithm,PM,will attempt to obtain better performance,by replacing the two global synchronization steps with localized synchronization steps.3.4.Pairwise Merge(PM)The fourth parallel algorithm,PM(see Figure7),dif-fers from TDM+C in two ways.First,the coordinator is more involved than in TDM+C.Secondly,instead of all the worker nodes merging simultaneously,as in TDM+C,pairs of worker nodes merge when the opportunity presents itself. Which two worker nodes are paired is determined dynami-cally by the query coordinator.A worker node is available for merging when its local aggregation tree has been built.The worker node informs the query coordinator that it has completed its aggregation tree.The query coordinator then arbitrarily picks another worker node that had previously completed its aggregation tree,thereby allowing the two worker nodes to merge their leaves.Then,the query coordinator instructs the worker node with the least number of leaf nodes to send the leaves to the other node,the“buddy worker node”,which does the merging of leaves.Once a worker nodefinishes transmitting leaves to its buddy worker node,it is no longer a participant in the query. This buddying-up continues until the query coordinator as-certains that only one worker node is left,which contains the completed aggregation tree.The query coordinator then directs the sole remaining worker node to submit the results directly to the client.Figure8provides a conceptual picture of this“buddy”system.A portion of a PM aggregation tree may be merged mul-tiple times with other aggregation trees.The merge algo-rithm is a merge-sort variant operating on two sorted lists as input(the local list,and the received list).This merge is near linear,,in the number of leaf nodes to be merged.Sole Remaining Figure 8.Pairwise Merge (PM)Algorithm3.5.Time Division Merge (TDM)The fifth parallel algorithm,TDM,is identical to TDM+C,except that it has distributed result placement rather than centralized result placement.This algorithm simply eliminates the final coordinator results collection phase and completes with each worker node having a dis-tinct piece of the final aggregation tree.A distributed re-sult is useful when the temporal aggregate operation is a subquery in a much larger distributed query.This allows further localized processing on the individual node’s aggre-gation sub-result in a distributed and possibly more efficient manner.4.Empirical EvaluationFor the purposes of our evaluation,we chose the tempo-ral aggregate operation COUNT since it does not require that the attribute itself be sent.This simplifies the data struc-tures maintained while still exhibiting the characteristics of a temporal aggregate computation.Based on this tem-poral aggregate operation we perform a variety of perfor-mance evaluations on the five parallel algorithms presented.The matrix in Table 3summarizes the experiments we have done.Algorithms Covered NumProcessors 1SAT,PM,SM,TDM,TDM+C 2,4,8,16,32,642SAT,PM,SM,TDM,TDM+C 2,4,8,16,32,643SAT,PM,SM,TDM,TDM+C 2,4,8,16,32,644PM,SM,TDM,TDM+C 16Table 3.Experimental Case Matrix Summary4.1.Experimental EnvironmentThe experiments were conducted on a 64-node shared-nothing cluster of 200MHz Pentium machines,each with 128MB of main memory and a 2GB hard disk.The ma-chines were physically mounted on two racks of 32ma-chines.Connecting the machines was a 100Mbps switched Ethernet network,having a point-to-point bandwidth of 100Mbps and an aggregate bandwidth of 2.4Gbps in all-to-all communication.Each machine was booted with version 2.0.30of the Linux kernel.For message passing between the Pen-tium nodes,we used the LAM implementation of the MPI communication standard [2].With the LAM implemen-tation,we observed an average communication latency of 790microseconds and an average transfer rate of about 5Mbytes/second.4.2.Experimental ParametersTo help precisely define the parameters for each set of tests,we established an experiment classification scheme.Table 4lists the different parameters,and the set of param-eter values for each experiment.Synthetic datasets were generated to model relations which store time-varying information for each employee in a database.Each tuple has three attributes,an SSN attribute which is filled with random digits,a StartDate attribute,and an EndDate attribute.The SSN attribute refers to an en-try in a hypothetic employee relation.On the other hand,the StartDate and EndDate attributes are temporal instants which together construct a valid-time period.The data gen-eration method varies from one experiment to another and is described later.NumProcessors depends on the type of performance measurement.Scale-up experiments used 2,4,8,16,32,and 64processing nodes,while the variable reduction ex-periment used a fixed set of 16nodes.To see the effects of data partitioning on the perfor-mance of the temporal algorithms,the synthetic tables were partitioned horizontally either by SSN or by StartDate.The SSN and StartDate partitioning schemes were attempts to model range partitioning based on temporal and non-temporal attributes [3].The tuple size was fixed at 41bytes/tuple.The tuple size was intentionally kept small and unpadded so that the gener-Parameter Exp4.3Exp4.4Exp4.5Exp4.62,4,8,16,32,642,4,8,16,32,642,4,8,16,32,6416by SSN by SSN by StartDate by StartDate41bytes41bytes41bytes41bytes65536tuples65536tuples65536tuples65536tuples*65536*65536*6553616*655360%100%0%0/20/40/60/80/100%Table4.Experiment Parametersated datasets could have more tuples before their size madethem difficult to work with.2All experiments except the single speed-up test used afixed database partition size of65,536tuples.This wasdone to facilitate cross-referencing of results between dif-ferent tests.Because of this,the16-node results of thescale-up experiments are directly comparable to the resultsof the16-node data reduction experiment.The total database size reflects the total number of tuplesacross all the nodes participating in a particular experimentrun.For scale-up tests,the total database size increased with the number of processing nodes.Finally,the amount of data reduction is100minus the ratio between the number of resulting leaves in thefinal aggregation tree and the original number of tuples in the dataset.A reduction of100percent means that a100-tuple dataset produces1leaf in thefinal aggregation tree because all the tuples have identical StartDates and EndDates.4.3.Baseline Scale-Up Performance:No Reductionand SSN PartitioningWe set up ourfirst experiment to compare the scale-up properties of the proposed algorithms on a dataset with no reduction.We will also use the measurements taken from this experiment as a baseline for later comparisons with sub-sequent experiments.The second column of Table4gives the parameters for this particular experiment.For this experiment,a synthetic dataset containing4M tuples was generated.Each tuple had a randomized SSN atrribute and was associated with distinct periods of unit length(i.e.,).The dataset was then sorted by SSN.3and were then distributed to the64 processing nodes.To measure the scale-up performance of the proposed al-gorithms,a series of6runs having2,4,8,16,32,and64 nodes,respectively,were carried out.Note that since we fixed the size of the dataset on each node,increasing the number of processors meant increasing the total database size.Timing results from this experiment are plotted in Fig-ure9and lead us to the following conclusions.2The total database size for the scale-up experiment at64processing nodes was64partitions65536tuples41bytes=171,966,464bytes.3Since the SSNfields are generated randomly,this has the effect of10203040506070248163264 TimeinSecondsNumber of Worker NodesSATSMPMTDMTDM+CFigure9.Scale-Up Results(4M tuple Datasetwith No Reduction and SSN Partitioning)SM performs better than SAT.Intuitively,since the dataset exhibits no reduction,both SM and SAT send all periods from the worker nodes to the coordinator.The rea-son behind SM’s performance advantage comes from the computational parallelism provided by building local aggre-gation trees on each worker node.Aside from potentially reducing the number of leaves passed on to the coordina-tor,this process of building local trees sorts the periods in temporal order.This sorting makes compiling the results more efficient4than SAT’s strategy of having to insert each valid-time period into thefinal aggregation tree.SAT exhibits the worst scale-up performance.This result is not surprising,since the only advantage SAT has over the original sequential algorithm comes from parallelized I/O. This single advantage does not make up for the additional communication overhead and the coordinator bottleneck.5 The performance difference between TDM and TDM+C increases with the number of nodes.For this observation, it is important to remember that TDM+C is simply TDM plus an additional result-collection phase that sends allfinal leaves to the coordinator,one worker node at a time.The performance difference increases with the number of nodesrandomizing the tuples in terms of StartDate and EndDatefields.4The SM coordinator uses a merge-sort variant in compiling and con-structing thefinal results.5In SAT,all the periods are sent to the coordinator which builds a single, but large,aggregation tree.because of the non-reducible nature of the dataset and the fact that scale-up experiments work with more data as the number of nodes increase.Among the algorithms that provide monolithic results, PM has the best scaleup performance up to32nodes.This is attributed to the multiple merge levels needed by PM.A PM computation needs at least merge levels where is the number of processing nodes.On the other hand,the TDM+C algorithm only merges local trees once but has three synchronization steps,as described in Section3.Withthis analysis in mind,we expected PM to perform better or as well as TDM+C for2,4,and8nodes,which have1,2, and3merge levels,respectively.We then expected TDM+C to outperform PM as more nodes are added,but we were suprised to realize that PM was still performing better than TDM+C up to perhaps50nodes.Tofind out what was going on behind the scenes,we used the LAM XMPI package[2]to visually track the pro-gression of messages within the various TDM+C and PM runs.This led us to the reason why TDM+C performed worse than PM for2to32nodes:TDM+C was slowed more by increased waiting time due to load-imbalance(computa-tion skew)as compared to PM.4.4.Scale-Up Performance:100%Reduction andSSN PartitioningThis experiment is designed to measure the effect of a significant amount of reduction(100%in this case)on the scale-up properties of the proposed algorithms.Table4 gives the parameters for this experiment.This experiment is modeled after thefirst one but with a synthetic dataset having100%reduction.This dataset was generated by creating4M tuples associated with the same period and having randomized SSN attributes.The syn-thetic dataset was then rearranged randomly6and split into 64partitions each having65,536tuples.This experiment,like thefirst one,is a scale-up experi-ment.Hence,it was conducted in much the same way.Tim-ing results from this experiment are plotted in Figure10and leads us to the following observations.All algorithms benefit from the100%data reduction. Comparing results from the baseline experiment with re-sults from the current experiment lead us to this observation. Because of the high degree of data reduction,the aggrega-tion trees do not grow as large as in thefirst experiment. With smaller trees,insertions of new periods take less time because there are fewer branches to traverse before reaching the insertion points.Because all of the presented algorithms use aggregation trees,they all experience increased perfor-mance.6The aggregation tree algorithm performs at its worst case when the dataset is sorted by time[7].10203040506070248163264 TimeinSecondsNumber of Worker NodesSATSMPMTDMTDM+CFigure10.Scale-Up Results(4M tuple Datasetwith100%Reduction and SSN Partitioning)With100%reduction,PM and TDM+C catch up to TDM.Aside from constructing smaller aggregation trees, a high degree of data reduction decreases the number of ag-gregation tree leaves exchanged between nodes.TDM does not send its leaves to a central node for result collection,so it does not transfer as many leaves as its peers.Because of this,TDM is not impacted by the amount of data reduction as much as either PM or TDM+C which end up performing as well as TDM.4.5.Scale-Up Performance:No Reduction andTime PartitioningThis experiment is designed to measure the effect of time partitioning on the scale-up properties of the proposed algo-rithms.The settings for this experiment are summarized in Table4.The dataset for this experiment was generated in a man-ner similar to thefirst one,but with StartDate rather than SSN partitioning.This was done by sorting the whole dataset by the StartDate attribute and then splitting it into 64partitions of64K tuples each.Time Partitioning did not significantly help any of the algorithms.We originally expected TDM and TDM+C to benefit from the time partitioning but we also realized that for this to happen,the partitioning must closely match the way the global time divisions are calculated.Because we randomly assigned partitions to the nodes,TDM did not benefit from the time partitioning.In fact,it even performed a little bit poorer in all but the16-node run.We attribute the small performance gaps to differences in how the partition-ing strategies interacting with the number of nodes made TDM redistribute mildly varying numbers of leaves across the runs.As for SM and PM,they exhibited no conclu-sive improvement because they were simple enough to work without considering how tuples were distributed across the various partitions.。
建筑专业英语试题及答案一、选择题(每题2分,共20分)1. What is the term used to describe the process of creatinga three-dimensional model of a building or structure?A. DraftingB. ModelingC. DesigningD. Blueprinting答案:B2. Which of the following is not a type of construction material?A. ConcreteB. SteelC. BrickD. Plastic答案:D3. What does the acronym HVAC stand for in the context of building systems?A. Heating, Ventilation, and Air ConditioningB. Heating, Ventilation, and Air QualityC. Heating, Ventilation, and Air ControlD. Heating, Ventilation, and Air Management答案:A4. What is the term for the process of joining two pieces of metal by heating them until they melt together?A. WeldingB. SolderingC. BrazingD. Annealing答案:A5. In architecture, what is the term for the horizontal division of a building that separates the ground floor from the first floor?A. FoundationB. FloorC. MezzanineD. Ceiling答案:C6. What is the term for a type of window that is hinged at the top and opens outward?A. Casement windowB. Awning windowC. Hopper windowD. Sliding window答案:C7. What is the term for a type of roof that has two slopes on each side, meeting at a ridge?A. Flat roofB. Gable roofC. Hip roofD. Mansard roof答案:C8. What is the term for a structural element that supports the weight of a building and transfers it to the ground?A. ColumnB. BeamC. GirderD. Pile答案:B9. What is the term for a type of construction joint that allows for thermal expansion and contraction of materials?A. Expansion jointB. Control jointC. Construction jointD. Movement joint答案:A10. What is the term for the process of applying a protective coating to metal to prevent corrosion?A. GalvanizingB. AnodizingC. PlatingD. Painting答案:A二、填空题(每题2分,共20分)1. The ________ is the main vertical load-bearing element ina building.答案:column2. The ________ is a horizontal beam that supports the ends of the floor joists.答案:rim joist3. A ________ is a type of foundation that is used to support structures on soft or unstable soil.答案:piling4. The ________ is the process of applying a thin layer of material to a surface to provide a smooth, even finish.答案:plastering5. ________ is a type of glass that is treated to reduce the amount of heat that passes through it.答案:Insulated glass6. A ________ is a type of door that is designed to provide a tight seal, often used in industrial settings.答案:rolling door7. ________ is a type of roofing material made from asphalt and fiber.答案:Built-up roofing8. ________ is a type of window that is fixed in place and does not open.答案:Fixed window9. ________ is a type of construction material made from cement, sand, and water.答案:Concrete10. ________ is a type of wall that is designed to supportthe weight of the building and resist lateral forces.答案:Load-bearing wall三、简答题(每题10分,共40分)1. What are the main components of a building's structural system?答案:The main components of a building's structural system include foundations, columns, beams, girders, floors, walls, and roofs. These elements work together to support the weight of the building and resist external forces such as wind and earthquakes.2. Explain the difference between a load-bearing wall and a non-load-bearing wall.答案:A load-bearing wall is a wall that supports the weightof the building above it, while a non-load-bearing wall does not carry any structural load. Non-load-bearing walls are typically used for partitioning spaces within a building and can be removed without affecting the structural integrity of the building.3. What is the purpose of a lintel in construction?答案:A lintel is a horizontal structural element that spans the space between two vertical supports, such as walls. Its purpose is to support the masonry above the opening, such asa door or window, and transfer the load to the supports on either side.4. Describe the function of a damp-proof course in a building. 答案:A damp-proof course (DPC) is a layer of material,usually bituminous or plastic, that is installed。
2025届江苏省徐州市新高三双基测试2024.8英语试题考试时间:120分钟试题总分:150分注意事项1.答题前,考生务必将自己的姓名、准考证号、班级、学校在答题卡上填写清楚。
2.每小题选出答案后,用2B铅笔把答题卡上对应题目的答案标号涂黑,如需改动,用橡皮擦干净后,再选涂其他答案标号。
在试卷上作答无效。
3.考试结束后,请将答题卡交回,试卷自行保存。
满分150分,考试用时120分钟。
第一部分听力(共两节,满分30分)略第二部分阅读(共两节,满分50分)第一节(共15小题;每小题2.5分,满分37.5分)阅读下列短文,从每题所给的A、B、C、D四个选项中选出最佳选项。
AACoding(编码) apps and programming apps for kids are becoming increasingly popular. Here are some of the best apps for teaching kids to code.ScratchScratch is by far the most popular coding app for kids, and it is now used in many schools all over the world. It is available for free on the web as well as for Android and iOS, which is why it is so popular. Created in 2003 by MIT students and staff, its coding language is designed for children aged 7 to 16.The language employs visual blocks or bricks that can be dragged and dropped on a workspace to construct logical chains. As a result, the child will have a better understanding of the fundamental principles of coding.Daisy the DinosaurDaisy the Dinosaur is a free iPad and iPhone programming app. What distinguishes it from the rest is that it is designed specifically for children aged 4 to 7. The app was developed by the same team that created the Hopscotch coding app for kids. If you’ve ever used that app, Daisy the Dinosaur has a similar user-friendly interface with limited features that is best for beginners.Cargo-BotCargo-Bot is a free coding app that is available for both Android and iOS devices. It is appropriate for children aged 10 and up. This app specialises in puzzle challenges involving a robotic arm that must be programmed to perform various tasks, most notably moving coloured boxes to create a specific design or pattern.KodableAnother relatively well-known coding app for kids is Kodable.’ It is free for: both the iPhone and iPad, but there is no option for Android or the web. Kodable is appropriate for both younger children and older teenagers(from 8 years old and up). It is also remarkable for guiding users from a basic to a complex level of coding.The approach is very game-like, with users able to select a character to guide them through the levels.21.Why is Scratch so popular?A.It can be adjusted by students in school.B.It was developed at the earliest time.C.It is free to use on several platforms.D.It helps children understand coding principles.22.Which app is suitable for learners of 5 years old?A.Scratch,B.Daisy the Dinosaur.C.Cargo Bot.D.Kodable. 23.What can be known about Kodable?A.It has different levels.B.It has many difficult puzzles.C.It uses visual blocks or bricks to perform tasks.D.Its development team created the Hopscotch coding app.BClimbing, I once thought, was a very manly activity, but as I found my way into this activity, I came to see that something quite different happens on the rock.Like wild swimming, rock climbing involves you into the landscape. On the rock, I am fully focused. Eyes pay close attention, ears are alert, and hands move across the surface. Unlike walking, where I could happily wander about absent-mindedly, in climbing, attentive observation is essential.As an arts student studying English literature, I discovered a new type of reading fromoutdoor climbing. Going out on to the crags (悬崖), I saw how you could learn to read the rocks and develop a vocabulary of physical movements. Good climbers knew how to adjust their bodies on to the stone. Watching them, I wanted to possess that skillful “language”.My progress happened when I worked for the Caingorms National Park Authority.Guiding my explorations into this strange new landscape was Nan Shepherd, a lady too. Unlike the goal-directed mindset of many mountaineers, she is not concerned with peaks or personal achievement. Shepherd sees the mountain as a total environment and she celebrates the Caingorms as a place alive with plants, rocks, animals and elements. Through her generous spirit and my own curiosity, I saw that rock climbing need not be a process of testing oneself against anything. Rather, the intensity of focus could develop a person into another way of being.Spending so much time in high and st ony places has transformed my view on the world and our place in it. I have come into physical contact with processes that go way beyond the everyday. Working with gravity, geology (地质学), rhythms of weather and deep time, I gain an actual relationship with the earth. This bond lies at the heart of my passion for rock climbing. I return to the rocks, because this is where I feel in contact with our land.24.What does the writer find important in climbing?A.Balance.B.Concentration.C.Determination.D.Perseverance.25.What does the writer learn from Nan Shepherd?A.Climbing goes together with nature.B.Every mountain top is within reach.C.The best climber is the one having fun.D.You can not achieve high unless you change.26.What does the underlined word“it”in the last paragraph refer to?A.Time.B.Transformation.C.The world.D.My view. 27.Why does the author like rock climbing?A.It challenges her to compete with men.B.It allows her a unique attitude toward rock.C.It teaches her how to possess a new language.D.It makes her feel connected with the earth.CThe Mona Lisa is one of the most famous paintings in the world. Painted by Leonardo da Vinci in the early 16th century, it has captivated audiences for centuries with its mysterious smile and masterful technique.The painting is housed in the Louvre Museum in Paris, France, and attracts millions of visitors each year. However, the Mona Lisa's history is not without controversy. There have been numerous attempts to steal or deface the painting, and its conservation has been a subject of great concern.Leonardo da Vinci's use of sfumato, a technique that blends colors and tones seamlessly, gives the painting a soft and hazy appearance. The sitter's expression, which seems to change depending on the viewer's angle and mood, adds to the intrigue of the work.28. What is the main subject of the passage?A. The Louvre MuseumB. Leonardo da VinciC. The Mona LisaD. Painting techniques29. What technique did Leonardo da Vinci use in the Mona Lisa?A. SfumatoB. PointillismC. CubismD. Surrealism30. Why is the Mona Lisa's conservation a concern?A. Because of its ageB. Because of the number of visitorsC. Because of past attempts to damage itD. All of the above31. What makes the Mona Lisa's expression intriguing?A. Its constant smileB. Its changing appearance depending on the viewerC. Its similarity to other paintingsD“A moth (飞蛾) to a flame” is often used to indicate an inescapable attraction, yet it is a strange example of animal behavior that continues to confuse people today. Scientists have raised a number of theories over the years to explain why. One holds that insects flying at night are following their nature to fly toward the brightest spot in their field of vision, which they mistake for the sky. Another suggests that insects are trying to warm themselves with the heat produced by the light. The most popular theory, though, is that insects are confusing lights with the moon or other celestial bodies (天体) that they normally use to navigate (导航).To find out the real reason, the team carried out a first set of experiments in an insect flight area. The researchers used eight high-speed infrared (红外线) cameras equipped with motion-capture technologies to track 30 insects from three moth and two dragonfly species. They also flew lab-raised insects from six different insect orders that were too small for motion-capture technology, including fruit flies and honeybees, to make sure different insects all showed similar responses to light. Working with co-author Pablo Allen of the Council on International Educational Exchange in Monteverde, Costa Rica, the researchers put heavy cameras, lights and tripods in two field sites to gather behavioral data from insects in the wild.The team was able to confirm that insects were not beelining to the light but rather circling it as they tilted (倾斜) in an attempt to turn their backs toward it. This behavior, known as a “dorsal light response”, normally helps insects to remain in an unchanging path of flight that is properly lined to the horizon (地平线). Artificial light that arrives from a point source causes them to fly in unpredictable patterns as they try to turn their backs to what they are mistaking for the sky.Now research might have finally solved the mystery mentioned first: artificial light confuses insects’ ability to turn themselves to the horizon, confusing their sense of what is up and down and causing them to fly in circles.32.Which theory about a moth to a flame is accepted by most people?A.They follow their nature to fly.B.They are blind to artificial light.C.They mistake artificial light for celestial bodies.D.They are attracted by the warmth of artificial light.33.What’s the author’s purpose in using the figures in paragraph 2?A.To show the study is comprehensive.B.To introduce the purpose of the study.C.To estimate the cost of the research.D.To stress the challenge faced by theresearchers.34.What does the underlined word “beelining” in paragraph 3 probably mean?A.Flying slowly.B.Sticking.C.Responding.D.Going straight. 35.What benefit can insects get from “dorsal light response”?A.They can fly beyond the horizon.B.They can follow a steady flight path.C.They can turn their backs toward lights.D.They can circle the light source upsidedown.第二节(共5小题;每小题2.5分,满分12.5分)阅读下面短文,从短文后的选项中选出可以填入空白处的最佳选项。
The Complete Effect and HLSL Guide(二)从1995年,3Dfx发布第一块消费级的3D硬件加速图形卡开始,计算机图形技术和相关的硬件技术都取得了重大进展。
虽然这类图形卡在渲染功能上有诸多限制,但为开发者打开了一片新的天地,终结了只能依靠软件解决方案进行渲染的时代。
其结果是让实时3D图形和游戏都变得更加真实。
此后,接下来的几代硬件都在性能和功能方面有了重大突破。
但是,由于受到硬件固定管线构架(fixed-pipeline architecture)的限制,仍然有很多约束,开发者被强制只能通过使用和改变渲染状态来控制渲染过程,获得最终的输出图形。
固定管线构架功能上的局限性,限制了开发者创建所需效果的能力。
总的来说,它所产生的图形都不够真实。
另一方面,用于电影CG渲染的高端软件渲染构架则发明了一些让渲染更加逼真的方法。
Pixar Animation Studios开发了一门称为RenderMan的着色语言。
它的目的是让艺术家和开发者使用一门简单但强大的编程语言来完全控制渲染过程。
RenderM an可以创建出高质量的图形,从照片级的真实效果,到卡通风格的非真实渲染效果都可以实现。
被广泛用于当今的电影中,包括著名的动画Toy Story和A Bug’s Life。
随着处理器芯片制造技术的革新,和处理能力的增强,RenderMan的思想逐渐影响并延伸到了消费级图形硬件。
DirectX 8的发布引入了顶点(vertex)和像素着色器(pixel sh ader)1.0 以及1.1版本。
虽然这两个版本的着色模型灵活性不高,同时缺乏流程控制等一些功能。
但是,这第一步,给予了艺术家和开发者长久以来所梦想的,创造夺目的、真实的图形的能力。
消费级图形卡所生产的图形终于能和好莱坞电影工作室所渲染出的图形相比了。
接下来的几年间,图像硬件和3D API无论在功能和性能上都取得了巨大飞跃,甚至打破了摩尔定律中的技术进步速率。
The Vagabond Parallel Temporal Object-OrientedDatabase System:Versatile Support for FutureApplicationsKjetil Nørv˚a gDepartment of Computer and Information ScienceNorwegian University of Science and Technology7491Trondheim,Norwayemail:noervaag@idi.ntnu.noAbstract.In this paper,we discuss features that future database systems should support to deliverthe required functionality and performance to future applications.The most important features areefficient support for:1)large objects,2)isochronous delivery of data,3)queries on large data sets,4)full text indexing,5)multidimensional data,6)sparse data,and7)temporal data and versioning.To efficiently support these features in one integrated system,a new database architecture is needed.We describe an architecture suitable for this purpose,the Vagabond Parallel Temporal Object-OrientedDatabase system.We also describe techniques we have developed to avoid some possible bottlenecksin a system based on this new architecture.1IntroductionThe recent years have brought computers into almost every office,and this availability of powerful computers,connected in global networks,has made it possible to utilize powerful data management systems in new application areas.The increasing performance and storage capacity,combined with decreasing prices,has made it possible to realize applications that previously were too heavy for current computer hardware.High performance and storage capacity is not necessarily enough.We need support software,e.g.database systems,operating systems,and compilers,able to benefit from current and future hardware.This often means rethinking previous solutions,similar to what was done in the hardware world with the introduction of the RISC concept.In this paper,we will concentrate on database systems,quite likely to be the bot-tleneck in many future information systems if not adequately designed.Thefirst step in the process of rethinking old solutions has already been done,with the advent of object-oriented database system(OODBs).While relational database systems(RDBs) have good performance for many of the previous,traditional,application areas,new ap-plications demands more than RDBs can deliver.The increased modeling power and removal of the language mismatch in OODBS,has made integration between the appli-cation programs easier,and in many cases helped to increase the application performance.Previously,data has lived in an artificial,modeled world after they had been inserted into the database.This created a mismatch in many ways similar to the language mis-match.What we would like,is that database systems should support a world more sim-ilar to our own,which includes time and space.This is not at all a new observation, especially the aspects of temporal database management have been an active research area for many years.However,current database architectures,adequate for yesterday’s applications,will have problems coping with tomorrow’s application.In this paper,we describe a new architecture,more suitable for tomorrows applications,the V agabond Par-allel Temporal Object-Oriented Database System.We give an overview of V agabond,anddescribe some of the new techniques we have developed to make V agabond able to de-liver the high performance and scalability needed for future applications.The organization of the rest of the paper is as follows.In Sect.2we give an overview of related work.In Sect.3,we describe some application areas that have only limited support in existing database system.Based on this discussion,we summarize the fea-tures future database systems should support,and describe assumptions and features that motivate the design of the V agabond system.In Sect.4we discuss some techniques that can increase the performance of a temporal OODBs.Finally,in Sect.5,we conclude the paper.2Related WorkThe area of temporal object-oriented databases(TOODBs)is still immature,as is evi-dent from the amount of research in this area,summarized in the Temporal Database Bibliography(last published in1998[14]).Most of the work in the area of temporal object-oriented databases has been done on data modeling,less has been done on im-plementation issues.Even though temporal databases have a long history,few full scale systems have been implemented[1].Common for most of these,is that they have only been tested on small amounts of data,which make the scalability of the systems ques-tionable.In most of the application areas where temporal database systems are needed, scalability is an important issue,as the amount of data will be large.In the area of tempo-ral object-oriented database systems,we are only aware of one prototype,the POST/C++ temporal object store[13].A description of a preliminary design of the V agabond storage manager and object-oriented database systems based on log-structured techniques was given in[8,9].Log-structuredfile systems,whose philosophy the log-only approach of V agabond is based on,was introduced by Rosenblum and Ousterhout[11].LFS has been used as the ba-sis for two other object managers:the Texas persistent store[12],and as a part of the Grasshopper operating system[3].Both object stores are page based,i.e.,operations in the database are done on page granularity.To our knowledge,there has been no publica-tions on other object LFS based log-only OODBs.3The Need for a New ArchitectureWhen designing new database systems,it is important to study the current as well as possible future applications of the system.We can categorize application areas into exist-ing application areas,and emerging application areas.Existing application areas includes the traditional database areas,like typical transaction processing,well suited for RDBs, and application areas where application specific database systems orfile systems have been used earlier,because current general purpose database systems can not handle the performance constraints.Emerging application areas include new application areas,that are emerging as a response to the increased computer performance in general,as well as application areas that are a response to other technologies,e.g.,World Wide Web.Examples of existing applications,where database systems until recently have been a potential performance bottleneck include geographical information systems,scientific and statistical database systems,and multimedia systems.Examples of applications where increased database support will be needed to deliver the desired performance includetemporal database systems and semistructured data management.Based on the charac-teristics of these application areas,we have identified some features that we believe future systems should support:Efficient support for large objects.Isochronous delivery of data.Queries on large data sets.In applications where low update rates appear,this should be exploited to increase performance.Support for full text indexing.Support multidimensional data.Support sparse data,for example by the use of data compression.Dynamic clustering and dynamic tuning of system options and parameters.Temporal data support/version management.Until now,no single system has supported all these features.For some of the fea-tures listed,ad-hoc solutions exists,but these are often not scalable,or will not work well together with support for the other features.We think that future systems should support these features,in one integrated system.This is the goal of the V agabond project. V agabond is designed to support the listed features,with a philosophy based on the fol-lowing assumptions:1.Although many of the current problem can be handled by future main memory databasesystems(MMDBs),there are many problems(and more will appear,as the computers become powerful enough to solve them)that are too large to be solved by a MMDB alone.However,the size of main memory increases fast,and it is very important to utilize the available main memory as much as possible to reduce time consuming secondary memory accesses.2.The main bottleneck in a database system for large databases is still secondary mem-ory access.In a database system,most accesses to data are read operations.Conse-quently,database systems have been read optimized.However,as main memory ca-pacity increases,the amount of disk write operations relative to disk read operations will in general increase.This calls for a focus on write optimized database systems.3.To provide the necessary computing power and data bandwidth,a parallel architec-ture is necessary.A shared-everything approach is not truly scalable,so our primary interest is in OODBs based on shared-nothing multicomputers.With the advent of high performance computers,and high speed networks,we expect multicomputers based on commodity workstations/servers and networks to be cost effective.4.In many application areas,there is a need for increased data bandwidth,not onlyincreased transaction throughput(although these points are related).This is especially important for emerging application areas that have a need for high data bandwidth.Examples are video on demand,and supercomputing applications,which have earlier usedfile systems because database systems have not supported delivery of large data volumes.5.Even though set based queries have been a neglected feature in most OODBs,weexpect it to be just as important in the future for OODBs as it has been previously for relational database systems.The popularity of the hybrid object-relational systems justifies this assumption.6.Distributed information systems are becoming increasingly common,and they shouldbe supported in a way that both facilitates efficient support for distribution,and effi-cient execution of local queries and operations.We will now describe the architecture and some interesting aspects of the V agabond OODB.3.1Log-Only StorageIn most current database systems,write ahead logging(W AL)is employed to increase throughput and reduce response time.W AL defers the non-sequential writing,but sooner or later,the data has to be written to the database.This often results in the writing of lots of small objects,almost always one disk access for each individual object.Our solu-tion to this problem,is to eliminate the current database completely,and use a log-only approach,similar to the log-structuredfile system approach[11].The log is written con-tiguously to the disk,in a no-overwrite way,in large blocks.This is done by writing many objects and index entries,possibly from many transactions,in one write operation.This gives good write performance,but possibly at the expense of read operations.Already written data is never modified,new versions of the objects are just appended to the log.Logically,the log is an infinite length resource,but the physical disk size is, of course,not infinite.We solve this problem by dividing the disk into large,equal sized, physical segments.When one segment is full,we continue writing in the next available segment.As data is vacuumed,deleted or migrated to tertiary storage,old segments can be reused.Deleted data will leave behind a lot of partiallyfilled segments,the data in these near empty segments can be collected and moved to a new segment.This pro-cess,which is called cleaning,makes the old segments available for reuse.By combining cleaning with reclustering,we can get well clustered segments.In a traditional system with in-place updating,keeping old versions of objects,which is required in a transaction time temporal database system,usually means that the previous version has to be copied to a new place before update.This doubles the write cost.In V agabond,this is not nec-essary.Keeping old versions comes for free(except for the extra disk space).Thus,our system supports transaction time temporal database systems in an efficient way.Because each new version of an object is written to a new place,logical object iden-tifiers(OIDs)are needed.When using logical OIDs,an OID index(OIDX)is needed to do the mapping from logical OID to physical location when retrieving an object.The index entries in the OIDX,the object descriptors(OD),contains the physical address for an object,and in a transaction time TOODB,the timestamp as well.In a traditional non-temporal OODB,the OIDX needs only be updated when objects are created,not when they are updated.In a log-only OODB,however,the OIDX needs to be updated on every object update.This might seem bad,and can indeed make it difficult to realize an effi-cient non-temporal OODB based on this technique.However,in the case of a TOODB, the OIDX needs to be updated on every object update also in the case of in-place updat-ing,because either the previous or the new version must be written to a new place.Thus, when supporting temporal data management,the indexing cost is the same in these two approaches.Our storage structure is very well suited as a basis for a temporal database system.We never overwrite data,so keeping old versions comes for free.We maintain the temporal information in the index,which makes retrieval efficient,without an additional index.Fig.1.The Vagabond system architecture(left),and the Vagabond server architecture(right).3.2Parallelity and Distribution in V agabondThe V agabond architecture is a system designed for high performance,and one strategy to achieve this,is to base the design on the use of parallel servers.Data is declustered over a set of servers,which we call a server group.It is possible to add and remove nodes from the configuration.The servers in a server group will cooperate on the same task.In this way,it is possible to get a data bandwidth close to the aggregate bandwidth of the cooperating servers.To benefit from the use of a parallel server groups,it is supposed that the servers in one server group are connected by some kind of high speed communication.In many organizations,it is also desirable to have the data in a distributed system, and the demand for support of distributed databases is increasing.To satisfy this,we use a hybrid solution:a distributed system,with server groups(Fig.1).The connections between the server groups in the distributed system have in general less bandwidth than the connections between the servers in a server group.Objects are clustered on server groups based on locality as is common in traditional distributed OODBs,but one server group can contain more than one computer(a kind of“super server”).Objects to be stored on a server group are declustered on the servers in the group according to some declustering strategy,e.g.,hashing.3.3Server ArchitectureWe use a peer-to-peer server architecture,similar to the Shore project[2].All application programs(AP)in the system are connected to one server running on the same machine. This server is the gateway to the database system,including remote servers.The servers do not have to contain any data.If all servers did,including those running on office workstations,that would make the availability of the system difficult.Thus,even if a node contains no data volume,a server must be running on that node to make it possible for the application program to access the database system.One advantage of this approach is that it makes it possible for several clients running on the same machine to utilize a common server side cache.On the client,client side caching will usually be employed as well.Client/Server Communication.A V agabond server is an object server.The architecture of the server is shown in Fig.1.A client normally operates against the V agabond API, a client side stub which provides the mechanisms to communicate with the server.Thecommunication with the server is done via the messenger,for example implemented as a shared memory queue.Server Side Operation.The server is threaded,and all subservers runs as separate threads. There is also one thread for each transaction.Even though the use of threads incur extra cost as locking overheads,thread administration,and thread switching overhead,com-pared to an event driven system,threading is beneficial when we want to exploit multi-processor computers.Each client that connects to the server starts the session by connecting to the Sub-server manager,which allocates either an OODB session subserver thread or an appli-cation subserver thread to the client.The allocated thread operates in the server address space on behalf of the mands and data are communicated through the mes-senger.Subservers and Server Extensibility.All communication between clients and the server is done via subservers.We have three classes of subservers:1)Subserver managers, 2)OODB session subservers,and3)application subservers.The subserver manager is only used when a session is started,to get allocated the appropriate subserver,as described above.OODB session subservers and application subservers are the ones that accesses the storage manager(SM)through the SM API.The OODB session subservers are the ones normally used,the“standard”portals to the system.Application subservers are exten-sions to the system,similar to Informix’s datablades.One interesting point here,is that the SM API is a superset of the V agabond API, the client interface stub.This feature makes it easier to implement and test subservers as clients,before they are added to the server.As subservers,they can communicate with clients through a messenger,as illustrated in Fig.1.The use of application subservers makes it easy to integrate extensions to V agabond, by adding new subservers to the system.This is a feature also found in other systems, e.g.,the V alue Added Server concept in Shore.However,for such a concept to be really beneficial,the OODB has to be object shipping rather than page shipping,so that the system can to be able tofilter out objects and do operations on the objects,something which is impossible or difficult on most page server systems.Storage Manager.The storage manager is responsible for permanent storage of objects. Its most important operations include transaction management,secondary and tertiary storage management,and indexing.Buffering data in main memory is used to reduce the amount of data needed to be transfered between main memory and disk.Important buffers include object buffer,index node buffer,and object descriptor buffer(index entry buffer).The size of these can be dynamically resized,to get optimal performance with changing access patterns. Permanent Storage.All data in V agabond is stored in a logical log.The log is stored in one logical data volume.A data volume consists of one or more storage devices.A storage device can be secondary as well as tertiary storage,some typical examples of storage devices are raw disk partitions,fixed size(but extendible)files on the nativefile system,simulating a disk partition,optical disk,and tape.Devices can be dynamically added or removed to the data volume.Adding a device basically increases the number of available segment in the volume,while removing a device is done byfirst moving all data residing on the device to be removed,essentially a cleaning operation on the segments residing on the device.Even if disk space is cheap,it will still be necessary to have data on tertiary memory for some applications.This can be done transparently in our system,and both objects and partitions of the OIDX can be migrated to tertiary storage.3.4Objects in V agabondIn our storage system,all objects smaller than a certain threshold,e.g.,64KB,are written as one contiguous object.They are not segmented into pages as is done in other systems) Objects larger than this threshold are segmented into subobjects,and a large object index is maintained for each large object.There are several reasons for doing it this way:Writing one very large object should not block all other transactions during that time.A segmented object is useful later,when only parts of the object is to be read ormodified.Parts of the object can reside on different physical devices,possibly on different levels in the storage hierarchy.The value of the threshold can be set independently for different object classes,some-thing which is very useful,because different object classes can have different object re-trieval characteristics.Typical examples are a video and a general index.In a video,you want to retrieve one large block of the video each time,it is needed to play the video. When searching an index however,often relatively small nodes are desired.Similar for both video and index retrieval is that you only want a small part of the object.In other situations,e.g.,retrieval of an image,you want to display the image,and therefore want to retrieve the whole image at once.Isochronous Retrieval.Some applications,e.g.,video servers,do not want all of the object delivered at once.Rather,they want part of it delivered at an appropriate rate, isochronous retrieval.We plan to support such retrieval in our system.This can partly be solved by two queues in the I/O system.One for“normal”data,and one for high-priority audio/video data.Special Objects.A large object can be viewed as an array of bytes,and retrieval of a part of the object is done by retrieving a certain byte range of the object.This is notflexible enough for some of the structures that are stored as large objects,e.g.,indexes.These structures are stored as large objects,but the subobject index has additional information to support more complex indexes.They can also have different concurrency control and recovery characteristics.These objects,which we call special objects,are handled by special object handlers.Examples of special objects are persistent roots,collections,index structures,spatial data structures,and multidimensional arrays,which are also stored as large objects.This has the advantage of making them an integrated part of the object system.00.0020.0040.0060.0080.010.0120.10.20.30.40.50.60.70.80.91A v g . i n d e x a c c e s s t i m e /s Index Buffer Size/Total Index Size No OD cache, 2P7030No OD cache, 2P9010No OD cache, 2P9505Opt. OD cache, 2P7030Opt. OD cache, 2P9010Opt. OD cache, 2P9505Fig.2.Access cost for different access patterns,with and without OD cache.4Removing BottlenecksDuring the design of V agabond,we have used cost modeling to identify potential bottle-necks in the proposed system,and to find techniques to avoid them.As can be expected,OID index management is potentially very costly.To reduce the indexing cost,we have developed a new OID index structure for parallel temporal OODBs [7],as well as several novel techniques that reduce the cost of OID indexing:1“Writable”object descriptor cache for temporal OODBs [10].Persistent caching of OID index entries [6].A second bottleneck is object retrieval.The V agabond system is write optimized,and as a result,object retrieval and index lookup can become a serious bottleneck.We will use some techniques to reduce the number and size of read operations,which can improve object retrieval performance considerably,with only marginal write penalties:The use of signatures in the OIDX [5].Object compression.Compression will also improve write efficiency,as it reduces the amount of data needed to be written to disk.We will now give an overview over the techniques.For a more detailed description,we refer to the corresponding papers where these techniques are presented and analyzed [5,6,10].4.1Writable Object Descriptor CacheTo reduce disk I/O,the most recently used index pages are kept in an index page buffer .OIDX pages will in general have low locality,and to increase the probability of finding a certain OD needed for a mapping from OID to physical address,it is also possible to keep the most recently used index entries (the ODs)in a separate OD cache in main memory,as is done in the Shore OODB [4].With low locality on index pages,a separate OD cache utilizes memory better,space is not wasted on large pages where only small parts of them will be used.In non-temporal OODBs,an OD cache is only useful as a read00.0050.010.0150.020.0250.030.0350.040.0450.055101520253035404550C o s t i n s e c o n d s Index memory in MB 2P8020, NoPCache 2P9505, NoPCache 2P8020, OptPCache 2P9505, OptPCache Fig.3.To the left,an overview of index and PCache.To the right,access cost for different access patterns,with and without PCache.buffer,because the OIDX is updated append-only,which is very efficient.In a TOODB,on the other hand,individual entries are to be inserted into the OIDX.To make it possible to do this asynchronously,in batch,we also make it possible to store these ODs in the OD bining optimal OD cache sizes with logging and efficient checkpoint-ing/OIDX installing in the background,this increases performance considerably.An example of cost reduction is illustrated in Fig.2,which shows the average in-dex access cost for different access patterns 2,with different index memory sizes (index memory is the memory used for the OD cache and the OIDX page buffer),both with an optimal OD cache size and without using an OD cache.The average access cost is calculated as ,where is the time needed for an read operation,is the time needed for a write operation,and ,the prob-ability of an operation being a write operation.The reason for doing it this way,is that read and write operations can not be studied independently.In the examples in this paper,.4.2The Persistent CacheThe ODs accessed will be almost uniformly distributed over the index leaf nodes.The OD cache makes read accesses efficient,but,in a database with many objects,most of the ODs that are updated during one checkpoint interval will reside in different leaf nodes in the OIDX.This low locality means that many leaf nodes have to be updated .When an index node is to be updated,an installation read of the node has to be done first.With a large index,the access to the nodes will be random disk accesses,and as a result,the installation read is very costly.A “writable”OD cache reduces the update cost,the number of update objects during one checkpoint interval must be smaller than the size of the OD cache.To reduce average access costs,the persistent cache (PCache)can be used.The PCache contains a subset of the entries in the OIDX,the goal is to have the most fre-quently used ODs in the PCache .In contrast to the main memory cache and the OD cache,the PCache is persistent,so that we do not have to write its entries back to the OIDX it--202040608002004006008001000G a i n i n %Memory size2P80202P9505Fig.4.Gain from using signatures in the OIDX in a temporal OODB,versus memory size,for different access patterns.self during each checkpoint interval.This is actually the main purpose of the PCache:to provide an intermediate storage area for persistent data,in this case,ODs.The size of the PCache is in general larger than the size of the main memory,but much smaller than the size of the OIDX.The contents of the PCache is maintained according to an LRU like mechanism.The result should be high locality on access to the PCache nodes,reducing the total number of installation reads,and making checkpoint less costly.Average OIDX lookup costs should also be less than without a PCache.The PCache,OD cache,index node buffer,and OIDX are illustrated in Fig.3.The number of nodes in the PCache should be small enough to make it possible to store pointers to all the nodes in main memory,in the figure,PCache nodes PC1,PC4and PC6,and 3OIDX nodes are in the buffer.In this case,at most one disk access will be needed to access a PCache node.To be able to do the copying of the ODs from the PCache to the TIDX efficiently,the nodes in the PCache should be accessed in the same order as the leaf nodes in the OIDX.Therefore,the nodes in the PCache are range partitioned,each node stores a certain interval of OIDs.An example of cost reduction is illustrated in Fig.3,which shows the average index access cost for the access patterns 80/20and 95/05,with different index memory sizes (in this case,index memory is memory used for OD cache and OIDX page and PCache page buffers),both with PCache (OptPCache)and without a PCache (NoPCache).In this example,a database with 100million object versions is studied.4.3Signatures in the OIDXSignatures are bit strings,generated by applying some hash function on some or all of the attributes of an object.The signatures of the objects can be stored separately from the objects themselves,and can later be used to filter out candidate objects during a perfect match query.In traditional systems,the signatures are stored in separate signature files,which must be updated every time an object is updated.This can be costly.In our case,the OIDX has to be updated every time an object is updated anyway,and by storing the signature in the OD in the OIDX,the additional cost of maintaining a signature is only marginal.Figure 4shows the gain from using signatures,with different access patterns.。