Walking_Marching_and_Galloping_Patterns_for_Memory_Tests2005
- 格式:pdf
- 大小:101.65 KB
- 文档页数:8
Marching Cubes ModuleDescriptionThis module is longer and more complex than many of the other modules.Most of the modules involve using VTK to visualize data.VTK is a general purpose toolkit that supports many different data types and many different algorithms.Its generic visualization pipeline structure allows different algorithms to be pieced together easily. The downside of this is that it is not optimized for any specific task and is thus slower and takes more memory to do the same task than an algorithm optimized for a specific visualization algorithm.This module examines many of the memory and low-level programming details that VTK hides by having the student develop a complete program for visualizing isosurfaces.Problem StatementThe goal of this module is to develop a complete system that extracts an isosurface from a data set and displays it using OpenGL.This project can easily be split into the two parts:extracting the isosurface and rendering it in an OpenGL window.If the student has previously worked with OpenGL this will be very easy since all it needs to do is open a window and display triangles,ideally rendered with smooth shading, vertex normals,and lighting.Extracting the isosurface using the marching cubes algorithm is straight forward,but actually writing the code and optimizing it is a fair amount of work.IntroductionCompared to many visualization algorithms,the implementation of the marching cubes algorithm is fairly straightforward and relatively simple and short.Because of this it is a good choice for afirst algorithm that a student implements from scratch.It also can be implemented in a brute force fashion or optimizations to limit memory usage and reduce duplicate computations can be ing the OpenGL API,the triangles it produces can be rendered for a complete visualization system in about10-15 pages of C++code.An optimized version with an advanced viewer that allows viewpoint changes and zooming capabilities can implemented in less than30pages of1code total.Depending on the number of students working together on the project,their expertise in OpenGL,and the amount of time you have,the instructor may provide some of the code for viewing the resulting isosurface.BackgroundThe marching cubes algorithm was originally developed by Lorensen and Cline and published in the1987Siggraph Proceedings(pp.163-169).The VTK book also describes the algorithm.A brief description of the algorithm from the course notes is copied here as the next two pargraphs for convenience.See Figure6-4page159of the VTK book for a two-dimensional example which generates an isoline for a specified value.The two-dimensional algorithm for extracting an isoline is known as marching squares.Figure6-5page160of the VTK book for the 16possible cases(2choices for each of4vertices so24)for an isoline passing through a square.Also note that cases1and14are equivalent(i.e.,it does not matter whether three of the vertices have a value above the isovalue and one below or three of the vertices have a value below the isovalue and one above-these are referred to as complementary cases).And if you allow rotations,cases1,2,4,7,8,11,13,and14are all equivalent.Also note that cases5and10are ambiguous with regard to where the isoline occurs.The extension of this method to three-dimensions is known as marching cubes.In three dimensions there are256cases(28).Figure6-6page161of the VTK book shows the15 topologically unique cases using the same strategy to reduce the number of two dimensional cases.The three-dimensional case also suffers from the ambiguity problem and requires handling some of the complementary(where the vertices above the isosurface and below are swapped)cases differently.See Figure6-10page164of the VTK book.Implementing the marching cubes algorithm is straightforward but is tedious and error-prone.There are256cases for a voxel since each value at the8voxel vertices can either be above or below the isosurface value.The common approach to implementing the marching cubes algorithm is to generate a256entry lookup table in which each entry specifies the triangles for that specific case.Then as each voxel is processed,the case is determined by calculating the case and getting the trinagles from the lookup table. This table could be produced by hand,but this is one of the areas where errors are likely to be introduced.If the lookup table specifies the wrong triangles for any one2case,the resulting isosurface(assuming that case occurs)will be wrong.Trying to track these errors down by looking at the image is practically impossible.The only way tofix it is to recheck each of the256entries in the lookup table.Another approach is to enter the15topologically unique cases and information about the triangles for each of the15cases and then write a program that uses this information to generate the256entry lookup table.This is less error prone since fewer entries have to be hand entered,although there is more code to write and debug.This is the approach the module author used and the code for it is in the instructor’s manual.To implement this consider how many different ways you can place a cube down on aflat surface.You may want to take a box and number the vertices from0to 7and experiment with placing the box down on different faces and rotating it.Once you determine,this,you can create a table that specifies which of the8positions each of the8vertices is in for each of the configurations.You will also want to number the edges from0to11and keep track of where each edge is for the configurations or just keep track of the two vertices each edge connects.Other information required is which vertices are above the isosurface value for each of the15unique cases and of course,the edges that contain the triangle vertices for each of the cases.Given all this information,a program can be written to generate the256 entry lookup table containing the triangles to generate for each case.A reasonable size for current data sets is256x256x256using2bits per data value.This requires approximately33MB of disk space/memory so it is not unreasonable to load the entire data set into memory and then also create the extracted triangles.AsMRI/CT scanners continue to improve their resoultion,data sets that are1024x1024x1024are expected.With2bits per data value,this is over2GB of data so reading the entire data set into the computer is not realistic.The size of memory will continue to grow,but since the data sets are cubic,the amount of memory needed will increase more quickly.When writing your program,try to minimize the number of slices you store in memory.The vertex location for each triangle is calculated by interpolating the data values at the two vertices where one is above and one below the specified isovalue.As you process each pair of data slices,the“brute force”method of generating the triangles would be to calculate and store the vertices for each triangle;however,each vertex is shared by a number of triangles so ideally you only want to calculate and store the coordinates for each vertex once and reuse it for each triangle that shares it.One of the module questions asks you to examine the relationship between the number of triangles and vertices.3The standard computer graphics data structure for storing a polyhedra is known as a polymesh.It includes an array of points and an array of polygons.Each polygon has an array that indexes the points that form that polygon.Thus,only one integer is required to store each polygon vertex(instead of3float/double values)plus the overhead of storing each vertex once.For the marching cubes algorithm,all the polygons are triangles so there is no need to store the number of vertices in each polygon.Other information that is often stored is a normal for each polygon and a“vertex normal”. The vertex normal is the average normal for all the polygons shared by that polygon. The normal for each triangle can be calculated using the cross product of two edges and then normalize the result.For each vertex,set the vertex normal to zero and then add all the normals for each polygon using that vertex and normalize the total.For lighting purpose,you will want to be consistent in specifying the vertex order for your triangles in the lookup table so that the vertices for a triangle are always specifed in clockwise or counter-clockwise order.This allows the normals to be calculated so that the normal is always“outward facing”for each triangle.Once all the triangles have been calculated along with vertex normals,we need to display the results.Currently,the most common graphics API is OpenGL.The simplest method for creating an interface for an OpenGL window is GLUT(GL Utility Toolkit).This was originally developed by SGI and there is now an open source version known as“freeglut”.Your instructor will let you know how to create an OpenGL window on your computer systems.The OpenGL is afinite state machine.The state(value of)various settings controls how objects are drawn.The state is changed by calling various OpenGL functions. Below is an example of setting up the viewing parameters for a400by400OpenGL window.glMatrixMode(GL_PROJECTION);glLoadIdentity();gluPerspective(45.0,1,0.1,100);glViewport(0,0,400,400);glMatrixMode(GL_MODELVIEW);glLoadIdentity();gluLookAt(0,0,5,//eye/camera location0,0,0,//center of interest0,1,0);//up vector that specifies the camera tiltglEnable(GL_LIGHTING);glEnable(GL_LIGHT0);4glShadeModel(GL_SMOOTH);glEnable(GL_DEPTH_TEST);glEnable(GL_NORMALIZE);float pos[4]={10,20,30,0};float intensity[4]={0.8,0.4,0.4,1};glLightfv(GL_LIGHT0,GL_POSITION,pos);glLightfv(GL_LIGHT0,GL_AMBIENT,intensity);Once,the viewing paratmers have been setup,drawing triangles using OpenGL is very easy.The vertices and normals are specified with function calls.Below is an example: glBegin(GL_TRIANGLES);glNormal3f(n1.x,n1.y,n1.z);//the vertex normal for point1glVertex3f(p1.x,p1.y,py.z);//the coordinates for point1glNormal3f(n2.x,n2.y,n2.z);//the vertex normal for point2glVertex3f(p2.x,p2.y,p2.z);//the coordinates for point2glNormal3f(n3.x,n3.y,n3.z);//the vertex normal for point2glVertex3f(p3.x,p3.y,p3.z);//the coordinates for point3glEnd();When creating your viewer,allow the isovalue to be specified on the command line so you can easily extract different isovalues without recompiling your program.You may also want thefilename to be specified on the command line.Your instructor may provide you with a“TrackBall”class to allow the viewing parameters to be changed using the mouse similar to the VTK interactors.The trackball class modifies the eye and center of interest and creates a viewing transformation matrix that can be applied in the OpenGL pipeline.If your program allows different viewing parameters to be specified interactively,you may want to consider using an OpenGL“display list”for all the triangles and vertex normals. Without a display list,the calls to glNormal3f and glVertex3f have to be repeated (placed in your display callback)each time the scene is redrawn.The overhead of all these function calls can be substantial.With a display list,all the functions are called once and OpenGL stores the information in memory.The display list trades memory requirements for speed.With a display list each time the scene is redrawn,only one OpenGL call to execute the display list needs to be performed.5Questions1.How many occurrences of each of the15topologically unique cases occur in the256entry lookup table?2.For an isosurface that produces a polyhedra with no holes in it,how manyvertices and edges are there if there are n triangles?If none of the vertices where shared,there would be3n vertices,but that is obviously not the case.On average by how many triangles is each vertex shared?Hint:Euler may help you answer this question.3.How many data slices did you keep in memory as you read the datafile?What isthe minimum number of slices that need to be in memory to implement thealgorithm?pare the performance of your implementation to VTK’s marching cubealgorithm.Describe how you do the comparision and list the results.5.If you were able to implement OpenGL display lists with the TrackBall forinteractive viewing parameter changes,compare the rendering speed with andwithout display lists.PrerequisitesThis module is more challenging than many of the others.It does not require a knowledge of VTK but does require significantly more programming expertise.The basic data structures covered in CS1and CS2are all that is required(arrays, classes/structs,and dynamic memory);however,because of the difficulty at least one student in each group probably should have completed an advanced data structures and algorithms course.Additionaly,a very basic knowledge of OpenGL and some window framework for it (such as GLUT)is necessary to visualize the extracted isosurface.Going over a simple GLUT/OpenGL example that draws polygons,and possibly smooth shaded polygons using vertex normals,should provide a student with enough knowledge.The basic concepts of smooth shading is covered in the course notes.Although Python or Java could be used for this assignment,it is more appropriate to use C/C++for the student to experience and appreciate the optimizations and6memory/speed improvements that a lower level language provides.If you have a large group of students working on the project,one or two students could develop the interface in Python and then access the C++code using Boost or SWIG to call theC++code from Python.InstructorThe instructor’s manual includes code for a complate visualization system using the marching cubes algorithm,OpenGL,and GLUT.The MarchingCubesTable class creates a lookup table specifying the triangles for each of the256cases.The code is heavily commented.Depending on the ability and experience of you students,you may want to suggest the data structures used in it or let them to create their own.The MarchingCubes class uses the MarchingCubesTable to extract triangles from a data set.It is set up to read data sets with the same format as the quarter resolution VTK head data set used in the Pipeline module.The quarter resolution VTK head data set consists of93files where eachfile contains a64by64slice with16bits per pixel.That data set can be viewed using:./main /usr/local/vtkdata/headsq/quarter 500where/usr/local/vtkdata/headsq contains thefiles quarter.1,quarter.2,..., quarter.93.The Viewer class initializes GLUT and OpenGL.It also uses the TrackBall class to handle viewing parameter changes using the mouse(left button rotates,middle button zooms,and right button translates).The Trackball class uses quaternions to achive the results and generates a4x4transformation that can be inserted in the OpenGL pipeline using the glMultMatrix function.The author module suggests you supply the students with the TrackBall and Viewer classes and have the students write the other classes.If your students are experienced with OpenGL or you have enough students that part of the group could work on the Viewer class while others write the marching cubes classes,you could let them write the Viewer class also.The TrackBall class is not needed but then the viewing parameters cannot be adjusted using the mouse.The code in the MarchingCubes class also includes methods for creating an OpenGL display list as discussed in the Background section.Also note the code to handledifferent endians.The head data set is stored in little endian format so if you are7reading the raw datafile on a big endian machine,the bytes of a short data type must be swapped.8。
生物多样性 1997,5(3):161~167CHIN ESE BIODIV ERSIT Y群落内物种多样性发生与维持的一个假说3张大勇 姜新华(兰州大学生物系干旱农业生态国家重点实验室, 兰州 730000)摘 要 本文根据作者对竞争排除法则的研究而提出了一个新的群落多样性假说。
按照作者的观点,占用相同生态位的物种可以稳定共存;这样,群落内物种多样性将受到4个基本因子所控制。
它们分别是:(Ⅰ)生态位的数量;(Ⅱ)区域物种库的大小;(Ⅲ)物种迁入速率,以及(Ⅳ)物种灭绝速率。
该假说强调区域生物地理过程与局域生态过程共同决定了群落内种多样性的大小及分布模式。
关键词 局域物种多样性,物种分化,区域物种库,生态位,竞争排除法则A hypothesis for the origin and maintenance of within2community species diversity/Zhang Dayong,JiangXinhu a//CHINESE BIODIVERSIT Y.—1997,5(3):161~167This paper formulates a novel hypothesis of community diversity on the basis of rejecting the competitive exclu2 sion principle.Since we accept the view that many species could occupy the same niche,local s pecies diversity is considered to be controlled by four fundamental factors,which are,res pectively,(Ⅰ)the number of niches in the community,(Ⅱ)the size of regional species2pool,(Ⅲ)species immigration rate,and(Ⅳ)species extinc2 tion rate.The hypothesis suggests that both regional biogeographic processes and local ecological processes will play an important role in determining the magnitude and pattern of community diversity.K ey w ords local species diversity,speciation,regional species2pool,niche,competitive exclusion principle Author’s address Department of Biology&State K ey Laboratory of Arid Agroecology,Lanzhou Univer2sity,Lanzhou 7300001 引言由于环境污染和生境破坏等人类活动的影响,大规模物种灭绝已成为当今社会所密切关注的一个焦点。
2023-2024学年上海市浦东新区东昌中学高一下学期期末考试英语试题1. The manager had fallen asleep where he __________, without undressing.A.had laid B.was lying C.was laying D.had lied 2. Mr Zhang is said __________ in American now, but I don’t know which state he is in.A.to have studied B.having studied C.to be studying D.to study 3. The visitors came to the railway station, only __________ that the train had left.A.to tell B.to be told C.told D.being told 4. He was speaking to his chemistry teacher __________ I saw him.A.for the first time B.the first time C.the first timewhen D.when the first time5. You’d better take something to read when you go to see a doctor __________ you have to wait.A.as if B.even if C.so that D.in case6. She is no longer the sweet girl __________ she used to be.A.what B.who C.that D.when7. Their school is built __________ there was a big rice field.A.what B.where C.in which D.which8. With the fast development of agriculture, the people ________ village I taught before had lived a happy life.A.who B.whose C.in whose D.in which9. The president stood up, __________ a friendly smile and began to make a speech.A.to give B.giving C.gave D.given10. There are cases __________ speech has started late in a child who eventually turns out to be of high IQ.A.why B.which C.where D.as Directions: After reading the passage below, fill in the blanks to make the passage coherent and grammatically correct. For the blanks with a given word, fill in each blank with the proper form of the given word; for the other blanks, use one word that best fits each blank.Number of Steps a Day It Takes to Cut Risk of Early DeathNew research suggests exactly how many steps you need to take each day to reduce the risk of heart disease and early death. If you 11 (fail) in your pursuit of walking 10,000 steps a day —researchers have some good news for you.They found walking up to 10,000 steps a day reduces these risks. The lowest risk of early death was 12 people who took 9,000 to 10,500 steps a day. When it came to avoiding heart attack, people 13 (manage) around 9,700 steps a day had the lowest risks.Experts have previously found people who spend a lot of time sitting while awake are more likely 14 (suffer) an early death and develop heart disease. It has been unclear 15 walking can balance out the effects of sitting down for most of the day — until now.A study, published in the British Journal of Sports Medicine, analysed data from 72,174 people 16 (age) around 61 from UK. Participants wore a device for seven days to measure their exercise levels. After a seven-year follow-up, 1,633 deaths and 6,190 cardiovascular (心脏血管的) events, such as heart attack, were recorded. The results showed any amount of daily steps above 2,200 a day were linked to lower death and heart disease — 17 the rest of the day was spent being inactive.Julie Ward, a senior nurse in the U.K., said, “We encourage people to stay active for 18 heart and circulatory health by doing 150 minutes of moderate exercise a week. “This can be any activity 19 fits into your lifestyle, such as taking regular walking breaks away from your computer screen, going to the gym, enjoying exercise classes, or even getting off the bus one stop 20 (early) to get more steps in.”Directions: Fill in each blank with a proper word chosen from the box. Each word can only be used once. Note that there is one word more than you need.A. topicB. developedC. arguablyD. increasinglyE. repeatedF. eventsG. shares H. individual I. influenced J. appeals K. literaryPoetry is a kind of writing in which the sound and meaning of groups of words express ideas or emotion in addition to the experiences or strong feelings the writer 21 . Unlike most other forms of writing, poetry is often written in lines, rather than paragraphs. Poetry also sounds different from other forms of writing, often using rhythm and rhyme to create an interesting sound when read aloud. Poetry catches the attention of a reader because it 22 to both emotions and senses.Sound is 23 the single most important aspect of any poem. The sound that any given word makes, or the sounds that come from specific groups of words used together, are what make poetry so unique as a form of writing. A typical story or report does not focus on the sounds that each 24 word makes when read. But poems generally contain few words, so it is important that each word plays a role in making an impact on the reader. Rhythm is the flow of sounds created by successive words in a poem. When you read a poem you can often hear this 25 pattern, or “beat” in the sounds. This is called meter.Some of the oldest and best-known poetry in the world came from Ancient Greece. As far back as 700 BCE, poets there recited their work at public 26 and religious ceremonies. The great epic poems The Iliad and The Odyssey by Homer came from Greece. The Greeks eventually 27 Roman poets, such as Virgil, who wrote the Aeneid 30 BCE. In medieval times, poems such as Beowulf, The Divine Comedy by Dante. and The Canterbury Tales by Chaucer were written. Religion and romance became the 28 of choice for many poets at that time.Poetry 29 even more during the Renaissance period of history, an era of many great cultural achievements. This was the period during which Shakespeare, the most well-known poet, was making his mark! Needless to say, a trend had started. Poetry has continued to grow and change as a form of 30 expression in modern times.Once upon a time there lived in Germany two brothers who loved a good story — one with magic and danger, royalty and villains (恶棍). At school they met a wise man who led them to a treasure — a library of old books with tales more fascinating than any they had ever heard. _________, the brothers began collecting their own stories, listening to the folktales people told them. Soon they_________ their own treasure — a book of fairy tales that would charm millions in faraway lands for generations to come.The brothers Grimm, Jacob and Wilhelm, named their story collection Children’s and Household Tales and published it in Germany in 1812. The collection which has been translated into more than 160 languages up to now is a publishing _________. The stories and their characters have appeared in theatre, opera, comic books, movies, paintings, rock music, advertising and even fashion.Such _________ would have shocked the modest Grimms. During their lifetimes the book sold few copies in Germany. The early editions were not even _________ children. They had no illustrations, and scholarly footnotes took up almost as much space as the tales themselves. Jacob and Wilhelm Grimm viewed themselves as _________ students of folklore. They began their work at a time when Germany had been occupied by the French under Napoleon. As young scholars, the brothers Grimm began to work on the fairy tale collection in order to save the endangered oral storytelling tradition of Germany.Long before the Grimms’ time, _________ developed in inns, barns, and peasant homes. During winter nights, as they sat spinning wood, women kept each other company and entertained each other with tales of adventure, romance and magic. _________, 40 such storytellers delivered tales to the Grimms, many of them coming to their house in Kassel. Although the brothers implied that they were just _________ the tales, Wilhelm polished and reshaped the stories up to the final edition of 1857. In an effort to make them more __________ to children and their parents, he stressed the moral of each tale and emphasized gender roles. According to the Grimms, the collection served as “a manual of __________.” To this day, parents read them to their children be cause they approve of the lessons in the stories: keep your promises, don’t talk to strangers, work hard, obey your parents.So what __________ their popularity? Some have suggested that it is because the characters are always striving for happiness. But the truth probably lies in their __________. The Grimms’ tales were born out of a storytelling tradition without __________ of age or culture. The brothers’ skill was to translate these into a universal style of writing that seems to mirror whatever moods or interests we bring to our __________ of them. And so it was that the Grimms’ fairy tales lived happily ever after.31.A.Inspired B.Disappointed C.Discouraged D.Relieved32.A.estimated B.produced C.sacrificed D.stocked33.A.medium B.partnership C.finding D.wonder34.A.quality B.wealth C.fame D.benefit35.A.marked as B.robbed of C.aimed at D.prevented from 36.A.intelligent B.hardworking C.peculiar D.patriotic37.A.collection B.storytelling C.entertainment D.meeting38.A.Besides B.Altogether C.However D.Similarly39.A.creating B.developing C.reviewing D.recording40.A.accustomed B.acceptable C.cruel D.compared41.A.manners B.parentship C.publishing D.adaptation42.A.results from B.depends on C.accounts for D.responds to43.A.appeal B.flexibility C.availability D.origin44.A.boundaries B.influences C.indications D.distributions 45.A.writing B.sharing C.reading D.beginning Charles Robert Darwin was born on 12 February 1809 in Shropshire, England. Darwin’s childhood passion was science, and his interest in chemistry, however, was clear; he was even nicknamed‘Gas’ by his classmates.In 1825, his father sent him to study medicine at Edinburgh University, where he learned how to classify plants. Darwin became passionate about natural history and this became his focus while he studied at Cambridge. Darwin went on a voyage together with Robert Fitzroy, the captain of HMS Beagle, to South America to facilitate British trade in Patagonia. The journey was life-changing. Darwin spent much of the trip on land collecting samples of plants, animals and rocks, which helpedhim to develop an understanding of the processes that shape the Earth’s surface. Darwin’s analysis of the plants and animals that he gathered led him to express doubts on former explanations about how species formed and evolved over time.Darwin’s work convinced him that natural selection was key to understanding the development of the natural world. The theory of natural selection says that individuals of a species are more likely to survive when they inherit(经遗传获得) characteristics best suited for that specific environment. These features then become more widespread and can lead eventually to the development of a new species. With natural selection, Darwin argued how a wide variety of life forms developed over time from a single common ancestor.Darwin married his cousin, Emma Wedgwood, in 1839. When Darwin’s eldest daughter, Annie, died from a sudden illness in 1851, he lost his belief in God. His tenth and final child, Charles Waring Darwin, was born in 1856. Significantly for Darwin, this baby was disabled, altering how Darwin thought about the human species. Darwin had previously thought that species remained adapted until the environment changed; he now believed that every new variation was imperfect and that a struggle to survive was what drove species to adapt.Though rejected at the beginning, Darwin’s theory of evolution by natural selection is nowadays well accepted by the scientific community as the best evidence-based explanation for the diversity and complexity of life on Earth. The Natural History Museum’s library alone has 478 editions of his On the Origin of Species in 38 languages.46. What made Darwin reconsider the origin and development of species?A.Examining plants and animals collected.B.His desire for a voyage to different continents.C.Classifying samples in a journey to South America.D.His passion for natural history at Edinburgh University.47. We can learn from paragraphs 1 to 3 that Darwin ________.A.used natural selection to develop new speciesB.enjoyed being called nicknames related to scienceC.learned some knowledge about plants when studying medicineD.argued with others over the diversity of life forms for a long period48. Which of the following changed Darwin’s view on the human species?A.That he had ten children in all. B.His youngest son’s being disabled.C.That he lost his eldest daughter. D.His marriage with Emma Wedgwood.49. This passage is mainly about ________.A.Darwin’s passion for medical science B.Darwin’s theory and experimentsC.Charles Darwin’s changing interest D.Charles Darwin’s life and workA.adventure lovers B.foreign travellers C.creativeteenagers D.professional sportsmen51. Catherine would like to develop her confidence in a safe and positive atmosphere, so she can call ________ for more information.A.(800) 2223-3595 B.(888) 4209-7855 C.(886) 6817-2173 D.(212) 6743-4300 52. Lucy’s dream is to be a film actress, so she’d probably go to ________ to attend a camp.A.Colorado B.Oregon C.Texas D.New York“Choose your friends wisely” may not only be good parental advice but also a way to d o better in college, a research study finds.The group of three researchers put that advice to the test at Berea College, a small liberal arts school in Kentucky, by looking at how much friends actually influence study habits and grades. They found that students who befriended studious (勤奋的) peers spent more hours studying themselves and posted higher grades during their freshman year.“It’s no fun to study by yourself,” said Nirav Mehta, one of the study’s authors, explaining the intuition behind the study. “If you want to goof off, and your friends are at the library, then you’re going to go to the library, too. And while you’re there, you’re probably going to get some studying done too.”Of course, it’s possible that studious people gravitate toward other studious people. They might have hit the books and got as many A’s no matter who their friends were. So the researchers checked to see if randomly assigned roommates also have a positive influence on study habits and grades. They found almost the same results: students who were assigned a studious roommate freshman year also studied more each day and had higher grade-point averages.Unfortunately, the opposite is also true, the researchers found. If you have friends and roommates who don’t study a lot, you’re likely to get dragged down by their poor habits, studying less and earning lower grades.Analyzing friends and study habits is usually difficult for researchers. But students at Berea College were asked to list their four best friends at the end of each semester and they kept careful daily logs of their time, including time spent studying. At the beginning of freshman year, the students were surveyed on their high school study habits. The researchers also had access to roommate assignments, high school grades and college grades.From this information, the economists calculated the average amount of time each student’s college friends had reported studying in high school. They found that for every additional 10 hours a week that a student’s friends had spent studying, on average, the student’s own study time in college would likely increase by almost 25 minutes a day, and the student’s own GPA would likely rise by almost a tenth of a point during freshman year.53. The phrase “goof off” (paragraph 3) most probably means ________.A.achieve higher grades B.choose your friendsC.go to the library D.be lazy about studying54. Why did the researchers also study the randomly assigned roommates?A.To further test the theory. B.To figure out more study habits.C.To put forward a new theory. D.To get more students to work hard.55. To carry out their research, what information did the researchers collect from students at Berea College?A.How many studious friends they have.B.How they comment on their friends’ grades.C.How much time they spent studying each day.D.How they thought of their own college grades.56. What suggestion would the researchers most likely give college students?A.If you want to do well in study, you’d better pick a hardworking friend.B.If you want to get on well with your roommates, you’d better work hard.C.If you want to raise your GPA, you’d better keep track of your study time.D.If you want to have a happy freshman year, you’d better care less about peer effects.A Recipe for Avoiding DisasterEvery log cabin homeowner has had those “what if” thoughts. What if conditions become so dry that a wildfire starts near my cabin? What if the rain is so relentless that the lake outside my doorstep begins to overflow its banks? What if tornado-fueled winds threaten to destroy everything I’ve created?57 . That’s the bad news. The good news? There are ways you can use your landscaping to minimize the damage that extreme weather can cause and still have an attractive yard all at the same time. Here are a few tips for using your outdoor resources in a way that can lessen the impact of a natural disaster on your log home:WILDFIRES: The key to keeping fires from damaging your home is regular maintenance with a focus on how fires spread. Begin by removing dead plants like trees and shrubs promptly and trimming (修剪) any branches that overhang the property or have close contact with the log walls.58 . It’s also a good idea to install irrigation in a 50-foot radius around your home to help create a fireproofing perimeter.FLOODS: A massive amount of flooding that leaves your house and yard in standing water is a tall order in terms of prevention strategies, but more moderate flooding can be stopped with a few simple landscaping tricks. 59 . You can create extra drainage (排水系统) through the use of strategically placed stones and bushes that direct water away from the foundation.TORNADOES: To minimize the damage caused by the sudden gusts tornado weather brings, be sure to keep trees and shrubs trimmed. 60 .61. 栖息地 ________62. 非凡的;显著的 ________63. 精致的;精细的 ________64. 恐龙 ________65. 简图;图解 ________66. 计算;核算 ________汉译英67. 整容手术 ________68. 全新的;崭新的 ________69. 正在运行;正在操作 ________70. 在校园内 ________71. 试穿 ________72. 他总是对自己的工作充满了热情。
A review on time series data miningTak-chung FuDepartment of Computing,Hong Kong Polytechnic University,Hunghom,Kowloon,Hong Konga r t i c l e i n f oArticle history:Received19February2008Received in revised form14March2010Accepted4September2010Keywords:Time series data miningRepresentationSimilarity measureSegmentationVisualizationa b s t r a c tTime series is an important class of temporal data objects and it can be easily obtained from scientificandfinancial applications.A time series is a collection of observations made chronologically.The natureof time series data includes:large in data size,high dimensionality and necessary to updatecontinuously.Moreover time series data,which is characterized by its numerical and continuousnature,is always considered as a whole instead of individual numericalfield.The increasing use of timeseries data has initiated a great deal of research and development attempts in thefield of data mining.The abundant research on time series data mining in the last decade could hamper the entry ofinterested researchers,due to its complexity.In this paper,a comprehensive revision on the existingtime series data mining researchis given.They are generally categorized into representation andindexing,similarity measure,segmentation,visualization and mining.Moreover state-of-the-artresearch issues are also highlighted.The primary objective of this paper is to serve as a glossary forinterested researchers to have an overall picture on the current time series data mining developmentand identify their potential research direction to further investigation.&2010Elsevier Ltd.All rights reserved.1.IntroductionRecently,the increasing use of temporal data,in particulartime series data,has initiated various research and developmentattempts in thefield of data mining.Time series is an importantclass of temporal data objects,and it can be easily obtained fromscientific andfinancial applications(e.g.electrocardiogram(ECG),daily temperature,weekly sales totals,and prices of mutual fundsand stocks).A time series is a collection of observations madechronologically.The nature of time series data includes:large indata size,high dimensionality and update continuously.Moreovertime series data,which is characterized by its numerical andcontinuous nature,is always considered as a whole instead ofindividual numericalfield.Therefore,unlike traditional databaseswhere similarity search is exact match based,similarity search intime series data is typically carried out in an approximatemanner.There are various kinds of time series data related research,forexample,finding similar time series(Agrawal et al.,1993a;Berndtand Clifford,1996;Chan and Fu,1999),subsequence searching intime series(Faloutsos et al.,1994),dimensionality reduction(Keogh,1997b;Keogh et al.,2000)and segmentation(Abonyiet al.,2005).Those researches have been studied in considerabledetail by both database and pattern recognition communities fordifferent domains of time series data(Keogh and Kasetty,2002).In the context of time series data mining,the fundamentalproblem is how to represent the time series data.One of thecommon approaches is transforming the time series to anotherdomain for dimensionality reduction followed by an indexingmechanism.Moreover similarity measure between time series ortime series subsequences and segmentation are two core tasksfor various time series mining tasks.Based on the time seriesrepresentation,different mining tasks can be found in theliterature and they can be roughly classified into fourfields:pattern discovery and clustering,classification,rule discovery andsummarization.Some of the research concentrates on one of thesefields,while the others may focus on more than one of the aboveprocesses.In this paper,a comprehensive review on the existingtime series data mining research is given.Three state-of-the-arttime series data mining issues,streaming,multi-attribute timeseries data and privacy are also briefly introduced.The remaining part of this paper is organized as follows:Section2contains a discussion of time series representation andindexing.The concept of similarity measure,which includes bothwhole time series and subsequence matching,based on the rawtime series data or the transformed domain will be reviewed inSection3.The research work on time series segmentation andvisualization will be discussed in Sections4and5,respectively.InSection6,vary time series data mining tasks and recent timeseries data mining directions will be reviewed,whereas theconclusion will be made in Section7.2.Time series representation and indexingOne of the major reasons for time series representation is toreduce the dimension(i.e.the number of data point)of theContents lists available at ScienceDirectjournal homepage:/locate/engappaiEngineering Applications of Artificial Intelligence0952-1976/$-see front matter&2010Elsevier Ltd.All rights reserved.doi:10.1016/j.engappai.2010.09.007E-mail addresses:cstcfu@.hk,cstcfu@Engineering Applications of Artificial Intelligence24(2011)164–181original data.The simplest method perhaps is sampling(Astrom, 1969).In this method,a rate of m/n is used,where m is the length of a time series P and n is the dimension after dimensionality reduction(Fig.1).However,the sampling method has the drawback of distorting the shape of sampled/compressed time series,if the sampling rate is too low.An enhanced method is to use the average(mean)value of each segment to represent the corresponding set of data points. Again,with time series P¼ðp1,...,p mÞand n is the dimension after dimensionality reduction,the‘‘compressed’’time series ^P¼ð^p1,...,^p nÞcan be obtained by^p k ¼1k kX e ki¼s kp ið1Þwhere s k and e k denote the starting and ending data points of the k th segment in the time series P,respectively(Fig.2).That is, using the segmented means to represent the time series(Yi and Faloutsos,2000).This method is also called piecewise aggregate approximation(PAA)by Keogh et al.(2000).1Keogh et al.(2001a) propose an extended version called an adaptive piecewise constant approximation(APCA),in which the length of each segment is notfixed,but adaptive to the shape of the series.A signature technique is proposed by Faloutsos et al.(1997)with similar ideas.Besides using the mean to represent each segment, other methods are proposed.For example,Lee et al.(2003) propose to use the segmented sum of variation(SSV)to represent each segment of the time series.Furthermore,a bit level approximation is proposed by Ratanamahatana et al.(2005)and Bagnall et al.(2006),which uses a bit to represent each data point.To reduce the dimension of time series data,another approach is to approximate a time series with straight lines.Two major categories are involved.Thefirst one is linear interpolation.A common method is using piecewise linear representation(PLR)2 (Keogh,1997b;Keogh and Smyth,1997;Smyth and Keogh,1997). The approximating line for the subsequence P(p i,y,p j)is simply the line connecting the data points p i and p j.It tends to closely align the endpoint of consecutive segments,giving the piecewise approximation with connected lines.PLR is a bottom-up algo-rithm.It begins with creating afine approximation of the time series,so that m/2segments are used to approximate the m length time series and iteratively merges the lowest cost pair of segments,until it meets the required number of segment.When the pair of adjacent segments S i and S i+1are merged,the cost of merging the new segment with its right neighbor and the cost of merging the S i+1segment with its new larger neighbor is calculated.Ge(1998)extends PLR to hierarchical structure. Furthermore,Keogh and Pazzani enhance PLR by considering weights of the segments(Keogh and Pazzani,1998)and relevance feedback from the user(Keogh and Pazzani,1999).The second approach is linear regression,which represents the subsequences with the bestfitting lines(Shatkay and Zdonik,1996).Furthermore,reducing the dimension by preserving the salient points is a promising method.These points are called as perceptually important points(PIP).The PIP identification process isfirst introduced by Chung et al.(2001)and used for pattern matching of technical(analysis)patterns infinancial applications. With the time series P,there are n data points:P1,P2y,P n.All the data points in P can be reordered by its importance by going through the PIP identification process.Thefirst data point P1and the last data point P n in the time series are thefirst and two PIPs, respectively.The next PIP that is found will be the point in P with maximum distance to thefirst two PIPs.The fourth PIP that is found will be the point in P with maximum vertical distance to the line joining its two adjacent PIPs,either in between thefirst and second PIPs or in between the second and the last PIPs.The PIP location process continues until all the points in P are attached to a reordered list L or the required number of PIPs is reached(i.e. reduced to the required dimension).Seven PIPs are identified in from the sample time series in Fig.3.Detailed treatment can be found in Fu et al.(2008c).The idea is similar to a technique proposed about30years ago for reducing the number of points required to represent a line by Douglas and Peucker(1973)(see also Hershberger and Snoeyink, 1992).Perng et al.(2000)use a landmark model to identify the important points in the time series for similarity measure.Man and Wong(2001)propose a lattice structure to represent the identified peaks and troughs(called control points)in the time series.Pratt and Fink(2002)and Fink et al.(2003)define extrema as minima and maxima in a time series and compress thetime Fig.1.Time series dimensionality reduction by sampling.The time series on the left is sampled regularly(denoted by dotted lines)and displayed on the right with a largedistortion.Fig.2.Time series dimensionality reduction by PAA.The horizontal dotted lines show the mean of each segment.1This method is called piecewise constant approximation originally(Keoghand Pazzani,2000a).2It is also called piecewise linear approximation(PLA).Tak-chung Fu/Engineering Applications of Artificial Intelligence24(2011)164–181165series by selecting only certain important extrema and dropping the other points.The idea is to discard minor fluctuations and keep major minima and maxima.The compression is controlled by the compression ratio with parameter R ,which is always greater than one;an increase of R leads to the selection of fewer points.That is,given indices i and j ,where i r x r j ,a point p x of a series P is an important minimum if p x is the minimum among p i ,y ,p j ,and p i /p x Z R and p j /p x Z R .Similarly,p x is an important maximum if p x is the maximum among p i ,y ,p j and p x /p i Z R and p x /p j Z R .This algorithm takes linear time and constant memory.It outputs the values and indices of all important points,as well as the first and last point of the series.This algorithm can also process new points as they arrive,without storing the original series.It identifies important points based on local information of each segment (subsequence)of time series.Recently,a critical point model (CPM)(Bao,2008)and a high-level representation based on a sequence of critical points (Bao and Yang,2008)are proposed for financial data analysis.On the other hand,special points are introduced to restrict the error on PLR (Jia et al.,2008).Key points are suggested to represent time series in (Leng et al.,2009)for an anomaly detection.Another common family of time series representation approaches converts the numeric time series to symbolic form.That is,first discretizing the time series into segments,then converting each segment into a symbol (Yang and Zhao,1998;Yang et al.,1999;Motoyoshi et al.,2002;Aref et al.,2004).Lin et al.(2003;2007)propose a method called symbolic aggregate approximation (SAX)to convert the result from PAA to symbol string.The distribution space (y -axis)is divided into equiprobable regions.Each region is represented by a symbol and each segment can then be mapped into a symbol corresponding to the region inwhich it resides.The transformed time series ^Pusing PAA is finally converted to a symbol string SS (s 1,y ,s W ).In between,two parameters must be specified for the conversion.They are the length of subsequence w and alphabet size A (number of symbols used).Besides using the means of the segments to build the alphabets,another method uses the volatility change to build the alphabets.Jonsson and Badal (1997)use the ‘‘Shape Description Alphabet (SDA)’’.Example symbols like highly increasing transi-tion,stable transition,and slightly decreasing transition are adopted.Qu et al.(1998)use gradient alphabets like upward,flat and download as symbols.Huang and Yu (1999)suggest transforming the time series to symbol string,using change ratio between contiguous data points.Megalooikonomou et al.(2004)propose to represent each segment by a codeword from a codebook of key-sequences.This work has extended to multi-resolution consideration (Megalooi-konomou et al.,2005).Morchen and Ultsch (2005)propose an unsupervised discretization process based on quality score and persisting states.Instead of ignoring the temporal order of values like many other methods,the Persist algorithm incorporates temporal information.Furthermore,subsequence clustering is a common method to generate the symbols (Das et al.,1998;Li et al.,2000a;Hugueney and Meunier,2001;Hebrail and Hugueney,2001).A multiple abstraction level mining (MALM)approach is proposed by Li et al.(1998),which is based on the symbolic form of the time series.The symbols in this paper are determined by clustering the features of each segment,such as regression coefficients,mean square error and higher order statistics based on the histogram of the regression residuals.Most of the methods described so far are representing time series in time domain directly.Representing time series in the transformation domain is another large family of approaches.One of the popular transformation techniques in time series data mining is the discrete Fourier transforms (DFT),since first being proposed for use in this context by Agrawal et al.(1993a).Rafiei and Mendelzon (2000)develop similarity-based queries,using DFT.Janacek et al.(2005)propose to use likelihood ratio statistics to test the hypothesis of difference between series instead of an Euclidean distance in the transformed domain.Recent research uses wavelet transform to represent time series (Struzik and Siebes,1998).In between,the discrete wavelet transform (DWT)has been found to be effective in replacing DFT (Chan and Fu,1999)and the Haar transform is always selected (Struzik and Siebes,1999;Wang and Wang,2000).The Haar transform is a series of averaging and differencing operations on a time series (Chan and Fu,1999).The average and difference between every two adjacent data points are computed.For example,given a time series P ¼(1,3,7,5),dimension of 4data points is the full resolution (i.e.original time series);in dimension of two coefficients,the averages are (26)with the coefficients (À11)and in dimension of 1coefficient,the average is 4with coefficient (À2).A multi-level representation of the wavelet transform is proposed by Shahabi et al.(2000).Popivanov and Miller (2002)show that a large class of wavelet transformations can be used for time series representation.Dasha et al.(2007)compare different wavelet feature vectors.On the other hand,comparison between DFT and DWT can be found in Wu et al.(2000b)and Morchen (2003)and a combination use of Fourier and wavelet transforms are presented in Kawagoe and Ueda (2002).An ensemble-index,is proposed by Keogh et al.(2001b)and Vlachos et al.(2006),which ensembles two or more representations for indexing.Principal component analysis (PCA)is a popular multivariate technique used for developing multivariate statistical process monitoring methods (Yang and Shahabi,2005b;Yoon et al.,2005)and it is applied to analyze financial time series by Lesch et al.(1999).In most of the related works,PCA is used to eliminate the less significant components or sensors and reduce the data representation only to the most significant ones and to plot the data in two dimensions.The PCA model defines linear hyperplane,it can be considered as the multivariate extension of the PLR.PCA maps the multivariate data into a lower dimensional space,which is useful in the analysis and visualization of correlated high-dimensional data.Singular value decomposition (SVD)(Korn et al.,1997)is another transformation-based approach.Other time series representation methods include modeling time series using hidden markov models (HMMs)(Azzouzi and Nabney,1998)and a compression technique for multiple stream is proposed by Deligiannakis et al.(2004).It is based onbaseFig.3.Time series compression by data point importance.The time series on the left is represented by seven PIPs on the right.Tak-chung Fu /Engineering Applications of Artificial Intelligence 24(2011)164–181166signal,which encodes piecewise linear correlations among the collected data values.In addition,a recent biased dimension reduction technique is proposed by Zhao and Zhang(2006)and Zhao et al.(2006).Moreover many of the representation schemes described above are incorporated with different indexing methods.A common approach is adopted to an existing multidimensional indexing structure(e.g.R-tree proposed by Guttman(1984))for the representation.Agrawal et al.(1993a)propose an F-index, which adopts the R*-tree(Beckmann et al.,1990)to index thefirst few DFT coefficients.An ST-index is further proposed by (Faloutsos et al.(1994),which extends the previous work for subsequence handling.Agrawal et al.(1995a)adopt both the R*-and R+-tree(Sellis et al.,1987)as the indexing structures.A multi-level distance based index structure is proposed(Yang and Shahabi,2005a),which for indexing time series represented by PCA.Vlachos et al.(2005a)propose a Multi-Metric(MM)tree, which is a hybrid indexing structure on Euclidean and periodic spaces.Minimum bounding rectangle(MBR)is also a common technique for time series indexing(Chu and Wong,1999;Vlachos et al.,2003).An MBR is adopted in(Rafiei,1999)which an MT-index is developed based on the Fourier transform and in(Kahveci and Singh,2004)which a multi-resolution index is proposed based on the wavelet transform.Chen et al.(2007a)propose an indexing mechanism for PLR representation.On the other hand, Kim et al.(1996)propose an index structure called TIP-index (TIme series Pattern index)for manipulating time series pattern databases.The TIP-index is developed by improving the extended multidimensional dynamic indexfile(EMDF)(Kim et al.,1994). An iSAX(Shieh and Keogh,2009)is proposed to index massive time series,which is developed based on an SAX.A multi-resolution indexing structure is proposed by Li et al.(2004),which can be adapted to different representations.To sum up,for a given index structure,the efficiency of indexing depends only on the precision of the approximation in the reduced dimensionality space.However in choosing a dimensionality reduction technique,we cannot simply choose an arbitrary compression algorithm.It requires a technique that produces an indexable representation.For example,many time series can be efficiently compressed by delta encoding,but this representation does not lend itself to indexing.In contrast,SVD, DFT,DWT and PAA all lend themselves naturally to indexing,with each eigenwave,Fourier coefficient,wavelet coefficient or aggregate segment map onto one dimension of an index tree. Post-processing is then performed by computing the actual distance between sequences in the time domain and discarding any false matches.3.Similarity measureSimilarity measure is of fundamental importance for a variety of time series analysis and data mining tasks.Most of the representation approaches discussed in Section2also propose the similarity measure method on the transformed representation scheme.In traditional databases,similarity measure is exact match based.However in time series data,which is characterized by its numerical and continuous nature,similarity measure is typically carried out in an approximate manner.Consider the stock time series,one may expect having queries like: Query1:find all stocks which behave‘‘similar’’to stock A.Query2:find all‘‘head and shoulders’’patterns last for a month in the closing prices of all high-tech stocks.The query results are expected to provide useful information for different stock analysis activities.Queries like Query2in fact is tightly coupled with the patterns frequently used in technical analysis, e.g.double top/bottom,ascending triangle,flag and rounded top/bottom.In time series domain,devising an appropriate similarity function is by no means trivial.There are essentially two ways the data that might be organized and processed(Agrawal et al., 1993a).In whole sequence matching,the whole length of all time series is considered during the similarity search.It requires comparing the query sequence to each candidate series by evaluating the distance function and keeping track of the sequence with the smallest distance.In subsequence matching, where a query sequence Q and a longer sequence P are given,the task is tofind the subsequences in P,which matches Q. Subsequence matching requires that the query sequence Q be placed at every possible offset within the longer sequence P.With respect to Query1and Query2above,they can be considered as a whole sequence matching and a subsequence matching,respec-tively.Gavrilov et al.(2000)study the usefulness of different similarity measures for clustering similar stock time series.3.1.Whole sequence matchingTo measure the similarity/dissimilarity between two time series,the most popular approach is to evaluate the Euclidean distance on the transformed representation like the DFT coeffi-cients(Agrawal et al.,1993a)and the DWT coefficients(Chan and Fu,1999).Although most of these approaches guarantee that a lower bound of the Euclidean distance to the original data, Euclidean distance is not always being the suitable distance function in specified domains(Keogh,1997a;Perng et al.,2000; Megalooikonomou et al.,2005).For example,stock time series has its own characteristics over other time series data(e.g.data from scientific areas like ECG),in which the salient points are important.Besides Euclidean-based distance measures,other distance measures can easily be found in the literature.A constraint-based similarity query is proposed by Goldin and Kanellakis(1995), which extended the work of(Agrawal et al.,1993a).Das et al. (1997)apply computational geometry methods for similarity measure.Bozkaya et al.(1997)use a modified edit distance function for time series matching and retrieval.Chu et al.(1998) propose to measure the distance based on the slopes of the segments for handling amplitude and time scaling problems.A projection algorithm is proposed by Lam and Wong(1998).A pattern recognition method is proposed by Morrill(1998),which is based on the building blocks of the primitives of the time series. Ruspini and Zwir(1999)devote an automated identification of significant qualitative features of complex objects.They propose the process of discovery and representation of interesting relations between those features,the generation of structured indexes and textual annotations describing features and their relations.The discovery of knowledge by an analysis of collections of qualitative descriptions is then achieved.They focus on methods for the succinct description of interesting features lying in an effective frontier.Generalized clustering is used for extracting features,which interest domain experts.The general-ized Markov models are adopted for waveform matching in Ge and Smyth(2000).A content-based query-by-example retrieval model called FALCON is proposed by Wu et al.(2000a),which incorporates a feedback mechanism.Indeed,one of the most popular andfield-tested similarity measures is called the‘‘time warping’’distance measure.Based on the dynamic time warping(DTW)technique,the proposed method in(Berndt and Clifford,1994)predefines some patterns to serve as templates for the purpose of pattern detection.To align two time series,P and Q,using DTW,an n-by-m matrix M isfirstTak-chung Fu/Engineering Applications of Artificial Intelligence24(2011)164–181167constructed.The(i th,j th)element of the matrix,m ij,contains the distance d(q i,p j)between the two points q i and p j and an Euclidean distance is typically used,i.e.d(q i,p j)¼(q iÀp j)2.It corresponds to the alignment between the points q i and p j.A warping path,W,is a contiguous set of matrix elements that defines a mapping between Q and P.Its k th element is defined as w k¼(i k,j k)andW¼w1,w2,...,w k,...,w Kð2Þwhere maxðm,nÞr K o mþnÀ1.The warping path is typically subjected to the following constraints.They are boundary conditions,continuity and mono-tonicity.Boundary conditions are w1¼(1,1)and w K¼(m,n).This requires the warping path to start andfinish diagonally.Next constraint is continuity.Given w k¼(a,b),then w kÀ1¼(a0,b0), where aÀa u r1and bÀb u r1.This restricts the allowable steps in the warping path being the adjacent cells,including the diagonally adjacent cell.Also,the constraints aÀa uZ0and bÀb uZ0force the points in W to be monotonically spaced in time.There is an exponential number of warping paths satisfying the above conditions.However,only the path that minimizes the warping cost is of interest.This path can be efficiently found by using dynamic programming(Berndt and Clifford,1996)to evaluate the following recurrence equation that defines the cumulative distance gði,jÞas the distance dði,jÞfound in the current cell and the minimum of the cumulative distances of the adjacent elements,i.e.gði,jÞ¼dðq i,p jÞþmin f gðiÀ1,jÀ1Þ,gðiÀ1,jÞ,gði,jÀ1Þgð3ÞA warping path,W,such that‘‘distance’’between them is minimized,can be calculated by a simple methodDTWðQ,PÞ¼minWX Kk¼1dðw kÞ"#ð4Þwhere dðw kÞcan be defined asdðw kÞ¼dðq ik ,p ikÞ¼ðq ikÀp ikÞ2ð5ÞDetailed treatment can be found in Kruskall and Liberman (1983).As DTW is computationally expensive,different methods are proposed to speedup the DTW matching process.Different constraint(banding)methods,which control the subset of matrix that the warping path is allowed to visit,are reviewed in Ratanamahatana and Keogh(2004).Yi et al.(1998)introduce a technique for an approximate indexing of DTW that utilizes a FastMap technique,whichfilters the non-qualifying series.Kim et al.(2001)propose an indexing approach under DTW similarity measure.Keogh and Pazzani(2000b)introduce a modification of DTW,which integrates with PAA and operates on a higher level abstraction of the time series.An exact indexing approach,which is based on representing the time series by PAA for DTW similarity measure is further proposed by Keogh(2002).An iterative deepening dynamic time warping(IDDTW)is suggested by Chu et al.(2002),which is based on a probabilistic model of the approximate errors for all levels of approximation prior to the query process.Chan et al.(2003)propose afiltering process based on the Haar wavelet transformation from low resolution approx-imation of the real-time warping distance.Shou et al.(2005)use an APCA approximation to compute the lower bounds for DTW distance.They improve the global bound proposed by Kim et al. (2001),which can be used to index the segments and propose a multi-step query processing technique.A FastDTW is proposed by Salvador and Chan(2004).This method uses a multi-level approach that recursively projects a solution from a coarse resolution and refines the projected solution.Similarly,a fast DTW search method,an FTW is proposed by Sakurai et al.(2005) for efficiently pruning a significant number of search candidates. Ratanamahatana and Keogh(2005)clarified some points about DTW where are related to lower bound and speed.Euachongprasit and Ratanamahatana(2008)also focus on this problem.A sequentially indexed structure(SIS)is proposed by Ruengron-ghirunya et al.(2009)to balance the tradeoff between indexing efficiency and I/O cost during DTW similarity measure.A lower bounding function for group of time series,LBG,is adopted.On the other hand,Keogh and Pazzani(2001)point out the potential problems of DTW that it can lead to unintuitive alignments,where a single point on one time series maps onto a large subsection of another time series.Also,DTW may fail to find obvious and natural alignments in two time series,because of a single feature(i.e.peak,valley,inflection point,plateau,etc.). One of the causes is due to the great difference between the lengths of the comparing series.Therefore,besides improving the performance of DTW,methods are also proposed to improve an accuracy of DTW.Keogh and Pazzani(2001)propose a modifica-tion of DTW that considers the higher level feature of shape for better alignment.Ratanamahatana and Keogh(2004)propose to learn arbitrary constraints on the warping path.Regression time warping(RTW)is proposed by Lei and Govindaraju(2004)to address the challenges of shifting,scaling,robustness and tecki et al.(2005)propose a method called the minimal variance matching(MVM)for elastic matching.It determines a subsequence of the time series that best matches a query series byfinding the cheapest path in a directed acyclic graph.A segment-wise time warping distance(STW)is proposed by Zhou and Wong(2005)for time scaling search.Fu et al.(2008a) propose a scaled and warped matching(SWM)approach for handling both DTW and uniform scaling simultaneously.Different customized DTW techniques are applied to thefield of music research for query by humming(Zhu and Shasha,2003;Arentz et al.,2005).Focusing on similar problems as DTW,the Longest Common Subsequence(LCSS)model(Vlachos et al.,2002)is proposed.The LCSS is a variation of the edit distance and the basic idea is to match two sequences by allowing them to stretch,without rearranging the sequence of the elements,but allowing some elements to be unmatched.One of the important advantages of an LCSS over DTW is the consideration on the outliers.Chen et al.(2005a)further introduce a distance function based on an edit distance on real sequence(EDR),which is robust against the data imperfection.Morse and Patel(2007)propose a Fast Time Series Evaluation(FTSE)method which can be used to evaluate the threshold value of these kinds of techniques in a faster way.Threshold-based distance functions are proposed by ABfalg et al. (2006).The proposed function considers intervals,during which the time series exceeds a certain threshold for comparing time series rather than using the exact time series values.A T-Time application is developed(ABfalg et al.,2008)to demonstrate the usage of it.Fu et al.(2007)further suggest to introduce rules to govern the pattern matching process,if a priori knowledge exists in the given domain.A parameter-light distance measure method based on Kolmo-gorov complexity theory is suggested in Keogh et al.(2007b). Compression-based dissimilarity measure(CDM)3is adopted in this paper.Chen et al.(2005b)present a histogram-based representation for similarity measure.Similarly,a histogram-based similarity measure,bag-of-patterns(BOP)is proposed by Lin and Li(2009).The frequency of occurrences of each pattern in 3CDM is proposed by Keogh et al.(2004),which is used to compare the co-compressibility between data sets.Tak-chung Fu/Engineering Applications of Artificial Intelligence24(2011)164–181 168。
Parrots are among the most fascinating and intelligent birds in the world.They are known for their vibrant colors,social behavior,and,most notably,their ability to mimic human speech.Here is a detailed introduction to these remarkable creatures:Origin and Habitat:Parrots are native to tropical and subtropical regions,with the majority of species found in South America,Central America,Africa,and Southeast Asia.They inhabit a variety of environments,including rainforests,savannas,and mangroves.Physical Characteristics:Parrots are characterized by their strong curved beaks,which are adapted for cracking seeds and nuts.They have strong legs with zygodactyl feet,meaning two toes point forward and two backward,which is an adaptation for perching.Their plumage is typically colorful and can vary greatly between species,from the bright green of the budgerigar to the striking blue and yellow of the macaw.Diet:Parrots are omnivorous,with their diet consisting mainly of fruits,nuts,seeds,and sometimes insects.Some species have even been known to eat small animals and snails.Social Behavior:Parrots are highly social birds.They often live in flocks and communicate with a variety of calls and sounds.They are known for their playful and curious nature,which makes them popular pets.Reproduction:Parrots usually mate for life.They build nests in tree cavities or cliffs,and the female lays a clutch of eggs which both parents take turns incubating.After hatching,the chicks are fed by the parents with a special nutrientrich food called crop milk.Cognitive Abilities:Parrots are renowned for their intelligence.They have the ability to learn and mimic sounds,which includes human speech.Some species,like the African Grey parrot,are known to have a vocabulary of several hundred words and can understand the context in which words are used.Conservation Status:Unfortunately,many parrot species are threatened due to habitat loss,poaching for the pet trade,and the illegal wildlife trade.Conservation efforts are in place to protect these birds and their habitats.Caring for Parrots as Pets:If you decide to have a parrot as a pet,its important to provide a large cage with plenty of space to fly,a varied diet,and plenty of mental stimulation.Parrots are social creatures and require interaction to stay happy and healthy.Cultural Significance:In many cultures,parrots are seen as symbols of freedom,love,and beauty.They are also featured in various mythologies and folklore,often as messengers or guides.In conclusion,parrots are not just beautiful to look at but are also intelligent and social creatures with unique behaviors and needs.Understanding and respecting these aspects of their nature is crucial for anyone who wishes to share their life with a parrot.。
DISTRIBUTED INTERACTIVE SIMULATION FOR GROUP-DISTANCEEXERCISES ON THE WEBErik Berglund and Henrik ErikssonDepartment of Computer and Information ScienceLinköping UniversityS-581 83 Linköping, SwedenE-mail: {eribe, her}@ida.liu.seKEYWORDS: Distributed Interactive Simulation, Distance Education, Network, Internet, Personal ComputerABSTRACTIn distributed-interactive simulation (DIS), simulators act as elements of a bigger distributed simulation. A group-distance exercise (GDE) based on the DIS approach can therefore enable group training for group members participating from different locations. Our GDE approach, unlike full-scale DIS systems, uses affordable simulators designed for standard hardware available in homes and offices.ERCIS (group distance exERCISe) is a prototype GDE system that we have implemented. It takes advantage of Internet and Java to provide distributed simulation at a fraction of the cost of full-scale DIS systems.ERCIS illustrates that distributed simulation can bring advanced training to office and home computers in the form of GDE systems.The focus of this paper is to discuss the possibilities and the problems of GDE and of web-based distributed simulation as a means to provide GDE. INTRODUCTIONSimulators can be valuable tools in education. Simulators can reduce the cost of training and can allow training in hazardous situations (Berkum & Jong 1991). Distributed-interactive simulation (DIS) originated in military applications, where simulators from different types of forces were connected to form full battle situations. In DIS, individual simulators act as elements of a bigger distributed simulation (Loper & Seidensticker 1993).Thus, DIS could be used to create a group-distance exercise (GDE), where the participants perform a group exercise from different locations. Even though DIS systems based on complex special-hardware simulators provide impressive training tools, the cost and immobility of these systems prohibit mass training.ERCIS (group distance exERCISe) is a prototype GDE system that uses Internet technologies to provide affordable DIS support. Internet (or Intranet) technologies form a solid platform for GDE systems because they are readily available, and because they provide high level of support for network communication and for graphical simulation. ERCIS, therefore, takes advantage of the programming language Java, to combine group training, distance education and real-time interaction at a fraction of the cost of full-scale DIS systems.In this paper we discuss the possibilities and the problems of GDE and of web-based distributed simulation as a means to provide GDE. We do this by discussing and drawing conclusions from the ERCIS project.BACKGROUNDLet us first provide some background on GDE, DIS, distributed objects, ERCIS’s military application, and related work.Group-Distance Exercise (GDE)The purpose of GDE is to enable group training in distance education through the use of DIS. Unlike full-scale DIS systems, our GDE approach assumes simulators designed for standard hardware available in homes and offices. This approach calls for software-based simulators which are less expensive to use, can be multiplied virtually limitlessly, and can enable training with expensive, dangerous and/or non-existing equipment.A thorough background on the GDE concept can be found in Computer-Based Group-Distance Exercise (Berglund 1997).Distributed Interactive Simulation (DIS)DIS originated as a means to utilize military simulators in full battle situations by connecting them (Loper & Seidensticker 1993). As a result it becomes possible tocombine the use of advanced simulators and group training.The different parts of DIS systems communicate according to predefined data packets (IEEE 1995) that describe all necessary data on the bit level. The implementation of the communication is, therefore, built into DIS systems.Distributed ObjectsDistributed objects (Orfali et al. 1996) can be characterized as network transient objects, objects that can bridge networks. Two issues must be addressed when dealing with distributed objects: to locate them over the network and to transform them from abstract data to a transportation format and vice versa.The common object request broker architecture (CORBA) is a standard protocol for distributed objects, developed by the object management group (OMG). CORBA is used to cross both networks and programming languages. In CORBA, all objects are distributed via the object request broker (ORB). Objects requesting service of CORBA object have no knowledge about the location or the implementation of the CORBA objects (Vinoski 1997).Remote method invocation (RMI) is Java’s support for distributed objects among Java programs. RMI only provides the protocol to locate and distribute abstract data, unlike CORBA. In the ERCIS project we chose RMI because ERCIS is an all Java application. It also provided us with an opportunity to assess Java’s support for distributed objects.RBS-70 Missile UnitERCIS supports training of the RBS-70 missile unit of the Swedish anti-aircraft defense. The RBS-70 missile unit’s main purpose is to defend objects, for instance bridges, against enemy aircraft attacks, see Figure 1. The RBS-70 missile unit is composed of sub units: two intelligence units and nine combat units. The intelligence units use radar to discover and calculate flight data of hostile aircraft. Guided by the intelligence unit, the combat units engage the aircraft with RBS-70 missiles. (Personal Communication).Intelligence unitCombat unitData transferFigure 1. The RBS-70 missile unit. Its main purpose is to defend ground targets.During training, the RBS-70 unit uses simulators, for instance, to simulate radar images. All or part of the RBS-70 unit’s actual equipment is still used (Personal Communication).Related WorkIn military applications, there are several examples of group training conducted using distributed simulation. For instance, MIND (Jenvald 1996), Janus, Eagle, the Brigade/Battalion Simulation (Loper & Seidensticker 1993), and ABS 2000.C3Fire (Granlund 1997) is an example of a tool for group training, in the area of emergency management, that uses distributed simulation.A common high-level architecture for modeling and simulation, that will focus on a broader range of simulators than DIS, is being developed (Duncan 1996). There are several educational applets on the web that use simulation; see for instance the Gamelan applet repository. These applets are, however, generally small and there are few, if any, distributed simulations. ERCISERCIS is a prototype GDE system implemented in Java for Internet (or Intranets). It supports education of the RBS-70 missile unit by creating a DIS of that group’s environment. We have used RMI to implement the distribution of the system.ERCIS has two principal components: the equipment simulators and the simulator server, see Figure 2. A maximum of 11 group members can participate in a single ERCIS session, representing the 11 sub-unit leaders, see Figure 3. The group members join by loading an HTML document with an embedded equipment-simulator applet.Simulator serverIntelligence unit equipmentsimulatorCombat unit equipmentsimulatorFigure 2. The principal parts of ERCIS. The equipment simulators are connected to one another through the simulator server.Figure 3. A full ERCIS session. 11 equipment simulators can be active in one ERCIS session at a time. The equipment simulators communicate via the simulator server.Simulator ServerThe simulator server controls a microworld of the group’s environment, including simulated aircraft, exercise scenario and geographical information.The simulator server also distributes network communication among the equipment simulators. The reason for this client-server type of communication is that Java applets are generally only allowed to connect to the computer they were loaded from (Flanagan 1997).Finally, the simulator server functions as a point of reference by which the distributed parts locate one another. The simulator-server computer is therefore the only computer specified prior to an ERCIS session.Equipment SimulatorsThe equipment simulators simulate equipment used by the RBS-70 sub units and also function as user interfaces for the group members. There are two types of equipment simulators: the intelligence-unit equipment simulator and combat-unit equipment simulator.Intelligence Unit Equipment SimulatorThe intelligence-unit’s equipment simulator, see Figure 4,contains a radar simulator and a target-tracking simulator . The radar simulator monitors the air space. The target-tracking simulator performs part of the work of three intelligence-unit personnel to approximate position,speed and course of three aircraft simultaneously.The intelligence-unit leader’s task is to distribute the hostile aircraft among the combat units and to send approximated information to them.Panel used to send information to the combat unit equipment simulator.Target-tracking symbol used to initate target tracking.Simulated radarFigure 4. The user interface of the intelligence unit equipment simulator, running on a Sun Solaris Applet Viewer.Combat Unit Equipment SimulatorThe combat unit equipment simulator, see Figure 5,contains simulators of the target-data receiver and the RBS-70 missile launcher and operator .The target-data receiver presents information sent from the intelligence unit. The information is recalculated relative to the combat unit’s position and shows information such as: the distance and direction to the target. The RBS-70 missile launcher and operator represents the missile launcher and its crew. Based on the intelligence unit’s information it can locate, track and fire upon the target.The combat-unit leader’s task is to assess the situation and to grant permission to fire if all criteria are met.Switch used to grant fire permissionFigure 5. The user interface of the combat unit equipment simulator, running on a Windows 95 Applet Viewer.DISCUSSIONLet us then, with experience from the ERCIS project, discuss problems and possibilities of GDE though DIS and of the support Internet and Java provide for GDE. Pedagogical valueEducators can use GDE to introduce group training at an early stage by, for instance, simplifying equipment and thereby abstracting from technical details. Thus, GDE can focus the training on group performance.GDE could also be used to support knowledge recapitulation for expert practitioners. To relive the group’s tasks and environment can provide a more vivid experience than notes and books.GDE systems can automate collection and evaluation of performance statistics. It is possible to log sessions and replay them to visualize comments on performance in after-action reviews (Jenvald 1996).To fully assess the values of GDE, however, real-world testing is required.SecurityAccess control, software authentication, and communication encryption are examples of security issues that concern GDE and distributed simulators. Java 1.1 provides basic security, which includes fine-grained access control, and signed applets. The use of dedicated GDE Intranets would increase security, especially access control. It would, however, reduce the ability to chose location from where to participate.Security restrictions, motivated or not, limit applications. ERCIS was designed with a client-server type of communication because web browsers enforce security restrictions on applets (Flanagan 1997). Peer-to-peer communication would have been more suitable, from the perspective of both implementation and scalability. We are not saying that web browsers should give applets total freedom. In ERCIS, it would have sufficed if applets were allowed to make remote calls to RMI objects regardless of their location.Performance characteristicsOur initial concerns about the speed of RMI calls and of data transfer over Internet proved to be unfounded. The speed of communication is not a limiting factor for ERCIS. For instance, a modem link (28.8 kilobits per second) is sufficient to participate in exercises.Instead the speed of animation limits ERCIS. To provide smooth animation, ERCIS, therefore, requires more than the standard hardware of today, for instance a Pentium Pro machine or better.ScalabilityIn response to increased network load, ERCIS scales relatively well, because the volume of the data that is transmitted among the distributed parts is very small, for instance, 1 Kbytes.Incorporating new and better simulators in ERCIS requires considerable programming effort. In a full-scale GDE system it could be beneficial to modularize the simulators in a plug-and-play fashion, to allow variable simulator complexity.Download timeThe download time for applets the size of ERCIS’s equipment simulator can be very long. One way to overcome this problem is to create Java archive files (JAR files). JAR files aggregate many files into one and also compress them to decrease the download time considerably. Push technology such as Marimba’s Castanet could also be used to provide automatic distribution of the equipment-simulator software. Distributed objectsDistributed objects, such as RMI, provide a high level of abstraction in network communication compared to the DIS protocol. There are several examples of typical distributed applications that do not utilize distributed objects but that would benefit greatly from this approach. Two examples are the Nuclear Power Plant applet (Eriksson 1997), and NASA’s distributed control of the Sojourner.CONCLUSIONERCIS is a GDE prototype that can be used in training under teacher supervision or as part of a web site where web pages provide additional information. The system illustrates that the GDE approach can provide equipment-free mass training, which is beneficial, especially in military applications where training can be extremely expensive.Java proved to be a valuable tool for the implementation of ERCIS. Java’s level of abstraction is high in the two areas that concern ERCIS: animation and distributed objects. Java’s speed of animation is, however, too slow to enable acceptable performance for highly graphic-oriented simulators. Apart from this Java has supplied the support that can be expected from a programming language, for instance C++.Using RMI to implement distribution was straightforward. Compared to the DIS protocol, RMI provides a flexible and dynamic communication protocol. In conclusion, ERCIS illustrates that it is possible to use Internet technologies to develop affordable DIS systems. It also shows that distributed simulations can bring advanced training to office and home computers in the form of GDE systems.AcknowledgmentsWe would like to thank Major Per Bergström at the Center for Anti-Aircraft Defense in Norrtälje, Sweden for supplying domain knowledge of the RBS-70 missile unit. This work has been supported in part by the Swedish National Board for Industrial and Technical Development(Nutek) grant no. 93-3233, and by the Swedish Research Council for Engineering Science (TFR) grant no. 95-186. REFERENCESBerglund E. (1997) Computer-Based Group Distance Exercise, M.Sc. thesis no. 97/36, Department of Computer and Information Science, Linköping University (http://www.ida.liu.se/~eribe/publication/ GDE.zip: compressed postscript file).van Berkum J, de Jong T. (1991) Instructional environments for simulations Education & Computing vol. 6: 305-358.Duncan C. (1996) The DoD High Level Architecture and the Next Generation of DIS, Proceedings of the Fourteenth Workshop on Interoperability of Distributed Simulation, Orlando, Florida.Eriksson H. (1996) Expert Systems as Knowledge Servers, IEEE Expert vol. 11 no. 3: 14 -19. Flanagan, D. (1997) Java in a Nutshell 2nd Edition, O’Reilly, Sebastopol, CA.Granlund R. (1997) C3Fire A Microworld Supporting Emergency Management Training, licentiate thesis no.598, Department of Computer and Information Science 598, Department of Computer and Information Science Linköping University.IEEE (1995) IEEE Standard for Distributed Interactive Simulation--Application Protocols, IEEE 1278.1-1995 (Standard): IEEE.Jenvald J. (1996) Simulation and Data Collection in Battle Training, licentiate thesis no. 567, Department of Computer and Information Science, Linköping University.Loper M, Seidensticker S. (1994) The DIS Vision: A Map to the Future of Distributed Simulation, Orlando, Florida: Institute for Simulation & Training (/SISO/dis/library/vision.doc) Orfali R, Harkey D, Edwards J. (1996) The Essential Distributed Objects Survival Guide John Wiley, New York.Vinoski S. (1997) CORBA: Integrating Diverse Applications Within Distributed Heterogeneous Environments, IEEE Communications, vol. 14, no. 2.RESOURCES AT THE WEBThe OMG home page: /CORBA JavaSoft’s Java 1.1 documentation: /-products/jdk/1.1/docs/index.htmlThe Gamelan applet repository: / Marimba’s Castanet home page: http://www.marimba.-com/products/castanet.htmlThe Nuclear Power Plant Applet (Eriksson 1995): http://-www.ida.liu.se/∼her/npp/demo.htmlNASAS Soujourner, The techical details on the control distribution of NASAS Soujourner: /features/1997/july/juicy.wits.details.html AuthorsErik Berglund is a doctoral student of computer science at Linköping University. His research interests include knowledge acquisition, program understanding, software engineering, and computer supported education. He received his M.Sc. at Linköping University in 1997. Henrik Eriksson is an assistant professor of computer science at Linköping University. His research interests include expert systems, knowledge acquisition, reusable problem-solving methods and medical Informatics. He received his M.Sc. and Ph.D. at Linköping University in 1987 and 1991. He was a postdoctoral fellow and research scientist at Stanford University between 1991 and 1994. Since 1996, he is a guest researcher at the Swedish Institute for Computer Science (SICS).。
a r X i v :0806.1256v 1 [p h y s i c s .s o c -p h ] 7 J u n 2008Understanding individual human mobility patternsMarta C.Gonz´a lez,1,2C´e sar A.Hidalgo,1and Albert-L´a szl´o Barab´a si 1,2,31Center for Complex Network Research and Department of Physics and Computer Science,University of Notre Dame,Notre Dame IN 46556.2Center for Complex Network Research and Department of Physics,Biology and Computer Science,Northeastern University,Boston MA 02115.3Center for Cancer Systems Biology,Dana Farber Cancer Institute,Boston,MA 02115.(Dated:June 7,2008)Despite their importance for urban planning [1],traffic forecasting [2],and the spread of biological [3,4,5]and mobile viruses [6],our understanding of the basic laws govern-ing human motion remains limited thanks to the lack of tools to monitor the time resolved location of individuals.Here we study the trajectory of 100,000anonymized mobile phone users whose position is tracked for a six month period.We find that in contrast with the random trajectories predicted by the prevailing L´e vy flight and random walk models [7],human trajectories show a high degree of temporal and spatial regularity,each individual being characterized by a time independent characteristic length scale and a significant prob-ability to return to a few highly frequented locations.After correcting for differences in travel distances and the inherent anisotropy of each trajectory,the individual travel patterns collapse into a single spatial probability distribution,indicating that despite the diversity of their travel history,humans follow simple reproducible patterns.This inherent similarity in travel patterns could impact all phenomena driven by human mobility,from epidemic prevention to emergency response,urban planning and agent based modeling.Given the many unknown factors that influence a population’s mobility patterns,ranging from means of transportation to job and family imposed restrictions and priorities,human trajectories are often approximated with various random walk or diffusion models [7,8].Indeed,early mea-surements on albatrosses,bumblebees,deer and monkeys [9,10]and more recent ones on marine predators [11]suggested that animal trajectory is approximated by a L´e vy flight [12,13],a random walk whose step size ∆r follows a power-law distribution P (∆r )∼∆r −(1+β)with β<2.While the L´e vy statistics for some animals require further study [14],Brockmann et al.[7]generalized this finding to humans,documenting that the distribution of distances between consecutive sight-ings of nearly half-million bank notes is fat tailed.Given that money is carried by individuals, bank note dispersal is a proxy for human movement,suggesting that human trajectories are best modeled as a continuous time random walk with fat tailed displacements and waiting time dis-tributions[7].A particle following a L´e vyflight has a significant probability to travel very long distances in a single step[12,13],which appears to be consistent with human travel patterns:most of the time we travel only over short distances,between home and work,while occasionally we take longer trips.Each consecutive sightings of a bank note reflects the composite motion of two or more indi-viduals,who owned the bill between two reported sightings.Thus it is not clear if the observed distribution reflects the motion of individual users,or some hitero unknown convolution between population based heterogeneities and individual human trajectories.Contrary to bank notes,mo-bile phones are carried by the same individual during his/her daily routine,offering the best proxy to capture individual human trajectories[15,16,17,18,19].We used two data sets to explore the mobility pattern of individuals.Thefirst(D1)consists of the mobility patterns recorded over a six month period for100,000individuals selected randomly from a sample of over6million anonymized mobile phone users.Each time a user initiates or receives a call or SMS,the location of the tower routing the communication is recorded,allowing us to reconstruct the user’s time resolved trajectory(Figs.1a and b).The time between consecutive calls follows a bursty pattern[20](see Fig.S1in the SM),indicating that while most consecutive calls are placed soon after a previous call,occasionally there are long periods without any call activity.To make sure that the obtained results are not affected by the irregular call pattern,we also study a data set(D2)that captures the location of206mobile phone users,recorded every two hours for an entire week.In both datasets the spatial resolution is determined by the local density of the more than104mobile towers,registering movement only when the user moves between areas serviced by different towers.The average service area of each tower is approximately3km2 and over30%of the towers cover an area of1km2or less.To explore the statistical properties of the population’s mobility patterns we measured the dis-tance between user’s positions at consecutive calls,capturing16,264,308displacements for the D1and10,407displacements for the D2datasets.Wefind that the distribution of displacements over all users is well approximated by a truncated power-lawP(∆r)=(∆r+∆r0)−βexp(−∆r/κ),(1)withβ=1.75±0.15,∆r0=1.5km and cutoff valuesκ|D1=400km,andκ|D2=80km(Fig.1c,see the SM for statistical validation).Note that the observed scaling exponent is not far fromβB=1.59observed in Ref.[7]for bank note dispersal,suggesting that the two distributions may capture the same fundamental mechanism driving human mobility patterns.Equation(1)suggests that human motion follows a truncated L´e vyflight[7].Yet,the observed shape of P(∆r)could be explained by three distinct hypotheses:A.Each individual follows a L´e vy trajectory with jump size distribution given by(1).B.The observed distribution captures a population based heterogeneity,corresponding to the inherent differences between individuals.C.A population based heterogeneity coexists with individual L´e vy trajectories,hence(1)represents a convolution of hypothesis A and B.To distinguish between hypotheses A,B and C we calculated the radius of gyration for each user(see Methods),interpreted as the typical distance traveled by user a when observed up to time t(Fig.1b).Next,we determined the radius of gyration distribution P(r g)by calculating r g for all users in samples D1and D2,finding that they also can be approximated with a truncated power-lawP(r g)=(r g+r0g)−βr exp(−r g/κ),(2) with r0g=5.8km,βr=1.65±0.15andκ=350km(Fig.1d,see SM for statistical validation). L´e vyflights are characterized by a high degree of intrinsic heterogeneity,raising the possibility that(2)could emerge from an ensemble of identical agents,each following a L´e vy trajectory. Therefore,we determined P(r g)for an ensemble of agents following a Random Walk(RW), L´e vy-Flight(LF)or Truncated L´e vy-Flight(T LF)(Figure1d)[8,12,13].Wefind that an en-semble of L´e vy agents display a significant degree of heterogeneity in r g,yet is not sufficient to explain the truncated power law distribution P(r g)exhibited by the mobile phone users.Taken together,Figs.1c and d suggest that the difference in the range of typical mobility patterns of indi-viduals(r g)has a strong impact on the truncated L´e vy behavior seen in(1),ruling out hypothesis A.If individual trajectories are described by a LF or T LF,then the radius of gyration should increase in time as r g(t)∼t3/(2+β)[21,22]while for a RW r g(t)∼t1/2.That is,the longer we observe a user,the higher the chances that she/he will travel to areas not visited before.To check the validity of these predictions we measured the time dependence of the radius of gyration for users whose gyration radius would be considered small(r g(T)≤3km),medium(20<r g(T)≤30km)or large(r g(T)>100km)at the end of our observation period(T=6months).Theresults indicate that the time dependence of the average radius of gyration of mobile phone users is better approximated by a logarithmic increase,not only a manifestly slower dependence than the one predicted by a power law,but one that may appear similar to a saturation process(Fig.2a and Fig.S4).In Fig.2b,we have chosen users with similar asymptotic r g(T)after T=6months,and measured the jump size distribution P(∆r|r g)for each group.As the inset of Fig.2b shows,users with small r g travel mostly over small distances,whereas those with large r g tend to display a combination of many small and a few larger jump sizes.Once we rescale the distributions with r g(Fig.2b),wefind that the data collapses into a single curve,suggesting that a single jump size distribution characterizes all users,independent of their r g.This indicates that P(∆r|r g)∼r−αg F(∆r/r g),whereα≈1.2±0.1and F(x)is an r g independent function with asymptotic behavior F(x<1)∼x−αand rapidly decreasing for x≫1.Therefore the travel patterns of individual users may be approximated by a L´e vyflight up to a distance characterized by r g. Most important,however,is the fact that the individual trajectories are bounded beyond r g,thus large displacements which are the source of the distinct and anomalous nature of L´e vyflights, are statistically absent.To understand the relationship between the different exponents,we note that the measured probability distributions are related by P(∆r)= ∞0P(∆r|r g)P(r g)dr g,whichsuggests(see SM)that up to the leading order we haveβ=βr+α−1,consistent,within error bars, with the measured exponents.This indicates that the observed jump size distribution P(∆r)is in fact the convolution between the statistics of individual trajectories P(∆r g|r g)and the population heterogeneity P(r g),consistent with hypothesis C.To uncover the mechanism stabilizing r g we measured the return probability for each indi-vidual F pt(t)[22],defined as the probability that a user returns to the position where it was first observed after t hours(Fig.2c).For a two dimensional random walk F pt(t)should follow ∼1/(t ln(t)2)[22].In contrast,wefind that the return probability is characterized by several peaks at24h,48h,and72h,capturing a strong tendency of humans to return to locations they visited before,describing the recurrence and temporal periodicity inherent to human mobility[23,24].To explore if individuals return to the same location over and over,we ranked each location based on the number of times an individual was recorded in its vicinity,such that a location with L=3represents the third most visited location for the selected individual.Wefind that the probability offinding a user at a location with a given rank L is well approximated by P(L)∼1/L, independent of the number of locations visited by the user(Fig.2d).Therefore people devote mostof their time to a few locations,while spending their remaining time in5to50places,visited with diminished regularity.Therefore,the observed logarithmic saturation of r g(t)is rooted in the high degree of regularity in their daily travel patterns,captured by the high return probabilities(Fig.2b) to a few highly frequented locations(Fig.2d).An important quantity for modeling human mobility patterns is the probabilityΦa(x,y)tofind an individual a in a given position(x,y).As it is evident from Fig.1b,individuals live and travel in different regions,yet each user can be assigned to a well defined area,defined by home and workplace,where she or he can be found most of the time.We can compare the trajectories of different users by diagonalizing each trajectory’s inertia tensor,providing the probability offinding a user in a given position(see Fig.3a)in the user’s intrinsic reference frame(see SM for the details).A striking feature ofΦ(x,y)is its prominent spatial anisotropy in this intrinsic reference frame(note the different scales in Fig3a),and wefind that the larger an individual’s r g the more pronounced is this anisotropy.To quantify this effect we defined the anisotropy ratio S≡σy/σx, whereσx andσy represent the standard deviation of the trajectory measured in the user’s intrinsic reference frame(see SM).Wefind that S decreases monotonically with r g(Fig.3c),being well approximated with S∼r−ηg,forη≈0.12.Given the small value of the scaling exponent,other functional forms may offer an equally goodfit,thus mechanistic models are required to identify if this represents a true scaling law,or only a reasonable approximation to the data.To compare the trajectories of different users we remove the individual anisotropies,rescal-ing each user trajectory with its respectiveσx andσy.The rescaled˜Φ(x/σx,y/σy)distribution (Fig.3b)is similar for groups of users with considerably different r g,i.e.,after the anisotropy and the r g dependence is removed all individuals appear to follow the same universal˜Φ(˜x,˜y)prob-ability distribution.This is particularly evident in Fig.3d,where we show the cross section of ˜Φ(x/σ,0)for the three groups of users,finding that apart from the noise in the data the curves xare indistinguishable.Taken together,our results suggest that the L´e vy statistics observed in bank note measurements capture a convolution of the population heterogeneity(2)and the motion of individual users.Indi-viduals display significant regularity,as they return to a few highly frequented locations,like home or work.This regularity does not apply to the bank notes:a bill always follows the trajectory of its current owner,i.e.dollar bills diffuse,but humans do not.The fact that individual trajectories are characterized by the same r g-independent two dimen-sional probability distribution˜Φ(x/σx,y/σy)suggests that key statistical characteristics of indi-vidual trajectories are largely indistinguishable after rescaling.Therefore,our results establish the basic ingredients of realistic agent based models,requiring us to place users in number propor-tional with the population density of a given region and assign each user an r g taken from the observed P(r g)ing the predicted anisotropic rescaling,combined with the density function˜Φ(x,y),whose shape is provided as Table1in the SM,we can obtain the likelihood offinding a user in any location.Given the known correlations between spatial proximity and social links,our results could help quantify the role of space in network development and evolu-tion[25,26,27,28,29]and improve our understanding of diffusion processes[8,30].We thank D.Brockmann,T.Geisel,J.Park,S.Redner,Z.Toroczkai and P.Wang for discus-sions and comments on the manuscript.This work was supported by the James S.McDonnell Foundation21st Century Initiative in Studying Complex Systems,the National Science Founda-tion within the DDDAS(CNS-0540348),ITR(DMR-0426737)and IIS-0513650programs,and the U.S.Office of Naval Research Award N00014-07-C.Data analysis was performed on the Notre Dame Biocomplexity Cluster supported in part by NSF MRI Grant No.DBI-0420980.C.A.Hi-dalgo acknowledges support from the Kellogg Institute at Notre Dame.Supplementary Information is linked to the online version of the paper at /nature.Author Information Correspondence and requests for materials should be addressed to A.-L.B.(e-mail:alb@)[1]Horner,M.W.&O’Kelly,M.E.S Embedding economies of scale concepts for hub networks design.Journal of Transportation Geography9,255-265(2001).[2]Kitamura,R.,Chen,C.,Pendyala,R.M.&Narayaran,R.Micro-simulation of daily activity-travelpatterns for travel demand forecasting.Transportation27,25-51(2000).[3]Colizza,V.,Barrat,A.,Barth´e l´e my,M.,Valleron,A.-J.&Vespignani,A.Modeling the WorldwideSpread of Pandemic Influenza:Baseline Case and Containment Interventions.PLoS Medicine4,095-0110(2007).[4]Eubank,S.,Guclu,H.,Kumar,V.S.A.,Marathe,M.V.,Srinivasan,A.,Toroczkai,Z.&Wang,N.Controlling Epidemics in Realistic Urban Social Networks.Nature429,180(2004).[5]Hufnagel,L.,Brockmann,D.&Geisel,T.Forecast and control of epidemics in a globalized world.Proceedings of the National Academy of Sciences of the United States of America101,15124-15129 (2004).[6]Kleinberg,J.The wireless epidemic.Nature449,287-288(2007).[7] D.Brockmann,D.,Hufnagel,L.&Geisel,T.The scaling laws of human travel.Nature439,462-465(2006).[8]Havlin,S.&ben-Avraham,D.Diffusion in Disordered Media.Advances in Physics51,187-292(2002).[9]Viswanathan,G.M.,Afanasyev,V.,Buldyrev,S.V.,Murphy,E.J.,Prince,P.A.&Stanley,H.E.L´e vyFlight Search Patterns of Wandering Albatrosses.Nature381,413-415(1996).[10]Ramos-Fernandez,G.,Mateos,J.L.,Miramontes,O.,Cocho,G.,Larralde,H.&Ayala-Orozco,B.,L´e vy walk patterns in the foraging movements of spider monkeys(Ateles geoffroyi).Behavioral ecol-ogy and Sociobiology55,223-230(2004).[11]Sims D.W.et al.Scaling laws of marine predator search behaviour.Nature451,1098-1102(2008).[12]Klafter,J.,Shlesinger,M.F.&Zumofen,G.Beyond Brownian Motion.Physics Today49,33-39(1996).[13]Mantegna,R.N.&Stanley,H.E.Stochastic Process with Ultraslow Convergence to a Gaussian:TheTruncated L´e vy Flight.Physical Review Letters73,2946-2949(1994).[14]Edwards,A.M.,Phillips,R.A.,Watkins,N.W.,Freeman,M.P.,Murphy,E.J.,Afanasyev,V.,Buldyrev,S.V.,da Luz,M.G.E.,Raposo,E.P.,Stanley,H.E.&Viswanathan,G.M.Revisiting L´e vyflightsearch patterns of wandering albatrosses,bumblebees and deer.Nature449,1044-1049(2007). [15]Sohn,T.,Varshavsky,A.,LaMarca,A.,Chen,M.Y.,Choudhury,T.,Smith,I.,Consolvo,S.,High-tower,J.,Griswold,W.G.&de Lara,E.Lecture Notes in Computer Sciences:Proc.8th International Conference UbiComp2006.(Springer,Berlin,2006).[16]Onnela,J.-P.,Saram¨a ki,J.,Hyv¨o nen,J.,Szab´o,G.,Lazer,D.,Kaski,K.,Kert´e sz,K.&Barab´a si A.L.Structure and tie strengths in mobile communication networks.Proceedings of the National Academy of Sciences of the United States of America104,7332-7336(2007).[17]Gonz´a lez,M.C.&Barab´a si,plex networks:From data to models.Nature Physics3,224-225(2007).[18]Palla,G.,Barab´a si,A.-L.&Vicsek,T.Quantifying social group evolution.Nature446,664-667(2007).[19]Hidalgo C.A.&Rodriguez-Sickert C.The dynamics of a mobile phone network.Physica A387,3017-30224.[20]Barab´a si,A.-L.The origin of bursts and heavy tails in human dynamics.Nature435,207-211(2005).[21]Hughes,B.D.Random Walks and Random Environments.(Oxford University Press,USA,1995).[22]Redner,S.A Guide to First-Passage Processes.(Cambridge University Press,UK,2001).[23]Schlich,R.&Axhausen,K.W.Habitual travel behaviour:Evidence from a six-week travel diary.Transportation30,13-36(2003).[24]Eagle,N.&Pentland,A.Eigenbehaviours:Identifying Structure in Routine.submitted to BehavioralEcology and Sociobiology(2007).[25]Yook,S.-H.,Jeong,H.&Barab´a si A.L.Modeling the Internet’s large-scale topology.Proceedings ofthe Nat’l Academy of Sciences99,13382-13386(2002).[26]Caldarelli,G.Scale-Free Networks:Complex Webs in Nature and Technology.(Oxford UniversityPress,USA,2007).[27]Dorogovtsev,S.N.&Mendes,J.F.F.Evolution of Networks:From Biological Nets to the Internet andWWW.(Oxford University Press,USA,2003).[28]Song C.M.,Havlin S.&Makse H.A.Self-similarity of complex networks.Nature433,392-395(2005).[29]Gonz´a lez,M.C.,Lind,P.G.&Herrmann,H.J.A system of mobile agents to model social networks.Physical Review Letters96,088702(2006).[30]Cecconi,F.,Marsili,M.,Banavar,J.R.&Maritan,A.Diffusion,peer pressure,and tailed distributions.Physical Review Letters89,088102(2002).FIG.1:Basic human mobility patterns.a,Week-long trajectory of40mobile phone users indicate that most individuals travel only over short distances,but a few regularly move over hundreds of kilometers. Panel b,displays the detailed trajectory of a single user.The different phone towers are shown as green dots,and the V oronoi lattice in grey marks the approximate reception area of each tower.The dataset studied by us records only the identity of the closest tower to a mobile user,thus we can not identify the position of a user within a V oronoi cell.The trajectory of the user shown in b is constructed from186 two hourly reports,during which the user visited a total of12different locations(tower vicinities).Among these,the user is found96and67occasions in the two most preferred locations,the frequency of visits for each location being shown as a vertical bar.The circle represents the radius of gyration centered in the trajectory’s center of mass.c,Probability density function P(∆r)of travel distances obtained for the two studied datasets D1and D2.The solid line indicates a truncated power law whose parameters are provided in the text(see Eq.1).d,The distribution P(r g)of the radius of gyration measured for the users, where r g(T)was measured after T=6months of observation.The solid line represent a similar truncated power lawfit(see Eq.2).The dotted,dashed and dot-dashed curves show P(r g)obtained from the standard null models(RW,LF and T LF),where for the T LF we used the same step size distribution as the onemeasured for the mobile phone users.FIG.2:The bounded nature of human trajectories.a,Radius of gyration, r g(t) vs time for mobile phone users separated in three groups according to theirfinal r g(T),where T=6months.The black curves correspond to the analytical predictions for the random walk models,increasing in time as r g(t) |LF,T LF∼t3/2+β(solid),and r g(t) |RW∼t0.5(dotted).The dashed curves corresponding to a logarithmicfit of the form A+B ln(t),where A and B depend on r g.b,Probability density function of individual travel distances P(∆r|r g)for users with r g=4,10,40,100and200km.As the inset shows,each group displays a quite different P(∆r|r g)distribution.After rescaling the distance and the distribution with r g(main panel),the different curves collapse.The solid line(power law)is shown as a guide to the eye.c,Return probability distribution,F pt(t).The prominent peaks capture the tendency of humans to regularly return to the locations they visited before,in contrast with the smooth asymptotic behavior∼1/(t ln(t)2)(solid line)predicted for random walks.d,A Zipf plot showing the frequency of visiting different locations.The symbols correspond to users that have been observed to visit n L=5,10,30,and50different locations.Denoting with(L)the rank of the location listed in the order of the visit frequency,the data is well approximated by R(L)∼L−1. The inset is the same plot in linear scale,illustrating that40%of the time individuals are found at theirfirsttwo preferred locations.FIG.3:The shape of human trajectories.a,The probability density functionΦ(x,y)offinding a mobile phone user in a location(x,y)in the user’s intrinsic reference frame(see SM for details).The three plots, from left to right,were generated for10,000users with:r g≤3,20<r g≤30and r g>100km.The trajectories become more anisotropic as r g increases.b,After scaling each position withσx andσy theresulting˜Φ(x/σx,y/σy)has approximately the same shape for each group.c,The change in the shape of Φ(x,y)can be quantified calculating the isotropy ratio S≡σy/σx as a function of r g,which decreases as S∼r−0.12(solid line).Error bars represent the standard error.d,˜Φ(x/σx,0)representing the x-axis cross gsection of the rescaled distribution˜Φ(x/σx,y/σy)shown in b.。
Example analysis of biodiversity survey data with R packagegradientForestC.Roland Pitcher,Nick Ellis,Stephen J.SmithMarch21,2011Contents1Introduction1 2Gradient Forest basics22.1Data (2)2.2gradientForest analysis (2)2.3gradientForest plots (3)3Gradient Forest predictions83.1Transforming predictors (9)3.2Biplot of the biological space (9)3.3Mapping in geographic space (12)3.4A clustered version (12)4Session information15 References16 1IntroductionR package gradientForest[Ellis et˜al.,2010]developsflexible non-parametric functions to quantify multi-species compositional turnover along environmental gradients.Theflexibility comes from the method’s origins in Random Forests[Breiman,2001];specifically R package randomForest[Liaw and Wiener,2002].This document provides an example to demonstrate the use of gradientForest for ecological analysis of biodiversity survey data.A multi-regional application is provided by[Pitcher et˜al.,2010].The document assumes some familiarity both with R and with community analysis.The example has some analogues with constrained ordi-nation methods and with Generalized Dissimilarity Modelling[Ferrier et˜al.,2007],which are both complementary.Package randomForest includes functions for plotting non-linear responses in compositional along environmental gradients,and for using these responses to transform en-vironmental data layers to biological scales.The transformed multi-dimensional biological space can be represented as a biplot and can be mapped in geographic space.This example demonstrates typical scripts for a gradient forest analysis and provides in the package a sites-by-species(row-by-columns)matrix and a matching sites-by-environment(row-by-columns)dataframe.The number of rows and their order must match between these two data objects.The data should not include NAs.It is assumed that users will be familiar with the data-processing steps necessary to produce such data objects.12Gradient Forest basics2.1DataThe example data provided in package gradientForest are real species data from a cross-shelf survey in the far northern Great Barrier Reef of northeast Australia[Poiner et˜al.,1998, Burridge et˜al.,2006].Of>1,000species observed,a subset of110are included from197of the sites sampled.The environmental data include28predictors,either measured at each site or attributed to each site by interpolation[Pitcher et˜al.,2002].>require(gradientForest)>load("GZ.sps.mat.Rdata")>dim(Sp_mat)[1]197110>load("GZ.phys.site.Rdata")>dim(Phys_site)[1]197282.2gradientForest analysisThe function gradientForest is a wrapper function that calls extendedForest,a modified version of randomForest,and collates its output across all the species in the data matrix. The key modification in extendedForest extracts the numerous tree split values along each predictor gradient and their associatedfit improvement,for each predictor in each tree,for the forests and returns that information to gradientForest.Like randomForest,extendedForest assesses the importance of each variable for predic-tion accuracy;information that is further collated and processed by gradientForest.Often, predictor variables are correlated however.The standard approach in random forests assesses marginal importance of predictor by randomly permuting each predictor in turn,across all sites in the dataset,and calculating the degradation prediction performance of each tree.Package extendedForest can account for correlated predictors by implementing conditional permutation [Ellis et˜al.,2010],following the strategy outlined by Strobl et˜al.[2008].In conditional per-mutation,the predictor to be assessed is permuted only within blocks of the dataset defined by splits in the given tree on any other predictors correlated above a certain threshold(e.g. r=0.5)and up to a maximum number of splits set by the maxLevel option(if required).>nSites<-dim(Sp_mat)[1]>nSpecs<-dim(Sp_mat)[2]>lev<-floor(log2(nSites*0.368/2))>lev[1]5The gradientForest may take several minutes to run.Other options that can be set include the number of trees typically500,whether the splits should be compact into bins(advising to prevent memory problems for large datasets)and the number of bins,and the correlation threshold for conditional permutation.The summary shows the number of species with positive R2ie.those species that could be predicted to any extent by the available predictor.The returned object is a list containing the data,predictor importances,species R2’s and other information described in the html help pages under Value.>gf<-gradientForest(cbind(Phys_site,Sp_mat),+predictor.vars=colnames(Phys_site),response.vars=colnames(Sp_mat),+ntree=500,transform=NULL,compact=T,+nbin=201,maxLevel=lev,corr.threshold=0.5)>gfA forest of500regression trees for each of90speciesCall:gradientForest(data=cbind(Phys_site,Sp_mat),predictor.vars=colnames(Phys_site), response.vars=colnames(Sp_mat),ntree=500,transform=NULL,maxLevel=lev,corr.threshold=0.5,compact=T,nbin=201)Important variables:[1]BSTRESS MUD S_AV Si_AV CHLA_AV>names(gf)[1]"X""Y""result"[4]"overall.imp""overall.imp2""ntree"[7]"imp.rsq""species.pos.rsq""ranForest.type"[10]"res""res.u""dens"[13]"call"2.3gradientForest plotsSeveral types of plots are available for the gradientForest object.Thefirst is the predictor overall importance plot.This show the mean accuracy importance and the mean importance weighted by species R2.in this example,both are conditional importance.Seabed stress and sediment mud fraction are clearly the most important variables across these89species.>plot(gf,plot.type="O")PO4_SRASPECT SLOPE O2_SR O2_AV PO4_AV CHLA_SR S_SR K490_SRSi_SR SST_SR BIR_AV BIR_SR GRAVEL BA THY NO3_SR T_AV T_SR SST_AV CRBNT NO3_AV Si_AV SAND K490_AV CHLA_AV S_AV MUD BSTRESSAccuracy importance0.000.040.08PO4_SRASPECT O2_SR SLOPE O2_AV CHLA_SR PO4_AV Si_SR S_SR BIR_AV BIR_SR K490_SR SST_SR NO3_SR BA THY T_SR GRAVEL SST_AV NO3_AV T_AV CRBNT K490_AV SAND CHLA_AVSi_AV S_AV MUD BSTRESSR 2weighted importance0.0000.0100.020The predictor gradient plots are best presented in order of importance;in this example the top 25predictors are presented in 5by 5panels.>most_important <-names(importance(gf))[1:25]>par(mgp =c(2,0.75,0))The second plot is the splits density plot (plot.type="S"),which shows binned split impor-tance and location on each gradient (spikes),kernel density of splits (black lines ),of observations (red lines )and of splits standardised by observations density (blue lines ).Each distribution in-tegrates to predictor importance.These show where important changes in the abundance of multiple species are occurring along the gradient;they indicate a composition change rate.Many of the usual plot options can be set in the call.>plot(gf,plot.type ="S",imp.vars =most_important,+leg.posn ="topright",cex.legend =0.4,cex.axis =0.6,+b =0.7,line.ylab =0.9,par.args =list(mgp =c(1.5,+0.5,0),mar =c(3.1,1.5,0.1,1)))0.10.30.50.000.040.080.12BSTRESS02040600e +004e −048e −04MUD 34.835.20.000.040.080.12S_AV 1.5 2.5 3.50.0000.0060.012Si_AV0.51.0 1.52.00.000.020.04CHLA_AV020*******.000000.00015SAND0.060.100.140.00.20.40.6K490_AV4060801000.00000.00100.0020CRBNT27.528.028.529.00.0000.010T_AV0.20.30.40.50.000.050.100.15NO3_AV26.726.927.10.000.040.08SST_AV020*******.000000.00015GRAVEL1.02.03.00.0000.0060.012T_SR−50−40−30−20−100e +003e −046e −04BATHY 0.10.20.30.40.50.000.020.040.06NO3_SR4.2 4.65.0 5.40.0000.0060.012SST_SR 0.050.100.150.200.000.100.20K490_SR0.050.150.250.000.020.04BIR_SR0.00.20.40.60.80.0000.0150.030BIR_AV 1.02.03.00.0000.0020.004S_SR1234560.00000.0010Si_SR 0.140.180.000.10PO4_AV 123450.0000.0040.008CHLA_SR 4.254.35 4.450.0000.0100.020O2_AV 0.0 1.0 2.00.0000.0020.0040.006SLOPED e n s i t yThe third plot is the species cumulative plot (plot.type="C",show.overall=F ),which for each species shows cumulative importance distributions of splits improvement scaled by R 2weighted importance,and standardised by density of observations.These show cumulative change in abundance of individual species,where changes occur on the gradient,and the species changing most on each gradient.Again many of the usual plot options can be set in the call;in this example the legend identifies the top 5most responsive species for each predictor >plot(gf,plot.type ="C",imp.vars =most_important,+show.overall =F,legend =T,leg.posn ="topleft",+leg.nspecies =5,b =0.7,cex.legend =0.4,+cex.axis =0.6,line.ylab =0.9,par.args =list(mgp =c(1.5,+0.5,0),mar =c(2.5,1,0.1,0.5),omi =c(0,+0.3,0,0)))0.10.20.30.40.50.000.100.20BSTRESS01030500.000.020.040.06MUD34.835.035.235.40.000.040.08S_AV1.52.53.50.000.040.080.12Si_AV0.5 1.0 1.5 2.00.000.020.04CHLA_AV0204060801000.000.040.08SAND0.060.100.140.000.020.040.06K490_AV4060801000.000.020.04CRBNT27.528.028.529.00.000.020.040.06T_AV0.20.30.40.50.000.020.04NO3_AV26.726.826.927.027.10.000.020.04SST_AV0204060801000.000.020.04GRAVEL1.0 1.52.0 2.53.0 3.50.000.020.04T_SR−50−40−30−20−100.000.020.04BATHY0.10.20.30.40.50.000.020.04NO3_SR4.2 4.65.0 5.40.0000.0100.020SST_SR0.050.100.150.200.000.020.040.06K490_SR0.050.150.250.0000.0100.020BIR_SR0.00.20.40.60.80.0000.0100.020BIR_AV1.02.03.00.000.020.04S_SR1234560.000.020.04Si_SR 0.140.160.180.200.0000.0150.030PO4_AV 123450.0000.0100.0200.030CHLA_SR 4.254.354.450.0000.0040.0080.012O2_AV 0.00.5 1.0 1.5 2.0 2.50.0000.0100.0200.030SLOPEC u m u l a t i v e i m p o r t a n c eThe fourth plot is the predictor cumulative plot (plot.type="C",show.species=F ),which for each predictor shows cumulative importance distributions of splits improvement scaled by R 2weighted importance,and standardised by density of observations,averaged over all species.These show cumulative change in overall composition of the community,where changes occur on the gradient.Again many of the usual plot options can be set in the call;in this example com-mon.scale=T ensures that plots for all predictors have the same y-scale as the most important predictor.>plot(gf,plot.type ="C",imp.vars =most_important,+show.species =F,common.scale =T,cex.axis =0.6,+b =0.7,line.ylab =0.9,par.args =list(mgp =c(1.5,+0.5,0),mar =c(2.5,1,0.1,0.5),omi =c(0,+0.3,0,0)))0.10.20.30.40.50.0000.0100.020BSTRESS01030500.0000.0100.020MUD34.835.035.235.40.0000.0100.020S_AV1.52.53.50.0000.0100.020Si_AV0.51.0 1.52.00.0000.0100.020CHLA_AV0204060801000.0000.0100.020SAND0.060.100.140.0000.0100.020K490_AV4060801000.0000.0100.020CRBNT27.528.028.529.00.0000.0100.020T_AV0.20.30.40.50.0000.0100.020NO3_AV26.726.826.927.027.10.0000.0100.020SST_AV0204060801000.0000.0100.020GRAVEL1.0 1.52.0 2.53.0 3.50.0000.0100.020T_SR−50−40−30−20−100.0000.0100.020BATHY0.10.20.30.40.50.0000.0100.020NO3_SR4.2 4.65.0 5.40.0000.0100.020SST_SR0.050.100.150.200.0000.0100.020K490_SR0.050.150.250.0000.0100.020BIR_SR0.00.20.40.60.80.0000.0100.020BIR_AV1.02.03.00.0000.0100.020S_SR1234560.0000.0100.020Si_SR 0.140.160.180.200.0000.0100.020PO4_AV 123450.0000.0100.020CHLA_SR 4.25 4.35 4.450.0000.0100.020O2_AV0.00.5 1.0 1.5 2.0 2.50.0000.0100.020SLOPEC u m u l a t i v e i m p o r t a n c eThe fifth plot shows the R 2measure of the fit of the random forest model for each species,ordered in various ways.>plot(gf,plot.type ="P",s =F,horizontal =F,+cex.axis =1,bels =0.7,line =2.5)qq q q qq q q qq q q q qqq q q q qq q q qq q q q q q q q qq q q q q q q q q q q q q q q q q q qq q q q q q q q q qq q q q q qq q q q q q q q qq q q q q q q q q q q q q1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889900.00.10.20.30.40.50.6Species performance rankR 2Overall performance of random forests over speciesSeveral other alternative formats of the R 2fit performance plot are available,e.g.:>plot(gf,plot.type ="P",s =T,horizontal =F,+cex.axis =1,bels =0.7,line =2.5)>plot(gf,plot.type ="P",s =F,horizontal =T,+cex.axis =1,bels =0.6,line =2.5)>plot(gf,plot.type ="P",s =T,horizontal =T,+cex.axis =1,bels =0.6,line =2.5)3Gradient Forest predictionsIn addition to examining compositional change along environmental gradients,the predictor cumulative functions can also be used to transform grid data layers of environmental variables to a common biological importance scale.This transformation of the multi-dimensional en-vironment space is to a biological space in which coordinate position represents compositionassociated with the predictors.These inferred compositional patterns can be mapped in bio-logical space and geographic space in a manner analogous to ordination,but takes into account the non-linear and sometimes threshold changes that occur along gradients.3.1Transforming predictorsThe example provided includes gridded environmental variables for a roughly10,000km2area of the far northern Great Barrier Reef where the biological surveys were conducted.The data include North and East coordinates plus28predictors at8,682grid cells.The grid data must include the same predictors with the same names as sites included in the gradientForest call.>load("GZ.phys.grid.Rdata")>dim(Phys_grid)[1]868230>names(Phys_grid)[1]"NORTH""EAST""BATHY""SLOPE""ASPECT" [6]"BSTRESS""CRBNT""GRAVEL""SAND""MUD"[11]"NO3_AV""NO3_SR""PO4_AV""PO4_SR""O2_AV"[16]"O2_SR""S_AV""S_SR""T_AV""T_SR"[21]"Si_AV""Si_SR""CHLA_AV""CHLA_SR""K490_AV"[26]"K490_SR""SST_AV""SST_SR""BIR_AV""BIR_SR"The grid variables are transformed using the gradientForest predict function.>imp.vars<-names(importance(gf))>Trns_grid<-cbind(Phys_grid[,c("EAST","NORTH")],+predict(gf,Phys_grid[,imp.vars]))It is useful to also transform the site environmental predictors,which are available from gf$X.>Trns_site<-predict(gf)3.2Biplot of the biological spaceThe multi-dimensional biological space can most effectively be represented by taking the prin-ciple components of the transformed grid and presenting thefirst two dimensions in a biplot. It must be acknowledged that while most variation in patterns is captured by thefirst dimen-sions,additional compositional pattern contained in the higher dimensions is not shown.A user defined RGB colour palette is set up based on thefirst3dimensions.>PCs<-prcomp(Trns_grid[,imp.vars])>a1<-PCs$x[,1]>a2<-PCs$x[,2]>a3<-PCs$x[,3]>r<-a1+a2>g<--a2>b<-a3+a2-a1>r<-(r-min(r))/(max(r)-min(r))*255>g<-(g-min(g))/(max(g)-min(g))*255>b<-(b-min(b))/(max(b)-min(b))*255The environmental variables may be shown as vectors,perhaps limited to the most important predictors—in this example,variables to show as vectors are selected.>nvs<-dim(PCs$rotation)[1]>vec<-c("BSTRESS","MUD","SST_AV","T_AV","CHLA_AV", +"SAND","CRBNT","GRAVEL")>lv<-length(vec)>vind<-rownames(PCs$rotation)%in%vec>scal<-40>xrng<-range(PCs$x[,1],PCs$rotation[,1]/scal)*+ 1.1>yrng<-range(PCs$x[,2],PCs$rotation[,2]/scal)*+ 1.1>plot((PCs$x[,1:2]),xlim=xrng,ylim=yrng,+pch=".",cex=4,col=rgb(r,g,b,max=255),+asp=1)>points(PCs$rotation[!vind,1:2]/scal,pch="+")>arrows(rep(0,lv),rep(0,lv),PCs$rotation[vec,+1]/scal,PCs$rotation[vec,2]/scal,length=0.0625)>jit<-0.0015>text(PCs$rotation[vec,1]/scal+jit*sign(PCs$rotation[vec,+1]),PCs$rotation[vec,2]/scal+jit*sign(PCs$rotation[vec,+2]),labels=vec)Different coordinate positions in the biplot represent differing compositions,as associated with the predictors.Further information may be added to the biplot including the location of sites in biological space,the weight mean location of species,and selected species may be identified interactively.−0.02−0.010.000.010.02−0.02−0.010.000.010.02PC1P C 2>PCsites <-predict(PCs,Trns_site[,imp.vars])>points(PCsites[,1:2])>SpsWtd <-sweep(gf$Y,2,apply(gf$Y,2,min),+"-")>SpsWtdPCs <-(t(SpsWtd)%*%(PCsites[,1:2]))/colSums(SpsWtd)>points(SpsWtdPCs,col ="red",pch ="+")If required,the abundance of any given species may be plotted on the biplot.For example the first species from gf$Y =A1010102,an alga from the family Caulerpaceae that appears to prefer carbonate gravelly sand area with moderate bedstress and lower temperature.>sp <-colnames(SpsWtd)[1]>points(PCsites[,1:2],col ="blue",cex =SpsWtd[,+sp]/2)Alternatively,specifically named examples could be plotted:e.g.E4030373a Fungiid coral;M2020101a Strombid mollusc;or S1010671a Perophorid ascidian to name a few.3.3Mapping in geographic spaceThe biplot and the colour palette in the previous section can be used as a key to visualise com-positional patterns mapped in geographic space.The following map plots predicted PC scores in geographic coordinates,using the same colour palette as above,and represents continuous changes in inferred compositional patterns associated with the predictors.>plot(Trns_grid[,c("EAST","NORTH")],pch =".",+cex =3,asp =1,col =rgb(r,g,b,max =255))−0.6−0.4−0.20.00.20.40.6−0.6−0.4−0.20.00.20.4EASTN O R T H3.4A clustered versionSome applications may require a hard clustered output,representing inferred assemblages,rather than a continuous representation of biodiversity composition.The following example uses clara to make 8clusters.This is a fast claustering algorithm suitable for large data sets.The medoids are labelled and the colour key takes the value for each medoid.Other clustering methods maybe used(for example,pam would take several minutes)as alternatives,and their various cluster diagnostics may provide a guide to the appropriate numbers of clusters.>require(cluster)>ncl<-8>clPCs<-clara(PCs$x,ncl,sampsize=1000)>medcolR<-r[clPCs$i.med]>medcolG<-g[clPCs$i.med]>medcolB<-b[clPCs$i.med]>plot((PCs$x[,1:2]),xlim=xrng,ylim=yrng,+pch=".",cex=4,col=rgb(medcolR[clPCs$clustering],+medcolG[clPCs$clustering],medcolB[clPCs$clustering],+max=255),asp=1)>points(PCs$rotation[!vind,1:2]/scal,pch="+")>arrows(rep(0,lv),rep(0,lv),PCs$rotation[vec,+1]/scal,PCs$rotation[vec,2]/scal,length=0.0625)>text(PCs$rotation[vec,1]/scal+jit*sign(PCs$rotation[vec,+1]),PCs$rotation[vec,2]/scal+jit*sign(PCs$rotation[vec,+2]),labels=vec)>text(clPCs$medoids[,1:2],labels=seq(1,ncl))>legend("bottomleft",as.character(seq(1,ncl)),+pch=15,cex=1,col=rgb(medcolR,medcolG,+medcolB,max=255))−0.02−0.010.000.010.02−0.02−0.010.000.010.02PC1P C 2>plot(Trns_grid[,c("EAST","NORTH")],pch =".",+cex =3,asp =1,col =rgb(medcolR[clPCs$clustering],+medcolG[clPCs$clustering],medcolB[clPCs$clustering],+max =255))>points(Trns_grid[clPCs$i.med,c("EAST","NORTH")],+pch =as.character(seq(1,ncl)))>legend("bottomleft",as.character(seq(1,ncl)),+pch =15,cex =1,col =rgb(medcolR,medcolG,+medcolB,max =255))−0.6−0.4−0.20.00.20.40.6−0.6−0.4−0.20.00.20.4EASTN O R T H4Session informationThe simulation and output in this document were generated in the following computing envi-ronment.•R version 2.12.2Patched (2011-03-18r54866),x86_64-unknown-linux-gnu•Locale:LC_CTYPE=en_US.UTF-8,LC_NUMERIC=C ,LC_TIME=en_US.UTF-8,LC_COLLATE=C ,LC_MONETARY=C ,LC_MESSAGES=en_US.UTF-8,LC_PAPER=en_US.UTF-8,LC_NAME=C ,LC_ADDRESS=C ,LC_TELEPHONE=C ,LC_MEASUREMENT=en_US.UTF-8,LC_IDENTIFICATION=C•Base packages:base,datasets,grDevices,graphics,methods,stats,utils •Other packages:cluster˜1.13.3,extendedForest˜1.4,gradientForest˜0.1-11,lattice˜0.19-17•Loaded via a namespace(and not attached):grid˜2.12.2,tools˜2.12.2ReferencesL.˜Breiman.Random Forests.Machine Learning,45(1):5–32,2001.C.Y.Burridge,C.R.Pitcher,B.J.Hill,T.J.Wassenberg,and I.R.Poiner.A comparison of demersal communities in an area closed to trawling with those in adjacent areas open to trawling:a study in the Great Barrier Reef Marine Park,Australia.Fisheries Research,79: 64–74,2006.N.˜Ellis,S.J.Smith,and C.R.Pitcher.Gradient forests:calculating importance gradients on physical predictors.submitted manuscript.2010.S.˜Ferrier,G.˜Manion,J.˜Elith,and K.˜ing generalized dissimilarity modelling to analyse and predict patterns of beta diversity in regional biodiversity assessment.Diversity and Distributions,13(3):252–264,2007.Andy Liaw and Matthew Wiener.Classification and regression by randomforest.R News,2(3): 18–22,2002.URL /doc/Rnews/.C.R.Pitcher,W.˜Venables,N.˜Ellis,I.˜McLeod,M.˜Cappo, F.˜Pantus,M.˜Austin, P.˜Doherty,and N.˜Gribble.Gbr seabed biodiversity mapping project:Phase1 report to crc-reef.Technical report,CSIRO/AIMS/QDPI Report,2002.URL .au/resprogram/programC/seabed/Seabedphase1rpt.htm.C.R.Pitcher,P.˜Lawton,N.˜Ellis,S.J.Smith,L.S.Incze,C-L.Wei,M.E.Greenlaw,N.H.Wolff, J.˜Sameoto,and P.V.R.Snelgrove.The role of physical environmental variables in shaping patterns of biodiversity composition in seabed assemblages.submitted manuscript.2010.IR˜Poiner,J.˜Glaister,CR˜Pitcher, C.˜Burridge,T.˜Wassenberg,N.˜Gribble, B.˜Hill,SJM Blaber,DA˜Milton, D.˜Brewer,et˜al.The environmental ef-fects of prawn trawling in the far northern section of the Great Barrier Reef Marine Park:1991–1996.Final Report to GBRMPA and FRDC,1998.URL http://www.publish.csiro.au/books/bookpage.cfm?PID=2419.C.˜Strobl,A.L.Boulesteix,T.˜Kneib,T.˜Augustin,and A.˜Zeileis.Conditional variable im-portance for random forests.BMC Bioinformatics,9(1):307,2008.。
A BSTRACTTesting semiconductor memories is increasingly important today because of the high density of current memory chips. In this paper we investigate on the various functional fault models present for today’s memory technology and discuss about the ways and means to detect these faults. Emphasis is laid on classical Walking, Galloping and March pattern tests, which are widely used today for testing chip level, array level and board level functional memory defects. Test strategies and pseudo code for implementation of these techniques are also discussed.I NTRODUCTIONThis paper talks about walking, marching and galloping pattern tests for RAM. Random access memory circuits are among some of the highly dense VLSI circuits that are being fabricated today. Since the transistor lines are very close to each other, RAM circuits suffer from a very high average number of physical defects per unit chip area compared with other circuits. This fact has motivated researchers to develop efficient RAM test sequences that provide good fault coverage.For testing today’s high density memories traditional algorithms take too much test time. For instance GALPAT and WALKING I/O [5][6] require test times of order n2 and n3/2(where n is the number of bits in the chip). At that rate, assuming a cycle time of 100 ns, testing a 16Mbit chip would require 500 hours for an n2 test and 860 seconds for an order n3/2 test. Other older tests, such as Zero-One and Checkerboard, are of order n, but they have poor fault coverage. Table 1 shows the memory testing time as a function of memory size.Table 1:Test time as a function of memory size [1] This paper introduces the different fault models used for today’s RAM technologies. It then talks about the different memory testing approaches like walking, marching & galloping pattern tests and analyses the effectiveness and implementation of each of these tests.H ISTORICAL N OTE &T ODAY’S T ECHNOLOGY Through the years different approaches have been investigated and proposed for testing memories. The most traditional approach is to simply apply a sequence of test patterns to the I/O pins and test the functionality of the memory, this approach is investigated in this paper in detail. Checker board, Zero-one (MSCAN) [Breuer & Friedman, 1976], Walking [2], galloping [3], MARCH tests –Marching 1/0 test [Breuer & Friedman, 1976][5], MATS Test [Nair, Thatte & Abraham, 1979][8], MATS+ Test [Abadir & Reghbati, 1983], MATS++ [Goor, 1991], MARCH X [unpublished], MARCH C [Marinescu, 1982][10], MARCH C- [Goor, 1991], MARCH AWalking, marching and galloping patterns formemory testsTerm paper – ELEC 7250Submitted by - Arvind Raghuraman[Suk & Reddy, 1981], MARCH Y [unpublished ], MARCH B [Suk & Reddy, 1981][9], butterfly, Moving inversion (MOVI) [De Jonge & Smeulders, 1976], surround disturb are some traditional pattern tests [2][3]. A more recent approach is to redesign and augment the peripheral circuit surrounding the RAM circuit to improve the testability, popularly referred as design for testability. The other approaches propose adding even more extra hardware to the memory circuitry to realize built-in self test (BIST). But the advantage of these memory design modifications are often offset by the overhead ’s that they introduce. This overhead is a function of memory size in terms of extra hardware, so we can substantiate its presence for large memories. The main goal behind these approaches is to reduce the memory testing time which rises exponentially with memory size.M EMORY F AILURE M ODES : Classical fault models are not sufficient to represent all important failure modes in a RAM; Functional Fault models should be employed. Memory Fault models can be classified under the categories shown below, brief descriptions of the models are given as follows.Figure 1.0 Memory cell faults 1. Stuck-at fault (SAF): cell or line s-a-0 or s-a-1 [1]. 2. Stuck-open fault (SOF): open cell or broken line . 3. Transition fault (TF): cell fails to transit [1]. 4. Data retention fault (DRF): cell fails to retain its logic value after some specified time due to, e.g., leakage, resistor opens, or feedback path opens [2]. 5. Coupling fault (CF): Coupling faults are of three types [1].• Inversion coupling fault (CFin): a transitionin one cell (aggressor) inverts the content of another cell (victim). [1,3]• Idempotent coupling fault (CFid): a transitionin one cell forces a fixed logic value into another cell. [1,3]• State coupling fault (CFst): a cell/line isforced to a fixed state only if the coupling cell/line is in a given state (a.k.a. pattern sensitivity fault (PSF)). [1,3]6. Bridging fault (BF): short between cells (can be AND type or OR type) [1]7. Neighborhood Pattern Sensitive Fault (NPSF) [1] 8. Active (Dynamic) NPSF [1] 9. Passive NPSF [1] 10. Static NPSF [1]Address decoder faults (AFs)1. No cell accessed by certain address [1,3].2. Multiple cells accessed by certain address [1,3].3. Certain cell not accessed by any address [1,3].4. Certain cell accessed by multiple addresses [1].Dynamic Faults1. Recovery faults: when some part of the memorycannot recover fast enough from a previous state [2].• Sense amplifier recovery: sense amplifiersaturation after reading/writing a long string of 0s or 1s. • Write recovery: a write followed by a read orwrite at a different location resulting in reading or writing at the same location due to slow address decoder. 2. Disturb faults: victim cell forced to 0 or 1 if we read or write aggressor cell (may be the same cell)[2].3. Data Retention faults: memory loses its content spontaneously, not caused by read or write [2]. • DRAM refresh fault: Refresh-line stuck-at fault • DRAM leakage fault: Sleeping sickness —loose data in less than specified hold time (typically hundreds of micro sec to tens of ms); caused by chargeleakage or environment sensitivity; usually affects a row or a column.Static data losses —defective pull-up device Inducing excessive leakage currents which can change the state of a cell Checkerboard pattern triggers max leakage.A LGORITHM ’S AND ANALYSIS :MARCH tests [2,3]:A MARCH test consists of a finite sequence of March elements, while a March element is a finite sequence of operations applied to every cell in the memory array before proceeding to the next cell. An o peration can consist of writing a 0 into a cell (w0), writing a 1 into a cell (w1), reading an expected 0 from a cell (r0), and reading an expected 1 from a cell (r1).MARCH Test Notations:Some of the most popular notations for MARCH tests which will be used through out his paper are shown below [1].)1(:1)0(:0)1(:1)0(:0:::cell a to a writing operation write w cell a to a writing operation write w cell a from a reading operation read r cell a from a reading operation read r way either change can sequence address orderdescending in changes sequence address orderascending in changes sequence addresss ⇓⇑ MARCHING 1/0 Test [Breuer & Friedman, 1976][5]:The MARCHING 1/0 is a test of 14n complexity. It is a complete test for AF ’s, SAF ’s and TF ’s but has the ability to detect only a part of CF ’s [2]. The test sequence is given as follows.tion implementa for pseudocode r w r r w r w r w r r w r w MARCHING })1,1,0();0,0,1();1();0,0,1();1,1,0();0({:0/1⇓⇑⇑⇓⇑⇑;}{1}};{){}0)((};{};0)(){1)(({);0);1((}};{){}1)((};{};1)(){0)(({));1(;0(0pass return reversed data with sequence same the repeat to background write return else i m if return else i m i m if i i n i for return else i m if return else i m i m if i n i i for to background write ===−−>=−====++−<==MATS Test [Nair, Thatte & Abraham, 1979][8]:MATS stands for Modified Algorithmic Test Sequence . MATS is the shortest MARCH test for unlinked SAF ’s in memory cell array and read/write logic circuitry [3]. The algorithm can detect all faults for OR type technology since the result of readingmultiple cells is considered as an OR function of thecontents of those cells. This Algorithm can also beused for AF ’s of AND type technology using the MATS-AND test sequence given below [2]. TheMATS Algorithm has a complexity of 4n with abetter fault coverage compared to equivalent zero-one and checkerboard tests [2].})0();0,1();1({:})1();1,0();0({:r w r w type AND MATS r w r w type OR MATS −−}};{){}1)(({));1(;0(}};{});1)(){0)(({));1(;0(};0)({));1(;0(fail return else i m if i n i i for fail return else i m i m if i n i i for i m i n i i for tion implementa for pseudocode =++−<====++−<===++−<==MATS+ Test [Abadir & Reghbati, 1983]:The MATS+ test sequence detects all SAF ’s and AF ’s, its often used instead of MATS when the technology used under test is unknown. The MATS+ algorithm has a test complexity of 5n.;}};{};0)(){1)(({);0);1({}};{};1)(){0)(({));1(;0(;0:})0,1();1,0();0({:pass return fail return else i m i m if i i n i for fail return else i m i m if i n i i for to background write tion implementa for pseudocode w r w r w MATS ==−−>=−===++−<==⇓⇑+MATS++ [Goor, 1991]:The MATS++ test sequence is a complete, irredundant, & optimized test sequence. It is similar to the MATS+ test but allows fault coverage for TF ’s. Recommended test of 6n test complexity for unlinked SAF ’s and TF ’s.0})0,0,1();1,0();0({:to background write tion implementa for pseudocode r w r w r w MATS ⇓⇑++;}};{){}0)((};{};0)(){1)(({);0);1((}};{};1)(){0)(({));1(;0(pass return fail return else i m if fail return else i m i m if i i n i for fail return else i m i m if i n i i for ===−−>=−===++−<==MARCH X [unpublished ]The MARCH X test is called so since it has been used without being published [3]. This test detects unlinked SAF ’s, AF ’s, TF ’s and CFin ’s. The MARCH X test is a test of 6n complexity.;}};{){}0)(({));1(;0(}};{};0)(){1)(({);0);1({}};{};1)(){0)(({));1(;0(;0:})0();0,1();1,0();0({:pass return fail return else i m if i n i i for fail return else i m i m if i i n i for fail return else i m i m if i n i i for to background write tion implementa for pseudocode r w r w r w X MARCH =++−<====−−>=−===++−<==⇓⇑MARCH C [Marinescu, 1982][10]:The MARCH C test is suited for AF ’s, SAF ’s, TF ’s and all CF ’s [3]. It is a test of 11n complexity.})0();0,1();1,0();0();0,1();1,0();0({:r w r w r r w r w r w C MARCH ⇓⇓⇑⇑;}};{){}0)(({);;0(}};{}0)(){1)(({);0);1((}};{};1)(){0)(();0);1((}};{){}0)(({));1(;0(}};{};0)(){1)(({));1(;0(}};{};1)({0)(({));1(;0(0pass return fail return else i m if i n i i for fail return else i m i m if i i n i for fail return else i m i m if i i n i for fail return else i m if i n i i for fail return else i m i m if i n i i for fail return else i m i m if i n i i for to background write tion implementa for pseudocode ++<====−−>=−===−−>=−==++−<====++−<====++−<=MARCH C- [Goor, 1991] This test sequence is a modification to MARCH C test implemented in order to remove redundancy present in it. Detects unlinked AF ’s, SAF ’s, TF ’s andall CF ’s. This test is of complexity 10n.}};{}0)(){1)(({);0);1((}};{};1)(){0)(();0);1((}};{};0)(){1)(({));1(;0(}};{};1)({0)(({));1(;0(0})0();0,1();1,0();0,1();1,0();0({:fail return else i m i m if i i n i for fail return else i m i m if i i n i for fail return else i m i m if i n i i for fail return else i m i m if i n i i for to background write tion implementa for pseudocode r w r w r w r w r w C MARCH ==−−>=−===−−>=−===++−<====++−<=⇓⇓⇑⇑−;}};{){}0)(({);;0(pass return fail return else i m if i n i i for ++<==MARCH A [Suk & Reddy, 1981]The MARCH A test is the shortest test for AF ’s, SAF ’s, linked CFid ’s, TF ’s not linked with CFid ’s, and certain CFin ’s linked with CFid ’s [2]. It is a complete and irredundant test of complexity 15n.backgroundto write tion implementa for pseudocode w w r w w w r w w r w w w r w A MARCH 0})0,1,0();0,1,0,1();1,0,1();1,0,1,0();0({:⇓⇓⇑⇑;}{};0)(;1)(){0)(({);0);1((}{};0)(;1)(;0)(){1)(({);0);1((}{};1)(;0)(){1)(({));1(;0(}{};1)(;0)(;1)(){0)(({));1(;0(pass return fail return else i m i m i m if i i n i for fail return else i m i m i m i m if i i n i for fail return else i m i m i m if i n i i for fail return else i m i m i m i m if i n i i for ===−−>=−=====−−>=−====++−<======++−<==MARCH Y [unpublished ]MARCH Y test is an extension of MARCH X. This test is of complexity 8n and can detect all faults detectable by MARCH X. })0();0,0,1();1,1,0();0({:r r w r r w r w Y MARCH ⇓⇑;}};{){}0)(({));1(;0(}};{){}0)((};{);0)(){1)(({);0);1((}};{){}1)((};{};1)(){0)(({));1(;0(0pass return fail return else i m if i n i i for fail return else i m if fail return else i m i m if i i n i for fail return else i m if fail return else i m i m if i n i i for to background write tion implementa for pseudocode =++−<=====−−>=−====++−<==MARCH B [Suk & Reddy, 1981][9];The MARCH B test is an extension of MARCH A test. It is a complete and irredundant test capable of detecting AF ’s, SAF ’s, linked CFid ’s or CFin ’s. This test is of complexity 17n.;}}{};0)(;1)(){0)(({);0);1((};{};0)(;1)(;0)(){1)(({);0);1((};{};1)(;0)(){1)(({);;0(}};{}1)(){0)((};{}0)(){1)((};{}1)(){0)(({));1(;0(0})0,1,0();0,1,0,1();1,0,1();1,0,0,1,1,0();0({:pass return fail return else i m i m i m if i i n i for fail return else i m i m i m i m if i i n i for fail return else i m i m i m if i n i i for fail return else i m i m if fail return else i m i m if fail return else i m i m if i n i i for backgroundto write tion implementa for pseudocode w w r w w w r w w r w r w r w r w B MARCH ===−−>=−=====−−>=−====++<========++−<==⇓⇓⇑⇑GALPAT:GALPAT known as Galloping patterns test is a test of 4n 2 complexity. The galloping patterns test is a strong test for most faults; it is a complete test to detect and locate all SAF ’s, TF ’s, AF ’s and CF ’s. The algorithm involves writing a basic cell with its complement and then verifying every other cell for its data integrity and repeating the algorithm with data inversed.endprocedure same repeat background to write i m i m OC verify and read BC verify and read OC OC OC OC for i m i m BC BC BC BC for backgroundto write tion implementa for pseudocode GALPAT ;;1};)()(};;{));1(;0(;)()({));1(;0(0:=++−<===++−<==Sliding Galloping Row/Column/Diagonal:This is a test sequence based on galloping patterns test but instead of shifting a 1 through the memory a diagonal of one ’s is shifted and the whole memory is read after each shift. Has the same fault coverage as GALPAT expect some CF ’s that may be missed. This test is of complexity 4n 3/2.1 11 1;;;1};'};';'{));1(;0(;'{));1(;0(;0{://end procedure same repeat background to write s BC diagonal complement s BC other verify and read s BC diagonal verify and read OC OC OC OC for s BC diagonal complement BC BC BC BC for to background write tion implementa for pseudocode Diagonal Column Row Galloping Sliding ++−<==++−<==WALPAT:WALPAT (walking patterns test) is a classical memory testing technique. The walking patterns test is similar to the galloping patterns test except that the basic cell is read only once after all the other cells are read. The WALPAT test is a test of complexity 2n 2.};)()(;};{));1(;0(;)()({));1(;0(0::i m i m BC verify and read OC verify and read OC OC OC OC for i m i m BC BC BC BC for backgroundto write tion implementa for pseudocode WALPAT =++−<===++−<==end procedure same repeat background to write ;;1A NALYSIS & R ESULTS :A complete summary of different MARCH tests their fault detection capabilities and test complexity is given is Table 1.MARCH TEST DETECTION & COMPLEXITYFaultsAF SAF TF CF other TC MARCHING 1/0 DS D N N 14n MATS D D N N 4n MATS+ D D D N 5n MATS++ D D D N6n MARCH X D D D D Ul-CFin 6n MARCH C D D D D Ul-CFin 11n MARCH C-DDDDUl-CF10nMARCH A D D D D l-TF 15n MARCH Y D D D D l-CF 8n MARCH BDDDDRead access time7nN=’No ’, L=’locate ’, D=’detect ’, LS= ‘locate some ’, DS= ‘detect some ’.Table 2.0 [2]A complete summary of other pattern based memory tests their fault detection capabilities and test complexity is given is Table 1.WALPAT & GALPAT SUMMARYFaults AF SAF TF CFother TC WALPAT L L L L refresh 2n 2 GALPATL L LLSense amp rec4n 2 Galloping Diagonal LS L L N Write rec 4n 3/2N=’No ’, L=’locate ’, D=’detect ’, LS= ‘locate some ’, DS= ‘detect some ’.Table 3.0 [2]Fault coverage for different pattern based tests [4]: Fault Coverage for MARCH tests [3]:Fault MATS++ MARCHX MARCHYMARCHC-SAF’s 100% 100% 100% 100%TF’s 100% 100% 100% 100% SOF’s 100% 0.2% 100% 0.2%AF’s 100% 100% 100% 100% CFin’s 75.0% 100% 100% 100% CFid’s 37.5% 50.0% 50.0% 100% CFst’s 50.0% 62.5% 62.5% 100%C ONCLUSION:MARCH tests are extensively being used today for functional testing of SRAM and DRAM technologies. They are more efficient then older classical pattern based tests with better fault coverage. With increase in density of semiconductor memories research is on for better pattern sequences and alternative strategies like DFT and BIST.References:1.Essentials of Electronic Testing, MichaelL.Bushnell, Vishwani D.Agarwal.2.Memory testing, Cheng-Wen Wu, Lab forreliable computing (LaRC), EE, NTHU.3.RAM Fault models and Memory Testing,Cheng-Wen Wu, Lab for realiablecomputing(LaRC), NTHU.ing MARCH tests to test SRAM’S, Van DeGoor, A.J.; Design & Test of Computers,IEEE Volume 10, Issue 1, March 1993Page(s):8 – 14.5.M.A. Breuer and A.D. Friedman, Diagnosisand Reliable Design of Digital Systems,Computer Science Press, WoodlandHills, Calif., 1976.6. A.J. van de Goor, Testing SemiconductorMemories, Theory and Practice, JohnWiley&Sons, Chichester, UK, 1991.7.S.M. Thatte and J.A. Abraham, ‘Testing ofSemiconductor Random Access Memories, ”hoc. Fault-Tolerant computer Symp. ~IEEEComputer Society Press, Los alamitos, Calif.,June 1977, pp. 81-87.8.R. Nair et al., “Efficient Algorithms forTesting Semiconductor Random AccessMemories,”lEEE Trans. Computers, Vol.C-28, NO. 3, 1978, pp. 572-576.9. D.S. Suk and S.M. Reddy, “A March Testfor Functional Faults in SemiconductorRandom-Access Memories,”lEEE Trans.Computers, Vol. C-30, No. 12, 1981, pp.982-985.10.M. Marinescu, “Simple and Efficient Agorithms for Functional RAM Testing,” &oc.lEEE lnt’l Test Conf, IEEE Computer society Press, 1982, pp. 23&239.11.Automatic Generation and compaction ofMARCH tests for memory arrays, KamranZarrineh, Member, IEEE, Shambhu J.Upadhyaya, Senior Member, IEEE, andSreejit Chakravarty, Senior Member, IEEE。