From Point Cloud to Grid DEM A Scalable Approach
- 格式:pdf
- 大小:314.02 KB
- 文档页数:15
pointcloudtransform方法在计算机视觉和三维重建领域,点云(Point Cloud)是一种用于描述三维物体形状和结构的数据结构。
点云由一系列的点组成,每个点都有一个特定的位置坐标和其他可能的属性(例如颜色、法线等)。
点云通常通过传感器(如激光雷达或深度相机)进行采集,并且可以在后续的处理过程中进行分析和转换。
点云的变换(Transform)是指将点云从一个坐标系转换到另一个坐标系的过程。
常见的变换操作包括平移、旋转和缩放。
点云变换的目的通常是将点云对齐到一个标准坐标系,或者将点云与其他点云或模型进行对齐。
点云变换是三维重建和点云配准等应用中的关键步骤。
点云变换的方法有很多种,其中一种常用的方法是刚体变换(Rigid Transform)。
刚体变换是指通过平移和旋转操作将一个点云变换到另一个坐标系中的过程。
刚体变换的主要优点是可以保持点云的形状和结构不变,只改变点云的位置和姿态。
刚体变换通常通过找到两个点云之间的对应关系来实现,然后使用最小二乘法来估计平移和旋转参数。
在实际应用中,通常会使用一些点云配准算法来自动计算刚体变换参数。
除了刚体变换外,还有一些其他的点云变换方法,例如非刚体变换(Non-rigid Transform)。
非刚体变换是指对点云进行形状变换的过程,不仅包括平移和旋转,还可以包括弯曲、拉伸等形变操作。
非刚体变换通常使用更复杂的模型和算法来实现,例如局部形变模型(Local Deformation Model)和弹性体模型(Elastic Body Model)。
非刚体变换可以用于模拟物体的变形和形态学分析。
此外,还有一些其他的点云变换方法可以用于特定的应用场景。
例如,表面重建(Surface Reconstruction)是指将离散的点云数据转换为连续的表面模型的过程,通常使用插值或拟合算法来实现。
点云融合(Point Cloud Fusion)是指将多个点云数据合并成一个更大的点云数据的过程,通常使用配准算法和滤波算法来实现。
点云栅格化;python(最新版)目录1.点云栅格化的概念和意义2.Python 在点云栅格化中的应用3.使用 Python 进行点云栅格化的具体步骤4.总结正文点云栅格化是将三维点云数据转换为二维栅格数据的过程,这样就可以在二维平面上进行处理和分析。
在众多编程语言中,Python 以其丰富的库和易用的语法脱颖而出,成为进行点云栅格化处理的首选工具。
首先,我们需要了解什么是点云。
点云是由大量三维坐标点组成的数据集,它可以用来描述物体或场景的三维结构。
然而,点云数据不便于进行直观的观察和分析,因此需要将其转换为栅格数据。
栅格数据是一种以像素为单位的二维数据结构,可以方便地进行图像处理和分析。
Python 提供了多种库和工具来进行点云栅格化处理。
其中,最为常用的是 CloudCompare 和 PyPIA 库。
CloudCompare 是一款开源的点云处理软件,可以进行点云的编辑、处理和分析。
PyPIA 则是一款基于Python 的点云处理库,提供了丰富的点云处理功能。
下面,我们将详细介绍如何使用 Python 进行点云栅格化的具体步骤。
1.安装 Python 和相关库:首先,需要安装 Python,并导入所需的库,如 CloudCompare 或 PyPIA。
2.读取点云数据:使用 CloudCompare 或 PyPIA 库读取点云数据,将其转换为 Python 数据结构。
3.点云栅格化:根据需求选择合适的栅格化方法,如表面重建、体素重建等。
使用相应的库函数进行栅格化处理。
4.保存栅格数据:将栅格化后的数据保存为常用的栅格数据格式,如GeoTIFF、PNG 等。
5.数据分析和处理:使用 Python 进行栅格数据的分析和处理,如计算表面积、体积等。
点云栅格化是点云数据处理的重要环节,Python 凭借其丰富的库和易用的语法,成为了点云栅格化处理的有力工具。
matlab中point cloud toolboxs的使用方法在Matlab中使用Point Cloud Toolboxs,可以便捷地处理和分析点云数据。
点云数据是由大量离散点组成的三维空间信息,广泛应用于计算机视觉、机器人、三维重建和虚拟现实等领域。
下面将介绍几个使用Point Cloud Toolboxs的重要方法和函数。
首先,在Matlab中加载点云数据可以使用`pcread`函数。
该函数可以读取各种格式的点云数据文件,如PLY、PCD和XYZ等,并将其转换为点云对象。
例如,以下代码展示了如何加载一个PLY格式的点云文件:```matlabptCloud = pcread('pointcloud.ply');```接下来,可以利用Point Cloud Toolboxs中的函数对点云数据进行预处理和分析。
其中一个重要的函数是`pcshow`,它可以显示点云数据的三维可视化效果。
例如,以下代码演示了如何可视化一个点云对象:```matlabpcshow(ptCloud);```此外,Point Cloud Toolboxs还提供了一系列处理点云数据的函数,如滤波、降采样、配准和重建等。
例如,可以使用`pcdenoise`函数对点云数据进行去噪处理:```matlabptCloudDenoised = pcdenoise(ptCloud);```另一个重要的函数是`pcdownsample`,可以对点云数据进行降采样操作,以减少数据量和计算复杂度:```matlabptCloudDownsampled = pcdownsample(ptCloud, 'gridAverage', 0.01);```此外,Point Cloud Toolboxs还提供了一些用于点云配准和重建的函数,如`pcmerge`和`pcfitplane`等。
这些函数可以用于将多个点云数据进行融合,或者将点云数据拟合成平面或曲面。
pointcloud2解析距离
在Pointcloud2中,每个点都包含了其在3D空间中的位置信息和其他属性信息。
要计算点之间的距离,可以使用以下步骤:
1. 首先,从Pointcloud2消息中提取点的数据。
可以使用点云库(如PCL)或自定义代码来完成此操作。
通常,点云数据存储在点云消息的“data”字段中,以二进制格式存储。
2. 解析点云数据并将其转换为点的坐标。
Pointcloud2消息中的数据是按行存储的,每个点的坐标之间可能有其他属性数据。
根据点云的类型(例如XYZ坐标、XYZRGB坐标等),解析出所需的坐标数据。
3. 使用解析出的点的坐标计算点之间的距离。
可以使用欧几里得距离公式或其他距离度量公式来计算点之间的距离。
例如,对于3D 空间中的两个点P1(x1,y1,z1)和P2(x2,y2,z2),欧几里得距离可以通过以下公式计算:
distance = sqrt((x2 - x1)^2 + (y2 - y1)^2 + (z2 - z1)^2)
其中sqrt表示平方根运算。
4. 重复步骤2和步骤3,以计算所有点之间的距离。
请注意,Pointcloud2消息中可能还包含其他属性数据,如点的颜色、法线等。
如果需要使用这些属性来计算点之间的距离,可以将其解析出来并应用相应的计算方法。
由不规则离散点集生成T I N与G R I D-D E M由不规则离散点集生成TIN与GRID DEM现阶段生成数字高程模型(DEM)的方法较多,如以摄影测量得到的像对为数据源跟踪生成等高线及DEM,由机载激光测距仪记录规则点集后生产数据,也可采用传统的地形图扫描后跟踪等高线,记录一连串离散点集,接着运用各类算法进行处理,最后生成不规则三角网(TIN)与规则格网(GRID)DEM的方法。
本文主要介绍的就是以等高线(参考图一)和离散点集为数据源,产生TIN与GRID DEM的技术路线。
具体步骤如下:1)跟踪等高线生成离散点集,记录在文本文件中。
参考图二和图三。
2)读取文本文件中的数据,进行预处理。
主要工作是找到XY轴方向上最小最大数值,压缩数据范围,避免数据范围跨度太大或太小,即出现数据分布稠密或稀疏的情况。
while(!_demfile.eof()){_demfile >> point3dXYZ[i][0]>> point3dXYZ[i][1]>> point3dXYZ[i][2];point3dXYZ[i][2] = point3dXYZ[i][2] / 2; //因为XY轴在随后调整,因此相应调Z轴数值if(xMin>point3dXYZ[i][0]) xMin = point3dXYZ[i][0];//得到整个范围的最大与最小数值if(xMax<point3dXYZ[i][0]) xMax = point3dXYZ[i][0];if(yMin>point3dXYZ[i][1]) yMin = point3dXYZ[i][1];if(yMax<point3dXYZ[i][1]) yMax = point3dXYZ[i][1];if(zMin>point3dXYZ[i][2]) zMin = point3dXYZ[i][2];if(zMax<point3dXYZ[i][2]) zMax = point3dXYZ[i][2];i++;}_demfile.close(); //文件流读取完毕,关闭_demfile.clear(); //文件流清除3)完成TIN中点,线,三角形和网的定义。
点云目标框生成方法Generating bounding boxes for point cloud objects is a crucial taskin the field of computer vision and autonomous driving. It involves detecting and localizing objects within a three-dimensional space, which can be challenging due to the irregular and sparse nature of point cloud data.生成点云对象的边界框是计算机视觉和自动驾驶领域的关键任务。
它涉及在三维空间中检测和定位对象,由于点云数据的不规则和稀疏性,这可能是具有挑战性的。
One approach to generating bounding boxes for point cloud objects is to utilize deep learning techniques, such as Convolutional Neural Networks (CNNs) or PointNet, which have shown promising results in object detection tasks. By feeding point cloud data into these models, they can learn to extract features and predict bounding boxes for objects in the scene.生成点云对象的边界框的一种方法是利用深度学习技术,例如卷积神经网络(CNN)或PointNet,这些技术在目标检测任务中显示出了良好的结果。
将点云数据输入到这些模型中,它们可以学习提取特征并预测场景中的物体的边界框。
多阈值提取平面点云边界点的方法随着三维扫描、激光雷达等技术的快速发展,获取大量三维点云数据成为了一项基础技术。
而对这些点云数据进行处理和分析,则成为了三维数据处理的一个重要环节。
其中,点云数据中的边界点提取是一个非常重要的问题,因为这些边界点可以提供物体形态等方面的重要信息。
本文将介绍一种用于提取平面点云边界点的方法——多阈值提取平面点云边界点的方法。
一、背景点云数据是一种离散的3D数据,由互相独立、无规律排布的点组成。
点云数据在现实中广泛应用于建模、自动驾驶、安全防范等领域,但其点数多、点云噪声多、点云分布不均匀等问题使得点云数据处理变得比较困难。
其中,点云的边缘提取是点云处理的重要环节之一。
点云边缘提取是指从点云中找出代表物体表面的点,这些点被称为边缘点。
在点云边缘提取中,“边缘”可以是物体的几何轮廓,也可以是不同颜色或材质之间划分的分界线。
在本文中,我们将介绍如何利用平面点云数据中单层或者多层的多阈值来提取其边界点。
二、多阈值提取平面点云边界点的方法多阈值提取平面点云边界点的方法是基于边界点的属性阈值得到的,具有一定的普适性和局限性。
其基本思想是利用边界点与噪声点的不同属性(如曲率、颜色、质心距离等)来区分出两者,然后在点云中根据点的属性在多个阈值中筛选出边界点。
1. 第一步:计算点云曲率在多阈值提取平面点云边界点的方法中,首先需要计算出点云中每个点的曲率。
曲率是表征点云中点的曲率大小的指标之一,它表示表面在当前点的弯曲率。
曲率的计算可以采用一些现有的算法,比如基于卡方分布、基于球形拟合、基于最小二乘法等算法。
2. 第二步:计算点云属性(颜色、质心距离)在多阈值提取平面点云边界点的方法中,常用的点云属性有颜色和质心距离。
颜色可以指代点云中每个点的RGB值,质心距离是指点云中每个点与当前面的质心之间的距离。
3. 第三步:设定多个阈值在多阈值提取平面点云边界点的方法中,需要设定多个阈值,常用的有曲率阈值、颜色阈值、质心距离阈值等。
一种点云物体的参数化边缘曲线提取方法说实话一种点云物体的参数化边缘曲线提取方法这事儿,我一开始也是瞎摸索。
我就知道点云是一大堆的点嘛,要从这里头找出物体的边缘曲线,感觉就像在乱麻里找线头,真的是无从下手。
我试过先简单地根据点的密度来区分,想着边缘的点密度可能和内部的不太一样,可试了才发现这太不靠谱了。
有时候因为点云的采集或者物体的形状,靠密度根本区分不出来边缘。
后来呀,我又想到按照距离来。
就是测量每个点到周围其他点平均距离啥的,觉得距离大的可能就是边缘点。
但这也有一堆问题。
比如说要是物体形状比较特殊或者有很多小突起的话,就全乱套了。
我当时真是有苦说不出啊。
然后我又换了个方法,开始考虑用几何学的东西。
我想着把点云给拟合到一些几何形状上。
这感觉就像是把一堆形状不规则的土块,给它填到合适的模具里一样。
比如对于一些近似球体或者立方体的物体,先拟合出来大致形状,再从这个拟合的形状上去找边缘。
但这个方法太受物体形状的局限了,如果物体很不规则,就没法拟合得准。
再后来,我就想到能不能先找出整个点云的凸包呢?凸包就像是一个紧紧包裹住点云的外皮。
这时候我就把每个点都拿来判断一下,看看跟凸包的关系。
如果是凸包上的点,那就有点像是边缘点的候选人了。
但这里边也有很多陷阱,有些凸包上的点可能是因为采集误差或者其他意外情况才在凸包上的,其实不是真正的边缘点。
到现在呢,我觉得比较靠谱的一个办法啊,就是先做空间分割。
就好比把一个大蛋糕切成好几块一样,把点云根据不同的区域分割开来。
然后在这些小的区域里再去用之前那种凸包的办法来找边缘点。
这样就能减少那种意外因素的干扰。
不过呢,这个方法也不是百分百完美,在分割的时候也得特别小心,要是分错了区域,那后面就又全都错了。
而且有时候点云质量要是不好,噪声点太多的话,还得先做做去噪处理,这就像是淘米一样,把那些杂质都去掉,才能更好的进行下一步的操作。
这就是我摸索出来的一些东西,希望对你们也有点用。
点云裁剪算法全文共四篇示例,供读者参考第一篇示例:点云裁剪是三维点云处理中重要的一个环节,它能够通过去除不需要的部分,从而减少后续处理的复杂度,提高处理效率。
点云裁剪算法通常使用在三维视觉和三维重建等领域,是一个十分关键的技术。
点云(Point Cloud)是由大量的点构成的三维数据集,它记录了空间中物体的位置、形态和颜色等信息。
在实际应用中,点云数据通常会包含大量的无效点,比如背景信息或者不需要处理的物体。
对点云进行裁剪是必不可少的一步。
点云裁剪的目标是从整个点云数据集中选择出与我们关注的物体或区域相关的部分,将其提取出来,从而达到简化数据、提高处理效率的目的。
点云裁剪算法的本质是对点云数据进行筛选和过滤,只保留我们关注的部分,同时减少冗余信息。
点云裁剪算法可以根据裁剪的原理和方法分为多种不同的类型,比如基于空间的裁剪、基于形状的裁剪、基于颜色的裁剪等。
基于空间的裁剪是最常用的一种,它通过设定一个边界框或者裁剪平面,将点云数据集限制在指定的空间范围内。
基于空间的裁剪算法一般会依赖于一些预先设定的参数,比如裁剪框的位置、大小和形状等。
根据不同的需求,可以采用不同的裁剪方法,如立方体裁剪、平面裁剪、球形裁剪等。
这些方法都可以根据具体的场景和需求进行选择和调整。
除了基于空间的裁剪,点云裁剪算法还可以根据点云的形状和特征进行裁剪。
基于形状的裁剪算法可以通过对点云数据进行形状匹配或者几何计算,从而剔除不符合形状规律的部分。
这样可以更精确地提取出我们所需要的信息,减少误差和干扰。
基于颜色的裁剪算法则可以通过对点云数据进行颜色识别和匹配,将具有相似颜色特征的点云部分提取出来。
这种方法一般用于处理带有彩色信息的点云数据,比如RGB-D点云,可以更好地保留物体的表面纹理和颜色信息。
在实际应用中,点云裁剪算法常常与其他处理算法相结合,比如点云配准、点云重构等。
通过裁剪可以将处理的范围限制在感兴趣的区域内,从而提高后续算法的精度和效率。
From Point Cloud to Grid DEM:A Scalable ApproachPankaj K.Agarwal 1,Lars Arge 12,and Andrew Danner 11Department of Computer Science,Duke University,Durham,NC27708,USA{pankaj,large,adanner}@2Department of Computer Science,University of Aarhus,Aarhus,Denmarklarge@daimi.au.dkSummary.Given a set S of points in R3sampled from an elevation function H:R2→R, we present a scalable algorithm for constructing a grid digital elevation model(DEM).Our algorithm consists of three stages:First,we construct a quad tree on S to partition the point set into a set of non-overlapping segments.Next,for each segment q,we compute the set of points in q and all segments neighboring q.Finally,we interpolate each segment independently using points within the segment and its neighboring segments.Data sets acquired by LIDAR and other modern mapping technologies consist of hundreds of millions of points and are too large tofit in main memory.When processing such massive data sets,the transfer of data between disk and main memory(also called I/O),rather than the CPU time,becomes the performance bottleneck.We therefore present an I/O-efficient algo-rithm for constructing a grid DEM.Our experiments show that the algorithm scales to data sets much larger than the size of main memory,while existing algorithms do not scale.For example,using a machine with1GB RAM,we were able to construct a grid DEM containing 1.3billion cells(occupying1.2GB)from a LIDAR data set of over390million points(occu-pying20GB)in about53hours.Neither ArcGIS nor GRASS,two popular GIS products,were able to process this data set.1IntroductionOne of the basic tasks of a geographic information system(GIS)is to store a repre-sentation of various physical properties of a terrain such as elevation,temperature, precipitation,or water depth,each of which can be viewed as a real-valued bivariate function.Because of simplicity and efficacy,one of the widely used representations is the so-called grid representation in which a functional value is stored in each cell of a two-dimensional uniform grid.However,many modern mapping technologies Supported by NSF under grants CCR-00-86013,EIA-01-31905,CCR-02-04118,and DEB-04-25465,by ARO grants W911NF-04-1-0278and DAAD19-03-1-0352,and by a grant from the U.S.–Israel Binational Science Foundation.Supported in part by the US Army Research Office through grant W911NF-04-1-0278and by an Ole Rømer Scholarship from the Danish National Science Research Council.Supported by the US Army Research Office through grant W911NF-04-1-0278.2P.K.Agarwal,L.Arge,and A.Danner do no acquire data on a uniform grid.Hence the raw data is a set S of N(arbitrary) points in R3,sampled from a function H:R2→R.An important task in GIS is thus to interpolate S on a uniform grid of a prescribed resolution.In this paper,we present a scalable algorithm for this interpolation problem.Al-though our technique is general,we focus on constructing a grid digital elevation model(DEM)from a set S of N points in R3acquired by modern mapping tech-niques such as LIDAR.These techniques generate huge amounts of high-resolution data.For example,LIDAR3acquires highly accurate elevation data at a resolution of one point per square meter or better and routinely generates hundreds of millions of points.It is not possible to store these massive data sets in the internal memory of even high-end machines,and the data must therefore reside on larger but consider-ably slower disks.When processing such huge data sets,the transfer of data between disk and main memory(also called I/O),rather than computation,becomes the per-formance bottleneck.An I/O-efficient algorithm that minimizes the number of disk accesses leads to tremendous runtime improvements in these cases.In this paper we develop an I/O-efficient algorithm for constructing a grid DEM of unprecedented size from massive LIDAR data sets.Related work.A variety of methods for interpolating a surface from a set of points have been proposed,including inverse distance weighting(IDW),kriging, spline interpolation and minimum curvature surfaces.Refer to[12]and the refer-ences therein for a survey of the different methods.However,the computational complexity of these methods often make it infeasible to use them directly on even moderately large points sets.Therefore,many practical algorithms use a segmenta-tion scheme that decomposes the plane(or rather the area of the plane containing the input points)into a set of non-overlapping areas(or segments),each contain-ing a small number of input points.One then interpolates the points in each segment independently.Numerous segmentation schemes have been proposed,including sim-ple regular decompositions and decompositions based on V oronoi diagrams[18]or quad trees[14,11].A few schemes using overlapping segments have also been pro-posed[19,17].As mentioned above,since I/O is typically the bottleneck when processing large data sets,I/O-efficient algorithms are designed to explicitly take advantage of the large main memory and disk block size[2].These algorithms are designed in a model in which the computer consists of an internal(or main)memory of size M and an infinite external putation is considered free but can only occur on ele-ments in main memory;in one I/O-operation,or simply I/O,B consecutive elements can be transfered between internal and external memory.The goal of an I/O-efficient algorithm is to minimize the number of I/Os.ManyΘ(N)time algorithms that do not explicitly consider I/O useΘ(N)I/Os when used in the I/O-model.However,the“linear”bound,the number of I/Os needed to read N elements,is onlyΘ(scan(N))=Θ(N)in the I/O model.The number of 3In this paper,we consider LIDAR data sets that represent the actual terrain and have been pre-processed by the data providers to remove spikes and errors due to noise.From Point Cloud to Grid DEM:A Scalable Approach 3I/Os needed to sort N elements is Θ(sort(N ))=Θ(N B log M/B N B )[2].In prac-tice,B is on the order of 103–105,so scan(N )and sort(N )are typically much smaller than N .Therefore tremendous speedups can often be obtained by develop-ing algorithms that use O (scan(N ))or O (sort(N ))I/Os rather than Ω(N )I/Os.Numerous I/O-efficient algorithms and data structures have been developed in re-cent years,including several for fundamental GIS problems (refer to [4]and the references therein for a survey).Agarwal et.al [1]presented a general top-down lay-ered framework for constructing a certain class of spatial data structures–including quad trees–I/O-efficiently.Hjaltason and Samet [9]also presented an I/O-efficient quad-tree construction algorithm.This optimal O (sort(N ))I/O algorithm is based on assigning a Morton block index to each point in S ,encoding its location along a Morton-order (Z-order)space-filling curve,sorting the points by this index,and then constructing the structure in a bottom-up manner.Our approach.In this paper we describe an I/O-efficient algorithm for construct-ing a grid DEM from LIDAR points based on quad-tree segmentation.Most of the segmentation based algorithms for this problem can be considered as consisting of three separate phases;the segmentation phase,where the decomposition is computed based on S ;the neighbor finding phase,where for each segment in the decomposi-tion the points in the segment and the relevant neighboring segments are computed;and the interpolation phase,where a surface is interpolated in each segment and the interpolated values of the grid cells in the segment are computed.In this paper,we are more interested in the segmentation and neighbor finding phases than the par-ticular interpolation method used in the interpolation phase.We will focus on the quad tree based segmentation scheme because of its relative simplicity and because it has been used with several interpolation methods such as thin plate splines [14]and B-splines [11].We believe that our techniques will apply to other segmentation schemes as well.Our algorithm implements all three phases I/O-efficiently,while allowing the use of any given interpolation method in the interpolation phase.Given a set S of N points,a desired output grid specified by a bounding box and a cell resolution,as well as a threshold parameter k max ,the algorithm uses O (N B h M +sort(T ))I/Os,where h is the height of a quad tree on S with at most k max points in each leaf,and T is the number of cells in the desired grid DEM.Note that this is O (sort(N )+sort(T ))I/Os if h =O (log N ),that is,if the points in S are distributed such that the quad tree is roughly balanced.The three phases of our algorithm are described in Section 2,Section 3and Sec-tion 4.In Section 2we describe how to construct a quad tree on S with at most k max points in each leaf using O (N h log M B )I/Os.The algorithm is based on the frame-work of Agarwal et.al [1].Although not as efficient as the algorithm by Hjaltason and Samet [9]in the worst case,we believe that it is simpler and potentially more practical;for example,it does not require computation of Morton block indices or sorting of the input points.Also in most practical cases where S is relatively nicely distributed,for example when working with LIDAR data,the two algorithms both4P.K.Agarwal,L.Arge,and A.Danner use O (sort(N ))I/Os.In Section 3we describe how to find the points in all neighbor leaves of each quad-tree leaf using O (N B h log M B )I/Os.The algorithm is simple and very similar to our quad-tree construction algorithm;it takes advantage of how the quad tree is naturally stored on disk during the segmentation phase.Note that while Hjaltason and Samet [9]do not describe a neighbor finding algorithm based on their Morton block approach,it seems possible to use their approach and an I/O-efficient priority queue [5]to obtain an O (sort(N ))I/O algorithm for the problem.However,this algorithm would be quite complex and therefore probably not of practical inter-est.Finally,in Section 4we describe how to apply an interpolation scheme to the points collected for each quad-tree leaf,evaluate the computed function at the rele-vant grid cells within the segment corresponding to each leaf,and construct the final grid using O (scan(N ))+O (sort(T ))I/Os.As mentioned earlier,we can use any given interpolation method within each segment.To investigate the practical efficiency of our algorithm we implemented it and experimentally compared it to other interpolation algorithms using LIDAR data.To summarize the results of our experiments,we show that,unlike existing algorithms,our algorithm scales to data sets much larger than the main memory.For example,using a 1GB machine we were able to construct a grid DEM containing 1.3billion points (occupying 1.2GB)from a LIDAR data set of over 390million points (occu-pying 20GB)in just 53hours.This data set is an order of magnitude larger than what could be handled by two popular GIS products–ArcGIS and GRASS.In addition to supporting large input point sets,we were also able to construct very large high res-olution grids;in one experiment we constructed a one meter resolution grid DEM containing more than 53billion cells—storing just a single bit for each grid cell in this DEM requires 6GB.In Section 5we describe the details of the implementation of our theoreti-cally I/O-efficient algorithm that uses a regularized spline with tension interpolation method [13].We also describe the details of an existing algorithm implemented in GRASS using the same interpolation method;this algorithm is similar to ours but it is not I/O-efficient.In Section 6we describe the results of the experimental compar-ison of our algorithm to other existing implementations.As part of this comparison,we present a detailed comparison of the quality of the grid DEMs produced by our algorithm and the similar algorithm in GRASS that show the results are in good agreement.2Segmentation Phase:Quad-Tree Construction Given a set S of N points contained in a bounding box [x 1,x 2]×[y 1,y 2]in the plane,and a threshold k max ,we wish to construct a quad tree T [8]on S such that each quad-tree leaf contains at most k max points.Note that the leaves of T partition the bounding box [x 1,x 2]×[y 1,y 2]into a set of disjoint areas,which we call segments .Incremental construction.T can be constructed incrementally simply by in-serting the points of S one at a time into an initially empty tree.For each point p ,From Point Cloud to Grid DEM:A Scalable Approach 5we traverse a root-leaf path in T to find the leaf v containing p .If v contains less than k max points,we simply insert p in v .Otherwise,we split v into four new leaves,each representing a quadrant of v ,and re-distribute p and the points in v to the new leaves.If h is the height of T ,this algorithm uses O (Nh )time.If the input points in S are relatively evenly distributed we have h =O (log N ),and the algorithm uses O (N log N )time.If S is so large that T must reside on disk,traversing a path of length h may require as many as h I/Os,leading to an I/O cost of O (Nh )in the I/O-model.By storing (or blocking )the nodes of T on disk intelligently,we may be able to access a subtree of depth log B (size B )in a single I/O and thus reduce the cost to O (N h log B )I/Os.Caching the top-most levels of the tree in internal memory may also reduce the number of I/Os needed.However,since not all the levels fit in internal memory,it is hard to avoid spending an I/O to access a leaf during each insertion,or Ω(N )I/Os in total.Since sort(N ) N in almost all cases,the incremental approach is very inefficient when the input points do not fit in internal memory.Level-by-level construction.A simple I/O-efficient alternative to the incre-mental construction algorithm is to construct T level-by-level:We first construct the first level of T ,the root v ,by scanning through S and,if N >k max ,distributing each point p to one of four leaf lists on disk corresponding to the child of v containing p .Once we have scanned S and constructed one level,we construct the next level by loading each leaf list in turn and constructing leaf lists for the next level of T .While processing one list we keep a buffer of size B in memory for each of the four new leaf lists (children of the constructed node)and write buffers to the leaf lists on disk as they run full.Since we in total scan S on each level of T ,the algorithm uses O (Nh )time,the same as the incremental algorithm,but only O (Nh/B )I/Os.However,even in the case of h =log 4N ,this approach is still a factor of log M B N /log 4N from the optimal O (N B log M B N B )I/O bound.Hybrid ing the framework of Agarwal et.al [1],we design a hybrid algorithm that combines the incremental and level-by-level approaches.In-Fig.1.Construction of a quad-tree layer of depth three with k max =2.Once a leaf at depth three is created,no further splitting is done;instead additional points in the leaf are stored in leaf lists shown below shaded nodes.After processing all points the shaded leaves with more than two points are processed recursively.6P.K.Agarwal,L.Arge,and A.Dannerstead of constructing a single level at a time,we can construct a layer of log4MB levels.Because4log4M B=M/B<M,we construct the layer entirely in internal memory using the incremental approach:We scan through S,inserting points one at a time while splitting leaves and constructing new nodes,except if the path fromthe root of the layer to a leaf of the layer is of height log4MB .In this case,we writeall points contained in such a leaf v to a list L v on disk.After all points have been processed and the layer constructed,we write the layer to disk sequentially and re-cursively construct layers for each leaf list L i.Refer to Figure1.Since a layer has at most M/B nodes,we can keep a internal memory buffer of size B for each leaf list and only write points to disk when a buffer runs full(for leaves that contain less than B points in total,we write the points in all such leaves to a single list after constructing the layer).In this way we can construct a layer on Npoints in O(N/B)=scan(N)I/Os.Since a tree of height h has h/log4MB layers,the total construction cost is O(N hlog MB )I/Os.This is sort(N)=O(N log MBN)I/Os when h=O(log N).3Neighbor Finding PhaseLet T be a quad tree on S.We say that two leaves are neighbors if their associated segments share part of an edge or a corner.Refer to Figure2for an example.If L is the set of segments associated with the leaves of T,we want tofind for each q∈L the set S q of points contained in q and the neighbor leaves of q.As for the construction algorithm,wefirst describe an incremental algorithm and then improve its efficiency using a layered approach.Fig.2.The segment q associated with a leaf of a quad tree and its six shaded neighboring segments.Incremental approach.For each segment q∈L,we canfind the points in the neighbors of q using a simple recursive procedure:Starting at the root v of T,we compare q to the segments associated with the four children of v.If the bounding box of a child u shares a point or part of an edge with q,then q is a neighbor of atFrom Point Cloud to Grid DEM:A Scalable Approach 7least one leaf in the tree rooted in u ;we therefore recursively visit each child with an associated segment that either neighbors or contains q .When we reach a leaf we insert all points in the leaf in S q .To analyze the algorithm,we first bound the total number of neighbor segments found over all segments q ∈L .Consider the number of neighbor segments that are at least the same size as a given segment q ;at most one segment can share each of q ’s four edges,and at most four more segments can share the four corner points of q .Thus,there are at most eight such neighbor segments.Because the neighbor relation is symmetric,the total number of neighbor segments over all segments is at most twice the total number of neighbor segments which are at least the same size.Thus the total number of neighbor segments over all segments is at most 16times the num-ber of leaves of T .Because the total number of leaves is at most 4N/k max ,and since the above algorithm traverses a path of height h for each neighbor,it visits O (Nh )nodes in total.Furthermore,as each leaf contains at most k max points,the algorithm reports O (Nk max )points in total.Thus the total running time of the algorithm is O ((h +k max )N )=O (hN )).This is also the worst case I/O cost.Layered approach.To find the points in the neighboring segments of each seg-ment in L using a layered approach similar to the one used to construct T ,we first load the top log 4M B levels of T into memory.We then associate with each leaf u in the layer,a buffer B u of size B in internal memory and a list L u in external mem-ory.For each segment q ∈L ,we use the incremental algorithm described above to find the leaves of the layer with an associated segment that completely contains q or share part of a boundary with q .Suppose u is such a layer leaf.If u is also a leaf of the entire tree T ,we add the pair (q,S u )to a global list Λ,where S u is the set of points stored at u .Otherwise,we add q to the buffer B u associated with u ,which is written to L u on disk when B u runs full.After processing all segments in L ,we recursively process the layers rooted at each leaf node u and its corresponding list L u .Finally,after processing all layers,we sort the global list of neighbor points Λby the first element q in the pairs (q,S u )stored in Λ.After this,the set S q of points in the neighboring segments of q are in consecutive pairs of Λ,so we can construct all S q sets in a simple scan of Λ.Since we access nodes in T during the above algorithm in the same order they were produced in the construction of T ,we can process each layer of log 4M B levels of T in scan(N )I/Os.Furthermore,since q |S q |=O (N ),the total number of I/Osused to sort and scan Λis O (sort(N )).Thus the algorithm uses O (N B h M B )I/Os intotal,which is O (sort(N ))when h =O (log N ).4Interpolation PhaseGiven the set S q of points in each segment q (quad tree leaf area)and the neighboring segments of q ,we can perform the interpolation phase for each segment q in turn sim-ply by using any interpolation method we like on the points in S q ,and evaluating the8P.K.Agarwal,L.Arge,and A.Danner computed function to interpolate each of the grid cells in q.Since q|S q|=O(N), and assuming that each S qfits in memory(otherwise we maintain a internal memory priority queue to keep the n max<M points in S q that are closest to the center of q, and interpolate on this subset),we can read each S q into main memory and perform the interpolation in O(scan(N))I/Os in total.However,we cannot simply write the interpolated grid cells to an output grid DEM as they are computed,since this could result in an I/O per cell(or per segment q).Instead we write each interpolated grid cell to a list along with its position(i,j)in the grid;we buffer B cells at a time in memory and write the buffer to disk when it runs full.After processing each set S q, we sort the list of interpolated grid cells by position to obtain the output grid.If the output grid has size T,computing the T interpolated cells and writing them to the list takes O(T/B)I/Os.Sorting the cells take O(sort(T))I/Os.Thus the interpolation phase is performed in O(scan(N)+sort(T))I/Os in total.5ImplementationWe implemented our methods in C++using TPIE[7,6],a library that eases the implementation of I/O-efficient algorithms and data structures by providing a set of primitives for processing large data sets.Our algorithm takes as input a set S of points,a grid size,and a parameter k max that specifies the maximum number of points per quad tree segment,and computes the interpolated surface for the grid using our segmentation algorithm and a regularized spline with tension interpolation method[13].We chose this interpolation method because it is used in the open source GIS GRASS module s.surf.rst[14]—the only GRASS surface interpolation method that uses segmentation to handle larger input sizes—and provides a means to compare our I/O-efficient approach to an existing segmentation method.Below we discuss two implementation details of our approach:thinning the input point set,and supporting a bit mask.Additionally,we highlight the main differences between our implementation and s.surf.rst.Thinning point sets.Because LIDAR point sets can be very dense,there are often several cells in the output grid that contain multiple input points,especially when the grid cell size is large.Since it is not necessary to interpolate at sub-pixel resolutions,computational efficiency improves if one only includes points that are sufficiently far from other points in a quad-tree segment.Our implementation only includes points in a segment that are at least a user-specified distanceεfrom all other points within the segment.By default,εis half the size of a grid cell.We implement this feature with no additional I/O cost simply by checking the distance between a new point p and all other points within the quad-tree leaf containing p and discarding p if it is within a distanceεof another point.Bit mask.A common GIS feature is the ability to specify a bit mask that skips computation on certain grid cells.The bit mask is a grid of the same size as the output grid,where each cell has a zero or one bit value.We only interpolate grid cellFrom Point Cloud to Grid DEM:A Scalable Approach 9values when the bit mask for the cell has the value one.Bit masks are particularly useful when the input data set consists of an irregularly shaped region where the input points are clustered and large areas of the grid are far from the input points.Skipping the interpolation of the surface in these places reduces computation time,especially when many of the bit mask values are zero.For high resolution grids,the number of grid cells can be very large,and the bit mask may be larger than internal memory and must reside on disk.Randomly querying the bit mask for each output grid cell would be very expensive in terms of I/O ing the same filtering idea described in Section 2and Section 3,we filter the bit mask bits through the quad-tree layer by layer such that each quad-tree segment gets a copy of the bit mask bits it needs during interpolation.The algorithm uses O (T B h log MB)I/Os in total,where T is the number of cells in the output grid,which is O (sort(T ))when h =O (log N ).The bits for a given segment can be accessed sequentially as we interpolate each quad-tree segment.GRASS Implementation.The GRASS module s.surf.rst uses a quad-tree segmentation,but is not I/O-efficient in several key areas which we briefly discuss;constructing the quad tree,supporting a bit mask,finding neighbors,and evaluating grid cells.All data structures in the GRASS implementation with the exception of the output grid are stored in memory and must use considerably slower swap space on disk if internal memory is exhausted.During construction points are simply inserted into an internal memory quad tree using the incremental construction approach of Section 2.Thinning of points using the parameter εduring construction is imple-mented exactly as our implementation.The bit mask in s.surf.rst is stored as a regular grid entirely in memory and is accessed randomly during interpolation of segments instead of sequentially in our approach.Points from neighboring quad-tree segment are not found in advance as in our algorithm,but are found when interpolating a given quad-tree segment q ;the algo-rithm creates a window w by expanding q in all directions by a width δand querying the quad tree to find all points within w .The width δis adjusted by binary search until the number of points within w is between a user specified range [n min ,n max ].Once an appropriate number of points is found for a quad-tree segment q ,the grid cells in q are interpolated and written directly to the proper location in the output grid by randomly seeking to the appropriate file offset and writing the interpolated results.When each segment has a small number of cells,writing the values of the T output grid cells uses O (T ) sort(T )I/Os.Our approach constructs the output grid using the significantly better sort(T )I/Os.6ExperimentsWe ran a set of experiments using our I/O-efficient implementation of our algorithm and compared our results to existing GIS tools.We begin by describing the data sets on which we ran the experiments,then compare the efficiency and accuracy of our algorithm with other methods.We show that our algorithm is scalable to over 39510P.K.Agarwal,L.Arge,and A.Danner(a)(b)Fig.3.(a)Neuse river basin data set and(b)Outer Banks data set,with zoom to very small region.million points and over53billion output grid cells–well beyond the limits of other GIS tools we tested.Experimental setup.We ran our experiments on an Intel3.4GHz Pentium4 hyper-threaded machine with1GB of internal memory,over4GB of swap space,and running a Linux2.6kernel.The machine had a pair of400GB SATA disk drives in a non-RAID configuration.One disk stored the input and output data sets and the other disk was used for temporary scratch space.For our experiments we used two large LIDAR data sets,freely available from online sources;one of the Neuse river basin from the North Carolina Floodmaps project[15]and one of the North Carolina Outer Banks from NOAA’s Coastal Ser-vices Center[16].Neuse river basin.This data set contains500million points,more than20GB of raw data;see Figure3(a).The data have been pre-processed by the data providers to remove most points on buildings and vegetation.The average spacing between points is roughly20ft.Outer banks.This data set contains212million LIDAR points,9GB of raw data; see Figure3(b).Data points are confined to a narrow strip(a zoom of a very small portion of the data set is shown in thefigure).This data set has not been heavily pre-processed to remove buildings and vegetation.The average point spacing is roughly 3ft.Scalability results.We ran our algorithm on both the Neuse river and Outer Banks data sets at varying grid cell resolutions.Because we used the default value ofε(half the grid cell size)increasing the size of grid cells decreased the number of points in the quad tree and the number of points used for interpolation.Results。