Ray Tracing Complex Scenes on Graphics Hardware Student (Pei-Lun Lee)
- 格式:pdf
- 大小:697.52 KB
- 文档页数:43
Geometric ModelingGeometric modeling is a fundamental aspect of computer graphics and computer-aided design (CAD), serving as the foundation for creating and manipulatingvirtual representations of physical objects. From architectural designs to video game environments, geometric modeling plays a crucial role in various industriesand applications. This essay delves into the multifaceted nature of geometric modeling, exploring its significance, methodologies, challenges, and future directions. At its core, geometric modeling involves the representation ofobjects and their properties in a digital environment. This representation enables designers, engineers, and artists to visualize and analyze complex structures, facilitating the design process and decision-making. One of the primary methodologies in geometric modeling is the use of mathematical equations and algorithms to describe the geometry of objects. Through techniques such as parametric modeling, spline modeling, and solid modeling, practitioners can create detailed and precise digital models that accurately reflect real-world objects. Parametric modeling, for instance, allows designers to define objects using parameters such as dimensions, angles, and curves. By adjusting these parameters, they can modify the shape and size of the object dynamically, facilitating rapid prototyping and iterative design processes. Similarly, spline modeling utilizes mathematical curves to create smooth and continuous surfaces, ideal for modeling organic shapes like characters and vehicles. Solid modeling, on the other hand, focuses on representing objects as a collection of interconnected geometric primitives, enabling the simulation of physical properties such as volume and mass. Despite its versatility and utility, geometric modeling poses several challenges, ranging from computational complexity to data interoperability issues. Complex geometric operations, such as boolean operations and surface intersection calculations, can be computationally intensive, requiring efficient algorithms and computational resources. Moreover, ensuring the accuracy and consistency of geometric models across different software platforms and file formats remains a persistent challenge in the field. Interoperability standards such as the Initial Graphics Exchange Specification (IGES) and the Standard for the Exchange ofProduct model data (STEP) aim to address these challenges by providing a commonframework for exchanging geometric data between CAD systems. Beyond its practical applications, geometric modeling also intersects with artistic expression,enabling artists and animators to create visually stunning digital artworks and animations. In the realm of computer graphics, techniques such as procedural modeling and ray tracing empower artists to generate intricate scenes andrealistic lighting effects with unparalleled fidelity. Procedural modelinginvolves the use of algorithms to generate complex geometry and textures automatically, allowing artists to create vast landscapes, intricate patterns, and detailed structures with minimal manual intervention. Ray tracing, on the other hand, simulates the behavior of light in a virtual environment, enabling the rendering of photorealistic images with accurate reflections, refractions, and shadows. Looking ahead, the future of geometric modeling holds tremendous promise, driven by advancements in areas such as artificial intelligence, virtual reality, and additive manufacturing. Machine learning techniques, for instance, are revolutionizing geometric modeling by automating tedious tasks such as mesh segmentation, shape synthesis, and texture generation. By training neural networks on vast datasets of 3D models, researchers can develop algorithms that learn to understand and manipulate geometric shapes with human-like proficiency. Virtual reality technologies, meanwhile, are democratizing the creation of 3D content by providing intuitive tools for sculpting, painting, and animating virtual objectsin immersive environments. Moreover, additive manufacturing, also known as 3D printing, is expanding the possibilities of geometric modeling by enabling the fabrication of complex and customizable objects with unprecedented speed and precision. From personalized medical implants to custom-designed aerospace components, additive manufacturing is revolutionizing traditional manufacturing processes and opening new frontiers in product design and customization. By leveraging geometric modeling techniques, designers can optimize the geometries of 3D-printed objects for specific performance criteria such as strength, weight, and flexibility, unlocking new possibilities for innovation across industries. In conclusion, geometric modeling is a foundational discipline that underpins various fields ranging from engineering and architecture to entertainment and manufacturing. By leveraging mathematical principles and computational algorithms,practitioners can create, manipulate, and visualize complex geometric shapes with remarkable precision and efficiency. Despite its challenges, geometric modeling continues to evolve rapidly, driven by advancements in technology and interdisciplinary collaboration. As we look to the future, the continued integration of geometric modeling with emerging technologies promises to revolutionize how we design, create, and interact with the world around us.。
显卡英文Graphics CardA graphics card, also known as a video card or a GPU (Graphical Processing Unit), is a component of a computer that is responsible for rendering images, videos, and other graphical elements onto a display. It is an essential part of a modern computer system, especially for tasks that require high-quality visuals such as gaming, video editing, and rendering.One of the key functions of a graphics card is to process and manipulate data related to graphics. It takes data from the CPU (Central Processing Unit) and transforms it into images that can be displayed on a monitor. The GPU consists of hundreds or even thousands of smaller processing units, each of which is capable of performing complex calculations simultaneously. This parallel processing capability allows for more efficient and faster rendering of graphics.Graphics cards are equipped with their own dedicated memory known as VRAM (Video Random Access Memory). This memory is used to store data related to graphics, such as textures, shaders, and geometry. The amount of VRAM in a graphics card determines its ability to handle high-resolution textures and complex visual effects. Higher-end graphics cards typically have more VRAM, which enables them to handle demanding tasks and provide a smoother and more immersive gaming experience.In addition to processing and rendering graphics, graphics cards also play a crucial role in accelerating certain computations. Taskssuch as machine learning, scientific simulations, and cryptocurrency mining can benefit greatly from the parallel processing capabilities of GPUs. These tasks often involve performing complex calculations on large datasets, and the massively parallel architecture of graphics cards allows for significant improvements in performance compared to traditional CPUs.Over the years, graphics cards have evolved significantly in terms of performance and capabilities. Advancements in technology have enabled the production of more powerful GPUs that can handle ever-increasing demands for realistic graphics. Features such as ray tracing, which simulate the behavior of light in a scene, have become possible with the introduction of dedicated hardware in modern graphics cards.When choosing a graphics card, factors such as performance, compatibility with other hardware, and budget need to be considered. High-end graphics cards are typically more expensive but offer superior performance and can handle the latest games and applications. Mid-range graphics cards provide a balance between performance and affordability, making them suitable for most users. Entry-level graphics cards are sufficient for basic tasks such as web browsing and office applications but may struggle with demanding games and applications.In conclusion, a graphics card is an essential component of a computer system that enables the rendering of high-quality graphics. Its processing power, dedicated memory, and parallel processing capabilities make it indispensable for tasks such asgaming, video editing, and scientific simulations. The continuous advancements in technology have led to the production of more powerful graphics cards that allow for more realistic and immersive visual experiences.。
Arnold featuresMemory-efficient, scalable raytracer rendering software helps artists render complex scenes quickly and easily.◆ See what's new (video: 2:31 min.)◆ Get feature details in the Arnold for Maya, Houdini, Cinema 4D, 3ds Max, or Katana user guidesSubsurface scatterHigh-performance ray-traced subsurface scattering eliminates the need to tune point clouds. Hair and furMemory-efficient ray-traced curve primitives help you create complex fur and hair renders.Motion blur3D motion blur interacts with shadows, volumes, indirect lighting, reflection, or refraction. Deformation motion blur and rotational motion are also supported. VolumesThe volumetric rendering system in Arnold can render effects such as smoke, clouds, fog, pyroclastic flow, and fire.InstancesArnold can more efficiently ray trace instances of many scene objects with transformation and material overrides. Subdivision and displacementArnold supports Catmull-Clark subdivision surfaces.OSL supportArnold now features support for Open Shading Language (OSL), an advanced shading language for Global Illumination renderers. Light Path ExpressionsLPEs give you power and flexibility to create Arbitrary Output Variables to help meet the needs of production.NEW|Adaptive samplingAdaptive sampling gives users another means of tuning images, allowing them to reduce render times without jeopardizing final image quality. NEW|Toon shaderAn advanced Toon shader is part of a non-photorealistic solution provided in combination with the Contour Filter.NEW|DenoisingTwo denoising solutions in Arnold offer flexibility by allowing users to use much lower-quality sampling settings. NEW|Material assignments and overrides Operators make it possible to override any part of a scene at render time and enable support for open standard framework such as MaterialX.NEW|Alembic proceduralA native Alembic procedural allows users to render Alembic files directly without any translation.NEW|Profiling API and structured statistics An extensive set of tools allow users to more easily identify performance issues and optimize rendering processes.Standard Surface shaderThis energy-saving, physically based uber shader helps produce a wide range of materials and looks. Standard Hair shaderThis physically based shader is built to render hair and fur, based on the d'Eon and Zinke models for specular and diffuse shading.Flexible and extensible APIIntegrate Arnold in external applications and create custom shaders, cameras, light filters, and output drivers. Stand-alone command-line rendererArnold has a native scene description format stored in human-readable text files. Easily edit, read, and write these files via the C/Python API.◆ See Arnold 5.1 release notesIntegrate Arnold into your pipeline•Free plug-ins provide a bridge to the Arnold renderer from within many popular 3D applications.•Arnold has supported plug-ins available for Maya, Houdini, Cinema 4D, 3ds Max, and Katana.•Arnold is fully customizable, with a powerful API to create custom rendering solutions.◆ See Arnold plug-ins。
第28卷第10期计算机应用与软件Vol.28No.102011年10月Computer Applications and Software Oct.2011基于复杂场景图的光线追踪渲染的Kd-tree 构造陈立华王毅刚(杭州电子科技大学图形图像研究所浙江杭州310018)收稿日期:2010-09-01。
陈立华,硕士生,主研领域:虚拟现实。
摘要在基于光线跟踪等的全局光照绘制中,改良空间划分结构一直是各种加速策略中重要的方法之一。
对常见的空间结构构建方法进行研究,针对复杂室内场景提出一种快速的分区构建方法。
首先,算法并不直接将整个空间进行剖分,而是采用分组策略,结合包围盒进行判断,将具有一定空间联系的场景实体合并成一定数量的组;之后,对每个组使用优化后的Kd-tree 构建细分结构,并提出合理的终止条件。
与以往的方法相比,该方法构建的加速结构更适合于基于场景图构建的复杂室内环境,为快速生成真实感图形提供了有效的手段。
关键词全局光照光线跟踪Kd-tree场景图离线渲染中图分类号TP37文献标识码AKD-TREE CONSTRUCTION OF RAY-TRACING RENDERINGBASED ON COMPLEX SCENE-GRAPHChen LihuaWang Yigang(Institute of Computer Graphics ,Hangzhou Dianzi University ,Hangzhou 310018,Zhejiang ,China )AbstractIn global illumination rendering based on ray-tracing or on something else ,the improvement of spatial partition structure hasalways been one of the important methods in various acceleration policies.This paper focuses on the study of common construction approaches of spatial structure ,proposes a fast partition construction method aiming at complex indoor scenes.Firstly the authors use a grouping strategy instead of dividing the whole space directly ,and combine the scene entities which are spatial associated to a certain extent into a sum of render-groups according to the judgement made in consideration with the bounding box.Then they use the optimised Kd-tree to construct sub-structures for each group ,and put forward reasonable termination condition.Compared with previous methods ,the accelerated structure constructed by this algorithm is more adapted to the complex indoor scene constructed based on scene graph.It provides an effective means to fast generation of realistic graphics.KeywordsGlobal illuminationRay-tracingKd-treeScene graphOffline rendering0引言在高质量真实感图形绘制领域,光线跟踪算法一直是生成逼真场景图像的主要应用的算法之一,其原理简单、易于实现,并且可以生成各种逼真的视觉效果,因而在CAD 及图形学各个领域得到了广泛的应用。
Large Scale Visualization Ian Williams & Steve Nash, PSG Applied EngineeringAgenda•Intro –Quadro Solutions•High Resolution & HDR Displays and Implications •Stereo•HDR•Implications of Multiple display channels •Addressing Multiple GPUs•SLI Mosaic mode•Combining T echnologiesQuadro Visual Computing Platform NVIDIAQuadro Plex VCSNVIDIA SLI NVDIA G-Sync NVIDIA HD SDI SceniX Scene Graph C CUDA OpenCL 30-bit Color mental ray reality server PhysX CompleXMulti-GPU OptiX InteractiveRay TracingSLI Mosaic Mode SLI Multi OS NVIDIA CUDAQuadro FX FamilyProduct SegmentTargetAudienceKey AdditionalFeaturesQuadroSolutionEstimatedStreet PriceUltra High-End 4D Seismic Analysis4D Medical Imaging+ 4GB GPU Memory+ 240 CUDA Parallel CoresQuadroFX 5800$ 3,299High-End Digital Special EffectsProduct Styling+ G-Sync+ SLI Frame RenderingQuadroFX 4800$ 1,799High-End High End MCADDigital EffectsBroadcast+ SDI+ Stereo+ SLI Multi-OSQuadroFX 3800$ 899Mid-Range Midrange CADMidrange DCC+25% better Perf than FX 580QuadroFX 1800$ 599Entry Volume CADVolume DCC+30% better performancethan FX 380+ 30-bit ColorQuadroFX 580$ 149EntryVolume CADVolume DCCProductivity Apps+50% Better Performancethan FX 370QuadroFX 380$ 99Quadro SystemsProduct SegmentTargetAudienceKey AdditionalFeaturesQuadroSolutionEstimatedStreet Price1U Rackmount Offline & RemoteRenderingFour GPUs16 GB total GPUMemoryTeslaS1070$ 9,000Desksideor3U RackableSeismic AnalysisProduct StylingScalable Graphics2 GPUsSLI Mosaic Mode-Easy 4KCompleXOptiXQuadroPlex2200 D2$ 10,750AXE –Engine Relationships CgFX API Open SceneGraph AXEReachAXE Flexibility OptiX ray tracing engine CompleX scene scaling engine QBStereo API 30-bit & SDI APICustom Applications AXE Center SceniXscenemanagement engineNon-GraphicApplicationsApplication Acceleration Engines -Overview•SceniX–scene management engine–High performance OpenGL scene graph builtaround CgFX for maximum interactive quality–Provides ready access to new GPU capabilities & engines•CompleX–scene scaling engine–Distributed GPU rendering for keeping complex scenes interactive as they exceed frame buffer limits–Direct support for SceniX, OpenSceneGraph, and soon more•OptiX–ray tracing engine–Programmable GPU ray tracing pipelinethat greatly accelerates general ray tracing tasks –Supports programmable surfaces and custom ray data 15GB Visible Human model from N.I.H.Autodesk Showcase customer example OptiX shader exampleWhy use Large Scale Visualization?•Quality•Detail•Pixel real estate•Stereo•Immersive experience•Industry specific needs•…….Display Technologies•Panels•Industry focused –e.g. medical, video •Projectors•Multiple Panels•Multiple ProjectorsImages courtesy of HP, Sony, Barco, Mechdyne,Large Scale VisualizationBeyond 8 DVI Dual Link Requires Clustered PCs with Quadro G-Sync to synchronize displays and Multi GPU aware software.1-2 DVI 2-4 DVI4-8 DVI> 8 DVIApplications written to run on a single display just work across larger display formats.GPUs Displays Linear Performance increase with Quadro Plex Quadro FX GraphicsQuadro G-Sync Card 124148Any Application Runs (Does not need to bemulti GPU aware)•Performance•Stereo•“Mechanics” of >8bit per component •Multiple display channels•OS impact•Synchronization•ClusteringImplications of High Resolution and HDR•GPU memory•3840x2160 desktop at 16x FSAA ~400MB of framebuffer .•Performance•Fill-rate•Window system implications•T exture size & depth•16 bit per componentPerformance Implications of High resolutions & HDR•Consumer Stereo Drivers (3DVision)•Stereo separation from single stream•OpenGL Quad Buffered Stereo•Application has explicit control of the stereo image•Active •Passive Stereo L, R, L, R, L, R, ……L, L, L, L, L, L, ……R, R, R, R, R, R, ……“Mechanics” of >8bit per component•Possible using both DVI or Display Port •Display Port much easier•T extures etc. need to be >8bit per component •FP16, I16 (G8x GPUs and beyond)•RGBA, LA, L•Full screen only•Desktop, GUI, etc will not be correctly displayed •Format specific to display device•Outline:•Configure double-wide desktop•Significantly easier if exported by the EDID•Create full-screen window•Render to off-screen context• E.g. OpenGL FBO•Draw a textured quad•Use fragment program to pack pixels -display device specific-cont16 bit per componentR G BOff-screen buffer8 bits 2 bits8 bit per componentR G B R G BFull-Screen Window-cont16 bit per componentR G BOff-screen buffer8 bits 2 bits8 bit per componentR G B R G BFull-Screen Window02048HDR and Display Port•Requires native Display Port GPU•Desktop will be display correctly (in 8bit)•Outline:•Open 10bit per component Pixel Format/Visual •RenderMultiple Display ChannelsWhy multiple display channels?•Resolutions becoming larger than channel bandwidths•Sony, JVC 4K projectors•Barco and Mitsubishi panels•…….First a couple of questions:•Which OS -Windows or Linux?•Level of application transparency:•Driver does everything?•Application willing to do some work?Implications of Multiple Display Channels•Attach Multiple Monitors using Display Properties •Extend the Desktop to each GPU•Ensure ordering is correct for desired layout•Adjust Resolutions and Refresh Rates•Displays using Refresh Rates <48Hz can be problematic •Synchronizing displays requires G-sync cardThings you don’t intend are also possibleThings to note:•Windows can be opened anywhere on (and off) the complete desktop •Windows can span display boundaries•However maximizing will lock to one display•Where the window centroid is located•Likewise full screen windows•WGL Desktop size is considered outer rectangle spanning all displays •Driver will typically send data to all GPUs (in case window is moved, etc.)•GPU Affinity OpenGL extension solves thisDISPLAY_DEVICE lDispDev;DEVMODE lDevMode;lDispDev.cb = sizeof(DISPLAY_DEVICE);if (EnumDisplayDevices(NULL, 0, &lDispDev, NULL)) {EnumDisplaySettings(lDispDev.DeviceName, ENUM_CURRENT_SETTINGS, &lDevMode);}g_hWnd1 = createWindow(hInstance, lDevMode.dmPosition.x, lDevMode.dmPosition.y, X0, Y0);if (!g_hWnd1) { MessageBox(NULL, "Unable to create first window(s).", "Error", MB_OK); return E_FAIL;}if (EnumDisplayDevices(NULL, 1, &lDispDev, NULL)) {EnumDisplaySettings(lDispDev.DeviceName, ENUM_CURRENT_SETTINGS, &lDevMode);}g_hWnd2 = createWindow(hInstance, lDevMode.dmPosition.x, lDevMode.dmPosition.y, X1, y1);if (!g_hWnd2) {MessageBox(NULL, "Unable to create second window(s).", "Error", MB_OK); return E_FAIL;}Verify first display exists and get display settingsCreate Window on first display Verify second display exists and get display settings Create Window on second display•WGL extension (WGL_NV_gpu_affinity), core OpenGL not touched •GLX definition in the works•Application creates affinity-DC•HDC wglCreateAffinityDCNV(const HGPUNV *phGpuList);•Special DC that contain list of valid GPUs -> affinity mask•Affinity mask is immutable•Application creates affinity context from affinity-DC•As usual with RC = wglCreateContext(affinityDC);•Context inherits affinity-mask from affinity-DC•Application makes affinity context current•As usual using wglMakeCurrent()•Context will allow rendering only to GPU(s) in its affinity-maskWindowsGPU Affinity•Affinity context can be made current to:•Affinity DC•Affinity mask in DC and context have to be the same•There is no window associated with affinity-DC. Therefore:•Render to pBuffer•Render to FBO•DC obtained from window (regular DC)•Rendering only happens to the sub-rectangle(s) of the window that overlap the parts of the desktop that are displayed by the GPU(s) in the affinity mask of the context.•Sharing OpenGL objects across affinity contexts only allowed if affinity mask is the same•Otherwise wglShareLists will failWindows cont.GPU Affinity•Enumerate all GPUs in a system•BOOL wglEnumGpusNV(int iGpuIndex, HGPUNV *phGpu);•Loop until function returns false•Enumerate all display devices attached to a GPU •BOOL wglEnumGpuDevicesNV(HGPUNV hGpu, int iDeviceIndex, PGPU_DEVICE lpGpuDevice);•Returns information like location in virtual screen space•Loop until function returns false•Query list of GPUs in an affinity-mask •BOOL wglEnumGpusFromAffinityDCNV(HDC hAffinityDC,int iGpuIndex, HGPUNV *hGpu);•Loop until function returns false•Delete an affinity-DC•BOOL wglDeleteDCNV(HDC hdc);Windows cont.GPU Affinity#define MAX_GPU 4int gpuIndex = 0;HGPUNV hGPU[MAX_GPU];HGPUNV GpuMask[MAX_GPU];HDC affDC;HGLRC affRC;while ((gpuIndex < MAX_GPU) && wglEnumGpusNV(gpuIndex, &hGPU[gpuIndex])) {gpuIndex++;}GpuMask[0] = hGPU[0];GpuMask[1] = NULL;affDC = wglCreateAffinityDCNV(GpuMask);<Set pixelformat on affDC>affRC = wglCreateContext(affDC);wglMakeCurrent(affDC, affRC);<Create a FBO>glBindFramebufferEXT(GL_FRAMEBUFFER_EXT, b);<now render>GPU Affinity –Render to Off-screen FBOCreate list of the first MAX_GPUs in the systemCreate an affinity-DC associated with first GPU Make the FBO current to render into itMultiple Displays -Linux•T wo traditional approaches depending on desired level of application transparency or behavior:•Separate X screens•3D Windows can’t span X screen boundaries•Location of context on GPU allows driver to send datato only that GPU•Xinerama•One large virtual desktop•3D Windows can span X screen boundaries•Will typically result in driver sending all data to allGPUs (in case window moves)Multiple Displays -Linux•Use nvidia-xconfig to create customized xorg.conf•nvidia-settings provides full featured control panel for Linux•Drivers can capture EDID•Useful when display device hidden behind KVM or optical cable •Synchronizing multiple displays requires G-sync cardSynchronizing Multiple Displays•Requires G-sync•Synchronize vertical retrace•Synchronize stereo field•Enables swap barrier•OpenGL Extensions•Windows: WGL_NV_Swap_Group•Linux:GLX_NV_Swap_GroupNameNV_swap_groupDependenciesWGL_EXT_swap_control affects the definition of this extension.WGL_EXT_swap_frame_lock affects the definition of this extension. OverviewThis extension provides the capability to synchronize the buffer swapsof a group of OpenGL windows. A swap group is created, and windows are added as members to the swap group. Buffer swaps to members of the swap group will then take place concurrently.This extension also provides the capability to sychronize the bufferswaps of different swap groups, which may reside on distributed systems on a network. For this purpose swap groups can be bound to a swap barrier.This extension extends the set of conditions that must be met beforea buffer swap can take place. BOOL wglJoinSwapGroupNV(HDC hDC,GLuint group);BOOL wglBindSwapBarrierNV(GLuint group,GLuint barrier);BOOL wglQuerySwapGroupNV(HDC hDC,GLuint *group);GLuint *barrier);BOOL wglQueryMaxSwapGroupsNV(HDC hDC,GLuint *maxGroups,GLuint *maxBarriers);BOOL wglQueryFrameCountNV(HDC hDC,GLuint *count);BOOL wglResetFrameCountNV(HDC hDC);NameNV_swap_groupOverviewThis extension provides the capability to synchronize the buffer swapsof a group of OpenGL windows. A swap group is created, and windows are added as members to the swap group. Buffer swaps to members of the swap group will then take place concurrently.This extension also provides the capability to sychronize the bufferswaps of different swap groups, which may reside on distributed systems on a network. For this purpose swap groups can be bound to a swap barrier.This extension extends the set of conditions that must be met beforea buffer swap can take place. Bool glxJoinSwapGroupNV(Display *dpy,GLXDrawable drawable, GLuint group); Bool glxBindSwapBarrierNV(Display *dpy,GLuint group,GLuint barrier);Bool glxQuerySwapGroupNV(Display *dpy,GLXDrawable drawable, GLuint *group);GLuint *barrier);Bool glxQueryMaxSwapGroupsNV(Display *dpy,GLuint screen, GLuint *maxGroups,GLuint *maxBarriers);Bool glxQueryFrameCountNV(Display *dpy,GLuint *count);Bool glxResetFrameCountNV(Display *dpy);Using G-syncRecommendations:•Control Panel will cause regular CPU contention •Polls hardware status•Use additional synchronization mechanisms in addition to swapbarrier–Broadcast frame countMultiple Displays made easy!•Enables transparent use of multiple GPUs on multiple displays •Enables a Quadro Plex(multiple GPUs) to be seen as one logical GPUby the operating system•Applications ‘just work’ across multi GPUs and multi displays•Works with OGL, DX, GDI etc•Zero or minimal performance impact for 2D and 3D applications compared with a single GPU per single display •Doesn’t support multiple View FrustumsDetails•Quadro Plex only•Operating System support•Windows XP, Linux, 32bit and 64bit •Vista/Win 7 soon•Maximum desktop size = 8k X 8k •FSAA may exacerbate desktop size •Compatible with G-sync•Clustering tiled displays •Supports StereoConfigurations0.10.20.30.40.50.60.70.80.91 1 screen4 screens 8 screensPerformance Hit for Multiple DisplaysViewperf 10.0SLI Mosaic Performance AdvantageViewperf 10.00.20.40.60.811.21 screen4 screens, Mosaic8 screens, MosaicProgrammatically controlling Mosaic Mode•NvAPI provides direct access to NVIDIA GPUs anddrivers on Windows platforms•Nvidia Control Panel GUI shows tested configurations •More advanced configuration possible through NvAPIProgramatically controlling Mosaic Mode (cont’d)NV_MOSAIC_TOPOLOGY topo; // struct defines rowcount, colcount& gpuLayoutNV_MOSAIC_SUPPORTED_TOPOLOGIES supportedTopoInfo; // list of topologies// Get List of Supported Topologies and display resolutionsnvStatus= NvAPI_Mosaic_GetSupportedTopoInfo(&supportedTopoInfo, type);// Set Mosaic Mode for a given topologynvStatus= NvAPI_SetCurrentMosaicTopology(&topo);// To Disable Mosaic ModenvStatus= NvAPI_EnableCurrentMosaicTopology(0);Beyond SLI Mosaic Mode•Can combine Mosaic for partial set of all GPUs •Use CUDA or GPU Affinity for non-display GPUs •Requires “Manual” Configuration•Combine Mosaic with CompleX Application Acceleration EngineSummary•Demand for Large Scale Viz& HDR technologies are being driven by economics• E.g. Digital Prototypes significantly less expensive than physicalprototypes however demand high quality and realism•Very large resolutions are de-facto standard for collaborative and large venue installations•Pixel bandwidth requirements still require multiple channels, even with Display Port•Some large venue displays are HDR capableSummary –cont.•Be aware of performance implications when using multiple GPUs–Use affinity/Separate Xscreens•Solutions like SLI Mosaic Mode extends the reach of Large Scale Visualization•Combining solutions enables unprecedented realism and interactivity–CompleX+ Mosaic = interactive massive datasets–OptiX+ Mosaic = unprecedented realism on a large scale–Compute + viz cluster = flexible utilization with massive compute powerThank You!•Feedback & Questions.。
计算机图形学的工作原理计算机图形学(Computer Graphics,简称CG)是研究如何利用计算机来处理、生成、显示图像的学科。
它是计算机科学中的应用方向之一。
计算机图形学包含三维几何建模、光线追踪、点线面成像等方面内容。
计算机图形学工作原理的核心是图像的构建和影像渲染技术。
下面从图像的构建、光线追踪和渲染技术三个方面出发,深入阐述计算机图形学的工作原理。
一、图像的构建计算机需要将分散的数据转换成可视化的图像,完成这一过程需要经过以下几步:1. 考虑输入数据的格式和类型。
输入数据可能是各种格式和类型的,包括图片、声音、视频、CAD等。
不同的数据格式和类型需要处理的方式也不同,需要选择不同的处理技术。
2. 数据处理与建模。
在大量的数据中,需要从零开始考虑数据建模的方案。
数据处理程序需要遵循计算机图形学的算法和原则,将数据分割成小的三角形网格或其他形式的几何元素。
3. 数据宽带测试与分析。
在完成模型数据建模之后,数据宽带的测试和分析需要根据特定的内存和CPU要求进行。
如果数据宽带过高,程序将消耗更多的资源。
4. 优化算法以提高可视化速度。
一旦建立了几何模型,计算机图形学的重要方法是优化算法和其他相关技术以提高可视化速度。
二、光线追踪光线追踪(Ray Tracing)是计算机图形学中的一个关键技术,通过反复逆向追踪射线来生成图像。
其基本原理是计算从视点射出的光线与场景中的物体的交点,然后根据物体的材质、纹理等信息对这些点进行处理并生成图像。
光线追踪的具体流程如下:1. 根据相机位置和方向计算从相机发出的光线。
2. 对于每条光线,在场景中查找与光线相交的物体。
3. 计算物体相交点处的表面的颜色、反射和透射等信息。
4. 递归计算反射和折射光线,直到遇到光源或达到最大递归深度。
5. 对光线追踪的结果进行处理和输出。
光线追踪可以生成高质量的图像,但通常需要较长的计算时间和大量的计算资源。
三、影像渲染技术影像渲染(Rendering)是计算机图形学中的另一个重要技术,通过模拟光源和材质等信息生成图像。
计算机图形学课程设计综述*名:***学号:专业:信息软件10-1计算机图形学在游戏领域上的应用 (3)一.计算机图形学的定义 (3)二.视频游戏的历史 (4)三.图形学在游戏中的应用 (5)1.几何学 (5)2 动画 (6)3.绘制 (6)四.总结 (9)计算机图形学在游戏领域上的应用计算机图形学(Computer Graphics,简称CG)是一种使用数学算法将二维或三维图形转化为计算机显示器的栅格形式的科学。
它的研究分为两部分:一部分研究几何作图,包括平面线条作图和三维立体建模等;另一部分研究图形表面渲染(Rendering)包括表面色调、光照、阴影和纹理等表面属性的研究。
目前,计算机图形学的应用已深入到真实感图形、科学计算可视化、虚拟环境、多媒体技术、计算机动画、计算机辅助工程制图等领域。
综观计算机图形学的发展,我们发现图形学的发展迅速,而且仍在快速的向前发展。
并且已经成为一门独立的学科,有着广泛的发展前景。
一.计算机图形学的定义计算机图形学:(Computer Graphics。
简称CG)是一种使用数学算法将二维或三维图形转化为计算机显示器的栅格形式的科学。
计算机图形学的主要研究内容就是研究如何在汁算机t{I表示图形、以及利用计算机进行图形的计算、处理和显示的相关原理与算法。
图形通常由点、线、面、体等几何元素和灰度、色彩、线型、线宽等非几何属性组成。
从处理技术上来看,图形主要分为两类,一类是基于线条信息表示的。
如工程图、等高线地图、曲面的线框图等,另一类是明暗图,也就是通常所说的真实感图形。
计算机图形学一个主要目的就是要利用计算机产生令人赏心悦目的真实感图形。
为此,必须建立图形所描述场景的几何表示,再用某种光照模型,计算在假想的光源、纹理、材质属性下的光照明效果。
同时,真实感图形计算的结果是以数字图像的方式提供的,计算机图形学也就和图像处理有着密切的关系。
计算机图形学的研究内容非常广泛,如图形硬件、图形标准、图形交互技术、光栅图形生成算法、曲线曲面造型、实体造型、真实感图形计算与显示算法、非真实感绘制,以及科学计算可视化、计算机动画、自然景物仿真、虚拟现实等。
光线追踪和蒙特卡洛方法Ray tracing and Monte Carlo methods are two popular techniques used in computer graphics to create realistic images. 光线追踪和蒙特卡洛方法是计算机图形学中常用的两种技术,用于创建逼真的图像。
Ray tracing, also known as ray casting, is a rendering technique that simulates the way rays of light travel in the real world, allowing for the creation of highly realistic images. 光线追踪,又称为射线投射,是一种渲染技术,模拟了光线在现实世界中的传播方式,可以创建出高度逼真的图像。
On the other hand, Monte Carlo methods rely on random sampling to solve problems that may be deterministic in principle. 另一方面,蒙特卡洛方法依靠随机抽样来解决本质上可能是确定性的问题。
Ray tracing works by tracing the path of light as it interacts with objects in a scene and simulating the effects of that interaction. 光线追踪通过追踪光线与场景中物体的相互作用路径,并模拟该相互作用的效果来工作。
This involves calculating the rays of light as they travel from the camera through the scene and interact with objects, surfaces, and materials. 这涉及计算光线从摄像机穿过场景并与物体、表面和材质相互作用时的路径。
opencl 渲染的原理OpenCL(Open Computing Language)是一个开放标准的并行计算框架,被广泛应用于GPU、FPGA和多核CPU等异构计算设备上,用于加速各种计算密集型任务。
OpenCL渲染是通过使用这个框架来实现的,它使用并行计算的方式加速图形渲染,提高渲染速度和效果。
OpenCL渲染的原理可以总结为以下几个步骤:1. 设备发现和选择:首先,OpenCL需要检测计算系统中可以支持OpenCL的设备,如GPU、FPGA或多核CPU。
然后,通过对这些设备进行评估和选择,确定使用哪些设备来进行渲染。
2. 数据传输和内存管理:在渲染过程中,需要将渲染所需的数据从主机内存传输到设备内存中进行计算。
OpenCL提供了一些函数和数据结构来管理数据传输和内存分配,确保数据能够高效地在主机和设备之间传输。
3. 内核程序编写:OpenCL使用内核程序来执行渲染任务。
内核程序是由OpenCL C语言编写的,并在设备上并行执行。
内核程序必须定义为能够处理多个数据项的函数,以便在设备上的多个处理单元上并行执行。
4. 并行计算与任务调度:OpenCL将渲染任务分成多个子任务,并在设备上并行执行这些子任务。
它使用任务调度器来管理和调度子任务,使得设备上的多个处理单元能够协同工作,加速渲染过程。
5. 结果传回和后处理:当渲染任务完成后,需要将计算结果从设备内存传输回主机内存,以供后续的后处理和显示。
OpenCL提供了相关的函数和数据结构来管理这一过程,保证数据的正确传输。
总体来说,OpenCL渲染利用了计算设备的并行处理能力,将渲染过程分解为多个并行的子任务,并利用设备上的多个处理单元并行执行这些子任务,从而加速渲染进程。
通过合理的设备选择、高效的数据传输和内存管理、并行计算和任务调度,OpenCL能够提供高性能和高效率的渲染体验。
参考内容:- Munshi, A., Gaster, B., Oberti, T., & Mattson, T. (2011). Heterogeneous computing with OpenCL. Morgan Kaufmann.- Stone, J. E., Gohara, D., & Shi, G. (2010). OpenCL: A parallel programming standard for heterogeneous computing systems. Computing in Science & Engineering, 12(3), 66-73.- Peyroux, F., & Boubekeur, T. (2017). Embree-gp: High-performance ray tracing kernels on GPUs. ACM Transactions on Graphics, 36(4), 110.- Takala, P. (2011). Heterogeneous computing with OpenCL. IEEE TPDS workshop on heterogeneity in computing (HC), 281-285.- Eti, A. M., Liu, F., Shafique, M., & Henkel, J. (2011). Task mapping for OpenCL-based heterogeneous platforms. 2011 International Conference on Embedded Computer Systems (SAMOS XI), 207-214.。
图形学中的三维模型渲染技术数字化时代,三维模型的应用越来越广泛。
如果想要在虚拟空间中重建一个真实的物体或场景,不可避免地需要通过三维建模技术来进行处理,但是三维建模之后的模型却仍然是一个无生命的物体,而如何让其更真实地呈现在视觉上,进而产生跨足现实和虚拟的奇妙体验呢?这个问题涉及到图形学(Graphics)很深的领域,而三维模型渲染技术便是图形学中的重要分支之一。
本文将以三维模型渲染技术为切入点,深入探讨渲染的背景、分类、算法和优化等方面。
一、背景三维图形渲染正是计算机图形领域中的最富挑战性和热门的研究领域,特别是在虚拟现实、游戏等领域的应用非常广泛。
渲染技术的基本任务是将3D场景中的物体用2D的方式展现出来。
自1990年代后期以来,计算机的运算性能、图形硬件和图形算法都得到了很大进展,开发者们得以采用飞快的现代计算机处理更为复杂的3D场景,开发更具交互性和感染力的游戏和虚拟现实应用。
大量的研究工作也被投入到了三维渲染领域,许多优秀的三维渲染算法和引擎被开发出来,并广泛应用于游戏、电影、动画等等领域。
二、分类三维模型渲染技术的基本分类主要包括离线渲染和实时渲染两类。
- 离线渲染方法为了得到更为逼真的图像,通常使用离线渲染方法,目的是真正摆脱实时硬件的限制,采用计算密集型的算法,在数据采集完成之后,利用计算机大量的时间来完成最优化的渲染工作。
三维场景需要先行建模,并将其储存到计算机内部。
随后需要进一步定义光线位置和各种照明条件等环境参数,才能在渲染引擎死缓存入正确的渲染流程。
- 实时渲染方法区别于离线渲染相对不需要高性能的实时渲染,是通过一些特别的技巧和算法,在几乎没有延迟的时间里,渲染出更有趣的3D 场景和物体。
通常情况下运用实时渲染技术的核心设备是电脑或在手机、智能电视等便携式设备。
实时渲染技术能够实现复杂的纹理效果、计算照明、使用真正的物理模拟和支持挤出模型。
同时,基于着色器编程的编码必须能够利用现有的图形外部引用库。
Master ThesisRay Tracing Complex Scenes onGraphics HardwareStudent:(Pei-Lun Lee)Advisors:(Yung-Yu Chuang,PhD.)(Ming Ouhyoung,PhD.) Department of Computer Science and Information Engineering National Taiwan University,Taipei,TaiwanAbstractRecently,due to the advancement of graphics hardware,graphics processing unit (GPU)is intensely used for general computation other than graphics display that it is originally designed for.Among them,ray tracing on GPU is one of the promising applications owing to its inherent property of parallel computation and shared data access.However,most of the existing research on GPU ray tracing focus on the acceleration of ray tracing itself while assuming the scene not to be too complex. In this thesis,we present a two level hierarchies of data structure to render complex scenes using a CPU/GPU collaborated system.We compare its performance with a CPU only implementation and conclude that this method is efficient and easy to implement.Contents1Introduction21.1Motivation (2)1.2Problem statement (2)1.3Thesis organization (3)2Related work42.1Rendering complex scenes (4)2.2General purpose computation on graphics hardware (5)2.3Ray tracing on graphics hardware (5)3Algorithm73.1Ray-scene intersection (8)3.2Complex scenes (9)3.3The two-level acceleration structure (12)3.4Traversal of two-level structure (13)3.5Rendering (16)4Results194.1The scenes (20)4.2Experiments and results (25)4.2.1Dragons (25)4.2.2Dragonball (27)4.2.3Plants (29)4.2.4Powerplant (31)4.3Conclusions (32)5Conclusions and future work345.1Conclusinos (34)5.2Future work (34)List of Figures3.1Interface of ray-scene intersection on GPU (9)3.2Details of ray-scene intersection on GPU (9)3.3Multi-texture approach (10)3.4The two-level algorithm (11)3.5Partition of a level1structure using KD-tree of the bunny model..134.1The order and position of each object of the dragon scene (21)4.2The function generated camera path used in the dragon scene (22)4.3(a)Plants scene looked from top.(b)The leaf nodes of level1KD-tree of the plants scene (24)4.4Path used to walkthough the powerplant scene (25)4.5(a)Rendering time of different implementations of ray tracer on thedragon scenes.The two vertical lines are the two limit on texturesize of the single texture GPU ray tracer and the2-level structure.(b)The texture size distribution of the dragon scenes (26)4.6(a)Per frame rendering time of the10-dragon scene along the cam-era path with different acceleration structures.(b)Number of timesthe underlying GPU ray tracer is called per frame (27)4.7Dragonball:(a)Texture size and number of nodes under differentpartition(b)The rendering time of the2-level structures of differentnode size (28)4.8(a)Rendering time of dragonball scene partitioning into3nodes(b)Rendering time of2level structure only (28)4.9Dragonball:number of times GPU BVH called per frame (29)4.10Plants:rendering time per frame(a)CPU KDT and2-level(b)2-level only (29)4.11Plants:(a)number of times GPU BVH is accessed per frame(b)number of rays generated per frame (30)4.12Plants:(a)percentage of active texture size(b)percentage of incre-mental texture size (30)4.13Per frame rendering time of powerplant(a)CPU KDT and2-level(b)2-level only (31)4.14Powerplant:(a)number of times GPU BVH is called per frame(b)total number of rays generated per frame (32)4.15Powerplant:(a)percentage of active textures(b)percentage of in-cremental textures (32)Chapter1Introduction1.1MotivationRay tracing is a classic algorithm in computar graphics for generating photo-realistic images by emulating the natural phenomenon of human visual perception.The most time consuming part of the ray tracing algorithm is the computation of ray-scene in-tersection.In ray tracing,the computation of each ray is independent to other,so it is suitable for parallel computing.Recently graphics processing unit(GPU)is getting more computation power than CPU and continues growing.Its programmable ability makes it a great plat-form for computing.Ray tracing is also one of the target of interest.While more and more ray tracing algorithms are implemented on GPU,all of them assume that the scene datafits into the on-board memory of GPU.However,in many practi-cal case such as production rendering,the scene contains more complex geometry than what GPU memory can hold.In this thesis,we present a GPU assisted ray tracer system that take complex scene as input,which makes GPU ray tracer more practical to use.1.2Problem statementGiven a scene data,whose memory consumption is larger than the capacity of GPU memory,we propose a data structure for data management and generate images using a GPU assisted ray tracing algorithm.1.3Thesis organizationIn the following chapters,we introduce the related work on GPU ray tracing and complex scenes rendering,then we describe the proposed method and show the experiments and results,and lastly we conclude the work and discuss some of the future direction of research.Chapter2Related workIn this chapter,we survey the related work on different topics including rendering complex scenes,general purpose computation on graphics hardware,and ray tracing on graphics hardware.We also survey some of the recent work on ray tracing in the first section.2.1Rendering complex scenes[PKGH97]presented a set of algorithms to improve the data localities of ray trac-ing algorithm,such that they can render more complex scenes with given memory capacity.They organize geometry cache in a two-level regular voxel grids with dif-ferent granularities.The coarse one is termed geometry grids which containfiner grids called acceleration grids inside.We use a similar scheme to theirs,except their goal is to manage the access pattern between disk to main memory while we deal with the system memory-GPU memory management.In addition,they use regular grid at both level but in our scheme either level can be any data structure.In[WPS+03],they reported that realtime ray tracing has already been achieved on different hardware architectures,including shared memory systems,PC clusters, programmable GPUs,and custom hardware.They also presented interactive global illumination algorithms as one of the application utilizing realtime ray tracing.[WDS04]visualizes an extra-ordinary complex model,’Boeing777’which con-tains350million individual triangles.They use a combination of several techniques including a realtime ray tracer,a low-level out of core caching and demand load-ing strategy,and a hierarchical approximation scheme for geometry not yet loaded. Whey render the full model at several frames per second on a single PC.[WSS05]presents an architecture of a programmable ray tracing hardware, which combines theflexibility of general purpose CPUs and the efficiency of GPUs.They implemented a prototype FPGA version running at66MHz and can achieve rendering images at interactive rate.The limitation of their system is similar to other ray tracers pursuing performance,that is the scene must be static and contains only triangles.2.2General purpose computation on graphics hard-ware[OLG+05]surveyed the development of the new research domain,general purpose computation on graphics processors,or GPGPU.They described the techniques used in mapping general-purpose computation to graphics hardware,and they sur-veyed a broad range of GPGPU applications.They also pointed the trend of next generation GPGPU research.[GRH+05]presented an improved bitonic sorting network algorithm on GPU. They analyzed the cache-efficiency of algorithm and get better data access pattern. The performance of their implementation was higher than a CPU quick sorting rou-tine.[FSH04]analyzed the suitability of GPUs for dense matrix-matrix multiplica-tion.They implemented an near-optimized algorithm but found that the utilization of the arithmetic resources was relatively low compared to CPU counterpart.The bottleneck was due to the limiting bandwidth.Thus they concluded that currently GPU was not a suitable platform for such computation.2.3Ray tracing on graphics hardware[CHH02]was one of thefirst work on ray tracing with programmable graphics hardware.They implemented only ray-triangle intersection on the GPU,and let the CPU do the ray generation and shading.Ray data were stored in texture mem-ory,and vertex data were sent through graphics pipeline as drawing primitives.In terms of intersections per second,their implementation exceeded the fasted CPU-based ray tracer at that time.The main bottleneck of their implementation is the slow readback of data from GPU to main memory,which is required each time a computation invoked.[PBMH02]mapped the entire ray tracing algorithm on GPU,from ray genera-tion,acceleration data structure traversal,ray-triangle intersection,to shading.They choose uniform grid as acceleration data structure.All data,including ray,geome-try,and grid were stored in texture memory.Still,they suffered from low utilizationrate of computation resources.They also presented the concept that a GPU can be viewed as a stream processor[Pur04].Under this structure,fragment programs are the kernels,and data is stream.Their work inspired several researches including [Chr05]and[KL05],who follow their framework.[WSE04]used GPU to implement a nonlinear ray tracing algorithm,which ex-tend ray tracing to handle curved light rays and can be used to visualize gravitational phenomena.[TF05]implemented a kd-tree traversal algorithm on GPU.Their modified al-gorithm can traverse a kd-tree without a stack,thus suitable for GPU,which lack of the capability of such complex data structure.They also use it in a ray tracer that beat the performance of one use uniform grid.[TS05]compared how performance of a GPU-based ray tracer can be affected with different accelerating data structures.They chose three structures to com-pare:uniform grid,kd-tree,and bounding volume hierarchies.They found that the bounding volume hierarchies is the most efficient in most cases.[CHCH06]uses a threaded bounding volume hieararchy built from a geometry image,which can be efficiently traversed and constructed entirely on the GPU and allows for dynamic geometry and level of detail.NVIDIA Gelato[NVI05]is a hardware-accelerated non-real-time high-quality renderer which leverages the NVIDIA GPU as afloating point math processor to render images faster than comparable renderers.Chapter3AlgorithmIn this chapter,we describe in detail the algorithms we use.Before we start,we list some of the design decisions we made.Design Decision•CPU/GPU job divisionAlthough we utilize the computation power of GPU,we do not aim for de-signing a system that completely runs on GPU.Instead we offload some of the tasks to CPU and make our experiments focus on the speedup of ray-scene intersection.GPU–Traversal of level2structuresCPU–Generating camera rays–Shading–Spawn secondary rays•Scene geometryWe choose triangles to be the only type of geometric primitive.This simplifys the problem and let us focus on the specific part of the system.Also by limiting the number of primitive to one.•Scene complexityWe do not aim for arbitrary complex scenes.The main subject of this thesis is on memory management between GPU and host,not disk to memory.So the data size of scenes cannot exceed the capacity of main memory.•Illumination model Since we compute the shading at CPU side,we do not want the cost of shading to be too expensive.Besides,realistic illumination is not the main objective,so we choose the simple Whitted’s ray tracing as the illumination model.3.1Ray-scene intersectionThe most important component in our system is the ray-scene intersection.This is the part where we mainly utilize the computation power of GPU.The ideal interface of ray-scene intersection on GPU is shown infigure3.1.Note that there are multiple rays fed into GPU at the same time.This is due to the SIMD nature of the fragment processors on GPU:it gets the most efficiency when processing multiple data at the same time.This interface resembles the ray engine in[CHH02].A more detailed framework of our GPU intersection block is shown in3.2.We implemented the methods described in[TS05].The scene data isfirst converted into an acceleration data structure and then stored in texture memory of GPU as textures.The rays are also stored in textures.The traversal of acceleration structure and intersection test written in shader code are executed in fragment processors.The intersection result are output to framebuffer,which has a texture attached,and then readback to main memory.In our implemtation,wefind the performance of BVH is the best out of the3 GPU acceleration structures in[TS05],which is consistent with their conclusion. Therefore we choose BVH as the underlying GPU intersection engine in our sys-tem.To implement the BVH ray tracer on GPU,several hardware and software features are required.First the GPU must support Shading Model3.0since dy-namic branching and looping instructions are used in the fragment shaders.To store data in textures,the support offloating point precision in textures must be present such as in OpenGL extension GL ARB texturefloat.To output data to tex-tures in fragment shader,we use the FrameBufferObject of OpenGL which supports render-to-texture function.We also use OcclusionQuery to determine whether the computation on GPU is st we use Early Z-Culling which is a optimization in graphics pipeline to skip the pixels whose computation are already over from entering the fragment processors again.All the above techniques are described in [TS05]and[GPG05].rayssceneintersections Figure3.1:Interface of ray-scene intersection on GPUrays sceneintersectionsGPUFigure3.2:Details of ray-scene intersection on GPU3.2Complex scenesThe scheme described in the last section worksfine with scenes that arefit into texture memory.We call the method in the last section the single-texture approach. However,as the scene gets more complex in geometry,two problems may be con-fronted:1.A single texture may exceed the maximum texture size2.Textures may exceed the capacity of the on-board texture memory of GPUIn the OpenGL specification,the maximum texture size is4096by4096.In our implementation,we use32bitfloating point texture to store data.As a result, the largest single texture will then be256MB.Take BVH,one of our implemented acceleration structure on GPU,for example,it tooks some where around64to67 bytes per triangle.By calculation,the256MB limit will be reached at around4 million triangles converted into a single texture representation of BVH structure. That is,if we use the simple BVH implementation,we can handle scenes with at most4million triangles.Figure3.3:Multi-texture approachAnother limitation is the capacity of texture memory.The latest GPU has at most512MB of texture memory,that is,we can have two256MB textures.How-ever,with the PCI-Express technology,the bandwidth between GPU and host is greatly improved compared to the last generation AGP bus.The graphics manufac-turers thus provide driver level mapping from graphics memory to main memory. That is,we can use part of the main memory as texture memory just like the virtual memory using hard disk as memory.For example,a GPU may mave only64MB memory on board.But to an user program,it may seems there are256MB tex-ture memory since the driver transparently swaps in and out texture data between GPU and host.These features are quite useful in low-end graphics products,since the cost of memory can be reduced while the performance comparable to products with same amount of actual on-board memory.However,by our experiment,the high-end GPU we use(nVidia GeForce7800)also has this feature.The available memory is256MB,but we can allocate more than1GB of textures without error. So this feature should be implemented in software and independent to hardware design.In sum,the limit on GPU memory size is resolved by the driver.The real limit still remaining is the size of a single texture.To get rid of this limit,a direct solution is to tile smaller textures into a larger one.For example we have data storing in an array that is represented as a texture of size8192by8192,we can divide it into4 textures sized4096by4096.And then we override the texture look-up function in shader code as in algorithm1.In this way,what we call multi-texture approach,by mapping the texture coordi-nates of a larger texture to that of multiple smaller textures,we can virtually accessAlgorithm1MyTex2D(tex;x;y)Return tex2D(tex0;x;y)else if x 4096and y<4096thenReturn tex2D(tex1;x 4096;y)else if x<4096and y 4096thenReturn tex2D(tex2;x;y 4096)elseReturn tex2D(tex3;x 4096;y 4096)end ifFigure3.4:The two-level algorithma texture that is larger than the maximum size in the specification.And by using this function,we can easily make a GPU intersection code capable of handling complex scenes by just sending larger textures.Nevertheless,this approach will not perform as efficient as it looks like.In frag-ment processors,where these codes are executed,instructions are issued in SIMD. Which means each time we call the MyTex2D function in a fragment shader,there will be4texture fetch commands executed.And3out of the4texels will be dis-carded by the conditional branching.Plus blocks of the4textures are potentially loaded to the texture memory and consume the bandwidth of the PCI-Express bus. The more tiles a texture are split into,the worse the performance.Overall,multi-texture is not a practical solution.So we come up with another solution,the two-level acceleration structure,which will be introduced in the fol-lowing section.3.3The two-level acceleration structureThe main idea of the proposed algorithm is to build a two-level hierarchies of ac-celeration data structures,which is composed of a coarser structure at top level,and finer structures at each leaf node of the coarser one.The coarser level(level1)of structure will be traversed by CPU,while thefiner level(level2)will be traversed by GPU,as shown infigure3.4.Through this partition,we can split large scene data into smaller chunks that canfit into the limited memory of GPU.This method is similar to[PKGH97],where they choose both the coarser andfiner level to be uniform grid.In theory,any of the acceleration data structure can be used in either level in this algorithm.AssumptionsWe make the following assumptions on the scene:1.Scenes are composed purely by triangles2.Scene data is larger than texture memory on GPU but smaller than systemmain memoryFor thefirst point,we handle only one type of primitive.Because GPU uses SIMD architecture,and two intersection codes means two times slower in GPU programming.Besides,all other primitive can be subdivided into triangles.Second,from viewpoint of GPU computation,the memory hierarchies are GPU L1cache,GPU L2cache,texture memory,system memory,and disk storage.The main target of this research is on texture memory to system memory.So we assume that our data canfit into main memory and do not explicitly handle the memory management from memory to disk.ConstructionGiven a set of triangles T,its2-level accelerating data structure(AS L1;AS i L2) can be computed with:To build a level1structure,one can use a top-down algorithm such as KD-tree and set the maximum triangles in a leaf node to a desired threshold.Or we can merge leaf nodes from bottom to top until the size of memory or the number of nodes meet the requirement from a complete accelerating structure or structure built with bottom-up algorithm such as the bounding volume hierarchies(BVH)[GS87].1:Build AS L1on T2:for each leaf node L i of AS L1do3:build AS i L2on triangles of L i4:end forFigure3.5:Partition of a level1structure using KD-tree of the bunny modelFig3.5is the visualization of a level1structure.Each color patch corresponds to a leaf node of the KD-tree.As thefigure shows,the partition is rather coarse,so each leaf node contains many triangles compared to a normal KD-tree used in an ordinary ray tracing algorithm.A level2structure has no difference from an acceleration data structure used in other GPU based ray tracers.The level2structures are built with the triangles of the leaf nodes of level1structures,each leaf node has one level2structure associated. The constructed level2structures AS i L2must be converted to the texture form that can be traced by a GPU ray tracer.And the texture representation of a level2 structure must not exceed the size limitation of a single texture.In OpenGL,this limitation is4096by4096texels,which can store up to256MB of data if32-bit floating point texture is used.So when building the level1structure,the size of leaf node must be choosed so that its level2structures are not too large.3.4Traversal of two-level structureHere we give the algorithm to traverse a2-level acceleration structure with the same interface of the ray-scene intersection described in section3.1.That is,given a queue of rays R and a2-level structure,compute and return the intersection of eachray in R.Algorithm3Intersect(R;AS L1;AS i L2)1:2:Add r to Q L13:end for4:while Q L1is not empty do5:for each ray r in Q L1do6:p=intersect L1(r;AS L1)7:if p is a leaf node Q i L2then8:Add r to Q i L29:else10:Intersect[r]=NULL11:Add r to I12:end if13:Delete r from Q L114:end for15:for each non-empty Q i L2do16:S=Intersect L2(Q i L2;AS i L2)17:for each ray s in S do18:if s is a hit then19:Add s in I20:else21:Add s in Q L122:end if23:end for24:end for25:end while26:Return IWe maintain a ray queue for the level1and each level2structures.That is, if the level1structure has n leaf nodes,there will be n+1ray queues.First all rays input for intersection query are put into the level1ray queue.Then each ray in level1ray queue is traversed in level1structure to test which is the nearest leaf node hit by the ray.If a ray hit some leaf node,we move that ray to the ray queue of the level2structure that node corresponds to;if the ray does not hit any thing,its traversal ends here.After this step,the level1ray queue should be empty and some of the level2queues would have rays in.These level2structures with non-empty ray queues are traversed and tested intersection in GPU.For the rays reported hit during the traversal of the level2structure,we update its states of intersection and it can stop the traversal if we use a level1structure that traverses along the direction of ray such as uniform grid or KD-tree.Otherwise if a ray misses in the level2traversal,it should continue its traversal in the level1structure,only not starts from root but from the next leaf node of current one.To do this we have to modified the level1traversal,we describe it in the following.Traversal of level1data structureAs stated above,the traversal of level1structure must continue from the last position of the previously visited node,or it will be an infinite loop.Our solution is to store afield of current position for each ray,and during traversal,we skip all the nodes that precedes to the current node.This method is used by[TF05]to remove the recursion of KD-tree traversal and apply on their GPU ray tracer,which they call’KD-Restart’.We use it in a CPU algorithm,though.This method adds extra cost to the traversal of level1structure,but since in our system,a level1structure are usually very small,containing only tens of leaf nodes,so we do not care these additional computational costs.Algorithm4Intersect L1(r;AS L1)p=nearest leaf node from position[r]along rposition[r]=next leaf node of p along rReturn pWe take into concern the effects of’restart’of traversal to compare3accelera-tion structures as level1structure.The candidates are uniform grid,KD-tree,and BVH.•Uniform grid:To store the’restart’position,the xyz index of a grid cell must be saved per ray and can be encoded into one integer.The traversal order is along the direction of the ray.The space partition is regular which can generate empty nodes or extremely uneven sizes of node.•KD-Tree:The’restart’position consists of2floating point number,the t min and t max of a node along the direction of ray.The space partition is adap-tive and each leaf node has similar size.The traversal order is along the ray direction,which is a plus.•BVH:A BVH is usuallyflatten from the original tree structure into an array by somefixed search order such as depth-first when traversed.We save the index of the array as’restart’position.A BVH is not traversed along the ray direction.The space partition is adaptive.From these comparisons,we choose KD-Tree as the level1structure since it has the properties of ray order traversal and adaptiveness.3.5RenderingTraditionally,a ray tracing algorithm is usually recursive.In our system,since the interface of the ray-scene intersection is multiple rays instead of a single ray,it is difficult to apply in a recursive ray tracing algorithm.Here we use the classic Whit-ted ray tracer as example and give two algorithms that utilize the2-level structure. One is called’batched’which interfaces the2-level structure through the ray-scene intersection block and is thus unaware of the underlying acceleration data structure. The other is called’mixed’which integrates the rendering with the traversal of the 2-level structure to get better utilization.BatchedAlgorithm5Rendering-batched(R)1:i02:while i<max iterations do3:i i+14:R intersect(R)5:for each ray r in R do6:if r is a hit then7:Add the shaded color to the pixel associated with r8:Add each spawn rays into S9:else10:Add the background color to the pixel associated with r11:end if12:end for13:R S14:S empty queue15:end whileIn algorithm5,we basically maintain2ray queues R and S.R stores the rays for current iteration while S stores the secondary rays for the next iteration.At each iteration,the ray-scene intersection is called to get the intersection of R.For each ray that hit,we compute its shading and spawned secondary rays such as reflection, refraction and add these new rays in S.Then we move the rays in S to R andcontinue the next iteration.The rendering process continues until the predefined maximum number of iteration is reached or there are no rays to process.Note that this algorithm is not aware of the underlying acceleration structure. The intersect procedure can be either a CPU implementation,a single-texture GPU implementation,or a2-level structure we described here.We also use this algorithm on the CPU KD-tree implementation in the experiments and results in the next chap-ter.MixedAlgorithm6Rendering-mixed(R)1:2:Initialize position[r]3:Add reference[r]to Q L14:end for5:while at least one of Q L1and all Q i L2is not empty do6:for each ray r in Q L1do7:p trace L1(r;AS L1)8:if p is a leaf node AS i L2of AS L1then9:position[r]next(r;p;AS L1)10:Add r to Q i L211:end if12:Delete r from Q L113:end for14:choose a AS i L2with non-empty Q i L215:S trace L2(Q i L2;AS i L2)16:for each ray S i in S do17:if S i is a hit then18:Process the intersection19:Spawn secondary rays T20:Add T to Q L121:else22:Add S i to Q L123:end if24:Delete S i from Q i L225:end for26:end whileIn algorithm6,we integrate the rendering of algorithm5with the traversal of2-level structure in algorithm3.The key difference is that after we get the intersection from a level2structure,we do not proceed to the intersection of the next non-。