Load Balancing for Spatial-Grid-Based Parallel Numeric Simulations on Clusters of SMPs
- 格式:pdf
- 大小:215.50 KB
- 文档页数:8
Load Balancing for Spatial-grid-based Parallel Numeric Simulations on Clusters
of SMPs
Huaien Gao Andreas Schmidt Amitava Gupta Peter Luksch
Technical University, Munich, Germany
Emails : {gao, schmiand, gupta, Peter.Luksch}@in.tum.de
Abstract
Load distribution is an essential factor to parallel efficiency of numerical simulations that are based on spatial grids, especially on clusters of symmetric multiprocessors (SMPs). This paper presents a method of mapping spatial grid nodes to processors that combines two load balancing methodologies, graph partitioning and graph matching, to achieve maximum parallel efficiency on SMP clusters. The method has been successfully applied to load distribution in a parallel Computational Fluid Dynamics (CFD) simulation. Test runs on a PC cluster prove the effectiveness of our method.
1. Introduction
Scientific simulations in many domains, e.g., CFD, Computational Structural Mechanics (CSM), VLSI simulations, are based on the concept of a spatial grid. A natural way to execute this type of applications in parallel is to follow the Single Program Multiple Data (SPMD) approach, i.e., to distribute the spatial grid onto multiple processes, each of which is assigned a partition of the grid.
Load balancing is a key factor in achieving high parallel efficiency, especially on platforms with a large number of nodes. The amount of work assigned to each processor has to be determined such that the turnaround time is minimized. For a parallel application running on a large number of processors, the turnaround time is defined as the maximum of all the times taken by the individual processors to complete the task. For a given problem, there also is an optimal number of processors for which the turnaround time is minimized. Knowing this number is important in order to maximize throughput. For instance, crash simulations usually involve parameter studies, where a large number of simulation jobs are executed with the same model, each with different parameter settings. Using more than the optimal number of processors for a single job would result in a waste of resources, because faster execution would be possible with fewer processors. With fewer processors per job, more jobs can run in parallel on a large cluster.
Approaches to task assignment in distributed systems can be classified into three broad categories, namely graph-theoretic ([1], [2] as examples), mathematical programming ([3], [4]) and heuristic [5]. Graph-theoretic algorithms view the task as a graph representing the inter-modular dependencies and apply graph partitioning methodologies to obtained equal partitions of the task with the inter-node communication or the volume of such communication minimized. A concise tool for this purpose is Metis [6]. With the mathematical programming approach the problem is viewed as an optimisation problem and solved using mathematical programming techniques. Heuristic methods provide fast but often sub-optimal solutions within a finite time, where an optimal solution cannot be obtained within a finite time.
Shen and Tsai propose a method in [7], where the problem of load balancing by optimal task assignment is viewed as a graph matching problem. The task is represented as a task graph with each module represented by a vertex and the communication between these modules represented by edges. The weights associated with the vertices represent the computation cost associated with each module and the weights associated with each edge represents the communication cost for the interaction between two adjacent vertices of the task graph. Both these costs are expressed in terms of time. The connectivity between the processors is viewed as a