openmp并行编程

格式：ppt
大小：696.00 KB
文档页数：77

下载文档原格式

基于多核的OpenMp并行程序设计

【技术研发】ｌ联鞭
基于多核的ＯｐｅｎＭｐ并行程序设计
彭曦顾炳根李展涛（桂林理工大学信息科学与工程学院广西桂林５４１００４）
摘要：介绍多核计算的出现和一种面向共享存储器的多处理器多线程并行编程语言ＯｐｅｎＭｐ，然后再以一个实例来说＂￣ＯｐｅｎＭｐ在多核下如何进行并行程序设计，通过计算加速比说明使用ＯｐｅｎＭｐ编程后程序执行效率得到显著提高
ＯｐｅｎＭＰ是一种面向共享存储器的多处理器多线程并行编程语言，线程间通过共享变量传递数据结果。ＯｐｅｎＭＰ标准形成于１９９７年，它是一种ＡＰＩ，用于编写可移植的多线程应用程序。ＯｐｅｎＭＰ程序设计模型提供了一组与平台无关的编译指令、指导命令、函数调用和环境变量，可以显式地指导编译器如何以及何时利用应用程序中的并行性。ＯｐｅｎＭＰ通过对原有的串行代码插入一些指导性的注释，并进行必要的修改，可以快速的实现并行编程，而这些注释的解析由编译器所完成。目前，Ｃ，ｃ＋＋，Ｆｏｒｔｒａｎ语言都支持ＯｐｅｎＭｐ，所有ＯｐｅｎＭｐ的并行化都是通过使用嵌入到ｃ，ｃ＋＋或Ｆｏｒｔｒａｎ源代码中的编译制导语句来达到的。
Ｓｔｒｕｃｔｕｒｅｄ—ｂｌｏｃｋ
ＯｐｅｎＭＰ的所有编译指导语句以＃ｐｒａｇｍａｏｍｐ开始，其中ｄｉｒｅｃｔｉｖｅ部分就包含Ｏｐｅｎｌｌｅｌｆｏｒ、
ｓｅｃｔｉｏｎ、ｓｅｃｔｉｏｎｓ、ｓｉｎｇｌｅ、ｍａｓｔｅｒ、ｃｒｉｔｉｃａｌ、ｆｌｕｓｈ、ｏｒｄｅｒｅｄ，

intelvisualfortran在visualstudio中如何正常的使用openmp并行程序

intelvisualfortran在visualstudio中如何正常的使⽤openmp并⾏程序在vs中利⽤ivf进⾏openmp的程序设计⼀：设置成openmp的可使⽤配置我的配置是IVF11.1,vstudio2008,Openmp3,进⼊代码界⾯后要设置属性，---fortran--language--process--OpenMp Dirctives为Generate parallel code如图所⽰：右键/属性这个并⾏的问题，我研究了很长时间，⾸先你要明确以下⼏点才能并⾏：1 你的计算机是双核以上的2 计算机的系统是64位的如XP64位（原因是现在的CPU多是采⽤64位架构，因此系统也要是64位的0，当然23位的也是可以的。

关键是确定你的cpu和对应的ivf3 你所⽤的IVF有64位组件，也异是在安装时会有64MT。

（在安装的过程中可以看到这个组件的安装）4 在IVF中要配置参数，project--（×）properties/fortran/language/process/openMP Directives ——generate parallelcode(Qopenmp)5 你的程序可以并⾏，即程序中有可以并⾏的地⽅，前后没有逻辑关系基本上把这⼏点弄懂了，差不多可以进⾏简单的并⾏计算了program main!*****************************************************************************8 0!!! MAIN is the main program for TEST_OMP.!! Discussion:!! TEST_OMP estimates the value of PI.!! This program uses Open MP parallelization directives.!! It should run properly whether parallelization is used or not.!! However, the parallel version computes the sum in a different! order than the serial version; some of the quantities added are! quite small, and so this will affect the accuracy of the results.!! Modified:! Author:!! John Burkardt!! A FORTRAN 90 module may be available:!! use omp_lib!! A FORTRAN 77 include file may be available:!! include 'omp_lib.h'!implicit noneinteger, parameter :: r4_logn_max = 9integer idinteger nthreadsinteger omp_get_num_procsinteger omp_get_num_threadsinteger omp_get_thread_numcall timestamp ( )write ( *, '(a)' ) ' 'write ( *, '(a)' ) 'TEST_OMP'write ( *, '(a)' ) ' FORTRAN90 version'write ( *, '(a)' ) ' 'write ( *, '(a)' ) ' Estimate the value of PI by summing a series.'write ( *, '(a)' ) ' 'write ( *, '(a)' ) ' This program includes Open MP directives, which' write ( *, '(a)' ) ' may be used to run the program in parallel.' write ( *, '(a)' ) ' 'write ( *, '(a)' ) ' The number of processors available:'write ( *, '(a,i8)' ) ' OMP_GET_NUM_PROCS () = ', omp_get_num_procs ( ) nthreads = 4write ( *, '(a)' ) ' 'write ( *, '(a,i8,a)' ) ' Call OMP_SET_NUM_THREADS, and request ', &nthreads, ' threads.'! Note that the call to OMP_GET_NUM_THREADS will always return 1! if called outside a parallel region!!!$OMP parallel private ( id )id = omp_get_thread_num ( )write ( *, '(a,i3)' ) ' This is process ', idif ( id == 0 ) thenwrite ( *, '(a)' ) ' 'write ( *, '(a)' ) ' Calling OMP_GET_NUM_THREADS inside a 'write ( *, '(a)' ) ' parallel region, we get the number of'write ( *, '(a,i3)' ) ' threads is ', omp_get_num_threads ( )write ( *, '(a)' ) ' 'end if!$OMP end parallelcall r4_test ( r4_logn_max )write ( *, '(a)' ) ' 'write ( *, '(a)' ) 'TEST_OMP'write ( *, '(a)' ) ' Normal end of execution.'write ( *, '(a)' ) ' 'call timestamp ( )stopendsubroutine r4_test ( logn_max )!*****************************************************************************8 0!!! R4_TEST estimates the value of PI using single precision.!! Discussion:!! PI is estimated using N terms. N is increased from 10^2 to 10^LOGN_MAX.! The calculation is repeated using both sequential and Open MP enabled code. ! Wall clock time is measured by calling SYSTEM_CLOCK.!! 06 January 2003!! Author:!! John Burkardt!implicit noneinteger clock_maxinteger clock_rateinteger clock_startinteger clock_stopreal errorreal estimateinteger logninteger logn_maxcharacter ( len = 3 ) modeinteger nreal r4_pireal timewrite ( *, '(a)' ) ' 'write ( *, '(a)' ) 'R4_TEST:'write ( *, '(a)' ) ' Estimate the value of PI,'write ( *, '(a)' ) ' using single precision arithmetic.'write ( *, '(a)' ) ' 'write ( *, '(a)' ) ' N = number of terms computed and added;' write ( *, '(a)' ) ' 'write ( *, '(a)' ) ' ESTIMATE = the computed estimate of PI;' write ( *, '(a)' ) ' 'write ( *, '(a)' ) ' ERROR = ( the computed estimate - PI );'write ( *, '(a)' ) ' 'write ( *, '(a)' ) ' TIME = elapsed wall clock time;'write ( *, '(a)' ) ' 'write ( *, '(a)' ) ' Note that you can''t increase N forever, because:'write ( *, '(a)' ) ' B) maximum integer size is a problem.'write ( *, '(a)' ) ' 'write ( *, '(a,i12)' ) ' The maximum integer:' , huge ( n )write ( *, '(a)' ) ' 'write ( *, '(a)' ) ' 'write ( *, '(a)' ) ' N Mode Estimate Error Time' write ( *, '(a)' ) ' 'n = 1do logn = 2, logn_maxmode = 'OMP'call system_clock ( clock_start, clock_rate, clock_max )call r4_pi_est_omp ( n, estimate )call system_clock ( clock_stop, clock_rate, clock_max )time = real ( clock_stop - clock_start ) / real ( clock_rate )error = abs ( estimate - r4_pi ( ) )write ( *, '( i12, 2x, a3, 2x, f14.10, 2x, g14.6, 2x, g14.6 )' ) &n, mode, estimate, error, timen = n * 10end doreturnendsubroutine r4_pi_est_omp ( n, estimate )!*****************************************************************************8 0 !!! R4_PI_EST_OMP estimates the value of PI, using Open MP.!! Discussion:!! The calculation is based on the formula for the indefinite integral:!! Integral 1 / ( 1 + X**2 ) dx = Arctan ( X )!! Hence, the definite integral!! Integral ( 0 <= X <= 1 ) 1 / ( 1 + X**2 ) dx!! A standard way to approximate an integral uses the midpoint rule.! If we create N equally spaced intervals of width 1/N, then the! midpoint of the I-th interval is!! X(I) = (2*I-1)/(2*N).!! The approximation for the integral is then:!! Sum ( 1 <= I <= N ) (1/N) * 1 / ( 1 + X(I)**2 )!! In order to compute PI, we multiply this by 4; we also can pull out! the factor of 1/N, so that the formula you see in the program looks like: !! ( 4 / N ) * Sum ( 1 <= I <= N ) 1 / ( 1 + X(I)**2 )!! Until roundoff becomes an issue, greater accuracy can be achieved by ! increasing the value of N. !! Modified:!! 06 January 2003!! Author:!! John Burkardt!! Parameters:!! Input, integer N, the number of terms to add up.!! Output, real ESTIMATE, the estimated value of pi.!implicit nonereal hinteger nreal sum2real xh = 1.0E+00 / real ( 2 * n )sum2 = 0.0E+00!!$OMP parallel do private(x) shared(h) reduction(+: sum2)!do i = 1, nx = h * real ( 2 * i - 1 )sum2 = sum2 + 1.0E+00 / ( 1.0E+00 + x**2 )end doestimate = 4.0E+00 * sum2 / real ( n )returnendfunction r4_pi ( )!*****************************************************************************8 0 !!! R4_PI returns the value of pi.!! Modified:!! 02 February 2000!! Author:!! John Burkardt!! Parameters:!! Output, real R4_PI, the value of pi.!implicit noner4_pi = 3.14159265358979323846264338327950288419716939937510E+00 returnendsubroutine timestamp ( )!*****************************************************************************8 0!!! TIMESTAMP prints the current YMDHMS date as a time stamp.!! Example:!! May 31 2001 9:45:54.872 AM!! Modified:!! 31 May 2001!! Author:!! John Burkardt!! Parameters:!! None!implicit nonecharacter ( len = 8 ) ampminteger dcharacter ( len = 8 ) dateinteger hinteger minteger mmcharacter ( len = 9 ), parameter, dimension(12) :: month = (/ &'January ', 'February ', 'March ', 'April ', &'May ', 'June ', 'July ', 'August ', &integer ninteger scharacter ( len = 10 ) timeinteger values(8)integer ycharacter ( len = 5 ) zonecall date_and_time ( date, time, zone, values ) y = values(1)m = values(2)d = values(3)h = values(5)n = values(6)s = values(7)mm = values(8)if ( h < 12 ) thenampm = 'AM'else if ( h == 12 ) thenif ( n == 0 .and. s == 0 ) thenampm = 'Noon'elseampm = 'PM'end ifelseh = h - 12if ( h < 12 ) thenampm = 'PM'else if ( h == 12 ) thenif ( n == 0 .and. s == 0 ) thenampm = 'Midnight'elseampm = 'AM'end ifend ifend iftrim ( month(m) ), d, y, h, ':', n, ':', s, '.', mm, trim ( ampm ) returnend!===================================== COPY上⾯的程序，可以完全运⾏成功，运⾏界⾯如下：。

在fortran下进行openmp并行计算编程

在fortran下进⾏openmp并⾏计算编程最近写⽔动⼒的程序，体系太⼤，必须⽤并⾏才能算的动，⽆奈只好找了并⾏编程的资料学习了。

我想我没有必要在博客⾥开⼀个什么并⾏编程的教程之类，因为⽹上到处都是，我就随⼿记点重要的笔记吧。

这⾥主要是openmp的~1 临界与归约在涉及到openmp的并⾏时，最需要注意的就是被并⾏的区域中的公共变量，对于需要reduce的变量，尤其要注意，⽐如这段代码：program mainimplicit noneinclude 'omp_lib.h'integer N,M,ireal(kind=8) tN=20000t=0.0!$OMP PARALLEL DOdo i=1,Nt=t+float(i);M=OMP_get_num_threads()enddowrite(*, "('t = ', F20.5, ' running on ', I3, ' threads.')") t,Mpausestopend串⾏代码可以很容易的得到正确结果：t = 200010000.00000 running on 1 threads.不幸的是，如果是并⾏的话，可能每次都得到⼀个不同的结果：t = 54821260.00000 running on 8 threads.t = 54430262.00000 running on 8 threads.....原因很简单，假设do被并⾏了两个线程，A1，A2，则每个线程都可以t，在其中⼀个线程访问t的时候，另⼀个线程修改了t，导致t的某些值“丢了”。

解决⽅法有两种，第⼀种就是“临界”，就是锁定t：!$OMP PARALLEL DOdo i = 1, N!$OMP CRITICALt = t+float(i)!$OMP END CRITICALM = OMP_get_num_threads()enddo这样每个时刻只有⼀个线程能访问这个变量。

在C++中实现并行计算和并行算法

在C++中实现并行计算和并行算法并行计算和并行算法是指通过同时运行多个计算任务来提高计算效率的一种计算方法。

在C++中，可以使用多线程、OpenMP和MPI等工具实现并行计算和并行算法。

1.多线程：C++提供了多线程编程的支持，可以使用std::thread库来创建和管理线程。

多线程可以将一个计算任务划分为多个子任务，在多个线程中同时执行，从而提高计算效率。

下面以一个简单的例子来说明多线程的使用：```cpp#include <iostream>#include <thread>//子线程执行的函数void task(int id) {std::cout << "Thread " << id << " is running" <<std::endl;int main() {const int numThreads = 4;std::thread threads[numThreads];//创建多个线程，并分配不同的子任务for (int i = 0; i < numThreads; ++i) { threads[i] = std::thread(task, i);}//等待所有线程执行完毕for (int i = 0; i < numThreads; ++i) { threads[i].join();}return 0;}运行这段代码，我们可以看到输出结果显示了四个线程同时执行的情况。

2. OpenMP：OpenMP是一种并行编程接口，可以在C++中使用它来实现并行计算。

OpenMP提供了一系列的指令和函数，可以在循环、函数和代码段等级别上实现并行化。

下面是一个使用OpenMP实现的并行循环的例子：```cpp#include <iostream>#include <omp.h>int main() {const int size = 100;int arr[size];//使用OpenMP并行化循环初始化数组#pragma omp parallel forfor (int i = 0; i < size; ++i) { arr[i] = i;}//输出数组的内容for (int i = 0; i < size; ++i) { std::cout << arr[i] << " ";if (i % 10 == 9) {std::cout << std::endl;}}return 0;}```运行结果显示数组中的元素是按照顺序初始化的，这表明循环在多个线程中并行执行。

OpenMP和MPI之对比

OpenMP和MPI之对比
嵌套并行执行模型
OpenMP 采用fork-join （分叉- 合并）并行执行模式。

线程遇到并行构造时，就会创建由其自身及其他一些额外（可能为零个）线程组成的线程组。

遇到并行构造的线程成为新组中的主线程。

组中的其他线程称为组的从属线程。

所有组成员都执行并行构造内的代码。

如果某个线程完成了其在并行构造内的工作，它就会在并行构造末尾的隐式屏障处等待。

当所有组成员都到达该屏障时，这些线程就可以离开该屏障了。

主线程继续执行并行构造之后的用户代码，而从属线程则等待被召集加入到其他组。

OpenMP 并行区域之间可以互相嵌套。

如果禁用嵌套并行操作，则由遇到并行区域内并行构造的线程所创建的新组仅包含遇到并行构造的线程。

如果启用嵌套并行操作，则新组可以包含多个线程。

OpenMP 运行时库维护一个线程池，该线程池可用作并行区域中的从属线程。

当线程遇到并行构造并需要创建包含多个线程的线程组时，该线程将检查该池，从池中获取空闲线程，将其作为组的从属线程。

如果池中没有足够的空闲线程，则主线程获取的从属线程可
能会比所需的要少。

组完成执行并行区域时，从属线程就会返回到池中。

高性能计算中的并行编程模型介绍

高性能计算中的并行编程模型介绍高性能计算（High-Performance Computing，HPC）是一种利用大规模计算机系统进行高效计算和解决复杂问题的技术。

在高性能计算中，为了提高计算效率和处理大规模数据，使用并行编程模型是必不可少的。

并行编程模型是一种在多个处理单元（如CPU、GPU等）上同时执行代码的方法，能够实现任务的分解和并发执行，提高计算速度和系统的整体性能。

并行编程模型主要有以下几种：共享内存模型、分布式内存模型以及混合模型。

共享内存模型是指多个处理单元共享同一个内存空间，在该模型中，所有的处理单元可以同时访问和修改共享内存中的数据。

共享内存模型的最大优势在于简单易用，程序员只需要在编写代码时考虑数据的同步和互斥。

常用的共享内存编程模型包括OpenMP和POSIX线程。

OpenMP（Open Multi-Processing）是一种支持并行编程的API，可以通过在代码中添加一些特殊的指令来实现并行化。

通过使用OpenMP，程序员可以简单地将串行代码转化为并行代码。

OpenMP使用的指令主要包括#pragma omp并行指令、#pragmaomp for指令以及#pragma omp critical指令等。

这些指令可以指定代码块并行执行、循环并行化以及实现临界区保护等。

OpenMP适用于共享内存系统，对于多核CPU和SMP（Symmetric Multi-Processing）系统，具有较好的扩展性。

POSIX线程（Pthreads）是一种标准的共享内存并行编程模型，可以在多线程环境下创建和管理线程。

Pthreads使用的函数库包括pthread_create、pthread_join和pthread_mutex等，可以创建线程、等待线程结束并实现互斥和同步。

使用Pthreads编写的并行程序可以同时利用多个CPU核心进行计算，有效地提高了程序的执行速度。

分布式内存模型是指多个处理单元之间通过消息传递来共享数据，每个处理单元拥有自己的本地内存。

linux openmp 例子程序

linux openmp 例子程序标题：Linux OpenMP例子程序1. OpenMP简介OpenMP是一种并行编程模型，可以在共享内存系统上实现并行计算。

它使用指令集和编译器指示来将串行代码转换为并行代码，从而实现更高效的计算。

2. Hello World程序下面是一个简单的OpenMP程序，用于打印“Hello World”：```c#include <stdio.h>#include <omp.h>int main() {#pragma omp parallel{int thread_id = omp_get_thread_num();printf("Hello World from thread %d\n", thread_id);}return 0;}```该程序使用了`#pragma omp parallel`指令来创建线程，并使用`omp_get_thread_num()`函数获取线程ID。

3. 并行for循环OpenMP可以很方便地并行化for循环。

下面是一个计算数组元素和的例子：```c#include <stdio.h>#include <omp.h>int main() {int sum = 0;#pragma omp parallel for reduction(+:sum)for (int i = 0; i < 100; i++) {sum += i;}printf("Sum: %d\n", sum);return 0;}```在上述代码中，`#pragma omp parallel for`指令将for循环并行化，`reduction(+:sum)`指示OpenMP将每个线程的局部和累加到全局和`sum`中。

4. 并行化矩阵乘法OpenMP也可以用于并行化矩阵乘法。

下面是一个简单的矩阵乘法示例：```c#include <stdio.h>#include <omp.h>#define N 100void matrix_multiply(int A[N][N], int B[N][N], int C[N][N]) {#pragma omp parallel forfor (int i = 0; i < N; i++) {for (int j = 0; j < N; j++) {C[i][j] = 0;for (int k = 0; k < N; k++) {C[i][j] += A[i][k] * B[k][j];}}}}int main() {int A[N][N];int B[N][N];int C[N][N];// 初始化A和B矩阵matrix_multiply(A, B, C);// 打印结果return 0;}```在上述代码中，`#pragma omp parallel for`指令将外层循环并行化，从而加快矩阵乘法的计算速度。

并行编程——MPIOPENMP混合编程

并⾏编程——MPIOPENMP混合编程在⼤规模节点间的并⾏时，由于节点间通讯的量是成平⽅项增长的，所以带宽很快就会显得不够。

所以⼀种思路增加程序效率线性的⽅法是⽤MPI/OPENMP混合编写并⾏部分。

这⼀部分其实在了解了MPI和OPENMP以后相对容易解决点。

⼤致思路是每个节点分配1-2个MPI进程后，每个MPI进程执⾏多个OPENMP线程。

OPENMP部分由于不需要进程间通信，直接通过内存共享⽅式交换信息，不⾛⽹络带宽，所以可以显著减少程序所需通讯的信息。

Fortran:Program hellouse mpiuse omp_libImplicit NoneInteger :: myid,numprocs,rc,ierrInteger :: i,j,k,tidCall MPI_INIT(ierr)Call MPI_COMM_RANK(MPI_COMM_WORLD,myid,ierr)Call MPI_COMM_SIZE(MPI_COMM_WORLD,numprocs,ierr)!$OMP Parallel private(tid)tid=OMP_GET_THREAD_NUM()write(*,*) 'hello from',tid,'of process',myid!$OMP END PARALLELCall MPI_FINALIZE(rc)StopEnd Program helloC++:# include <cstdlib># include <iostream># include <ctime># include "mpi.h"# include "omp.h"using namespace std;int main ( int argc, char *argv[] );//****************************************************************************80int main ( int argc, char *argv[] ){int myid;int nprocs;int this_thread;MPI::Init();myid=MPI::COMM_WORLD.Get_rank();nprocs=MPI::COMM_WORLD.Get_size();#pragma omp parallel private(this_thread){this_thread=omp_get_thread_num();cout <<this_thread<<" thread from "<<myid<<" is ok\n";}MPI::Finalize();return0;}这⾥值得要注意的是，似乎直接⽤mpif90/mpicxx编译的库会报错，所以需要⽤icc -openmp hello.cpp -o hello -DMPICH_IGNORE_CXX_SEEK -L/Path/to/mpi/lib/ -lmpi_mt -lmpiic -I/path/to/mpi/include其中-DMPICH_IGNORE_CXX_SEEK为防⽌MPI2协议中⼀个重复定义问题所使⽤的选项，为了保证线程安全，必须使⽤mpi_mt库对于intel的mpirun，必须在mpirun后加上-env I_MPI_PIN_DOMAIN omp使得每个mpi进程会启动openmp线程。

OpenMP

OpenMP是一种针对共享内存的多线程编程技术，由一些具有国际影响力的大规模软件和硬件厂商共同定义的标准.它是一种编译指导语句，指导多线程、共享内存并行的应用程序编程接口(API)OpenMP是一种面向共享内存以及分布式共享内存的多处理器多线程并行编程语言，OpenMP是一种能被用于显示指导多线程、共享内存并行的应用程序编程接口.其规范由SGI发起.OpenMP具有良好的可移植性，支持多种编程语言.OpenMP能够支持多种平台，包括大多数的类UNIX以及WindowsNT系统.OpenMP最初是为了共享内存多处理的系统结构设计的并行编程方法，与通过消息传递进行并行编程的模型有很大的区别.因为这是用来处理多处理器共享一个内存设备这样的情况的.多个处理器在访问内存的时候使用的是相同的内存编址空间.SMP是一种共享内存的体系结构，同时分布式共享内存的系统也属于共享内存多处理器结构，分布式共享内存将多机的内存资源通过虚拟化的方式形成一个同意的内存空间提供给多个机子上的处理器使用,OpenMP对这样的机器也提供了一定的支持.OpenMP的编程模型以线程为基础，通过编译指导语句来显示地指导并行化，为编程人员提供了对并行化的完整控制.这里引入了一种新的语句来进行程序上的编写和设计.OpenMP的执行模型采用Fork-Join的形式，Fork-Join执行模式在开始执行的时候，只有一个叫做“主线程“的运行线程存在.主线程在运行过程中，当遇到需要进行并行计算的时候，派生出线程来执行并行人物，在并行执行的时候，主线程和派生线程共同工作，在并行代码结束后，派生线程退出或者是挂起，不再工作，控制流程回到单独的主线程中。

OpenMP的功能由两种形式提供：编译指导语句和运行时库函数，并通过环境变量的方式灵活控制程序的运行.OpenMP和MPI是并行编程的两个手段，对比如下：∙OpenMP:线程级（并行粒度）；共享存储；隐式（数据分配方式）；可扩展性差；∙MPI：进程级；分布式存储；显式；可扩展性好。

OpenMP共享内存并行编程详解

OpenMP共享内存并⾏编程详解实验平台：win7， VS20101. 介绍并⾏计算机可以简单分为共享内存和分布式内存，共享内存就是多个核⼼共享⼀个内存，⽬前的PC就是这类（不管是只有⼀个多核CPU 还是可以插多个CPU，它们都有多个核⼼和⼀个内存），⼀般的⼤型计算机结合分布式内存和共享内存结构，即每个计算节点内是共享内存，节点间是分布式内存。

想要在这些并⾏计算机上获得较好的性能，进⾏并⾏编程是必要条件。

⽬前流⾏的并⾏程序设计⽅法是，分布式内存结构上使⽤MPI，共享内存结构上使⽤Pthreads或OpenMP。

我们这⾥关注的是共享内存并⾏计算机，因为编辑这篇⽂章的机器就属于此类型（普通的台式机）。

和Pthreads相⽐OpenMP更简单，对于关注算法、只要求对线程之间关系进⾏最基本控制（同步，互斥等）的我们来说，OpenMP再适合不过了。

本⽂对windows上Visual Studio开发环境下的OpenMP并⾏编程进⾏简单的探讨。

本⽂参考了wikipedia关于OpenMP条⽬、（有OpenMP Specification）、MSDM上关于OpenMP条⽬以及教材《MPI与OpenMP并⾏程序设计（C语⾔版）》：1.2.3.4. 《MPI与OpenMP并⾏程序设计（C语⾔版）》第17章，Michael J. Quinn著，陈⽂光等译，清华⼤学出版社，2004注意，OpenMP⽬前最新版本为4.0.0，⽽VS2010仅⽀持OpenMP2.0（2002年版本），所以本⽂所讲的也是OpenMP2.0，本⽂注重使⽤OpenMP获得接近核⼼数的加速⽐，所以OpenMP2.0也⾜够了。

2. 第⼀个OpenMP程序step 1：新建控制台程序step 2：项⽬属性，所有配置下“配置属性>>C/C++>>语⾔>>OpenMP⽀持”修改为是（/openmp），如下图：step 3：添加如下代码：1 #include<omp.h>2 #include<iostream>3int main()4 {5 std::cout << "parallel begin:\n";6#pragma omp parallel7 {8 std::cout << omp_get_thread_num();9 }10 std::cout << "\n parallel end.\n";11 std::cin.get();12return0;13 }step 4：运⾏结果如下图：可以看到，我的计算机是8核的（严格说是8线程的），这是我们实验室的⼩型⼯作站（⾄多⽀持24核）。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

49
3
DSM：分布共享存贮并行机，它是由结点（一般是SMP系：分布共享存贮并行机，它是由结点（一般是系通过高速消息传递网络互连而成。统）通过高速消息传递网络互连而成。存贮系统物理上是分布的，各结点有自己独立的寻址空间，分布的，各结点有自己独立的寻址空间，然而在逻辑上存贮系统是共享的。贮系统是共享的。分布共享、共享变量、单地址空间、分布共享、NUMA Cluster(Now,Cow)群集系统：群集系统：群集系统将单个处理器，将单个处理器，用Ethernet，Myrinet，Quadrics，，，， Infiniband，Switch连结起来形成群集系统。连结起来形成群集系统。，连结起来形成群集系统结点用高速网连结起来。将SMP结点用高速网连结起来。结点用高速网连结起来将DSM用高速网连结起来。用高速网连结起来。用高速网连结起来多地址空间、分布非共享、共享变量、多地址空间、分布非共享、NORMA
OpenMP并行编程
主讲人：赵永华中科院计算机网络信息中心超级计算中心 yhzhao@
OpenMP编程简介 OPenMP编程制导 OpenMP库函数 OpenMP OpenMP环境变量 OpenMP计算实例
10:49
2
并行机体系结构及通信机制（回顾）
PVP：并行向量机：集中共享、共享变量、单地址空间、集中共享、UMA SMP：共享存贮并行机，它是由多个处理器通过交叉开关：共享存贮并行机， Crossbar）或开关（SWITCH）与内存互连。（Crossbar）或开关（SWITCH）与内存互连。集中共享、共享变量、单地址空间、集中共享、UMA MPP：分布式存贮并行机，它是由称为结点通过消息传递网：分布式存贮并行机，络互连而成。络互连而成。多地址空间、分布非共享、共享变量、多地址空间、分布非共享、NORMA
10:49 4
OpenMP编程简介
10:49
5
OpenMP简介
OpenMP是共享存储体系结构上的一个并行编程模型。适合于SMP共享内存多处理系统和多核处理器体系结构。起源于ANSI X3H5标准简单、移植性好和可扩展性等特点提供了支持Fortran、C/C++的API和规范由一组编译制导、运行时库函数（Run-Time routines）和环境变量组成。工业标准
OpenMP程序开始于一个单独的主线程（Master Thread），然后主线程一直串行执行，直到遇见第一个并行域(Parallel Region)，然后开始并行执行并行域。其过程如下：
Fork:主线程创建一个并行线程队列，然后，并行域中的代码在不同的线程上并行执行； Join:当并行域执行完之后，它们或被同步或被中断，最后只有主线程在执行。
10:49
15
编译制导标识（sentinels）
制导是特殊的、仅用于特定编译器的源代码。制导由一个位于行首的标识加以区分。 OpenMP 制导标识: Fortran: !$OMP (or C$OMP or *$OMP) C/C++: #pragma omp
10:49
16
并行域制导
一个并行域就是一个能被多个线程并行执行的程序段 Fortran：：！$OMP PARALLEL [clauses] BLOCK ！$OMP END PARALLEL C/C++：： #pragma omp parallel [clauses] { BLOCK }
计算Pi值
/* Seriel Code */ static long num_steps = 100000; double step; void main () { int i; double x, pi, sum = 0.0, start_time,end_time; step = 1.0/(double) num_steps; start_time=clock(); for (i=1;i<= num_steps; i++){ x = (i-0.5)*step; sum = sum + 4.0/(1.0+x*x); } pi = step * sum; end_time=clock(); printf(“Pi=%f\n Running time \n”, pi, end_time-start_time); }
DEC、Intel、IBM、HP、Sun、SGI等公司支持 DEC、Intel、IBM、HP、Sun、SGI 包括Linux、UNIX NT等多种操作系统平台 Linux、UNIX和NT
/
10:49 6
OpenMP并行编程模式
OpenMP是基于线程的并行编程模型。 OpenMP采用Fork-Join并行执行方式：
10:49
7
OpenMP程序并行框架
Master thread
F O R K
J I O N
F O R K
J I O N
串行部分
并行域
串行部分
并行域
串行部分
10:49
8
简单的”Hello, world”OpenMP并性程序
/* 用OpenMP/C编写编写Hello World代码段 */ 编写代码段 #include <omp.h> int main(int argc, char *argv[]) { int nthreads,tid; int nprocs; char buf[32]; /* Fork a team of threads */ #pragma omp parallel private(nthreads,tid) { /* Obtain and print thread id */ tid = omp_get_thread_num(); printf("Hello, world from OpenMP thread %d\n", tid); /*Only master thread does this */ if (tid == 0) { nthreads = omp_get_num_threads(); printf(" Number of threads %d\n",nthreads); } } return 0; } 10:49
10:49 17
说明
在并行域结尾有一个隐式同步（barrier）。子句（clause）用来说明并行域的附加信息。在Fortran语言中，子句间用逗号或空格分隔； C/C++子句间用空格分开。
10:49
18
并行域结构：例图
Master thread
Threads barrier Master thread Threads barrier Master thread
9
编译执行：编译执行： efc -openmp –o HelloWorld HelloWorld.c ./HelloWorld 运行结果：运行结果： Hello World from OpenMP thread 2 Hello World from OpenMP thread 0 Number of threads 4 Hello World from OpenMP thread 3 Hello World from OpenMP thread 1
10:49 23
并行域并行（SPMD并行模式）
include <omp.h> static long num_steps = 100000; double step; #define NUM_THREADS 4 void main () { int i ; double pi, sum[NUM_THREADS] , start_time, end_time ; step = 1.0/(double) num_steps; omp_set_num_threads(NUM_THREADS) start_time=omp_get_wtime(); #pragma omp parallel { int id; double x; id = omp_get_thraead_num(); for (i=id, sum[id]=0.0;i< num_steps; i=i+NUM_THREADS){ x = (i+0.5)*step; sum[id] += 4.0/(1.0+x*x); } } for(i=0, pi=0.0;i<NUM_THREADS;i++)pi += sum[i] * step; end_time=omp_get_wtime(); printf(“Pi=%f\n Running time \n”, pi, end_time-start_time); }
10:49 11
基于C/C++语言的OpenMP程序结构 #include<omp.h> main(){ int var1, var2, var3; …….. #pragma omp parallel private(var1, var2) shared(var 3) { …………. } …………… } 10:49
10:49 19
shared 和privated子句
并行域内的变量，可以通过子句说明为公有或私有; 在编写多线程程序时，确定哪些数据的公有或私有非常重要：影响程序的性能和正确性 Fortran：： SHARED(list) PRIVATE(list) DEFAULT(SHARED|PRIVATE|NONE) C/C++：： shared(list) private(list) default(shared|private|none)
10:49
10
OpenMP程序结构
基于Fortran语言的OpenMP程序结构 PROGRAM PROG_NAME INTEGER VAR1, VAR2 ,VAR3 ………. !$OMP PARALLEL PRIVATE(VAR1, VAR2) SHARED(VAR3) ………. !$OMP END PARALLEL …… END