当前位置:文档之家› 2.2 Compile the Java class............................... 5

2.2 Compile the Java class............................... 5

2.2 Compile the Java class............................... 5
2.2 Compile the Java class............................... 5

Speeding up Java Programs by calling native methods

Studienarbeit

Institut für elektrische Nachrichtentechnik

RWTH-Aachen

Felix Engel

Matrikelnummer:222750

Betreuer:Holger Crysandt

July21,2004

Contents

1Introduction4

2The Java Native Interface5

2.1Declare a Java method as native (5)

2.2Compile the Java class (5)

2.3Create a C-header?le from the compiled Java Class (6)

2.4Implement the functions declared in the header?le (6)

2.5Create a shared library (7)

2.6Load the library into the Java class (7)

3Test Procedure8

3.1Tests involved (8)

4Analysing Test Results10

4.1Description of the plots (10)

4.2Represantative cases (11)

4.2.1Worst case order n:copy (11)

4.2.2Best case order n:dot (12)

4.2.3Order n2:gemv (12)

4.2.4Order n3:gemm (13)

5Conclusions20 References20

A Test routines22

B Hardware con?gurations24

2

CONTENTS3

Chapter1

Introduction

The popularity and widespread use of the Java Programming Language are constantly increasing. The main reasons for this development are:

?Cross platform availability

?Pure object Orientation

?A garbage collector which autmatically frees memory that is not being referenced anymore ?A wide set of tested“off the shelf“libraries for a vast number of tasks are provided by the Java Runtime Environment(JRE)and third party vendors.

However,Java bytecode cannot be as performant as highly optimized C or FORTRAN code. Especially for numerical computations,hardware vendors like SUN Microsystems[1],Intel[2] and Silicon Graphics provide libraries which generally include the“Basic Linear Algebra Sub-programs“(BLAS)[3].The BLAS standardize FORTRAN subroutines and functions for basic vector and matrix operations.

Java speci?es an interface to call native methods written in any language from within Java classes,using the“Java Native Interface“(JNI)[4].

In order to bene?t from the availability of vendor supplied libraries and thereby speed up numerical computations in Java,the use of the JNI is a promising approach.

In this paper,an analysis of the potential speed gain which can be achieved by calling opti-mized BLAS routines via the Java Native Interface will be done.

A similar analysis has been done by Bik and Gannon[5]in1997.This paper will show, that due to the rapid developement the Java platform has experienced since then,their results are outdated by now.

4

Chapter2

The Java Native Interface

The Java speci?cation by SUN Microsystems includes the“Java Native Interface“,a C/C++API1 to

?Call native methods from within Java classes

?Load a Java virtual machine into a running C-Program and thereby call Java software from C-Code

In this paper,the?rst option is used:Calling native methods from java.In this chapter the steps necessary to call native code from a Java class are described.

2.1Declare a Java method as native

By using the keyword native a method is declared as native and its body is not implemented. public static final native void scal(int n,float alpha, float[]x,int off_x,int inc_x);

2.2Compile the Java class

To compile the class,a command like the following can be used:

javac de/smurflord/BlasJNI/BlasL1.java

This command has to be called from the toplevel directory,otherwise the linker cannot properly resolve the name of the native methods.

6T HE J AVA N ATIVE I NTERFACE 2SDK:S oftware D evelopement K it

2.5C REATE A SHARED LIBRARY7

Chapter3

Test Procedure

3.1Tests involved

In order to evaluate the factors which contribute to the total execution time of JNI wrapped functions,benchmarks were run on seven BLAS routines.The tested routines were nrm2,copy, scal,axpy and dot from the Level1BLAS,gemv from the Level2BLAS and gemm from the Level3BLAS.

Time measurements were done by saving the system time before and after the calls to an operation and then taking the difference.Since most architectures do not contain a high precision clock,the Level1BLAS timings were taken for800iterations,while the Level2and Level3 timings were taken for1iteration.The corresponding code samples are listed in?gure A.1and A.2.

For all tested functions the following execution times were taken:

?A native function call to a vendor supplied library.No Java code is used here.

?The operation written in pure Java

?The native function from a vendor supplied library called via the JNI

?A function which only copies the routines from the JVM’s heap to C memory and back (see?gure A.3).

For the Level2and3BLAS(GEMV and GEMM)the following additional timings were measured:

?A pure C implementation(Fig.A.4and A.5)

8

3.1T ESTS INVOLVED9

Chapter4

Analysing Test Results

4.1Description of the plots

For each test the results are plotted in a diagram combining four test series for the scalar functions and six for the Level2and3BLAS(tables4.1and4.2).Since the results were the same for different members of the same architecture(for example AMD and Intel Processors)and for different operating systems on the same machine,only a few representative cases are discussed here.The complete set of results is given in the appendix.Note that the scalar functions are plotted using a logarithmic scale on both axes,whereas gemm and gemv are plotted using a linear scale on the vector size axis and a logarithmic scale on the time axis.

Title

Java

An optimized library called from C

JNI

The time it takes to copy the data from

4.2R EPRESANTATIVE CASES11

Description

A naive Java implementation

Native C

The native C algorithm wrapped via the JNI

Library

An optimized library called from JA V A

JNI to C copy

the JVM to C memory and back(if necessary)

Table4.2:Test series plotted for Level2and3BLAS

Function Complexity

n

2n

2n

2n

3n

n+n2

4n2

12A NALYSING T EST R ESULTS 1SIMD:S ingle I nstruction M ultiple D ata

4.2R EPRESANTATIVE CASES13

14A NALYSING T EST R ESULTS

I NSTITUT FüR N ACHRICHTENTECHNIK

4.2R EPRESANTATIVE CASES15

16A NALYSING T EST R ESULTS

I NSTITUT FüR N ACHRICHTENTECHNIK

4.2R EPRESANTATIVE CASES17

18A NALYSING T EST R ESULTS

I NSTITUT FüR N ACHRICHTENTECHNIK

4.2R EPRESANTATIVE CASES19

Chapter5

Conclusions

The tests have shown,that invoking native functions always imposes a overhead because the native code works on copies of the original data.These copies have to be synchronised with the original data,which is an expensive operation.Furthermore,for the straightforward operations tested,Java code has proven to be slower than native code by only a factor1.5?2.As long as the complexity of the calculation is the same as the order of copy operations that have to be done, this speed advantage is consumed by the copy operations and the best choice is to use a Java method.In the case where the complexity of the operation itself is higher than the complexity of the copy operation,a speed gain in the magnitude of approximately1.5?3can be obtained by replacing Java methods with calls to native methods.

A notable speed gain in order of a magnitude could,however,only be achieved for the matrix-matrix multiplication,a calculation where the main speedup can be attributed to algorithms which provide reduced complexity.

These results demonstrate the rapid improvement the Java platform has undergone since Bik and Gannon[5]did their tests in1997.The speed of the Java platform has greatly improved.The advantages provided by the Java environment,most notably the excellent portability,will be lost, if the Java Native Interface is used.Due to the comparably small speedup for most calculations, the Java Native Interface should be used with care.In most cases the better solution will be to provide ef?cient algorithms in Java.

20

相关主题
文本预览
相关文档 最新文档