Speeding up Java Programs by calling native methods
Studienarbeit
Institut für elektrische Nachrichtentechnik
RWTH-Aachen
Felix Engel
Matrikelnummer:222750
Betreuer:Holger Crysandt
July21,2004
Contents
1Introduction4
2The Java Native Interface5
2.1Declare a Java method as native (5)
2.2Compile the Java class (5)
2.3Create a C-header?le from the compiled Java Class (6)
2.4Implement the functions declared in the header?le (6)
2.5Create a shared library (7)
2.6Load the library into the Java class (7)
3Test Procedure8
3.1Tests involved (8)
4Analysing Test Results10
4.1Description of the plots (10)
4.2Represantative cases (11)
4.2.1Worst case order n:copy (11)
4.2.2Best case order n:dot (12)
4.2.3Order n2:gemv (12)
4.2.4Order n3:gemm (13)
5Conclusions20 References20
A Test routines22
B Hardware con?gurations24
2
CONTENTS3
Chapter1
Introduction
The popularity and widespread use of the Java Programming Language are constantly increasing. The main reasons for this development are:
?Cross platform availability
?Pure object Orientation
?A garbage collector which autmatically frees memory that is not being referenced anymore ?A wide set of tested“off the shelf“libraries for a vast number of tasks are provided by the Java Runtime Environment(JRE)and third party vendors.
However,Java bytecode cannot be as performant as highly optimized C or FORTRAN code. Especially for numerical computations,hardware vendors like SUN Microsystems[1],Intel[2] and Silicon Graphics provide libraries which generally include the“Basic Linear Algebra Sub-programs“(BLAS)[3].The BLAS standardize FORTRAN subroutines and functions for basic vector and matrix operations.
Java speci?es an interface to call native methods written in any language from within Java classes,using the“Java Native Interface“(JNI)[4].
In order to bene?t from the availability of vendor supplied libraries and thereby speed up numerical computations in Java,the use of the JNI is a promising approach.
In this paper,an analysis of the potential speed gain which can be achieved by calling opti-mized BLAS routines via the Java Native Interface will be done.
A similar analysis has been done by Bik and Gannon[5]in1997.This paper will show, that due to the rapid developement the Java platform has experienced since then,their results are outdated by now.
4
Chapter2
The Java Native Interface
The Java speci?cation by SUN Microsystems includes the“Java Native Interface“,a C/C++API1 to
?Call native methods from within Java classes
?Load a Java virtual machine into a running C-Program and thereby call Java software from C-Code
In this paper,the?rst option is used:Calling native methods from java.In this chapter the steps necessary to call native code from a Java class are described.
2.1Declare a Java method as native
By using the keyword native a method is declared as native and its body is not implemented. public static final native void scal(int n,float alpha, float[]x,int off_x,int inc_x);
2.2Compile the Java class
To compile the class,a command like the following can be used:
javac de/smurflord/BlasJNI/BlasL1.java
This command has to be called from the toplevel directory,otherwise the linker cannot properly resolve the name of the native methods.
6T HE J AVA N ATIVE I NTERFACE 2SDK:S oftware D evelopement K it
2.5C REATE A SHARED LIBRARY7
Chapter3
Test Procedure
3.1Tests involved
In order to evaluate the factors which contribute to the total execution time of JNI wrapped functions,benchmarks were run on seven BLAS routines.The tested routines were nrm2,copy, scal,axpy and dot from the Level1BLAS,gemv from the Level2BLAS and gemm from the Level3BLAS.
Time measurements were done by saving the system time before and after the calls to an operation and then taking the difference.Since most architectures do not contain a high precision clock,the Level1BLAS timings were taken for800iterations,while the Level2and Level3 timings were taken for1iteration.The corresponding code samples are listed in?gure A.1and A.2.
For all tested functions the following execution times were taken:
?A native function call to a vendor supplied library.No Java code is used here.
?The operation written in pure Java
?The native function from a vendor supplied library called via the JNI
?A function which only copies the routines from the JVM’s heap to C memory and back (see?gure A.3).
For the Level2and3BLAS(GEMV and GEMM)the following additional timings were measured:
?A pure C implementation(Fig.A.4and A.5)
8
3.1T ESTS INVOLVED9
Chapter4
Analysing Test Results
4.1Description of the plots
For each test the results are plotted in a diagram combining four test series for the scalar functions and six for the Level2and3BLAS(tables4.1and4.2).Since the results were the same for different members of the same architecture(for example AMD and Intel Processors)and for different operating systems on the same machine,only a few representative cases are discussed here.The complete set of results is given in the appendix.Note that the scalar functions are plotted using a logarithmic scale on both axes,whereas gemm and gemv are plotted using a linear scale on the vector size axis and a logarithmic scale on the time axis.
Title
Java
An optimized library called from C
JNI
The time it takes to copy the data from
4.2R EPRESANTATIVE CASES11
Description
A naive Java implementation
Native C
The native C algorithm wrapped via the JNI
Library
An optimized library called from JA V A
JNI to C copy
the JVM to C memory and back(if necessary)
Table4.2:Test series plotted for Level2and3BLAS
Function Complexity
n
2n
2n
2n
3n
n+n2
4n2
12A NALYSING T EST R ESULTS 1SIMD:S ingle I nstruction M ultiple D ata
4.2R EPRESANTATIVE CASES13
14A NALYSING T EST R ESULTS
I NSTITUT FüR N ACHRICHTENTECHNIK
4.2R EPRESANTATIVE CASES15
16A NALYSING T EST R ESULTS
I NSTITUT FüR N ACHRICHTENTECHNIK
4.2R EPRESANTATIVE CASES17
18A NALYSING T EST R ESULTS
I NSTITUT FüR N ACHRICHTENTECHNIK
4.2R EPRESANTATIVE CASES19
Chapter5
Conclusions
The tests have shown,that invoking native functions always imposes a overhead because the native code works on copies of the original data.These copies have to be synchronised with the original data,which is an expensive operation.Furthermore,for the straightforward operations tested,Java code has proven to be slower than native code by only a factor1.5?2.As long as the complexity of the calculation is the same as the order of copy operations that have to be done, this speed advantage is consumed by the copy operations and the best choice is to use a Java method.In the case where the complexity of the operation itself is higher than the complexity of the copy operation,a speed gain in the magnitude of approximately1.5?3can be obtained by replacing Java methods with calls to native methods.
A notable speed gain in order of a magnitude could,however,only be achieved for the matrix-matrix multiplication,a calculation where the main speedup can be attributed to algorithms which provide reduced complexity.
These results demonstrate the rapid improvement the Java platform has undergone since Bik and Gannon[5]did their tests in1997.The speed of the Java platform has greatly improved.The advantages provided by the Java environment,most notably the excellent portability,will be lost, if the Java Native Interface is used.Due to the comparably small speedup for most calculations, the Java Native Interface should be used with care.In most cases the better solution will be to provide ef?cient algorithms in Java.
20