Profiling on Power6
This page describes possibilities to get performance information about your program. If you have only a vague idea where in your program the CPU time is spent, you should "profile" your code.
Profiling with tprof
tprof allows profiling of your program without recompiling. In flat profile mode the tprof command could look like this:
tprof -usz -p progname -x poe path_to_progname
Find more information about tprof with "man tprof".
Profiling with gprof
Profiling with gprof is done in three steps:
-
Recompile the program with the -p or -pg flags. -pg is like -p, but it produces more extensive statistics. For example:
- Fortran
- mpxlf90_r -pg mysource.f90 -o myprog
- C
- mpxlc_r -pg mysource.c -o myprog
-
Run your program. This should produce output files gmon.out.nn, where nn is the rank of the MPI task.
-
Analyze the output file with gprof:
gprof myprog gmon.out.nn
This produces a list of subroutines, together with the CPU usage in those subroutines (i.e. what percentage of the total CPU time was spent in each subroutine).
Simple performance library
After you have identified the routines consuming most of the CPU time, you can use the performance library libperf.a to further narrow down the hot spots of your program. Using simple calls you can instrument regions in your program to get information about runtimes and Mflop/s in those regions. By linking with libperfhpm.a this instrumentation can also be used to call the HPM library.
This performance library can be called from FORTRAN and(or) from C.
HPC toolkit (HPCT)
The IBM high-performance computing toolkit (HPCT) is installed on Power6 vip as a module.
module load hpct
See Bits & Bytes no. 184 for an an introductory article on HPCT.
Main components of HPCT are
- hpccount: command for measuring the overall performance of an application
- Xprof: a GUI
- peekperf: GUI based, interactive instrumentation of executables and graphical analysis of performance data
- hpcInst: command for non-interactive instrumentation of executables (no source-code modification neccessary)
- libhpc: a library which can be used to explicitly instrument source code
- libmpitrace: a library for profiling and tracing MPI function calls
- a library for I/O profiling
hpccount command:
With the hpccount utility you can measure the number of hardware operations executed by your program. You can apply the hpccount command to your binary directly, for example as follows:
hpccount ./a.outor
module load hpct poe hpccount ./a.out -procs 4
in a LoadLeveler command file.
HPC library (libhpc):
In order to investigate particular parts of your program, you can instrument your program with calls of the libhpc and rebuild your executable:
mpxlf90_r -I$IHPCT_HOME/include my_prog.F -L$IHPCT_HOME/lib -lhpc -llicensePlease note the uppercase suffix .F - the file has to be passed through the preprocessor to digest the line (to enforce this on .f files, use qsuffix=cpp=f ).
#include <f_hpc.h>
This can provide valuable clues as to whether you (i.e. your program, or the libraries involved) have already reached the maximum number of MegaFlops that the hardware can deliver. In that case, obviously you cannot make your program faster (only with a different algorithm that requires fewer calculations). Unfortunately, it is more likely that the CPU cannot access memory as quickly as it could process the data. Then you could, for example, check the libhpc output for "cache misses" and optimize the use of the cache.
Find more about the HPC library in $IHPCT_HOME/doc/HPM_ug.pdf.
MPI trace library (libmpitrace):
To find out in which MPI routines most of the communication time is spent, you can link your program against the MPI library mpitrace. It is part of the HPC toolkit as well. This is an example command line, how to link your program with mpitrace:
mpxlf90_r -o my_prog my_prog.F\
-L$IHPCT_HOME/lib -lmpitrace -llicense\
-L/bgsys/drivers/ppcfloor/ppc/gnu-linux/powerpc-bgp-linux/lib -lgfortran
You must add the library gfortran (GNU fortran), since mpitrace needs some subroutines from gfortran.
Once an application has been properly linked with the mpitrace library it may be run as usual and will collect and output MPI tracing and profiling information. By default the library collects a time summary and time history of MPI calls for the first 256 MPI process ranks. The environment variables listed below may be used to control some of the run-time behavior of the mpitrace library:
TRACE_ALL_TASKS = yes or no
yes - enables tracing of all mpi ranks. Does not effect output of profiling information.
no - enables tracing of only first 256 ranks (default)
TRACE_ALL_EVENTS = yes or no
yes - enables automatic tracing (default)
no - disables automatic tracing (still allows tracing through API calls)
MAX_TRACE_RANK = # of max rank
specifies the number of mpi tasks to trace
TRACE_SEND_PATTERN = yes or no
no - default
yes - outputs information for point-to-point communication as a matrix representation of the number of bytes send between
ranks and enables calculation of torus hops. The sent-byte information is written to the file sent-bytes.matrix.
The point-to-point communication is only available for the ranks for which tracing information has been collected.
TRACEBACK_LEVEL = level #
Specifies the depth to which the call stack should be unwound for identifying the routine that initiated the MPI call.
OUTPUT_ALL_RANKS = yes or no
yes - outputs profile information for all ranks.
no - outputs profiles and trace information only for rank 0 and ranks with max, min, and median total MPI time.
Several types of output files are generated by running an application linked to the mpitrace library (nn stands for the MPI rank):
- mpi_profile.nn - Text files containing MPI profile information. By default profile information is created for only 4 MPI ranks: rank 0, and the ranks with the maximum, minimum, and medium MPI times.
- mpi_profile.nn.viz - contains profile information that may be viewed with the IHPC Toolkit peekperf utility.
- single_trace - contains tracing information for those MPI ranks for which tracing was enabled, may be viewed with the IHPC Toolkit peekview utility.
Both peekperf and peekview are in the $IHPCT_HOME/bin directory. To use them you need to be able to export an X Windows display back to your local machine, which requires you have X Windows software installed, and X forwarding should be enabled, for instance by logging in to the Blue Gene/P system with "ssh -X". Additionally, it is necessary to add the directory $IHPCT_HOME/lib to your environment's LD_LIBRARY_PATH variable.
More details on the mpitrace library may be found in the IBM Redpaper IBM System Blue Gene Solution: High Performance Computing Toolkit for Blue Gene/P.
