The pgdbg, Totalview, and gdb debuggers should each work in parallel. The description here is of using the gdb (or ddd debugger) used on a VCL node.
MPI (Message Passing Interface) code are typically used for distributed memory codes, i.e., each processor has its own memory and communicates to other processors by sending and receiving messages. For purposes of debugging with gdb, we've implemented a shared memory version of MPI. Here the messages are passed within a single shared memory node. It's possible to start more processes than the number of physical cores.
The mpirun call starts up a process. When an MPI_Init call is encountered, that root process starts up the requested number of new processes. By putting a pause after the MPI_Init, we can identify the new processes and attach each new process to a gdb session. Then we can step through each of the processes individually.
Here's an example. From one of the 32 bit login nodes,
This sets up the mpif77, mpicc, mpicxx commands to use the gnu32
library. Compile a simple MPI code, e.g., the monte.f code
from the MPI short course. An excerpt is as follows.
real*8 ans(10), ans2(10) real*8 startim, entim, sum, sindex c c function integer string_len iflag = 1 c call MPI_INIT(ierr) do while (iflag.eq.1) end do c call MPI_COMM_SIZE(MPI_COMM_WORLD, p, ierr) call MPI_COMM_RANK(MPI_COMM_WORLD, my_rank, ierr) * print*,' I am ', my_rank c if (my_rank.eq.0) then cc print*,'input random seed'
>mpirun -np 2 ./monte
[gwhowell@login02 ~]$ ps -ef | grep monte gwhowell 25532 25386 0 14:39 pts/64 00:00:00 /bin/sh /home/gwhowell/mpiches/mpich-1.2.7p1/gnu32/ch_shmem/bin/mpirun -np 2 ./monte gwhowell 25560 25532 6 14:39 pts/64 00:01:42 /home/gwhowell/ppmpi_f/chap03/./monte gwhowell 25561 25560 10 14:39 pts/64 00:02:40 /home/gwhowell/ppmpi_f/chap03/./monte
gdb gdb> attach 25560 gdb> set iflag = 0
gdb gdb> attach 25561 gdb> set iflag = 0
You may prefer to start up one or more of the gdb sessions with "ddd". This has a GUI interface which can be useful. For example, when I tried
Totalview debugging can be accomplished in much the same way as outlined here (but so far works only for fortran and C codes, not for C++). For the totalview debugger, try