Message Passing Interface (MPI) is a standard interface for writing parallel applications using the message passing paradigm (as opposed to the shared memory paradigm). Local Area Multicomputer (LAM) is a nice MPI implementation that can create virtual clusters on simple local area networks. However, debugging parallel code is tough and painful. Any C code is wont to require some debugging. I was working on writing a few parallel applications for my senior seminar (resulting paper ) and had to learn about debugging on LAM. The most useful by far has been using shell script wrappers to remotely run applications in gdb.
LAM sets certain environment variables before executing your code remotely.
Simple shell script wrappers can be used to start debugging tools such as
gdb or valgrind instead of the executable.
However, LAM maps stdin to /dev/null on remote
nodes so you will not be able to use these applications interactively.
You probably want to set up batch debugging scripts.
Here's a simple bash script that runs gdb with a command file
one one node and discards standard output from all other nodes.
#!/bin/bash # Write the debug commands you want to be run on gdb into the file # "debugcmds" # USAGE: ./debugrank.sh [rank of node to debug] [program] "[parameters]" # Runs given program inside gdb on given node (specified by rank) and executes # commands inside file "debugcmds" inside gdb. # On all other nodes, the program is run normally with output redirected to # /dev/null #------------------- # Configuration. You can change these DEBUGFILE=debugcmds # this file contains the gdb commands that are run on # debugged node REDIR=/dev/null #------------------- PRG=$1 PARAM=$2 if [ "$LAMRANK" == "$1" ]; then gdb -x $DEBUGFILE $PRG else $PRG $PARAM >$REDIR fi
The official LAM FAQ outlines this and several other debugging methods.