Using the Grid/Job Advanced MPI script

From BiGGrid Wiki
Revision as of 10:50, 28 June 2011 by Machiel.Jansen (talk | contribs) (Created page with "{{Todo}} =MPI Job= ==Introduction== MPI jobs on the Grid are slightly different. They will run on only one site and you can request a number of nodes per job. On the lifescience ...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
This page has been marked as Todo, which means it needs serious work.
Please feel free to add to this page. Contribute help

MPI Job

Introduction

MPI jobs on the Grid are slightly different. They will run on only one site and you can request a number of nodes per job. On the lifescience Grid the clusters have 16 nodes. When you request more nodes, your job can only run on larger clusters. Currently that will be the cluster in Groningen and Gina. There are several flavors of MPI available, however MPICH and OpenMPI are the most common on the Grid.

To request four cpu's for an MPICH job, add: JobType= 'MPICH' and NodeNumber = 4 in the jdl-file.

Submission

The three files from the Example below, you can submit with:

user$ startGridSession lsgrid
user$ glite-wms-job-submit -d $USER -o myjobs test-mpi.jdl

And the usual commands to check status, retreive output and possibly cancel your job.

user$ glite-wms-job-status -i myjobs
user$ glite-wms-job-output --dir ./my_ouput -i myjobs
user$ glite-wms-job-cancel -i myjobs 

Example files

test-mpi.jdl

Type = "Job";
JobType = "MPICH";
NodeNumber = 4;
Executable = "test-mpi.sh";
Arguments = "test-mpi";
StdOutput = "test-mpi.out";
StdError = "test-mpi.err";
InputSandbox = {"test-mpi.sh","test-mpi.c"};
OutputSandbox = {"test-mpi.err","test-mpi.out","mpiexec.out"};
Requirements = Member("MPICH", other.GlueHostApplicationSoftwareRunTimeEnvironment);

file: test-mpi.sh

#!/bin/sh -x

# the binary to execute
EXE=$1 

echo "***********************************************************************" 
echo "Running on: $HOSTNAME" 
echo "As:       " `whoami` 
echo "***********************************************************************" 

echo "***********************************************************************" 
echo "Compiling binary: $EXE" 
echo mpicc -o ${EXE} ${EXE}.c
mpicc -o ${EXE} ${EXE}.c
echo "*************************************" 

if [ "x$PBS_NODEFILE" != "x" ] ; then 
  echo "PBS Nodefile: $PBS_NODEFILE" 
  HOST_NODEFILE=$PBS_NODEFILE 
fi

if [ "x$LSB_HOSTS" != "x" ] ; then 
  echo "LSF Hosts: $LSB_HOSTS" 
  HOST_NODEFILE=`pwd`/lsf_nodefile.$$ 
  for host in ${LSB_HOSTS} 
  do 
    echo $host >> ${HOST_NODEFILE} 
  done 
fi

if [ "x$HOST_NODEFILE" = "x" ]; then
  echo "No hosts file defined.  Exiting..."
  exit
fi 

echo "***********************************************************************" 
CPU_NEEDED=`cat $HOST_NODEFILE | wc -l` 
echo "Node count: $CPU_NEEDED"
echo "Nodes in $HOST_NODEFILE: "
cat $HOST_NODEFILE
echo "***********************************************************************" 

echo "***********************************************************************" 
CPU_NEEDED=`cat $HOST_NODEFILE | wc -l` 
echo "Checking ssh for each node:"
NODES=`cat $HOST_NODEFILE`
for host in ${NODES}
do
  echo "Checking $host..." 
  ssh $host "hostname; set;ls -l `which mpirun`;rpm -qf `which mpirun`;rpm -qa | grep mpi;hostname"
done
echo "***********************************************************************" 

echo "***********************************************************************" 
echo "Executing $EXE with mpiexec" 
chmod 755 $EXE 
mpiexec `pwd`/$EXE > mpiexec.out 2>&1 
echo "***********************************************************************" 

file: test-mpi.c

/*  hello.c
 *
 *  Simple "Hello World" program in MPI.
 *
 */
   
#include "mpi.h"
#include <stdio.h>
int main(int argc, char *argv[])
{
  int numprocs;  /* Number of processors */
  int procnum;   /* Processor number */
  /* Initialize MPI */
  MPI_Init(&argc, &argv);
  /* Find this processor number */
  MPI_Comm_rank(MPI_COMM_WORLD, &procnum);
  /* Find the number of processors */
  MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
  printf ("Hello world! from processor %d out of %d\n", procnum, numprocs);
  /* Shut down MPI */
  MPI_Finalize();
  return 0;
}