Difference between revisions of "User:Dennisvd@nikhef.nl/mpi"

From PDP/Grid Wiki
Jump to navigationJump to search
 
 
(5 intermediate revisions by the same user not shown)
Line 1: Line 1:
OK, so maybe not heroic. But my attempts at getting MPI running are much like an uphill battle.
+
OK, so maybe not heroic. But my attempts at getting MPI running are much like an [http://en.wikipedia.org/wiki/Battle_of_Hamburger_Hill uphill battle].
  
I managed, so here are some notes:
+
I managed only partly, so far, so here are some notes:
  
 
* I used passwordless-hostbased-ssh logins between the nodes
 
* I used passwordless-hostbased-ssh logins between the nodes
 
* I used Torque as the batch system
 
* I used Torque as the batch system
 
* I used the RHEL4 provided openmpi (that does not include the tm module, hence the ssh stuff)
 
* I used the RHEL4 provided openmpi (that does not include the tm module, hence the ssh stuff)
* It worked with cream.
+
* It worked with torque and mpirun
 +
* ... but not with mpi-start.
 +
* It does not yet work with CREAM, als all the processes are run on the same node.
 +
 
 +
One thing that I had forgotten was to set the MPI_*_PATH variables that mpi-start needs. It had a fallback to /opt/i2g/openmpi and that did not distribute the work according to plan. After I straightened this out, I got the same result: everything ran on the same node.
 +
 
 +
mpi-start is supposed to do the right thing, but no. The debug message I got was clarifying things:
 +
 
 +
found openmpi and PBS, don't set machinefile
 +
 
 +
which means that the call to mpirun (or mpiexec) does not include the -machinefile which it needs. You wouldn't need it if the openmpi came with the PBS startup stuff, but in this case it is no true.
 +
 
 +
== References ==
 +
 
 +
* http://egee-uig.web.cern.ch/egee-uig/production_pages/MPIJobs.html ''Official EGEE user documentation on using MPI.''
 +
* http://www.grid.ie/mpi/wiki/YaimConfig ''YAIM configuration notes for MPI sites.''

Latest revision as of 16:40, 12 June 2009

OK, so maybe not heroic. But my attempts at getting MPI running are much like an uphill battle.

I managed only partly, so far, so here are some notes:

  • I used passwordless-hostbased-ssh logins between the nodes
  • I used Torque as the batch system
  • I used the RHEL4 provided openmpi (that does not include the tm module, hence the ssh stuff)
  • It worked with torque and mpirun
  • ... but not with mpi-start.
  • It does not yet work with CREAM, als all the processes are run on the same node.

One thing that I had forgotten was to set the MPI_*_PATH variables that mpi-start needs. It had a fallback to /opt/i2g/openmpi and that did not distribute the work according to plan. After I straightened this out, I got the same result: everything ran on the same node.

mpi-start is supposed to do the right thing, but no. The debug message I got was clarifying things:

found openmpi and PBS, don't set machinefile

which means that the call to mpirun (or mpiexec) does not include the -machinefile which it needs. You wouldn't need it if the openmpi came with the PBS startup stuff, but in this case it is no true.

References